Siaw Young - Notes on "Rebuilding a Web Server"

Some notes I took while watching Rebuilding a Web Server, a brief walkthrough by Marc-André Cournoyer on writing a simple Rack-compliant web server. The code for the class is here.

Concurrency

The entire stack looks like this:

Browser -> Socket -> HTTP Parser -> Rack -> Your App

There’s also a scheduler running alongside, handling concurrent connections. Such a scheduler can be implemented in different ways: threads, pre-forked processes, or an event loop.

Threads

A naive implementation would look like this, spawning a new thread for each incoming socket connection:

# inside the server's class definition
...
  def start
    loop do
      socket = @server.accept
      Thread.new do
        connection = Connection.new(socket, @app)
        connection.process
      end
    end
  end
...

Web servers like Puma use threads. Thread spawning is quite expensive, so web servers that use threads for concurrency will usually spawn a number of threads (thread pool) on bootup and reuse them.

Pre-forked Processes

Preforking is a popular concurrency model used by servers such as Unicorn and Nginx. fork creates a copy of the current process, and this child process is attached to its parent process. The two of them share the same socket¹.

# inside the server's class definition
...
  def initialize(port, app)
    @server = TCPServer.new(port)
    @app = app
  end

  def prefork(workers)
    workers.times do
      fork do
        start
      end
    end
    Process.waitall
  end

  def start
    loop do
      socket = @server.accept
      connection = Connection.new(socket, @app)
      connection.process # goes on to process the raw socket data
    end
  end
...

server.prefork(5) # for 5 child worker processes

Worker processes are forked beforehand, and all of them share the same listening socket. Whichever process is free will be scheduled by the OS scheduler to handle the next incoming connection on the socket. Presumably, leveraging on the OS scheduler is really efficient.

Event Loop

We can simulate an event loop in Ruby using a gem called eventmachine. eventmachine is a feature-packed gem, and comes with helper methods that handle accepting, reading and writing to and from socket connections for us.

# inside the server's class definition
...
  def start_event_machine
    EM.run do
      EM.start_server "localhost", 3000, EMConnection do |conn|
        conn.app = @app
      end
    end
  end

  class EMConnection < EM::Connection
    attr_accessor :app
    def post_init
      @parser = Http::Parser.new(self)
    end
    def receive_data(data)
      @parser << data
    end
    ...
  end
...

server.start_event_machine

`readpartial`

readpartial is an instance method of the IO class in Ruby which allows us to read data off a socket as soon as data is available. The APIDock entry on readpartial elaborates further:

readpartial is designed for streams such as pipe, socket, tty, etc. It blocks only when no data immediately available. This means that it blocks only when following all conditions hold.

the byte buffer in the IO object is empty.

the content of the stream is empty.

the stream is not reached to EOF.

Using the readpartial method, we can read off a socket like this:

data = socket.readpartial(1024) # reads at most 1024 bytes from the I/O stream
puts data

# do other things with data

sysread is a method with similar functionality.

`http_parser.rb`

http_parser.rb is a gem that wraps around Node’s HTTP parser.

Rack

Rack is a set of specifications that web servers, middleware applications, and application frameworks must adhere to. Rack apps must have a single point of entry named call, which must return an array containing the status code, the headers, and the body of the response.

Things which behave exactly like Rack tells them to (e.g. Unicorn, Rails) are Rack-compliant, and the benefit of this is that Rack-compliant things can be used in conjunction, layered on top of each other, or swapped out and replaced, without each having knowledge of the other (yep, abstraction).

Noah Gibb’s nice book Rebuilding Rails offers an excellent practical tutorial on Rack. The book covers more than just Rack, but the chapters on Rack are particularly illuminating.

KIV: Notes on Rebuilding Rails

Footnotes

More explicitly, the reason why they share the same socket is because of the file descriptor inheritance that happens in fork. According to Linux’s man pages:

The child inherits copies of the parent’s set of open file descriptors. Each file descriptor in the child refers to the same open file description (see open(2)) as the corresponding file descriptor in the parent. This means that the two descriptors share open file status flags, current file offset, and signal-driven I/O attributes.

↩