Ruby HTTP server from the ground up

Article Logo

Getting something to work quickly is important when you are starting out, but if you want to become better at programming it's important to know a few levels below the abstractions you are used to be working with.

When it comes to Web development it's important to know how HTTP works, and what better way to do that than go through baptism by fire and build our own HTTP server.

How does HTTP look anyway?

HTTP is plaintext protocol implemented over TCP so we can easily inspect what requests look like (HTTP 2 is actually no longer plaintext, it's binary for efficiency purposes).
One way to look at request structure is to use curl with -v (verbose) flag:

curl http://example.com/something -H "x-some-header: value" -v
Outputs
GET /something HTTP/1.1
Host: example.com
User-Agent: curl/7.64.1
Accept: */*
x-some-header: value

And in response we get
HTTP/1.1 404 Not Found
Age: 442736
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Sat, 03 Jul 2021 15:02:03 GMT
Expires: Sat, 10 Jul 2021 15:02:03 GMT
...
Content-Length: 1256

<!doctype html>
<html>
<head>
...

The plan

Let's define the steps we are going to need:

  • Listen on a local socket for incoming TCP connections
  • Read incoming request's data (text)
  • Parse the text of the request to extract method, path, query, headers and body from it
  • Send the request to our app and get a response
  • Send the response to the remote socket via the connection
  • Close the connection

With that in mind let's setup the general structure of our program:

require 'socket'

class SingleThreadedServer
  PORT = ENV.fetch('PORT', 3000)
  HOST = ENV.fetch('HOST', '127.0.0.1').freeze
  # number of incoming connections to keep in a buffer
  SOCKET_READ_BACKLOG = ENV.fetch('TCP_BACKLOG', 12).to_i

  attr_accessor :app

  # app: a Rack app
  def initialize(app)
    self.app = app
  end

  def start
    socket = listen_on_socket
    loop do # continuously listen to new connections
      conn, _addr_info = socket.accept
      request = RequestParser.call(conn)
      status, headers, body = app.call(request)
      HttpResponder.call(conn, status, headers, body)
    rescue => e
      puts e.message
    ensure # always close the connection
      conn&.close
    end
  end
end

SingleThreadedServer.new(SomeRackApp.new).start

Listening on a socket

A "full" version of the implementation of listen_on_socket looks like that:

def listen_on_socket
    Socket.new(:INET, :STREAM)
    socket.setsockopt(Socket::SOL_SOCKET, Socket::SO_REUSEADDR, true)
    socket.bind(Addrinfo.tcp(HOST, PORT))
    socket.listen(SOCKET_READ_BACKLOG)
end

However, there's a lot of boilerplate here and all this code could be replaced with:

def listen_on_socket
    socket = TCPServer.new(HOST, PORT)
    socket.listen(SOCKET_READ_BACKLOG)
end

Parsing a request

Before we start let's define what an end should look like. We want our server to be Rack compatible. Here's an example I found of what Rack expects in its environment as a part of the request:

{"GATEWAY_INTERFACE"=>"CGI/1.1", "PATH_INFO"=>"/", "QUERY_STRING"=>"", "REMOTE_ADDR"=>"127.0.0.1", "REMOTE_HOST"=>"localhost", "REQUEST_METHOD"=>"GET", "REQUEST_URI"=>"http://localhost:9292/", "SCRIPT_NAME"=>"", "SERVER_NAME"=>"localhost", "SERVER_PORT"=>"9292", "SERVER_PROTOCOL"=>"HTTP/1.1", "SERVER_SOFTWARE"=>"WEBrick/1.3.1 (Ruby/2.2.1/2015-02-26)", "HTTP_HOST"=>"localhost:9292", "HTTP_ACCEPT_LANGUAGE"=>"en-US,en;q=0.8,de;q=0.6", "HTTP_CACHE_CONTROL"=>"max-age=0", "HTTP_ACCEPT_ENCODING"=>"gzip", "HTTP_ACCEPT"=>"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8", "HTTP_USER_AGENT"=>"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36", "rack.version"=>[1, 3], "rack.url_scheme"=>"http", "HTTP_VERSION"=>"HTTP/1.1", "REQUEST_PATH"=>"/"}

We are not going to return all of these params, but let's at least return the most important ones.

First thing we are going to need is to parse a request line, it's structure probably looks familiar to you:

MAX_URI_LENGTH = 2083 # as per HTTP standard

def read_request_line(conn)
    # e.g. "POST /some-path?query HTTP/1.1"

    # read until we encounter a newline, max length is MAX_URI_LENGTH
    request_line = conn.gets("\n", MAX_URI_LENGTH)

    method, full_path, _http_version = request_line.strip.split(' ', 3)

    path, query = full_path.split('?', 2)

    [method, full_path, path, query]
end

After the request line come the headers:

Let's remember how they look like, each header is a separate line:

Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Content-Length: 1256
MAX_HEADER_LENGTH = (112 * 1024) # how it's defined in Webrick, Puma and other servers

def read_headers(conn)
    headers = {}
    loop do
        line = conn.gets("\n", MAX_HEADER_LENGTH)&.strip

        break if line.nil? || line.strip.empty?

        # header name and value are separated by colon and space
        key, value = line.split(/:\s/, 2)

        headers[key] = value
    end

    headers
end

As a result we get:

{
    "Cache-Control" => "max-age=604800"
    "Content-Type" => "text/html; charset=UTF-8"
    "Content-Length" => "1256"
}

Next we need to read the body, not all requests are expected to have a body, only POST and PUT:

def read_body(conn:, method:, headers:)
    return nil unless ['POST', 'PUT'].include?(method)

    remaining_size = headers['content-length'].to_i

    conn.read(remaining_size)
end

Having all the blocks from above we can finish our simplified implementation:

class RequestParser
  class << self
    def call(conn)
      method, full_path, path, query = read_request_line(conn)

      headers = read_headers(conn)

      body = read_body(conn: conn, method: method, headers: headers)

      # read information about the remote connection
      peeraddr = conn.peeraddr
      remote_host = peeraddr[2]
      remote_address = peeraddr[3]

      # our port
      port = conn.addr[1]
      {
        'REQUEST_METHOD' => method,
        'PATH_INFO' => path,
        'QUERY_STRING' => query,
        # rack.input needs to be an IO stream
        "rack.input" => body ? StringIO.new(body) : nil,
        "REMOTE_ADDR" => remote_address,
        "REMOTE_HOST" => remote_host,
        "REQUEST_URI" => make_request_uri(
          full_path: full_path,
          port: port,
          remote_host: remote_host
        )
      }.merge(rack_headers(headers))
    end

    # ... (methods we implemented above)

    def rack_headers(headers)
      # rack expects all headers to be prefixed with HTTP_
      # and upper cased
      headers.transform_keys do |key|
        "HTTP_#{key.upcase}"
      end
    end

    def make_request_uri(full_path:, port:, remote_host:)
      request_uri = URI::parse(full_path)
      request_uri.scheme = 'http'
      request_uri.host = remote_host
      request_uri.port = port
      request_uri.to_s
    end
  end
end

Sending a response

Let's skip the Rack app part for a time, we are going to implement it later, and implement sending a response:

class HttpResponder
  STATUS_MESSAGES = {
    # ...
    200 => 'OK',
    # ...
    404 => 'Not Found',
    # ...
  }.freeze

  # status: int
  # headers: Hash
  # body: array of strings
  def self.call(conn, status, headers, body)
    # status line
    status_text = STATUS_MESSAGES[status]
    conn.send("HTTP/1.1 #{status} #{status_text}\r\n", 0)

    # headers
    # we need to tell how long the body is before sending anything,
    # this way the remote client knows when to stop reading
    content_length = body.sum(&:length)
    conn.send("Content-Length: #{content_length}\r\n", 0)
    headers.each_pair do |name, value|
      conn.send("#{name}: #{value}\r\n", 0)
    end

    # tell that we don't want to keep the connection open
    conn.send("Connection: close\r\n", 0)

    # separate headers from body with an empty line
    conn.send("\r\n", 0)

    # body
    body.each do |chunk|
      conn.send(chunk, 0)
    end
  end
end

That's an example of what we can send:

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Content-Length: 53

<html>
<head></head>
<body>hello world</body>
</html>

Rack App

Any Rack app needs to return status, headers, body. Status is an integer, body is an array of strings (chunks).

With that in mind let's make an app that's going to read files from the file system based on the request path:

class FileServingApp
  # read file from the filesystem based on a path from
  # a request, e.g. "/test.txt"
  def call(env)
    # this is totally unsecure, but good enough for the demo
    path = Dir.getwd + env['PATH_INFO']
    if File.exist?(path)
      body = File.read(path)
      [200, { "Content-Type" => "text/html" }, [body]]
    else
      [404, { "Content-Type" => "text/html" }, ['']]
    end
  end
end

Final word

That was pretty simple, was it not?
Because we skipped all the corner cases!

If you want you dive into the topic in greater detail I encourage you to jump into WEBRick code, it's implemented in pure Ruby. You can learn more about Rack from this article.

If you want see the full version of the code we just wrote, you can check out the Github repo: github.com/TheRusskiy/ruby3-http-server/blob/master/servers/single_threaded_server.rb.

Next we are going to experiment with different ways of processing requests: single threaded server, multi-threaded server and even Fibers / Ractors from Ruby 3.
Head over to part #2.

Popular posts from this blog

HTTP server in Ruby 3 - Fibers & Ractors

Next.js: restrict pages to authenticated users

Migration locks for TypeORM