4 minute read

What if I told you that you could create a API backend that didn’t require any code? Crazy right? Wrong! Picture a client side Javascript app that displays a piffy saying. Typically, it would be developed in one of two ways: if the number of quotes is small, they might be hard-coded in the app, if there is a large number of witticisms, then an API will be built and the client will hit it when it needs something to say.

The first approach doesn’t scale. Every new saying is more for the client to download even though it needs exactly one. Not to mention that a quick View Source removes all of the mystery from your app.

The second can scale out to infinity and beyond, but now you need to code a backend, to create and maintain infrastructure. Now your afternoon project is real work.

What we need is a way to get one joke from the server without having to download them all and without having to spin up an API.

Enter HTTP Range

There’s a feature of the HTTP protocol that lets the browser request a chunk of a file from the server, the Range header. Sending only a portion of the file is called byte serving.

The format is quiet simple

Range: bytes=500-1000

The first number (zero indexed) offset of the starting byte, the second the end’s offset (not the size! I repeat, not the size!). If a web server supports HTTP Range then just that chunk of bytes will be returned (otherwise the header is ignored and entire file is returned as it normally would be).

While it rarely comes up for most developers, it’s what allows you to skip around when streaming audio or video. Because of that, byte serving is supported by most major web servers and many caching technologies.

Doing Something with Range

So, how does the help us? Well, in our witty example, if our sayings were in a file on the server and we knew the starting and ending offset of each one, we could use the following jQuery to read exactly one:

var range = '36-78';
$.ajax('sayings.json', {
  type: 'GET',
  dataType: 'json',
  cache: false,
  headers: { 'Range': 'bytes=' + range },
  success: function(data) { console.log(data); },
  error: function(jqXHR, textStatus, errorThrown) {
    console.log("AJAX Error: " + textStatus);
    console.log(errorThrown);
  }
 }
);

This presumes ‘sayings.json’ exists on the server and that ‘36-80’ is the range of a saying. How would we set that up? Given a text file like:

Yeah, well. The Dude abides.
That rug really tied the room together.
Look, let me explain something to you. I'm not Mr. Lebowski. You're Mr. Lebowski. I'm the Dude.
Hey, careful, man, there's a beverage here!

The following Ruby code will create two JSON files:

require 'json'
source = 'sayings.txt'
destination = 'sayings.json'
index = 'sayings-index.json'
ranges = []
File.open(destination, 'w') do |f|
  f.puts "[" # Wrap whole file in array
  File.foreach(source) do |line|
    offset = f.pos # Save current file offset
    f.puts line.chomp.to_json + ',' # Output JSON escaped line.
	ranges << "#{offset}-#{f.pos - 3}" # Save range of this line
  end
  f.puts "]" # Close array
end
 # Write the of offsets
File.open(index, 'w') {|f| f.puts ranges.to_json }

sayings.json is the list of sayings in a JSON formatted array. We format the file as a JSON array so that if the server doesn’t support HTTP Range, we still get back valid data.

sayings-index.json is an array of the byte range of each line in the file. To create it we use File#pos to record the offset in the file before and after we write each line.

What’s with that magic f.pos - 3 in the code? First “f.pos” returns the position it’s going to write the next character to (-1), second, we have the “,” needed to make the JSON valid (-2), and third, we have to account for the line feed at the need of the line (-3). Thus, subtracting 3 from the position moves us back to the closing quote on the string.

The Codeless API

Once you have the files, your page can load the index and then a randomly selected saying:

var sayings;
$.ajax('sayings-index.json', {
    type: "GET",
    dataType: 'json',
    error: function(jqXHR, textStatus, errorThrown) {
        console.log("AJAX Error: " + textStatus);
        console.log(errorThrown);
    },
    success: function(data, textStatus, jqXHR) {
        sayings = data;
        display_saying();
    }
});

display_saying = function() {
    saying = Math.floor(Math.random() * sayings.length);
    $.ajax('sayings.json', {
        type: "GET",
        dataType: 'json',
        cache: false,
        headers: {
            "Range": "bytes=" + sayings[saying]
        },
        error: function(jqXHR, textStatus, errorThrown) {
          console.log("AJAX Error: " + textStatus);
          console.log(errorThrown);
        },
        success: function(data, textStatus, jqXHR) {
	    alert(data);
	}
    });
};

Is it fast?

This is actually a very efficient way to fetch data. It boils down to two very low level syscalls:

lseek(fp,range_start,SEEK_SET)
read(fd, buf, range_end - range_start);

Implementations vary, but if the web server caches the data it can be reduced to an even simpler memory look up. Something like:

memcpy(&response, &cache[range_start], range_end - range_start);

A Familiar Pattern

While the pattern of finding data using a seek offset isn’t something most developers commonly run into, it’s one they take advantage of everyday. Pretty much every database engine there is uses it. In fact you could describe a database as an attempt to find the most efficient way to reduce a query to a seek offset.

Congratulations! You’ve just implement your own database using the HTTP protocol.

Next up, we’ll look at serving something more that just lines of text.

Comments