Web-Scale HTTP Tail

2 minute read

Previously while talking features of the HTTP Range header I wrote:

Last and likely least , you can read the last N bytes of a file by requesting a negative offset: […] Honestly, I’ve never come up with a use case for that.

A friend pointed out a possible use case, “tailing” a file on a server. As you surely know, the UNIX tail command displays the last N lines of a file. We could get this effect by using a negative offset Range request. It might look something like:

require 'uri'
require 'net/http'

n = 10
buffer_size = 4096

uri = URI("http://example.com/access_log")
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.path)
request['Range'] = "bytes=-#{buffer_size}"
response = http.request(request)

abort "Unexpected response code" if response.code != '206'
lines = response.body.split(/\r?\n/) # Handle CR/LF or just LF
lines.shift # Discard first, incomplete line.
lines.last(n)

The data returned will be a string, lines = response.body.split(/\r?\n/) splits it into lines allowing the for lines that are terminated with either CR/LF or just LF.

Because we are reading bytes, not lines, it’s pretty likely that our byte range will start in the middle of a line. For that reason it’s best to discard the first line with lines.shift.

The real tail reads the file backwards in chunks until it gets N lines. We can’t do that, requesting additional chunks would be meaningless, the file might have changed. Instead, we need to read more than we expect to need. lines.last(n) will insure we get up to “n” lines, so if we come up short nothing with break. You can adjust buffer_size as needed.

“tail” is useful, but how about “tail -f”. The real “tail -f” works roughly like:

f = File.open('file')
f.seek(0,IO::SEEK_END) # Jump to the end of the file
previous_size = f.pos
loop do
  if f.size > previous_size
    puts f.read
    previous_size = f.size
  end
end

We can simulate this in HTTP:

require 'uri'
require 'net/http'

def http_size(uri)
  http = Net::HTTP.new(uri.host, uri.port)
  request = Net::HTTP::Head.new(uri.path)
  response = http.request(request)
  response['content-length']
end

url = "http://example.com/access_log"
uri = URI(url)
previous_size = http_size(uri)

loop do
  current_size = http_size(uri)
  if current_size > previous_size
    http = Net::HTTP.new(uri.host,uri.port)
    request = Net::HTTP::Get.new(uri.request_uri)
    request['Range'] = "bytes=#{previous_size}-#{current_size}"
	abort "Unexpected response code" if response.code != '206'
    response = http.request(request)
    puts response.body.split(/\r?\n/) # Hand CR/LF or just LF
    previous_size = current_size
  end
  sleep 5
end

We make a HTTP HEAD request to get the size of the file. If it’s changed, we use the Range header to fetch just the new data.

A couple of caveats for HTTP tail -f:

Because stat()ing doesn’t have a lot of overhead, the real tail doesn’t sleep. The HTTP version needs to so it doesn’t turn into a DoS attack. Based on your needs you might want to adjust that sleep or schedule the request to avoid blocking.

The code presumes it’s getting complete lines, and this is probably reasonable for logs. However, if incomplete lines become a problem, you would want buffer the last line if it doesn’t end with a line-feed and display it as part of the next request

You’d probably want some security and some error handling, but there you have it, tail, two ways, in HTTP.

Comments