Web-Scale HTTP Tail
Previously while talking features of the HTTP Range header I wrote:
Last and likely least , you can read the last N bytes of a file by requesting a negative offset: […] Honestly, I’ve never come up with a use case for that.
A friend pointed out a possible use case, “tailing” a file on a server. As you surely know, the UNIX tail command displays the last N lines of a file. We could get this effect by using a negative offset Range request. It might look something like:
require 'uri'
require 'net/http'
n = 10
buffer_size = 4096
uri = URI("http://example.com/access_log")
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.path)
request['Range'] = "bytes=-#{buffer_size}"
response = http.request(request)
abort "Unexpected response code" if response.code != '206'
lines = response.body.split(/\r?\n/) # Handle CR/LF or just LF
lines.shift # Discard first, incomplete line.
lines.last(n)
The data returned will be a string, lines = response.body.split(/\r?\n/)
splits it into lines allowing the for
lines that are terminated with either CR/LF or just LF.
Because we are reading bytes, not lines, it’s pretty likely that our
byte range will start in the middle of a line. For that reason it’s
best to discard the first line with lines.shift
.
The real tail
reads the file backwards in chunks until it gets N
lines. We can’t do that, requesting additional chunks would be
meaningless, the file might have changed. Instead, we need to read
more than we expect to need. lines.last(n)
will insure we get up to
“n” lines, so if we come up short nothing with break. You can adjust
buffer_size as needed.
“tail” is useful, but how about “tail -f”. The real “tail -f” works roughly like:
f = File.open('file')
f.seek(0,IO::SEEK_END) # Jump to the end of the file
previous_size = f.pos
loop do
if f.size > previous_size
puts f.read
previous_size = f.size
end
end
We can simulate this in HTTP:
require 'uri'
require 'net/http'
def http_size(uri)
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Head.new(uri.path)
response = http.request(request)
response['content-length']
end
url = "http://example.com/access_log"
uri = URI(url)
previous_size = http_size(uri)
loop do
current_size = http_size(uri)
if current_size > previous_size
http = Net::HTTP.new(uri.host,uri.port)
request = Net::HTTP::Get.new(uri.request_uri)
request['Range'] = "bytes=#{previous_size}-#{current_size}"
abort "Unexpected response code" if response.code != '206'
response = http.request(request)
puts response.body.split(/\r?\n/) # Hand CR/LF or just LF
previous_size = current_size
end
sleep 5
end
We make a HTTP HEAD request to get the size of the file. If it’s changed, we use the Range header to fetch just the new data.
A couple of caveats for HTTP tail -f
:
Because stat()ing doesn’t have a lot of overhead, the real tail doesn’t sleep. The HTTP version needs to so it doesn’t turn into a DoS attack. Based on your needs you might want to adjust that sleep or schedule the request to avoid blocking.
The code presumes it’s getting complete lines, and this is probably reasonable for logs. However, if incomplete lines become a problem, you would want buffer the last line if it doesn’t end with a line-feed and display it as part of the next request
You’d probably want some security and some error handling, but there you have it, tail, two ways, in HTTP.
Comments