Previously, I worked through how to get messages from an IMAP server and work with the message headers. Let’s look at extracting data from those messages.

As before we need to connect to the server, authenticate, and select the INBOX:

1
2
3
4
5
6
7
8
9
10
11
#!/usr/bin/env ruby

require 'net/imap'
imap = Net::IMAP.new('mail.example.com', ssl: true)
begin
  imap.authenticate('PLAIN', 'spike', password)
rescue
  abort 'Authentication failed'
end

imap.select('INBOX')

This time we used #select opening the INBOX read/write so we can make changes. Let’s grab the oldest, unread message:

1
2
3
ids = imap.search(["UNSEEN"])
id = ids.first
raw_message = imap.fetch(id,'RFC822').first.attr['RFC822']

(#fetch always returns an Array.)

‘RFC822’ says “request the entire message, in RFC 822 format, as string. As I noted in my last post, the Ruby IMAP library returns some not really useful objects. While it’s possible to extract what we need from them, we’ll get a much cleaner interface if we use the mail ruby Gem.

1
2
require `mail`
message = Mail.read_from_string raw_message

The mail gem gives us a much nicer object:

1
2
message.subject => "Subject"
message.body.to_s => "This is a test.\n\nBob\n"

And really shines is when there are attachments:

1
2
3
message.multipart? => true
message.parts.map { |p| p.content_type } => ['text/plain', 'application/json']
json = JSON.parse(message.parts[1])

Why would we have a message with a JSON attachment? Because we put it there! Since we have the mail gem:

1
2
3
4
5
6
7
8
9
10
11
12
13
require 'mail'
require 'json'

report = { bot: 1138, temp: 42, flux: 10 }
mail = Mail.new do
  from    'bot1138@example.com'
  header['X-Bot-ID'] = '1138'
  to      'bot-status-queue@gmail.com'
  subject 'Hourly update from Bot 1138'
  body    'See attached'
  add_file filename: 'status.json', content: report.to_json
  end
mail.deliver!

There are also no shortage of ways to email attachments from the command line using out of the box tools, which might be more appropriate than getting Ruby running on small device.

So yeah, that’s really the how, not so much the why. Why is that it’s a fairly easy way to create a distributed network of data sources. For example, a large number of small Internet Of Things devices.

Yes, the more common approach would be to create an API and have the devices ping it. Out of the box, that may even be easier. However, this is way less infrastructure. An AWS EC2 free tier instance and a free Gmail account is all it takes. You can manage a huge volume of incoming data with no more than a cron job. The SMTP protocol will handle network outages and server loading.

One last issue we need to deal with. How to we flag message we’ve already processed? If we’re very optimistic, the current code would do. When you open a folder Read/Write (which is what #select does) and you read the body of the message with #fetch (as opposed to just read the header or other meta data), the message is automatically marked as “Seen”. So, any message we read will be removed from the search.

The downside of this approach is that it assumes reading a message is the same as processing it. If our job dies for some reason after reading a message, that message could end up dropped on the floor.

A safer approach is to flag messages that have been successfully processed. The simplest way to do so is to flag the message as “Deleted”:

1
imap.store(id, "+FLAGS", [:Deleted])

And change our search to find all undeleted messages:

1
ids = imap.search(['NOT','DELETED'])

This works well and processed messages can easily purged. However, if you want to keep a record of past messages, you probably want to use a different flag, lest the messages be permanently deleted. In that case, the “Flagged” flag will do the job:

1
imap.uid_store(uid, "+FLAGS", [:Flagged])

(we’ll get at the UID below) and our search becomes:

1
ids = imap.search(['NOT','DELETED','NOT','FLAGGED'])

Putting it all together and our cron job runs something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#!/usr/bin/env ruby

require 'net/imap'
require 'mail'
require 'json'

imap = Net::IMAP.new('mail.example.com', ssl: true)

STDERR.print 'Password: '
password = STDIN.noecho(&:gets).chomp!
puts

begin
  imap.authenticate('PLAIN', ENV['IMAP_USER'], ENV['IMAP_PASSWORD'])
rescue
  abort 'Authentication failed'
end

imap.select('INBOX')

# You might consider flitering on subject as well.

ids = imap.search(['NOT','DELETED','NOT','FLAGGED'])

imap.fetch(ids,['UID','RFC822']).each do |imap_message|
  message = Mail.read_from_string imap_message.attr['RFC822']
  attachment = message.attachments.detect {|a| a.content_type.start_with? 'application/json'}
  next if attachment.nil?
  data = JSON.parse(attachment.body.decoded) # => {"bot"=>1138, "temp"=>42, "flux"=>10}
  # Do something important with the data!

  uid = imap_message.attr['UID']
  imap.uid_store(uid, "+FLAGS", [:Flagged])
end

Even if this isn’t an approach you’d ever need, I hope it gets you thinking about how you can leverage the technologies all around you.

Comments