Previously, I worked through how to get messages from an IMAP server and work with the message headers. Let’s look at extracting data from those messages.
As before we need to connect to the server, authenticate, and select the INBOX:
1 2 3 4 5 6 7 8 9 10 11
This time we used
#select opening the INBOX read/write so we can
make changes. Let’s grab the oldest, unread message:
1 2 3
#fetch always returns an Array.)
‘RFC822’ says “request the entire message, in RFC 822 format, as string. As I noted in my last post, the Ruby IMAP library returns some not really useful objects. While it’s possible to extract what we need from them, we’ll get a much cleaner interface if we use the mail ruby Gem.
The mail gem gives us a much nicer object:
And really shines is when there are attachments:
1 2 3
Why would we have a message with a JSON attachment? Because we put it there! Since we have the mail gem:
1 2 3 4 5 6 7 8 9 10 11 12 13
There are also no shortage of ways to email attachments from the command line using out of the box tools, which might be more appropriate than getting Ruby running on small device.
So yeah, that’s really the how, not so much the why. Why is that it’s a fairly easy way to create a distributed network of data sources. For example, a large number of small Internet Of Things devices.
Yes, the more common approach would be to create an API and have the devices ping it. Out of the box, that may even be easier. However, this is way less infrastructure. An AWS EC2 free tier instance and a free Gmail account is all it takes. You can manage a huge volume of incoming data with no more than a cron job. The SMTP protocol will handle network outages and server loading.
One last issue we need to deal with. How to we flag message we’ve
already processed? If we’re very optimistic, the current code would
do. When you open a folder Read/Write (which is what
and you read the body of the message with
#fetch (as opposed to just
read the header or other meta data), the message is automatically
marked as “Seen”. So, any message we read will be removed from the
The downside of this approach is that it assumes reading a message is the same as processing it. If our job dies for some reason after reading a message, that message could end up dropped on the floor.
A safer approach is to flag messages that have been successfully processed. The simplest way to do so is to flag the message as “Deleted”:
And change our search to find all undeleted messages:
This works well and processed messages can easily purged. However, if you want to keep a record of past messages, you probably want to use a different flag, lest the messages be permanently deleted. In that case, the “Flagged” flag will do the job:
(we’ll get at the UID below) and our search becomes:
Putting it all together and our cron job runs something like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
Even if this isn’t an approach you’d ever need, I hope it gets you thinking about how you can leverage the technologies all around you.