Encrypting Lots of Sensitive Data With Ruby (on Rails)
Previously I wrote about how to use public key encryption to automatically encrypt data using Ruby (and thus Rails). Because this method can encrypt data without a password, it’s very useful for securing information received from a form, without the person entering the from having to do anything special. However public key encryption has limits; the amount of data you can encrypt with this method is limited by the key size you use, and, that after a point increasing the key size isn’t a practical an option. We solve this problem using a combination of public key encryption and symmetrickey encryption.
First a little theory; and when I say “a little” I mean it. Under the hood encryption is a headache inducing branch of mathematics. If you want to the literal truth, there is plenty of good reading out there.
Symmetrickey cryptography is what people tend to think of when, and if, they think of encryption. A password is used to encrypt some information, and that same password must be entered to retrieve the information. Under the hood is an algorithm or cipher, which simply put, is a mathematical function that transforms the data into something obscure and then back again. For our purpose we need a block ciphers. A block cipher takes a small, typically 128 or 256 bit, chunk of data and encrypts it. There are many, but in this example we’ll use the Advanced Encryption Standard which is the de facto standard.
(There also exist stream ciphers which work on streams of data, encrypting a phone call for example, but that trade security for speed and can be difficult to use correctly.)
We also need a little glue. Because block ciphers operate on small chunks of data, they need to be applied again and again. However give the same input data the cipher will always produce the same encrypted output; any redundancies in the input will be exposed as redundancies in the output and make in vulnerable to a number of attacks. To avoid this we use a mode of operation called Cipher Block Chaining. CBC using data from one block to further obfuscate the data in the next, effectively hiding any redundancies.
While simple and secure, using symmetrickey cryptography can be problematic; everyone needs to know the password to encrypt data and everyone who has the password (or as the pros say, shared secret key) can decrypt data. This works well if you small set of people who need to know the password and a secure way to distribute it (over drinks in a dark corner of a seedy bar is poetic, if not necessarily secure) but in the case of a web site with hundreds or thousands of people entering data, it’s not practical.
Publickey cryptography can be thought of as Symmetrickey encryption with two passwords, or keys, called a key pair. One, the public key, that encrypts data and another, the private key that decrypts. Because the public key can not be used to decrypt data it encrypts it can be safely given out or installed on a web site, allowing anyone to encrypt data to be sent to the owner of the key. The private key is kept safe and is typically symmetric encrypted with an additional password
The solution is actually quite simple. We generate a random password and use that for the symmetrickey encryption. We then encrypt the random password using the public key, and store both the encrypted password, and encrypted data. When we need to get at the data, we use the private key, and its password to decrypt the random password which, in turn, is used to decrypt the data.
Well, it’s almost that simple. In order to randomize the data, the CBC glue requires a Initialization vector (IV). This article has a good explanation of why, but for our purposes we can just think of it as a second random password we need to encrypt and save.
OK, enough talk, let’s encrypt some text:
1 2 3 4 5 6 7 8 9 10 11 12 

At this point we could just save encrypted_data in the database and it would be well protected. So well in fact that we couldn’t get it back. To do that we’re going to need to save the random password and IV.
Generate a key pair. Be sure to choose a good password as this is the one that will decrypt everything.
1 2 3 4 5 6 7 

Now extract the public key:
1 2 3 

See my previous article for more details on what we’re doing here.
Now we can use the public key to encrypt the random key and IV:
1 2 3 4 5 6 

Now if we store all three pieces, encrypted_key, encrypted_iv, and encrypted_data we have successfully encrypted our original data.
Of course we’ll want to get that data back, and to do so we reverse the process:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 

password is the password you used when generating the keypair.
Now let’s put it all together in an Active Record model:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 

When creating or updating your model you don’t have to do anything, if “plain_data” is present it will be automatically encrypted. When you want to view the plain text you call “@record.decrypt_sensitive(‘passwd’)”; that could be done with a little AJAX that prompts for a password and populates the “plain_data” field. The encrypted data is only updated when “plain_data” is present. This done so that the record can be updated without decrypting (and reencrypting) encrypted data (handy in an application were not everyone has access to the sensitive data). To actually clear the encrypted data call “@record.clear_sensitive” and then save.
Setting up the APP_CONFIG hash is left as an exercise for the reader.
Clearly, this screams for a plugin; watch this space.