Thursday, November 4, 2010

RC4 for jackasses like me

The RC4 stream cipher is the simplest thing ever, but all the articles I found assumed that you had some knowledge of crypto. So what I'm trying to provide here is a beeline to a rough conceptual understanding of RC4 and stream ciphers in general, without any prerequisite knowledge of cryptography.

Got it? Good. And since you understood...1

The purpose of a cipher to provide an opaque pipe between two parties so that you can send messages without anyone else understanding those messages except the other party, even if third parties intercept those messages.

A cipher needs two components:

  1. An encryption algorithm that translates your message (plaintext/cleartext) into gibberish (ciphertext.)

  2. A decryption algorithm that translates that gibberish into the original message.

Say I'm White Power Mike and I want to send a secret message to Aryan Jim, without allowing it to be intercepted by the horde of foreigners. Being a racist, White Power Mike doesn't do the logic thing very well and uses a decoder ring. This decoder ring maps a letter in the original message to a letter in the coded message. The message is decrypted by doing the inverse.

Unfortunately, the horde includes some number of excellent cryptoanalysts. They use frequency analysis (finding how frequently every letter in the ciphertext occurs and comparing it to the average frequency of the letter in various English texts) to figure out what White Power Mike is saying.

Que lastima, bitches.

There are two big problems with White Power Mike's cipher implementation:

  1. Securely achieving a mutual understanding of the decoding/encoding scheme is difficult.

  2. The plaintext and ciphertext have a static bijective mapping between them, which makes the cipher vulnerable to all sorts of cryptoanalysis.

Modern ciphers, like RC4, try to fix these problems.

RC4 is a synchronous stream cipher. You have a pseudo-random keystream that you combine with your message bit by bit (or byte by byte since text messages are in byte-sized chunks) using a symmetric operation (almost always xor.) Your encryption keystream should always be synchronized with the other party's decryption keystream and vice versa. That means if you consume a bit from the keystream to encrypt a message, the receiving party must consume a bit to decrypt the message. If any message is lost, the whole cipher is toast.

Let's go through an encryption/decryption cycle with a hypothetical synchronous stream cipher.

Message:
Curse those handsome foreign devils!

Bytes:2
0x43 0x75 0x72 0x73 0x65 0x20 0x74 0x68 0x6f 0x73 0x65 0x20 0x68 0x61 0x6e 0x64 0x73 0x6f 0x6d 0x65 0x20 0x66 0x6f 0x72 0x65 0x69 0x67 0x6e 0x20 0x64 0x65 0x76 0x69 0x6c 0x73

To encrypt \C, White Power Mike asks his encryption key-stream function for a byte. Let's say he gets 0x20. 0x43 xor 0x20 is 0x63.

Then he'll go and encrypt \u. Suppose his key-stream returns 0x51. 0x75 xor 0x51 is 0x24.

Eventually, White Power Mike will have an encrypted message that he'll pass along to Aryan Jim.

Aryan Jim gets an encrypted message from White Power Mike:

0x63 0x75 .......

He'll ask for a byte from his keystream, which should return 0x20 because it's synchronized. 0x63 xor 0x20 is 0x43, 0x75 xor 0x51 is 0x24 and so on.

Suppose White Power Mike sends another message and Aryan Jim misses it. Their keystreams are no longer synchronized, which means that future messages from White Power Mike will not be decrypted properly.

Because the keystreams must be synchronized, the generation of new bits can't be fully random. You will have to maintain a state array of bytes and generate a keystream from this state. To make things extra crazy I mean secure, you will also want to mutate the state array deterministically every time you make a new key byte. Therefore, your keystream function will have two functions:

  1. To generate keys based on the key state

  2. To mutate the keystream state

RC4 is a specification that defines how you initialize the keystream state, how key bytes are generated from the keystream state, and how the keystream state is mutated for every key byte generated. I won't go into the implementation details here because that's what Wikipedia is for.

Hope this helps clear up any confusion about RC4.




1 Would you stop scheming and looking hard? I got a great big bodyguard.
2 Assuming ASCII... obviously, racists can't unicode. Also, here's the Clojure function I used: (fn [text] (map #(Integer/toHexString (int %)) text)). If Python was the new Perl, Clojure is the new Python.

No comments: