When writing a private or sensitive document, encryption is always your friend. Whether it be your holiday shopping list, or a secret government document, PDF encryption is sometimes looked upon as the solution. Now, why anybody would ever use PDF is beyond me. All I know is that it’s not uncommon to stumble upon one of these documents and not be able to access their contents.
Before I can explain to you how one might go about cracking a PDF, I have to explain to you how encryption works from a PDF standpoint. When setting a password in any modern PDF viewer, you have two options for encryption. The first is a password required to open the document, also known as the user password. This password is used for encryption. The second password is optional, and, if present, sets permissions that should not be allowed unless the second password is given. The second password, also known as the owner password, is not used in the encryption process, and is just there for a false sense of security. This being said, it is up to the PDF viewer to govern the rules set by the creator of the PDF, as long as the correct user password was given. With the user password, any PDF viewer can read, copy, print, edit, etc. The owner password is just there for show, and is enforced by pretty much every professional PDF viewer.
The PDF encryption process is slightly more complicated than one might think. Normally, something like this might be done by MD5 hashing the password, and using some of the output buffer as an encryption key. In my opinion, this is a perfectly good and reliable method of security, but Adobe clearly doesn’t think so. The encryption key used in a PDF file can be calculated by mashing up some information from the PDF document along with the passphrase entered by the user. This encryption key can then be verified by decrypting a buffer stored in the PDF file. If the output of this buffer is a pre-defined string (in Adobe’s specification), then the encryption key is correct. This is the only easy way to validate a password for a PDF file, given that Adobe uses a simple RC4 encryption algorithm, which allows any key, no matter how invalid, to be used for decryption without error or warning.
So, knowing all of this, I set out to make a PDF cracker. The normal procedure when cracking a password with brute force is something like this: pick a password, check if its correct, if not, repeat. This all seems very simple, but how do we perform the second step? Like I said above, you need a bunch of bits of information from the PDF document (all available in it’s binary contents), and of course the magic, pre-defined Adobe encryption string.
As I discovered, getting data from a PDF file is pretty straight forward. PDF files are split into parts. A part can either be an “Object”, or a “Stream.” For my cracker program, I would have to find the trailer object, read the document ID, then find the encryption object. The encryption object contains all of the pieces of data (other than the document ID) that is needed for generating the encryption key.
The process of actually reading objects from a PDF file is pretty simple, but I will not describe them here. To see how I do it, check out my code for it in PDFReader.c in my cracker program. Anyway, with this information, all that was left to do was actually write the code to test a password.
Generating the encryption key is a multi-step process. You need to pad the user password (passphrase used for encryption) to a 32-byte buffer. You then append the hash of the “owner” password (readily available in the PDF’s contents). Finally, you append a four-byte, little endian permissions number (also found in the PDF file), followed by the document ID. MD5 hashing this gives you a 16-byte buffer, the first five of which being the encryption key. This key is then used to decrypt the “user” hash from the encryption object. If this is equal to a special string defined by Adobe, then the user password and other information was correct.
My PDFCrack program itself is a command-line, UNIX application. It takes either a dictionary file, or reads passwords to test from standard input. If no dictionary source is specified, it generates strings systematically in order, testing each and every string as a potential password.
As I always do, I have posted this code to a GitHub repository. Unfortunately this cracker only works on PDF version 1.3 and 1.4 (exported by Pages ’09 on Mac OS X). It could probably be expanded for other, newer versions of the PDF specification, but I have no real reason to do that myself. If you would like to expand on it, please, fork my repository and get to work. That’s what open source is for, isn’t it?