Challenge 8: Cryptanalysis

Introduction


Cryptography is a term that is used to describe techniques to protect information from being seen or altered by unauthorized parties. It is a key element to ensure secure communication in today's Internet, and it is used in many widespread protocols such as SSL or SSH.

Cryptanalysis describes techniques to break cryptographic schemes and recover encrypted data without the knowledge of the key. Similar to the diversity of cryptographic schemes, different cryptanalysis approaches have been developed. These techniques either attempt to attack weaknesses of the used encryption approach or resort to a brute-force method in which a large number of possible key values are tried. This challenge aims to give the student some "hands-on" experience with cryptanalysis.

Detailed Description


The key idea of this challenge is to familiarize yourself with simple cryptanalysis techniques. You will analyze a shared key cipher and will launch a brute-force attack on the UNIX crypt() password encryption function. The challenge is divided into two tasks:

Task 1: Your first task is to cryptanalyze a poly-alphabetic substitution cipher. To do so, make sure you have your customized ciphertext file crypt.txt, which is located in your home directory on bandit. Each character (byte) of the plaintext file has been XORed with an encryption key, and it is displayed as a 2-digit hexadecimal value in the ciphertext file. For your convenience and for easier reading, we have formatted the ciphertext with spaces, and it is being displayed in 8 columns. The plaintext consists only of lowercase ASCII characters, uppercase ASCII characters, and hyphen characters (-). Note that this means that all space and punctuation characters (such as dot, comma, or exclamation mark) except hyphens are removed. This is a standard procedure with substitution ciphers to make cryptanalysis a little harder as obvious word boundaries are not present anymore. The key is multiple bytes long, and it is your task to determine how long exactly. The ciphertext is obtained from the plaintext by combining the plaintext with the key using the XOR operation. Because the key is only a certain number of bytes long, it has to be repeated to encrypt an arbitrary long plaintext.

The first task is successfully solved when you have recovered the meaningful plaintext that corresponds to your ciphertext. As part of this, you (obviously) have to (a) determine the length of the key and (b) the bytes that make up the key.

Task 2: The library function crypt() on UNIX systems is the password encryption function. It is based on the Data Encryption Standard (DES) algorithm. The crypt() function accepts two parameters: The salt is a two-character string that is used to perturb the algorithm in one of 4096 different ways. The second parameter, key, is the password that the user has typed in.

Classic UNIX systems encrypt the plaintext passwords of users using the crypt() function and store these encrypted passwords in the password file (obviously, it would not be wise to store passwords in plaintext!). Whenever a user would like to log in, the password he/she types at the login prompt is encrypted using crypt() and the result is compared against the encrypted password in the password file. If the encrypted passwords match, then access is granted.

The following is a standard example entry in a UNIX password file:

  user:FundbSsM8Hkbo:105:100:Mr. X:/home/user:/bin/bash
   

In this case, the username user has an encrypted password FundbSsM8Hkbo. IMPORTANT: The first two characters in this password entry are the salt as described above. In this case, the salt for the password is Fu, and the password has been generated by invoking the crypt() function as crypt("Fu","pi8ck") (i.e., the password of the user is pi8ck). Also, remember that standard UNIX passwords are only 8 characters in size. If a password is entered that is larger than 8 characters, the rest of the string is truncated. For example, if the password being entered is HelloWorldHowAreYou, the actual password that is used for crypt() is HelloWor.

Your second task is to write a program in Java called Cr4ck.java that launches a dictionary-based brute force attack on all the encrypted passwords contained in a standard UNIX password file (i.e., very similar to the password cracker John the Ripper). Your program should output to <stdout> all cracked username/password pairs that it successfully broke, and then terminate.

After the brute-force attack is completed and all possible password combinations have been tried, your program should exit with the exit code 0 (after printing all usernames and passwords to <stdout>). In case of errors (e.g., wrong number of arguments, file not found, etc.), your program should exit with the exit code 1.

The synopsis for Cr4ck is as follows:

  java -cp .:<path to jcrypt> Cr4ck <password file name> <dictionary file name>
    

...where <password file name> is the name (i.e., path and name) of the UNIX password file from which your password cracker reads the encrypted passwords and usernames, and <dictionary file name> is the name (i.e., path and name) of the dictionary file that contains a list of words that should be used during the brute-force attack (see wordlist.txt for an example dictionary file). Also, do not forget to set your CLASSPATH if necessary (see the makefile and the discussion about the jcrypt library below).

To make life easier for you and to enable you to implement exhaustive searches that will not last hours/days, you can assume that the password of a user consists only of a word in the dictionary file and at most two additional special characters. For this challenge, we define a special character to be either a number (i.e., a digit [0-9]) or one of the four characters %,#,&,!

In other words, whenever you are trying a word in the dictionary file, you need to generate all "valid" password combinations of this word. For example, for a word test, the following are only some examples of the many possible password combinations according to our definition:

  test, test1, test2, te3st, te1!st, test#%, 12test, !te5st, t0est&,  etc. etc.
    

Once your program is started, it should first parse the password file and create a list of usernames and encrypted passwords. Then, for each word in the dictionary, it should generate all possible password combinations (as described above) and encrypt each generated password using jcrypt. Then, obviously, the encrypted form of each generated password is compared against the encrypted passwords in the password file. If there is a match, a password has been cracked. In this case, the username and the corresponding cracked password has to be printed to <stdout>, each on a single line in the form:

  <username>:<password>
    

As you can see, the username and the password should be separated by the ":" character. Note that the line order in which you print the username/password data is not important.

Here is a UNIX password file that you can use to test your application (of course, you can and should create your own password files at home for testing): passwd.txt. And here is a dictionary file that should allow you to crack all the passwords in that password file: wordlist.txt.

The password file and the dictionary file we have given you should deliver the following output (lines can be in any order):

  aotj:1ope#nde
  lobgot:9field1
  wapl:net1work
  sysadm:te%st
  testuser:test
  loser:pi8ck
  gehrer:sparen!!
  kartalkaya:v!isito7
  cuuniv:7&operat
  mastad:save7%
    

How fast the passwords are cracked will depend on the efficiency of your implementation. However, if it is taking your program longer than a couple of minutes to crack the passwords, then maybe you should think about how your implementation can be optimized. Please also make sure that your program terminates. If your application goes into an endless loop, your submission will be blocked after a timeout. You should not be wasting resources. The same is also true on bandit.

Here is the Makefile that we will use to compile your program.

Make sure that your program runs in the lab environment before you submit. Also, note that we use JDK Version 1.5.0 so your program needs to work with that version.

Note: Of course, we do not want you to re-implement crypt. Thus, for this challenge, we recommend that you use a Java implementation of crypt(), which is called jcrypt. On the lab machines, jcrypt is already installed for you under /opt/jcrypt/. This program/library is very simple to use. Just check the source code at /opt/jcrypt/jcrypt.java, and you can easily determine the method you need to call.

Hints


  • The ciphertext in the first challenge contains a battle report from Lord of the Rings. That is, you will find a long list of sentences that describe military events that are set in the Lord of the Rings universe. As previously mentioned, the text contains only lowercase and uppercase ASCII letters as well as hyphens.
  • In order to crack the XOR key, you need to do frequency analysis. Try to look for the characters that are frequent in the English language. You can use UNIX tools such as sort, awk, cat, uniq to extract, sort and display information from the encrypted text for further analysis.
  • As mentioned previously, we will use Java 1.5.0 to test your program (this version of Java is installed on the lab machines).

Deliverables


To submit your challenge solution to us, you need to follow these steps:

  1. Cryptanalyze the ciphertext that you received for the first task and determine the XOR key. Write this key onto the first line of a file called key.txt. The bytes of the key should be written as a single hexadecimal number with lowercase letters a to f. For example, assume that the key is three bytes long. In this case, it consists of exactly six hexadecimal digits taken from the set [0-9a-f]. An example would be 1d2a32.
  2. Develop your brute-force UNIX password cracker program in Java, writing the code into the Cr4ck class in a file called Cr4ck.java. Make sure that your Java program compiles and runs in the lab environment.
  3. In the directory where your files key.txt and Cr4ck.java are located, call /usr/local/bin/submit8
  4. Read any error or success messages. Then, wait a couple of minutes and read your e-mails on bandit to view the results of the automatic grading program.

Administrative Information and Deadline


This is an individual project. The project is due on Thursday, 09.06.2011, 23:59:59 PST.