Cryptanalysis is an important component of the process of creating strong cryptosystems. A reliance on “security via obscurity” (violating Kerckhoff’s Law) can result in the use of weak cryptosystems if the creators did not consider all possible attack vectors. Instead, the cryptographic algorithms in common use today have been published for cryptanalytic review. The ones currently considered “trusted” and in common use are the ones for which an effective attack has not yet been discovered.
Simple cryptanalytic techniques
Modern cryptographic algorithms are designed to be resistant against all known cryptanalytic techniques. However, a few simple techniques can be useful for evaluating the security (and potentially breaking) older or amateur cryptosystems.
Entropy calculations
Entropy is the measure of the amount of randomness that exists within a system. A strong cryptographic algorithm should produce a ciphertext with very high randomness, indicating that there is little or no useful information linking the ciphertext to the original plaintext or secret key. This makes entropy testing a useful tool for identification of encrypted data. While entropy can be calculated manually, tools like Binwalk and radare2 have built-in entropy testers that can be used to identify encrypted data within a file. After encrypted data has been identified, other features can be used to help identify the encryption algorithms used. Some examples of useful information include:
Ciphertext and block length Function names
If the encryption algorithm can be identified, it is possible to determine if it is a broken algorithm. Alternatively, knowledge of the algorithm can help in the search for an encryption key within a file.
Character frequency analysis
Unlike a good ciphertext, modern languages are anything but random. With sufficient knowledge of a language, it is often possible to guess which letter comes next after a given series. For example, in the English language, which letter almost always comes after the letter Q? The lack of randomness in language is useful for cryptanalysis because it can make it easy to break weak ciphers. Character frequency analysis can easily break substitution and rotational ciphers. The graph above shows the relative frequencies of letters in the English language. As shown, some letters (such as E, T and A) are much more common than others (such as Z, Q and J). This is useful for analysis of substitution and rotation ciphers since the most common letter in the ciphertext is likely to map to E, the second most common is likely to map to T, etcetera, as long as the ciphertext is long enough. With a rotational cipher, a single correct match is enough to determine the step size and decrypt the message. With a substitution cipher, every pairing must be determined; however, knowledge of a few letters within a word makes it possible to guess the remainder.
Encoding vs. encryption
Encoding and encryption are both techniques for data obfuscation. However, their implementation and effects are very different. Encryption requires a secret key for encryption and decryption. Without knowledge of this secret key, the plaintext cannot be retrieved from the ciphertext. Encoding algorithms apply a reversible operation to data without using a secret key. This means that anyone with knowledge of the encoding algorithm can reverse it. Encoding algorithms are commonly used in malware as a simple replacement for encryption. However, they are easily reversed if the encoding algorithm can be identified.
Base64 encoding
Base64 encoding is an encoding technique designed to make it possible to send any type of data over protocols limited to alphanumeric characters and symbols. This is accomplished by mapping sequences of three bytes to sets of four characters. This mapping makes it possible to assign a sequence of six bits (four sets of six characters is twenty-four bits, which is the length of three bytes) to one of sixty-four printable characters, as shown in the table above. The base64 system uses padding so that an input that is not exactly a multiple of three bytes in length will result in an encoded version with one or two equal signs (=) at the end. The combination of the base64 character set and these option equal signs make this encoding style relatively easy to identify. Base 64 encoding is used to make unprintable data printable, so a common use is to encode encrypted data. However, in some cases, encoding is used instead of encryption, making it easily reversible.
URL encoding
URL encoding is another example of an encoding style designed to allow data to be passed in a protocol with a constrained character set. In this case, URL encoding is intended to allow characters that are reserved in URLs, such as ? and /, to be included in a domain name or other parts of the URL. As shown above, URL encoding uses a percent sign (%) followed by the ASCII representation of a value to replace that value. This eliminates the reserved character from the URL but enables it to be easily retrieved when needed. URL encoding is intended to enable the use of reserved characters in URLs. However, it is commonly abused in injection attacks or as a simple layer of obfuscation since it defeats simple string matching.
Getting started with cryptanalysis
Most modern encryption algorithms are secure against known attacks, and many of the “broken” ones require knowledge of advanced mathematics to understand the attacks. However, many older encryption and encoding algorithms can be easily broken with simple techniques. This is useful because many malware variants use these weaker forms of encryption. An understanding of basic cryptanalytic concepts and techniques can be very valuable in cybersecurity.
Sources
English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU, norvig.com Encoding and Decoding Base64 Strings in Python, Stack Abuse URL Encoding, chrisrng.svbtle.com