Index > Course > 2021-02-25: Base64 and Beyond: Malware Encryption Techniques
Quick recap - Malware uses crypto very differently from everyone else. Their keys will be found, so they try to use the fastest crypto available (XOR) to evade detection.
Base64 was made for transmitting email attachments. It converts binary data to a set of 64 printable ASCII characters, since email could originally only handle printable characters.
There are more than one sets of characters, but the most common is [A-Za-z0-9+/]
, with the =
character for padding. The mapping is as follows:
3 bytes of binary maps to 4 bytes of base64 encoding. 6 bits become 1 base64 character.
data: | A | T | T |
8bit: | 0x41 | 0x54 | 0x54 | (hex)
binary: |010000|01|0101|0100|01|010100|
6bit: | 16 | 21 | 17 | 20 | (decimal)
base64: | Q | V | R | U |
If there are any leftover bits, they are padded with zeros to make a new base64 character.
Standard practice is to pad with =
such that the base64 text is a multiple of 4 in length, but this isn’t strictly required.
Malware can be sneaky - instead of the standard base64 dictionary, they may use a modified one:
aABCDEFGHIJKLMNOPQRSTUVWXYZbcdefghijklmnopqrstuvwxyz+/
Moving the lowercase a to the front shifts all the other characters, meaning that a conventional de-encoding would yield a scrambled string.
Analysts will still see the decryption, and will find the key - so AES provides no benefits over something simple like XOR.
The strings
command can reveal a lot about a binary. Libraries with embedded strings can be immediately recognized, like OpenSSL or AES.
Crypto algorithms use constants, or arrays of constants. If these are present in the binary, it likely uses those algorithms. There are tools to scan for these constants, like KANAL.
The IDA Entropy plugin can scan for high-entropy strings. These could be crypto constants, or ciphertexts.
Index > Course > 2021-02-25: Base64 and Beyond: Malware Encryption Techniques