SuperAES
Points: 404
Author: Shadowwws
Writeup granularity: Very detailed
Description: Come try my super AES encryptor
This simple challenge is about breaking a custom stream cipher based on a LCG. I found it very interesting as it highlights an important concept about LCG parameter selection.
I was able to solve it during the competition and even got the 🩸 first blood 🩸 for it !
First look
We are given a single script chall.py
and the address of a remote server. This script provides a stream cipher based on AES-ECB and a Linear congruential generator.
Main cipher
class SuperAES:
def __init__(self,key,lcg):
self.aes = AES.new(key,AES.MODE_ECB)
self.lcg = lcg
def encrypt(self,plaintext):
ciphertext = b""
for i in range(0,len(plaintext),16):
ciphertext += self.encrypt_block(plaintext[i:i+16])
return ciphertext
def encrypt_block(self,block):
keystream = self.aes.encrypt(int(self.lcg.next_state()).to_bytes(16,"big"))
return bytes([k^b for k,b in zip(keystream,block)])
Here, the keystream is generated using the 16-byte output of the LCG, encrypted with AES. The keystream is then XORed with the plaintext to produce the ciphertext.
Now that we have understood the encryption process, we can say that there are no major flaws in this system: This is a stream cipher similar to AES-CTR, but with a Linear Congruential Generator instead of a counter. As long as the counter is not reused, the cipher is secure.
Thus, the weakness must be in the LCG.
Linear Congruential Generator
m = 288493873028852398739253829029106548736
a = int(time.time())
b = a%16
s = random.randint(1,m-1)
class LCG:
def __init__(self, a, b, m, seed):
self.a = a
self.b = b
self.m = m
self.state = seed
self.counter = 0
def next_state(self):
ret = self.state
self.state = (self.a * self.state + self.b) % self.m
return ret
At first glance, this looks like a standard LCG with fixed $m$ and a random seed. However $a$ and $b$ depend on the current time and can be easily predicted.
The vulnerability
When we run the LCG on random values for $a$, we notice that the output becomes constant after some iterations. After a few tries, we find that this occurs for all values of $a$ when $a = 0 \mod 14$.
This is simply because 14 is the product of all prime factors of $m$:
$$\begin{align*} m &= 288493873028852398739253829029106548736 \\ &= 2^{66}\ 7^{22} \\ \end{align*}$$
Since the output of the LCG is constant, the keystream will be constant as well. This means that the same keystream block will be used to encrypt multiple blocks of the plaintext. There are multiple ways to recover the plaintext, the easiest one is to use what we know about the flag format.
assert len(flag) == 33
assert flag.startswith(b"GCC{")
#assert flag.endswith(b"}") (not in the challenge)
The exploit
The exploit part of this challenge is a very common one. We simply have to match the known values of the plaintext to the ciphertext and deduce the keystream bytes at that location. Since the block size (16) does not divide the length of the flag (33), we recover different parts of the repeated keystream block.
def compute_flag(data: bytes)->bytes:
"""
Compute the flag from the given data and known information about it.
"""
LEN_FLAG = 33
LEN_BLOCK = 16
known_flag = b"GCC{" + b"\xFF" * 28 + b"}" # Unknown bytes are 0xFF
known_keystream_block = [None for _ in range(16)]
# Guess the known_keystream_block using the info we know
for i in range(len(data)):
if known_flag[i%LEN_FLAG] != 0xFF:
# Overwrite the previous value, as it might not be in the constant state yet
known_keystream_block[i%LEN_BLOCK] = data[i] ^ known_flag[i%LEN_FLAG]
# Compute the flag
plaintext = bytes([data[i] ^ known_keystream_block[i%LEN_BLOCK] for i in range(len(data))])
return plaintext[-LEN_FLAG:]
Now that we know how to recover the flag when $a = 0 \mod 14$, we can write the full exploit.
Because it is not possible to predict the exact time at which the server will receive our connection, we will have to try multiple values. This is also very common in this type of challenges, as network communication and the server time are not always predictable.
import socket
import time
while True: # Break when we find the flag
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("challenges1.gcc-ctf.com", 4001))
data_recv = s.recv(1024)
s.send(b"49\n")
data_recv = s.recv(10000)
data = bytes.fromhex(data_recv.decode())
candidate = compute_flag(data)
if candidate.startswith(b"GCC{"):
print(candidate)
break
time.sleep(.95) # To avoid missing a value, send more
Luckily, 14 is quite small and we can try all possible values in reasonable time. Imagine having to wait two minutes for your exploit to complete, yikes.
A summary of the code and some test data is available in the solve.ipynb notebook.
Going Further
After the competition, I really wanted to understand why the output of the LCG becomes constant when the product of the prime factors of $m$ (here 14) divides $a$.
$$m = 2^{66}\ 7^{22} = 56^{22}$$
b = 0
To explain this, we will first understand why this works for $a = 0 \mod 112$, which is used the indended solution. In this simple case, we have both
$$\begin{cases} a = 0 \mod 14 \\ a = 0 \mod 16 \\ \end{cases}$$
Since $a = 0 \mod 16$, we have $b = 0$ and the LCG becomes:
$$\begin{align*} x_{i+1} &= a \cdot x_i \mod m \ \ \ \ (1)\\ &= 56 \cdot 2 \cdot x_i \mod m \\ \end{align*}$$
This means that $x_{22} = 56 ^{22} \cdot 2^{22} \cdot x_0 = m \cdot 2^{22} \cdot x_0 = 0 \mod m$. And because of (1), all the following values will be 0 as well.
General case
However, we will show that it is not necessary to have $b = 0$ for this to happen. The only requirement for the LCG to become constant is that $a$ is a multiple of 14. From the regular LCG formula:
$$\begin{align*} x_{i+1} &= a \cdot x_i + b \mod m \\ \end{align*}$$
We can show recursively that for any $i>0$:
$$\begin{align*} x_{i} &= a^{i} \cdot x_0 + \sum_{j=0}^{i-1} a^{j} \cdot b \mod m \\ \end{align*}$$
When $a = 0 \mod 14$, we have $a = 2 \cdot 7 \cdot k$ for some integer $k$.
Similarly to the previous case, when $i>66$, $a^{i} = 2^{i} \cdot 7^{i} \cdot k^{i} = m \cdot K = 0 \mod m$. This means that the first term becomes 0 as it is a multiple of $m = 2^{66}\ 7^{22}$. Since it was the only term depending on $x_0$, the output becomes predictable:
$$\begin{align*} x_{i} &= \sum_{j=0}^{i-1} a^{j} \cdot b \mod m \\ &= b \cdot \sum_{j=0}^{i-1} a^{j} \mod m \\ \end{align*}$$
In addition, the next step will not change the value, as the new first term will also be a multiple of $m$. This is why the output becomes constant.
This constant value can also be written as $x_{i} = -b \cdot (a-1)^{-1} \mod m$ when $a-1$ is invertible modulo $m$.