Robots, Oracles and Protocols; Breaking Cryptography Through Information Leakage

The field of Cryptography is amongst the most important in all of academia, while also being the most difficult to get right. Being trusted to safeguard applications ranging from cellular communications to digital banking and internet access, this area of research is highly trusted by the masses while paradoxically being subjected to the largest amount of scrutiny and seeing major vulnerabilities arise on a regular basis. According to Schneier’s law, “any person can invent a security system so clever that she or he can’t think of how to break it”. As a result, those who claim to have an unbreakable cryptosystem simply because they can’t break it are either geniuses or fools, with the latter being far more common.

Many cryptographic algorithms thought to be secure may turn out to have numerous weaknesses due to some simple oversight. Adaptive-chosen-ciphertext attacks are a good example of this as an attacker can use them to completely break a cipher, only having been given a small bit of information that may seem completely innocuous. To show how this is possible I will be going over two theoretical forms of this vulnerability to build up an understanding, before explaining the vulnerability behind the recent ‘ROBOT Attack’ which rendered websites such as Facebook and Paypal vulnerable.

(Note that this article will assume a basic understanding of the RSA cryptosystem. If you are not familiar, my previous article explains it in more depth).

Parity Oracles

In cryptography, an oracle vulnerability is a form of side-channel attack where an affected device or server leaking small amounts of information about a plaintext message can be used to fully decrypt a ciphertext.

A Parity oracle (also known as an LSB oracle) is the simplest form of RSA oracle which takes a ciphertext, decrypts it, and only returns the single least-significant bit of the plaintext, which is analogous to just revealing whether the plaintext is odd or even. This situation may seem contrived but as you will see, the techniques introduced here will useful in other scenarios.

More formally, an oracle O that takes a ciphertext C as input can be represented as:

Or in python as:

def oracle(c):
    return int(bin(pow(c,d,n))[-1])

Strangely, just knowing whether a ciphertext’s decryption is even or odd is enough information to fully recover the plaintext. The secret to how this is achieved lies within the homomorphic properties of RSA.

RSA is partially homomorphic over multiplication, meaning that given two or more ciphertexts, it is possible to multiply them together in such a way that the decryption of the resulting ciphertext is equal to the multiplication of each of the corresponding plaintexts.

For example, if you have a ciphertext C and you would like to double the value of its corresponding plaintext M, You could create a modified ciphertext C'=2^e*C , and the decryption of C' would be 2M.

Using this trick, we can now get the oracle to perform decryption on modified ciphertexts to obtain more information about the plaintext.

From our knowledge of RSA, we know that the message cannot be greater than the modulus. As such, we know that the plaintext M lies in the range 0 < M < N. Although we know the range, brute forcing all possible messages within this range is near impossible. However, by determining the parity of modified ciphertexts, we can decrypt the message using an adaptive chosen ciphertext attack to restrict this range.

If we use the above trick to send 2M to the decryption oracle, we can determine a lot about M‘s range by knowing whether it is odd or even.

By definition, 2M will always be an even number. However, as the modulus N is the product of two odd (prime) numbers, subtracting N from 2M will always result in an odd number.

As such, there are two possible scenarios depending on whether 2M is odd or even.

Using this method, we have managed to divide the search space in half just by knowing one bit of information about the plaintext. However, we can keep going by creating submitting more modified ciphertexts. By then creating the encryption of 4M by submitting C*4^e, we are presented with four possible scenarios:

Keep in mind here that when 2M is odd, the value was reduced modulo N, and so 4M must be represented as 2(2m-n).

You may notice that every step is reducing the search interval by half. This means that by multiplying our ciphertext by successive powers of two, we can perform a binary search for the plaintext and fully recover it in log₂(n) steps.

The algorithm find the plaintext using this method can be represented in python as follows:

upperBound = n
lowerBound = 0
for i in range(n.bit_length()):
    chosenCt = ciphertext*pow(2**i, e, n) % n
    lsb = oracle(chosenCt)
    if lsb == 0:
        upperBound= (upperBound + lowerBound) // 2
    elif lsb == 1:
        lowerBound = (lowerBound + lowerBound) // 2

The speed of this program depends a lot on your setup, but cracking 1024-bit RSA with an oracle hosted on a local Flask server took around 10 seconds on my computer.

Now that we’ve covered a rather simple type of RSA oracle, I’ll now cover a slightly more complicated vulnerability that requires a less specific oracle.

Middle-Bit Oracles

In the previous example, I may have made these attacks seem overly simplistic by stating that you can fully decrypt a ciphertext just by having a single bit of the plaintext leaked. The method I’ve covered only works when the least significant bit is leaked as this reveals the parity of the plaintext, which can reveal lots of valuable information. If instead the oracle were to leak bits from the middle of the plaintext, we would have to take a more sophisticated approach. To demonstrate how this could be done, I will walk through the exploitation of an oracle that instead leaks two consecutive bits from the middle of the plaintext. (This form of oracle appeared in the 2016 Tokyo Westerns CTF.)

In this situation, we are given access to an oracle which will decrypt a ciphertext of our choice with a private key and return two consecutive bits from the middle of the plaintext. This oracle could be represented in python as:

def oracle(c):
    m = pow(c, d, n)
    return (m >> (n.bit_length()//2)) & 0b11

These bits alone do not reveal much to us. However, if we can somehow use these bits to determine whether a given plaintext is even or odd, we can decrypt our ciphertext using a similar approach to the one we used before. Once again, we can achieve this by utilising RSA’s multiplicatively homomorphic properties.

Recall that given a ciphertext C with a corresponding plaintext M, it is possible to create a modified ciphertext C'=d^e*M such that its decryption M'=d*m. It is also possible to use this method to effectively divide the plaintext by calculating d‘s modular multiplicative inverse.

Multiplying a plaintext M by the modular multiplicative inverse of 2 will do one of two things. If M is even, it is simply transformed into M/2. If M is odd however, it is transformed into (M+N)/2 .

In the case that M is even, dividing M by 2 is equivalent to shifting it right by one bit. Therefore, if we query the oracle O twice to obtain the middle bits of both D(c) and D(2^-e*C), we can get four bits a, b, c, d:

If M is even, the bits from the second oracle will overlap with the bits from the first oracle such that a=d, meaning that if LSB(M)=0, a must equal d. However, if LSB(M)=1, the second oracle will transform M in an unpredictable way, and so we can assume that there is no predictable correlation between (a,b) and (c,d). Therefore, we can construct an LSB oracle as such:

Keep in mind that this oracle is probabilistic as you cannot guarantee that LSB(M)=0 if a=d, and it is still possible for a to equal d if LSB(M)=1. Now using a similar — but more sophisticated — approach, we can narrow the possible intervals for M using RSA’s homomorphic properties.

For the sake of simplifying the following basic number theory, we’ll make the assumption thatLSB(M)=1. If this is not the case, you just first need to find M'=r*M such that LSB(r*M)=1, which can be divided out after the message is decrypted.

The first step is to multiply C with increasing values of d^e until a value of d is found such that it can be guaranteed that d*M mod N is odd.

knowing d, we can now rewrite this information in a more useful way. As performing a modulo reduction over N is equivalent to subtracting N from a number until it is in the range [0,N], we can denote the above as:

Here, we know that both M and N are odd. Additionally, we know that the difference of two numbers is odd if and only if those two numbers have opposing parities. Therefore, we can write:

Using the maths we’ve covered in the first part, we know that given d, for a valid value of r, M lies within the interval:

As we know that r< d, and r and d have opposite parities, knowing a valid d, we can find a list of possible intervals that M may lie within. For example:

Now, this does not narrow the interval very much. Our message still only resides inside one of the possible intervals, and both are far too large to manually search. But we can still narrow the interval by multiplying d by some value until a new LSB can be confirmed, creating a new set of possible intervals. From this, we know that the message must reside in the intersection of both sets of intervals. For example:

Through doing this we have reduced the possible candidates for M by a factor of four by knowing the least significant bits of two modified ciphertexts, but there is one other important optimisation that we can make. Whenever a new valid value of d is found such that LSB(dM mod N)=1, we can use the previously determined intervals to narrow the possible values of r. With a given value of d, for each possible interval:

Using all of the above, we can now begin to construct an algorithm to narrow narrow the interval until it feasibly possible to manually search all of the possible message space. We can implement this algorithm in python very simply, using libnum.ranges to conveniently keep track of the various intervals.

intervals = Ranges((0, n - 1))
d = 2
while intervals.len > 100:
    d %= n
    c_prime = (pow(d, e, n) * C) % n
    if lsb(c_prime) != 1:
        d += 1
    current = Ranges()
    for a, b in intervals._segments:
        ra = a * d // n
        rb = b * d // n
        if ra % 2 == d % 2:
            ra+= 1
        r = ra
        while r <= rb:
            a = r * n // d
            b = (r + 1) * n // d
            current = current | Ranges((l, r))
            r += 2
    intervals = intervals & current
    d *= 2

After the message space has been sufficiently reduced, recovering the message from the ciphertext is trivial.

for i in intervals:
    if pow(i, e, n) == c:
        print('m =', i)

Now that we’ve managed to fully decrypt a ciphertext from a small amount of information leakage, we can use what we’ve covered to perform a more sophisticated and practical attack.

Bleichenbacher’s Attack

The vulnerabilities that I have described up until now may seem overly contrived and specific to the point that you would not expect to see them appear in any modern applications, and you would be right to think that. These forms of weaknessess were long considered to only be of theoretical concern up until 1998 when Daniel Bleichenbacher demonstrated such an RSA vulnerability that affected thousands of web servers using the SSL protocol at the time. Additionally, while this is an old vulnerability, poorly deployed countermeasures and modern research mean that this attack still surfaces from time to time, most recently and notably in the ‘ROBOT Attack’ (Return Of Bleichenbacher’s Oracle Threat) which was discovered just in 2017 and found to effect websites such as Facebook and Paypal.

You may have noticed that textbook RSA as we have been using has many inherent flaws. Along with being completely deterministic, as we have covered, its homomorphic properties make it trivial to predictively alter a ciphertext. To protect against these weaknesses, practical RSA uses padding to extend and add randomness to plaintext messages before they are encrypted. The most common and relevant form of padding in our case is PKCS #1 V 1.5 padding.

A padded message begins with the bytes 0x0 0x2, which is followed by a string of random non-zero bytes, and finishes with a zero byte and the plaintext message at the end. The random bytes are added such that the full length of the message will be the same length as the modulus. The random padding prevents the cipher from being deterministic, and the rigid structure makes it difficult to predictably alter an encrypted message without creating a padding error.

However, what Bleichenbacher discovered in 1998 is that given an oracle that reveals whether a ciphertext’s decryption is correctly padded or not (whether it is PKCS conforming), an attacker can use this to fully decrypt the message.

To cover how this can be done, we’ll start by setting some more rigorous definitions. As all padded messages begin with 0x000x02 and extend to the length of the modulus, we can say that a message is PKCS compliant if it exists within the range:


In this situation, assume that we have a message ‘M₀’ that we are trying to decrypt using this vulnerability. In this model, we make the assumption that we know that M₀ is properly padded. Knowing that our message is properly padded, we at least know a base range that our message resides in. Bleichenbacher also describes a simple method of fully recovering a message that is not properly padded, but for the sake of time we will be assuming that we know the message’s format.

Once again, the next step is to modify the ciphertext by multiplying by successive values of s^e. However predictably, most modifications that we make will result in a message that is improperly padded, and will be of no use to us. Therefore, we want to find the first positive integer s such that s^e*c is properly padded. This first integer will be denoted as s₁. Once this integer is found, our second message which we know is also properly will be denoted as M₁.

As we know that M₁ is also properly padded, we know the interval in which it resides, and can use this to narrow the possible intervals for M₀. Once again, we will represent a reduction over a modulo N as subtracting N from a number 0 or more times.

Thus, given this formula and range for M₁, we can construct a new restricted range for m₀.

This range is more narrow, although it depends on a value of r we do not know. However we can limit the possible values of r based on the fact that both M₀ and M₁ are in the range [2B,3B-1].

So, by knowing a valid value of S₁, we can find a range of values of r which will each give us an interval in which M₀ may reside. But before we get to generating possible values of S₁, there’s an significant lower bound that we can calculate which will make generating S₁ easier later on.

From our determined range of r, we can gather that

If r=0 in our definition of modulo N, this means that N does not need to be subtracted from our plaintext to bring it into the range [0,N-1] However, as M₀ will always be at least 2B:

Therefore, we know that our value of S₁ will always be greater than N/3B, which can save a lot of time searching for this first value, which already takes a substantial amount of time. Hence we know where to start our search for the smallest possible value of S₁.

s1 = math.ceil(n/(3*B))
m1 = (c * pow(s1,e,n)) % n
while not Oracle(m1):
    s1 += 1
    m1 = (c * pow(s1,e,n)) % n

Note that this step can take a very long time. This form of attack is also known as Bleichenbacher’s “Million Message attack” owing to how an attacker may have to generate an amount of messages in the hundreds of thousands or possibly millions.

Once a value of S₁ is found, we can use it to find the upper and lower bounds of r, which will let us calculate possible bounds for M₀. For example, if we discover that 5<r<11, then there will be five valid intervals for M₀, which we can intersect with our initial interval of [2B,3B-1] like we did in the middle-bit oracle.

We can then narrow the interval further by generating more values of S (S₁, S₂, S₃ … Sᵢ) for each step, and then intersecting the currently known intervals with the intervals gathered from Sᵢ to find a more narrow bound at each step.

One further optimisation we can make is that at each step we can limit the possible values of r by ignoring all values that do not exist within one of the intervals that has already been discovered. As a general formula, during step i, if we have a set of intervals [a,b] which we know M₀ must lie within:

Searching for these successive values of Sᵢ is a very slow process. However, once we reach a point where there is only a single interval that M₀ can reside in, we can switch to another more efficient method of searching for M₀.

From our previous definition of the bounds that M₀ resides in at a given step, we can derive

Given that we’re assuming that we’ve reached a point where we’ve narrowed the search range for M₀ to a single interval, we will say that M₀ resides only within the interval [a,b]. Therefore, Sᵢ exists within the bounds

Following Bleichenbacher’s original paper, if we now select r such that

We can determine that

Therefore, each new value of Sᵢ will be at least double that of Sᵢ₋₁, whereas when there was more than one interval, all we knew for certain is that Sᵢ > Sᵢ₋₁. As the intervals at each step have size B/Sᵢ, doubling Sᵢ at each step will half the search space that remains, and so we can converge on M₀ using a binary search as we did with our parity oracle.

Now that we know all of the formulas and steps, this attack is rather simple to lay out.

That’s everything you should need to know to understand adaptive-chosen-ciphertext attacks. If you’ve followed the article up to here, you should now a system that leaks seemingly inconsequential bits of information can render a seemingly secure cryptosystem insecure. If you’d like to learn more about these vulnerabilities, you can read Bleichenbacher’s original paper Here. Additionally, if you’d like to read into modern applications of these vulnerabilities, you can look into the more recent ‘DROWN Attack’ and ‘ROBOT Attack’.