Application vulnerability due to Non Random Hash Functions

Asked 29/12, 2011 at 15:47 Answered 29/5, 2012 at 17:31

Below excerpt is from an article that explains possibility of Denial Of Service(DoS) attack because of non random hash functions used in Hash Data Structures.

[…] the condition can be leveraged by exploiting predictable collisions in the underlying hashing algorithms.

In order to verify it I went through reference implementation of Java HashMap from Oracle and indeed found a static hash functions used:

    static int hash(int h) {
       h ^= (h >>> 20) ^ (h >>> 12);
       return h ^ (h >>> 7) ^ (h >>> 4);
    }

Another paper on the topic tells:

A Tomcat 6.0.32 server parses a 2 MB string of colliding keys in about 44 minutes of i7 CPU time, so an attacker with about 6 kbit/s can keep one i7 core constantly busy. If the attacker has a Gigabit connection, he can keep about 100.000 i7 cores busy

How can we safeguard against this vulnerability. Moreover so with so many of softwares we use being open source (Tomcat etc.) which rely on this implementation.

Lupulin answered 29/12, 2011 at 15:47 Comment(6)

I read some where that tomcat released fix for this issue. Quick and safe way is fix upgrade. – Scammony 29/12, 2011 at 16:0

To be sure I would patch HashMap hash or String hashCode myself, unless tomcat have a fix you can use. – Ayeaye 29/12, 2011 at 16:8

@thinksteep I took tomcat as an example.There might be numerous other applications depending on reference implementation, unless they role out their own implementation of Hash Data structures for eg as Peter suggested. I am wondering why don't RI directly provide a fix for this. – Lupulin 29/12, 2011 at 16:24

Suggest migration to Serverfault or Security.SE. – Purpure 29/12, 2011 at 19:45

Although this, of course, applies across Java (and a number of other languages) generally, it is worth noting that Tomcat has fixes in the trunk for the 6 and 7 branches already. See markmail.org/message/… – Headwaters 4/1, 2012 at 13:10

It's worth knowing that in Java SE 7, there is an alternative hashing mode available in the JRE which safeguards against this problem; although it is disabled by default for backwards compatibility reasons. In Java 8, this was addressed more fully by JEP-180 (openjdk.java.net/jeps/180) – Dorwin 7/5, 2014 at 13:15

Understanding Attack Vector

How HashMaps work

Say a comment form on a blog accepts the parameters – first_name, last_name, comment – as post parameters. Internally, Tomcat stores these parameters as a HashMap.

The logical structure of this HashMap is like this -


"first_name" --> "Sripathi"
"last_name" --> "Krishnan"
"comment" ---> "DoS using poor Hashes"

But the physical structure is different. The keys are first converted into a hashCode, and then the hashCode is converted into an array index.

The ideal physical structure thus becomes -


0 --> "Sripathi"
1 --> "Krishnan"
2 --> "DoS using poor Hashes"

But the possible keys are infinite. So at some point, two keys will have the same hash code. This becomes a hash collision.

With collisions, the physical structure becomes :


0 --> "Sripathi", "Krishnan"
1 --> Empty
2 --> "DoS using poor hashes"

Hash Collisions and impact on performance

When you have hash collisions, inserting a new entry means iterating over all the elements in a single hash "bucket" sequentially just to find out if it already exists in the map. Inserting one element can approach O(n) complexity if all elements hash to the same value. Inserting n elements in this worst case makes it O(n*n) complexity.

In short : If you insert thousands of keys that have the same hashCode, the server will require a lot of CPU cycles.

How do you generate keys with the same Hash?

In Java, "Aa" and "BB" have the same hash code.

Because of a property called "Equivalent Substrings", we can generate several other strings with the same hashcode, just by starting with these 2 strings.

First Iteration : "AAAA", "AABb", "BbAA", "BbBb" have the same hash code

Now, we have 4 strings with the same hash code. We can permute them to generate 16 strings that will have the same hash code. For example :


"AaAaAaAa", "AaAaBBBB", "AaAaAaBB", "AaAaBBAa",
"BBBBAaAa", "BBBBBBBB", "BBBBAaBB", "BBBBBBAa",
"AaBBAaAa", "AaBBBBBB", "AaBBAaBB", "AaBBBBAa",
"BBAaAaAa", "BBAaBBBB", "BBAaAaBB", "BBAaBBAa",

All these 16 strings have the same hash code.

You can now take these 16 strings, and generate 256 strings that have the same hashcode.

In short : It is very easy to generate a large set of strings that will have the exact hash code.

How do you attack the server?

Create thousands of string that have the same hash code (see above)
Construct a POST request like this - AaAa=&AaBB=&BBAa=&BBBB= ....
Submit the form
Repeat in a loop, and create several threads so that all server resources are used up

Because this is just a POST request, an attacker can also use innocent browsers to attack a server. Just find a website with a cross site scripting vulnerability, embed code to make a POST request, and then use social engineering to spread the link to as many users as you can.

Prevention

In general, the underlying platform cannot fix this. This is considered to be a application framework problem. In other words, Tomcat has to fix this, not Oracle/Sun.

Possible fixes include :

Restrict the number of POST parameters - Tomcat 6.0.35+ has a new parameter maxParameterCount. The default value is 10,000. The lower the better, as long as it does not break your functionality.
Restrict the size of the POST request - For the attack to work, the Payload has to be huge. The default POST allowed by Tomcat is 2MB. Reducing this to say 200KB will reduce the effectiveness of this attack. The parameter in tomcat is maxPostSize
Web Application Firewall - If you have a web application firewall, you can configure it to block requests that look suspicious. This is a reactive measure, but is nice to have in case you cannot use one of the above solutions.

FYI - Tomcat's documentation is here - http://tomcat.apache.org/tomcat-6.0-doc/config/http.html

Cornfield answered 29/12, 2011 at 17:55 Comment(7)

Sorry -- it should be "Aa" and "BB", both have the hashcode 2112. Updated post accordingly. – Cornfield 2/1, 2012 at 9:2

Java HashMap does not do chaining when collision occurs, if there is already a value stored for key 'k', then it will replace that value with new value. According to java documentaiton "Associates the specified value with the specified key in this map. If the map previously contained a mapping for this key, the old value is replaced". kindly check this docs.oracle.com/javase/1.4.2/docs/api/java/util/…, java.lang.Object).So I dont completely agree with your "Hash Collisions and impact on performance" part. your explantion is nice about hashing in general – Leatherneck 2/1, 2012 at 11:5

@rao_555 : I am speaking about "two different keys that have the same hash value". Consider hashmap with Keys "Aa" and "BB". In this case, the value will not be replaced, and the performance will degrade because the Hashmap essentially becomes a Linkedlist. – Cornfield 2/1, 2012 at 12:47

Please fix the "First Iteration" line as well. "AAAA", "AABb", "BbAA", "BbBb" is wrong. The rest is correct. The example would be clearer, however, with Ea and FB, since the point is that a and B differ by 31, while E and F differ by 1. – Preconceive 7/1, 2012 at 8:26

@Sripathi Krishnan In my machine, the hashcode for "Aa" and "BB" is 2260. the code I used is int hash = hash("Aa".hashcode()); – Sandeesandeep 7/1, 2012 at 23:40

The JDK can and should fix this. This can be done by using a cryptographic hash function like SipHash. – Newbill 15/3, 2017 at 22:5

@RajeshPantula You are confusing hash keys and hashed key values (the hash codes). A hash key "collision" of course replaces the value of the key. But a hash-code collision most definitely performs chaining. Just try adding "Aa" and "BB" to a hash map and confirm that (a) the string keys have the same hashCode value and (b) both items remain distinct keys in the map. – Disdainful 8/4, 2019 at 14:7

The simplest solution is to upgrade to a fixed version of tomcat. However, I suspect you want to know the details of what the tomcat people would need to change.

This attack works by exploiting a common implementation detail of hash data structures - using linked lists to hold all the values whose hash is the same. Adding values to this linked list is inefficient as the size of the list gets large. An attacker can create a list of values that are known to generate colliding hashes, forcing this inefficient behavior. In order to protect against this, you have a few options:

Prevent collisions - prevent the attacker from generating colliding values by having some (pseudo) random factor in your hash function. Perl has done this for a long time.
Use something besides linked lists for your buckets - the attack works because inserting N items into a linked list has N^2 growth. If you use a balanced tree or some other structure that has N logN growth when inserting, the problem should be mitigated. This may sacrifice some best/average case performance in order to limit how bad the worse case is.

Lien answered 29/12, 2011 at 16:44 Comment(5)

Thanks Peter. My question was more from the perspective of Java default hash structures. They use static hash function. I guess I have to override hashing part for now to safeguard against this vulnerability – Lupulin 29/12, 2011 at 16:52

The fact that Java doesn't randomize its hashes is not a problem. It's the fact that it isn't done by servlet containers when storing post parameters in maps that is a problem, because it makes it easy to make an efficient DoS attack without needing much bandwidth. You don't need to change anything in your Java program, unless you're implementing a web server which stores POST parameters in a HashMap. – Natiha 29/12, 2011 at 17:7

@JBNizet I agree for this particular case. It is most easily exploitable. But lets say a previous developer with a public website who knows internals of system can always hack the weak hashing methods. At least he can slow the system down because he knows what function is getting used. Please pardon me for assuming evil developers :) – Lupulin 29/12, 2011 at 17:16

@JB Nizet - If you are being properly paranoid, you need to be careful about any Hash structure that stores user controllable data, not just POST parameters. There seems little downside to always using a "safe" hashing algorithm. If performance is super important, then you could explicitly use an unsafe algo. Or, we could use a Hash structure that switches to a safe algorithm when we notice exploitative keys being inserted. – Lien 29/12, 2011 at 17:23

The problem is not strictly that a linked list resolution is used. Hash collisions will effect other hash table implementations, e.g those not using separate chaining, just as easily. The problem is because the same hash is generated, which defeats the point in spreading out the keys over the available buckets - any open addressing which only uses the initial hash value will have the same degenerate issue. (Without the use of additional/internal hash functions, I don't know of a way an "O(1)" hash implementation could avoid this attack.) – Hipolitohipp 17/3, 2014 at 18:7

Tomcat version affected are Apache Tomcat <= 5.5.34, <= 6.0.34, <= 7.0.22 as per the link you provided. The page lists Apache Tomcat >= 5.5.35, >= 6.0.35, >= 7.0.23 as Fixed versions.

Ayeaye answered 29/12, 2011 at 16:14 Comment(0)

Here's some python code to generate those keys... Haven't tested it yet, but would be interested to get feedback on it:

#!/usr/bin/python
import sys
alphabet = ["Aa","BB"]

def func(str, length):
                global alphabet
                if(length != 0):
                                for x in alphabet:
                                                new_str = str+x
                                                func(new_str, length-1)
                else:
                                sys.stdout.write(str+"=&")


for x in range(1,16):
        func("",x)

Caernarvonshire answered 29/5, 2012 at 17:31 Comment(0)

-1

Java HashMap/HashTable can do the 'resize' operation when the filled entry reach threshold. It's hard to say that there have an fixed bucket HashMap waiting for you. Because of the operation for selecting bucket have two steps: one is take hash value of specified key; another primary step is remainder operation with total bucket size(the size has being changed by 'resize').

Hydrazine answered 6/1, 2012 at 6:16 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++