Is it possible to have same hashcode for different strings using java's hashcode function?or if it is possible then what is the % of its possibility?
A Java hash code is 32bits. The number of possible strings it hashes is infinite.
So yes, there will be collisions. The percentage is meaningless - there is an infinite number of items (strings) and a finite number of possible hashes.
char
array and arrays in Java (using the standard JVM) have a maximum size. Therefore the number of possible strings is not infinite. –
Doering YES. A lot.
Look at following pair
- "FB" and "Ea"
can return same hash code even though the characters in it are not same.
Basically it is the sum of characters in a string multiplied by an integer.
String
hashCode
Javadocs define the algorithm used to generate the hash code for a string: s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
. If you use this algorithm for both "FB" and "Ea", the result is 2236 for each. –
Ible if it is possible then what is the % of its possibility?
That is not a particularly meaningful question.
However, unless there is some systemic bias in the String::hashcode
function or the way that you are generating the String
objects, the probability that any two randomly generated String
objects will have the same hash code will be 1 in 232.
This assumes that the Strings are chosen randomly from the set of all possible String values. If you restrict the set in various ways, the probability will vary from the above number.
(For instance, the existence of the "FB" / "Ea" collision means that the probability of a collision in the set of all 2 letter strings is higher than the norm. And we could chose a set such that the probability for a hash collision for non-equal strings is zero. Trivially the set "A" and "B".)
Another thing to note is that the chance of 232 different strings chosen at random (from a much larger unbiased set of strings) having no hash collisions is vanishingly small. To understand why, read the Wikipedia page on the Birthday Paradox.
In reality, the only way you are going to get no hash collisions in a set of 232 different strings is if you select or generate the strings in a way that is designed to avoid this. Even forming the set by selecting randomly generated strings and rejecting colliders is going to be computationally expensive. To produce such a set efficiently, you would need to exploit the properties of the String::hashCode
algorithm, which (fortunately) is specified.
Yes, it is entirely possible. The probability of a string (or some other object type -- just assuming you'll be using strings in this example) having the same hashcode as some other string in a collection, depends on the size of that collection (assuming that all strings in that collection are unique). The probabilities are distributed as follows:
- With a set of size ~9,000, you'll have a 1% chance of two strings colliding with a hash in the set
- With a set of size ~30,000, you'll have a 10% chance of two strings colliding with a hash in the set
- With a set of size ~77,000, you'll have a 50% chance of two strings colliding with a hash in the set
The assumptions made, are:
- The hashCode function has no bias
- Each string in the aforementioned set is unique
This site explains it clearly: http://eclipsesource.com/blogs/2012/09/04/the-3-things-you-should-know-about-hashcode/ (Look at "the second thing you should know")
This wouldn't directly answer your question, but I hope it helps.
The below is from the source code of java.lang.String
.
/**
* Returns a hash code for this string. The hash code for a
* <code>String</code> object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using <code>int</code> arithmetic, where <code>s[i]</code> is the
* <i>i</i>th character of the string, <code>n</code> is the length of
* the string, and <code>^</code> indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* @return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
int len = count;
if (h == 0 && len > 0) {
int off = offset;
char val[] = value;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
Yes, it is possible for two Strings to have the same hashcode. If you take a look at the now-deleted Wikipedia article, you will see that both "FB"
and "Ea"
have the same hashcode. There is nothing in the method contract saying a hashCode()
should be used to compare for equality, you want to use equals()
for that.
Since Java 1.2, String implements hashCode()
by using a product sum algorithm over the entire text of the string.
Yes, by definition of the pigeon-hole concept, two different strings can produce the same hashcode and code should always be written to cater for such conditions (typically, by not breaking.)
The percentage of collisions for random strings should be minimal. However, if you hash strings from external sources, an attacker could easily create hundreds of thousands of strings having the same hashcode. In a java HashMap these would all map to the same bucket and effectively turn the map into a linked list. Access times to the map would then be proportional to the map size instead of constant, leading to a denial of service attack.
See this page on Effective DoS attacks against Web Application Plattforms for further information links to the presentation.
Yes this is possible, because one of the contract between equals() & hashCode() method of Object class is.......... If two object are not equal according to equals() method then there is no guaranty that their hashCode will be same, the hashCode may/may not be equal. i.e, if obj1.equals(obj2) return false then obj1.hashCode()==obj2.hashCode() may/may not return true. Example:
String str1 = "FB";
String str2 = "Ea";
System.out.println(str1.equals(str2));// false
System.out.println(str1.hashCode() == str2.hashCode()); // true
//You can run the below code with -Xmx2100m and can get multiple results, enough to fill your console
`
import java.util.HashMap;
public class TestHashCollision {
public static void main(String[] args) {
final String TEXT = "was stored earlier had the same hash as";
HashMap<Integer,String> hs=new HashMap<>();
long t1=System.currentTimeMillis();
long t2=System.currentTimeMillis();
for(long l=0;l<Long.MAX_VALUE;l++) {
String key="d"+l;
if(hs.containsKey(key.hashCode())) {
System.out.println("'"+hs.get(key.hashCode())+"' "+TEXT+" '"+key+"'");//System.exit(0);
} else {
hs.put(key.hashCode(),key);
}
t2=System.currentTimeMillis();
if(t2-t1>10000) {
t1=System.currentTimeMillis();
System.out.println("10 seconds gone! size is:"+hs.size());
}
}
System.out.println("Done");
}
}
`
Yes(not just in Java, it applies to any language), it can produce the same hash-code for different strings. I am recalling a rule taught by my professor, it might be useful here -
Two same strings/value must have the same hashcode, but the converse is not true.
example in python
>>> hash('same-string')
-5833666992484370527
>>> hash('same-string')
-5833666992484370527
There might be another string which can match the same hash-code, so we can't derive the key using hash-code.
The reason for two different string to have the same hash-code is due to the collision.
"tensada".hashCode()
"friabili".hashCode());
The java hash function return equal values here.
© 2022 - 2024 — McMap. All rights reserved.