A hash set works with "buckets". It stores values in those "buckets" according to their hash code. A "bucket" can have several members in it, depending on whether those members are equal, using the equals(Object)
method.
So let's say we construct a hash set with 10 buckets, for argument's sake, and add the integers 1, 2, 3, 5, 7, 11 and 13 to it. The hash code for an int is just the int. We end up with something like this:
- (empty)
- 1, 11
- 2
- 3, 13
- (empty)
- 5
- (empty)
- 7
- (empty)
- (empty)
The traditional way to use a set is to look and see if a member is in that set. So when we say, "Is 11 in this set?" the hash set will modulo 11 by 10, get 1, and look in the 2nd bucket (we're starting our buckets with 0 of course).
This makes it really, really fast to see if members belong to a set or not. If we add another 11, the hash set looks to see if it's already there. It won't add it again if it is. It uses the equals(Object)
method to determine that, and of course, 11 is equal to 11.
The hash code for a string like "abc" depends on the characters in that string. When you add a duplicate string, "abc", the hash set will look in the right bucket, and then use the equals(Object)
method to see if the member is already there. The equals(Object)
method for string also depends on the characters, so "abc" equals "abc".
When you use a StringBuffer, though, each StringBuffer has a hash code, and equality, based on its Object ID. It doesn't override the basic equals(Object)
and hashCode()
methods, so every StringBuffer looks to the hash set like a different object. They're not actually duplicates.
When you print the StringBuffers to the output, you're calling the toString() method on the StringBuffers. That makes them look like duplicate strings, which is why you're seeing that output.
This is also why it's very important to override hashCode()
if you override equals(Object)
, otherwise the Set looks in the wrong bucket and you get some very odd and unpredictable behavior!