Java profiling: java.lang.Object.hashCode takes half of the CPU time but never explictly called

Asked 26/6, 2010 at 16:51 Answered 7/12, 2010 at 7:34

Solved java profiling benchmarking hashcode hprof

I have been benchmarked my multihreaded program using -agentlib:hprof=cpu=samples and was surprised to find the following line in the results:

rank   self  accum   count trace method
   1 52.88% 52.88%    8486 300050 java.lang.Object.hashCode

I never explicitly call hashCode() in my program. What can be the reason for this? How can I understand the source for this time "waste" and whether it is normal or not?

Thanks, David

Stedfast answered 26/6, 2010 at 16:51 Comment(1)

It would be nice if you were to explain why this is a problem in the first place. – Rompers 26/6, 2010 at 18:19

Most likely you're using very intensively a Map such as a HashMap.

HashMap used the hashCode to distribute the objects. If you're using many objects with this data structure, is very important your .equals and your .hashCode method are properly implemented.

See: Effective Java Item 8: Always override hashCode when you override equals

Oud answered 26/6, 2010 at 16:52 Comment(3)

Thanks! You are mot probably right. I can actually relinquish my use of the random access capabilities (is that how you call it?), and I don't care about the order of the objects. I just need to be able to add objects then iterate over all of them. Also, this is indeed a set (I don't need the same object more than once), but I will also never attempt to add it more than once... Should I use a list instead (although I don't care about the ordering)? What is the most efficient data structure for such a set? – Stedfast 26/6, 2010 at 17:3

It is indeed a HashSet. I now converted all to ArrayLists. Now this line disappears from the profiler but the total run time remained very similar. Strange. isn't it?! – Stedfast 26/6, 2010 at 17:10

@David B May, or may be not . You may be doing basically the same with a different data structure. You could probably think on a different algorithm, but it's hard to tell from here. :) – Oud 26/6, 2010 at 17:24

One thing you should do is to check out matching stack trace to see who is calling it; changes are it is indeed HashMap.

But beyond this, I have noticed that hprof tends to vasty overestimate calls to hashCode(); and I really would like to know how and why. This is based on actually knowing rough performance profile of code; and I have seen 50% percent cpu use (by sampling), where it is all but certain that it absolutely will not take that long. Implementation of hashCode() just returns an int field, and method is final (on final object). So it is basically a profiler artifact of some sort... just no idea how or why, or how to get rid of it.

Archiearchiepiscopacy answered 7/12, 2010 at 7:34 Comment(0)

You are mot probably right. I can actually relinquish my use of the random access capabilities (is that how you call it?), and I don't care about the order of the objects. I just need to be able to add objects then iterate over all of them. Also, this is indeed a set (I don't need the same object more than once), but I will also never attempt to add it more than once... Should I use a list instead (although I don't care about the ordering)? What is the most efficient data structure for such a set?

A HashSet is implemented as a HashMap that maps the key to itself, so switching to a HashSet won't make much difference, performance-wise.

The other alternatives are a TreeSet, or (assuming that your application will never try to insert a duplicate) one of the List classes. If your application is such that a List will work, then an ArrayList or LinkedList will be more efficient than either a HashSet or TreeSet.

However, there is something very fishy about your application spending 50% of its time in hashCode methods. Unless the hash tables are resized, the hashCode method should only be called once per set or map operation. So either there is a lot of map/set resizing going on, or you are doing huge numbers of set add operations. (AFAIK, the Object hashcode method is cheap, so the cost of each call should not be an issue.)

EDIT

Is nextInt() really expensive? Any alternatives?

No it is not expensive. Take a look at the code. The Random class (and the nextInt() method) does make use of an AtomicLong to make it thread-safe, and you might save a few cycles if you coded a non-thread-safe version. The source code is in your JDK installation directory ... take a look.

Comfortable answered 26/6, 2010 at 17:27 Comment(1)

There indeed many add operations. As mentioned, I switched to ArrayList but the total time just slightly improved. Now, the top ranking CPU time consumer is: rank self accum count trace method 1 19.30% 19.30% 2727 300122 java.util.Random.next I indeed call rand.nextInt() many times to get integers between 0 and ~ 10^7. Is nextInt() really expensive? Any alternatives? – Stedfast 26/6, 2010 at 17:43

Recommended topics

Hot tags