My python program executes faster than my java version of the same program. What gives?

Asked 27/5, 2009 at 22:48 Answered 20/11, 2009 at 23:53

Update: 2009-05-29

Thanks for all the suggestions and advice. I used your suggestions to make my production code execute 2.5 times faster on average than my best result a couple of days ago. In the end I was able to make the java code the fastest.

Lessons:

My example code below shows the insertion of primitive ints but the production code is actually storing strings (my bad). When I corrected that the python execution time went from 2.8 seconds to 9.6. So right off the bat, the java was actually faster when storing objects.
But it doesn't stop there. I had been executing the java program as follows:

java -Xmx1024m SpeedTest

But if you set the initial heap size as follows you get a huge improvement:

java -Xms1024m -Xmx1024m SpeedTest

This simple change reduced the execution time by more than 50%. So the final result for my SpeedTest is python 9.6 seconds. Java 6.5 seconds.

Original Question:

I had the following python code:

import time
import sys

def main(args):    
    iterations = 10000000
    counts = set()
    startTime = time.time();    
    for i in range(0, iterations):
        counts.add(i)
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
    main(sys.argv)

And it executed in about 3.3 seconds on my machine but I wanted to make it faster so I decided to program it in java. I assumed that because java is compiled and is generally considered to be faster than python I would see some big paybacks.

Here is the java code:

import java.util.*;
class SpeedTest
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;
        HashSet counts = new HashSet((2*iterations), 0.75f);

        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++)
        {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());
    }
}

So this java code does basically the same thing as the python code. But it executed in 8.3 seconds instead of 3.3.

I have extracted this simple example from a real-world example to simplify things. The critical element is that I have (set or hashSet) that ends up with a lot of members much like the example.

Here are my questions:

How come my python implementation is faster than my java implementation?
Is there a better data structure to use than the hashSet (java) to hold a unique collection?
What would make the python implementation faster?
What would make the java implementation faster?

UPDATE:

Thanks to all who have contributed so far. Please allow me to add some details.

I have not included my production code because it is quite complex. And would generate a lot of distraction. The case I present above is the most simplified possible. By that I mean that the java put call seems to be much slower than the python set`s add().

The java implementation of the production code is also about 2.5 - 3 times slower than the python version -- just like the above.

I am not concerned about vm warmup or startup overhead. I just want to compare the code from my startTime to my totalTime. Please do not concern yourselves with other matters.

I initialized the hashset with more than enough buckets so that it should never have to rehash. (I will always know ahead of time how many elements the collection will ultimately contain.) I suppose one could argue that I should have initialized it to iterations/0.75. But if you try it you will see that execution time is not significantly impacted.

I set Xmx1024m for those that were curious (my machine has 4GB of ram).

I am using java version: Java(TM) SE Runtime Environment (build 1.6.0_13-b03).

In the production version of I am storing a string (2-15 chars) in the hashSet so I cannot use primitives, although that is an interesting case.

I have run the code many, many times. I have very high confidence that the python code is between 2.5 and 3 times faster than the java code.

Rasberry answered 27/5, 2009 at 22:48 Comment(12)

Oh man... must... resist... temptation... to... flamebait... answer... :-) – Kitts 27/5, 2009 at 22:55

Your python program would also probably faster if you used xrange() rather than range(). – Bessbessarabia 27/5, 2009 at 22:56

I wasn't aware that Java was generally considered to be faster than Python. In fact I remember seeing data indicating that (for a certain test, don't remember what it was though) Java was slower than Python. Certainly, if I wanted to speed up a Python program I'd rewrite it in C, not Java. – Heptavalent 27/5, 2009 at 23:1

Are you quite alright David. Written sensibly Java performance should be within spitting distance of the equivalent C (although probably take far shorter to write). Python (all known implementations) generally have to rely on better algorithms or shipping it out to libraries written in other languages. – Drupe 27/5, 2009 at 23:6

Your test is missing a key element. Access time into the set. How many milliseconds doo 1000 set.contains() calls take? For items that are in the list and items that aren't? Running the test locally in java I find that calling set.contains() 10000 times takes less than 20 milliseconds for both cases (hits and misses). – Lizalizabeth 27/5, 2009 at 23:13

The fact that you can use libraries written in other languages (like C ...which is what CPython is written in) to speed up execution of some easy-to-read Python is one of Python's really fantastic features. Why would that be a bad thing? – Picturize 27/5, 2009 at 23:14

Java's fantastic feature is that reasonable implementation are almost as fast as C with no additional effort. – Drupe 27/5, 2009 at 23:36

You could also use the psyco module for python to speed up repetative execution even more, sometimes by factors of thousands. psyco.sourceforge.net – Tali 27/5, 2009 at 23:54

It isn't entirely fair to say that java is compiled, but python is not. True, java has a linker and other static compilation/optimization goodies. Nevertheless, basic python execution DOES byte compile to *.pyc files, and it's the byte-compiled version that is executed. Java is not dissimilar in that javac produces VM-specific byte-compilation. On subsequent executions, the *.pyc is used directly if the *.py has not changed. You can also ask python to generate a *.pyo (optimized byte compilation) for further performance enhancements over subsequent executions. – Kitts 28/5, 2009 at 0:20

Even if you run something many many times, if your individual Java methods are executed only a few times then they may be interpreted and not compiled. This is what people refer to as "warm-up" ... running methods enough times that they'll get compiled and won't just be interpreted. – Muscadel 28/5, 2009 at 2:30

@Tom - You could say the very same thing for Python. I think the real question here is that if the Python implementation works faster, why not use it? ..or to really confuse things, you could try Jython (the Python implementation based on Java) :-) – Picturize 28/5, 2009 at 8:39

I don't think 9 seconds is the final result for Python. Take a Peek at this post: #918859 – Emetine 10/6, 2009 at 19:18

You're not really testing Java vs. Python, you're testing java.util.HashSet using autoboxed Integers vs. Python's native set and integer handling.

Apparently, the Python side in this particular microbenchmark is indeed faster.

I tried replacing HashSet with TIntHashSet from GNU trove and achieved a speedup factor between 3 and 4, bringing Java slightly ahead of Python.

The real question is whether your example code is really as representative of your application code as you think. Have you run a profiler and determined that most of the CPU time is spent in putting a huge number of ints into a HashSet? If not, the example is irrelevant. Even if the only difference is that your production code stores other objects than ints, their creation and the computation of their hashcode could easily dominate the set insertion (and totally destroy Python's advantage in handling ints specially), making this whole question pointless.

Heliograph answered 28/5, 2009 at 9:26 Comment(0)

I suspect that is that Python uses the integer value itself as its hash and the hashtable based implementation of set uses that value directly. From the comments in the source:

This isn't necessarily bad! To the contrary, in a table of size 2**i, taking the low-order i bits as the initial table index is extremely fast, and there are no collisions at all for dicts indexed by a contiguous range of ints. The same is approximately true when keys are "consecutive" strings. So this gives better-than-random behavior in common cases, and that's very desirable.

This microbenchmark is somewhat of a best case for Python because it results in exactly zero hash collisions. Whereas, if Javas HashSet is rehashing the keys it has to perform the additional work and also gets much worse behavior with collisions.

If you store the range(iterations) in a temporary variable and do a random.shuffle on it before the loop the runtime is more than 2x slower even if the shuffle and list creation is done outside the loop.

Mealworm answered 28/5, 2009 at 8:46 Comment(1)

+1 I'm pretty sure this is exactly what makes Python faster here. – Heliograph 28/5, 2009 at 9:46

It has generally been my experience that python programs run faster than java programs, despite the fact that java is a bit "lower level" language. Incidently, both languages are compiled into byte code (that's what those .pyc file are -- you can think of them as kind of like .class files). Both languages are byte-code interpreted on a virtual stack machine.

You would expect python to be slower at things like, for example, a.b. In java, that a.b will resolve into a dereference. Python, on the other hand, has to do one or more hash table lookups: check the local scope, check the module scope, check global scope, check builtins.

On the other hand, java is notoriously bad at certain operations such as object creation (which is probably the culprit in your example) and serialization.

In summary, there's no simple answer. I wouldn't expect either language to be faster for all code examples.

Correction: several people have pointed out that java isn't so bad at object creation any more. So, in your example, it's something else. Perhaps it's autoboxing that's expensive, perhaps python's default hashing algorithm is better in this case. In my practical experience, when I rewrite java code to python, I always see a performance increase, but that could be as much due to the language as it is due to rewritng in general leads to performance improvements.

Kenway answered 27/5, 2009 at 23:4 Comment(7)

bad object creation is no longer a huge issue after java 1.5. while its still the "slowest" its way faster now. serialization still does suck – Sumerian 27/5, 2009 at 23:7

Dross. For the most part Java gets compiled into optimised machine code, which Python generally does not. Object creation is incredibly fast in Java (faster than pretty much any C++ implementation). (Particularly on Azul hardware!) – Drupe 27/5, 2009 at 23:13

Actually, Java used to be bad at object creation, but hasn't been for several years now. This bit of old information lingers on, despite the fact that it's so untrue today. – Muscadel 27/5, 2009 at 23:54

@Tom Hawtin, Faster than pretty much any C++ implementation? Really I would like to see a benchmark of this. – Heptode 28/5, 2009 at 0:12

Still, Java undeniably has to create objects for all the ints in this case. If Python manages to avoid that by having a special case for an all-int set, it probably explains the difference. – Heliograph 28/5, 2009 at 9:39

@grepsedawk, Tom said "Object Creation" is faster. See here: en.wikipedia.org/wiki/Java_performance#Program_speed – Detective 28/5, 2009 at 9:54

@Detective looking at the wikipedia reference #52 to dr dobbs journal. They are using an older glibc which has a potentially slower general purpose allocator than newer glibc implementations. Not to mention using a custom allocator for object creation will show you the real reason why tuned C++ code always beats Java. But maybe that is what Tom meant by "pretty much" even though custom allocators are very common in C++. Example : Boost.Pool – Heptode 29/5, 2009 at 1:42

Another possible explanation is that sets in Python are implemented natively in C code, while HashSet's in Java are implemented in Java itself. So, sets in Python should be inherently much faster.

Trip answered 27/5, 2009 at 23:6 Comment(3)

Why would C be much faster than Java?? There might be implementation differences, for instance a common one is to shove small integers into references rather than actually have an object. – Drupe 27/5, 2009 at 23:42

Tom Hawtin - Because for most cases C IS faster than Java? – Heptode 28/5, 2009 at 0:28

"for most cases C IS faster than Java" --- is a thing that most Java programmers had started to belive false since 1.5. – Dalhousie 28/5, 2009 at 10:2

I'd like to dispel a couple myths I saw in the answers:

Java is compiled, yes, to bytecode but ultimately to native code in most runtime environments. People who say C is inherently faster aren't telling the entire story, I could make a case that byte compiled languages are inherently faster because the JIT compiler can make machine-specific optimizations that are unavailable to way-ahead-of-time compilers.

A number of things that could make the differences are:

Python's hash tables and sets are the most heavily optimized objects in Python, and Python's hash function is designed to return similar results for similar inputs: hashing an integer just returns the integer, guaranteeing that you will NEVER see a collision in a hash table of consecutive integers in Python.
A secondary effect of the above is that the Python code will have high locality of reference as you'll be accessing the hash table in sequence.
Java does some fancy boxing and unboxing of integers when you add them to collections. On the bonus side, this makes arithmetic way faster in Java than Python (as long as you stay away from bignums) but on the downside it means more allocations than you're used to.

Sabbatarian answered 28/5, 2009 at 9:53 Comment(2)

I'm pretty sure compilers like gcc have no trouble making machine specific optimizations. Look up the march flag. Maybe you mean something else though like profile guided optimization? But that is also available in gcc. – Heptode 29/5, 2009 at 1:46

But when you compile a program using GCC you typically don't enable all possible features, because some of your users will probably be using old CPUs that don't have those features. A JIT only has to produce machine code for one machine, and it knows exactly which CPU features it supports. – Proclaim 21/11, 2009 at 0:0

Edit: A TreeSet might be faster for the real use case, depending on allocation patterns. My comments below deals only with this simplified scenario. However, I do not believe that it would make a very significant difference. The real issue lays elsewhere.

Several people here has recommended replacing the HashSet with a TreeSet. This sounds like a very strange advice to me, since there's no way that a data structure with O(log n) insertion time is faster then a O(1) structure that preallocates enough buckets to store all the elements.

Here's some code to benchmark this:

import java.util.*;
class SpeedTest
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;
        Set counts;

        System.out.println("HashSet:");
        counts = new HashSet((2*iterations), 0.75f);
        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++) {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());

        counts.clear();

        System.out.println("TreeSet:");
        counts = new TreeSet();
        startTime = System.currentTimeMillis();
        for(int i=0; i<iterations; i++) {
            counts.add(i);
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
        System.out.println(counts.size());
    }
}

And here's the result on my machine:

$ java -Xmx1024M SpeedTest
HashSet:
TOTAL TIME = 4.436
10000000
TreeSet:
TOTAL TIME = 8.163
10000000

Several people also argued that boxing isn't a performance issue and that object creation is inexpensive. While it's true that object creation is fast, it's definitely not as fast as primitives:

import java.util.*;
class SpeedTest2
{    
    public static void main(String[] args)
    {        
        long startTime;
        long totalTime;
        int iterations = 10000000;

        System.out.println("primitives:");
        startTime = System.currentTimeMillis();
        int[] primitive = new int[iterations];
        for (int i = 0; i < iterations; i++) {
            primitive[i] = i;
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );

        System.out.println("primitives:");
        startTime = System.currentTimeMillis();
        Integer[] boxed = new Integer[iterations];
        for (int i = 0; i < iterations; i++) {
            boxed[i] = i;
        }
        totalTime = System.currentTimeMillis() - startTime;
        System.out.println("TOTAL TIME = "+( totalTime/1000f) );
    }
}

Result:

$ java -Xmx1024M SpeedTest2
primitives:
TOTAL TIME = 0.058
primitives:
TOTAL TIME = 1.402

Moreover, creating a lot of objects results in additional overhead from the garbage collector. This becomes significant when you start keeping tens of millions of live objects in memory.

Metrist answered 27/5, 2009 at 23:33 Comment(5)

I said TreeSet may be better on the number of objects score. I did not say that it would be faster. – Drupe 27/5, 2009 at 23:35

Tom: It wasn't directed towards you. Didn't even see your post before I posted this. :) – Metrist 27/5, 2009 at 23:37

In my experience Hash* is slower with general use than Tree*. This benchmark is very good for Hash*, since all your objects are nicely even distributed. But in the real world you'll get clumping, which would lead to sub O(log n) performance. – Sumerian 27/5, 2009 at 23:45

Heh, I skipped over the two previous answers mentioning TreeSet. I only though of it as a bit of an after thought. – Drupe 27/5, 2009 at 23:46

Pyro: We don't know anything about the actual problem though, so it's hard to make a good recommendation about the choice of data structure. You make a good point, but your post didn't clarify that you meant general purpose usage rather than this very case. – Metrist 27/5, 2009 at 23:50

I find benchmarks like this to be meaningless. I don't solve problems that look like the test case. It's not terribly interesting.

I'd much rather see a solution for a meaningful linear algebra solution using NumPy and JAMA. Maybe I'll try it and report back with results.

Garrettgarrick answered 27/5, 2009 at 23:20 Comment(4)

You're right, benchmarks like this don't prove a lot when there are so many other factors. That makes your second point just as moot though; just because it work's for one situation doesn't mean it's going to be any better for others. – Picturize 27/5, 2009 at 23:24

Benchmarking doing something sensible does make sense. Benchmarking nonsense, less so. (I don't know about NumPy and JAMA. Are they wrappers over C code? :) – Drupe 27/5, 2009 at 23:40

@Jon Cage - It's not moot if I'm interested in what I learn and I use problems that are representative of what I'd really do with the libraries. – Garrettgarrick 27/5, 2009 at 23:49

@Tom Hawtin - JAMA is pure Java. NumPy has an API for integrating with C, but I believe it's pure Python. – Garrettgarrick 27/5, 2009 at 23:50

I'm not too familiar with python, but I do know HashSet can't contain primitives, so when you say counts.add(i) the i there is getting autoboxed into a new Integer(i) call. That's probably your problem.

If for some reason you really needed a 'set' of integers between 0 and some large n, its probably best declared as an 'boolean[] set = new boolean[n]'. Then you could go through the array and mark items that are in the set as 'true' without incurring the overhead of creating n Integer wrapper objects. If you wanted to go further than that you could use a byte[] of size n/8 and use the individual bits directly. Or perhaps BigInteger.

EDIT

Stop voting my answer up. Its wrong.

EDIT

No really, its wrong. I get comparable performance if I do what the question suggest, populate the set with N Integers. if I replace the contents of the for loop with this:

    Integer[] ints = new Integer[N];
    for (int i = 0; i < N; ++i) {
        ints[i] = i;
    }

Then it only takes 2 seconds. If you don't store the Integer at all then it takes less than 200 millis. Forcing the allocation of 10000000 Integer objects does take some time, but it looks like most of the time is spent inside the HashSet put operation.

Lizalizabeth answered 27/5, 2009 at 22:54 Comment(2)

I thought that too. But is this different for Python? – Oversexed 27/5, 2009 at 23:0

I thought exactly the same thing until, like you, I tried it. – Muscadel 27/5, 2009 at 23:34

There's a number of issues here which I'd like to bring together.

First if it's a program that you are only going to run once, does it matter it takes an extra few seconds?

Secondly, this is just one microbenchmarks. Microbenchmarks are pointless for comparing performance.

Startup has a number of issues.

The Java runtime is much bigger than Python so takes longer to load from disk and takes up more memory which may be important if you are swapping.

If you haven't set -Xms you may be running the GC only to resize the heap. Might as well have the heap properly sized at the start.

It is true that Java starts off interpreting and then compiles. Around 1,500 iterations for Sun client [C1] Hotspot and 10,000 for server [C2]. Server Hotspot will give you better performance eventually, but take more memory. We may see client Hotspot use server for very frequently executed code for best of both worlds. However, this should not usually be a question of seconds.

Most importantly you may be creating two objects per iteration. For most code, you wouldn't be creating these tiny objects for such a proportion of the execution. TreeSet may be better on number of objects score, with 6u14 and Harmony getting even better.

Python may possibly be winning by storing small integer objects in references instead of actually have an object. That is undoubtably a good optimisation.

A problem with a lot of benchmarks is you are mixing a lot of different code up in one method. You wouldn't write code you cared about like that, would you? So why are you attempting to performance test which is unlike code you would actually like to run fast?

Better data structure: Something like BitSet would seem to make sense (although that has ynchronisation on it, which may or may not impact performance).

Drupe answered 27/5, 2009 at 23:21 Comment(0)

You need to run it multiple times to get a real idea of "how fast" each runs. The JVM startup time [for one] is adding to the single running time of the Java version.

You're also creating a HashSet with a large initial capacity, which means the backing HashMap will be created with that many available slots, unlike the Python where you create a basic Set. Hard to tell if that would hinder though, as when your HashSet grows it will have to reallocate the stored objects.

Trinatrinal answered 27/5, 2009 at 22:53 Comment(5)

As the OP measures the time spent in the loop the JVM startup time does not apply – Fein 27/5, 2009 at 23:1

Not necessarily, if you're writing a fast response tool that can start processing data as soon as it starts then you want it to start fast. – Summertime 27/5, 2009 at 23:8

"warm up time" if you prefer that phrase. – Drupe 27/5, 2009 at 23:38

I don't even get what you're trying to say with this anymore. As others have mentioned, the startup time doesn't count, which renders your first statement pretty much incorrect. As for the second, the preallocation of storage has to be an advantage for java in this case, so what's your point? – Metrist 28/5, 2009 at 0:2

I agree my initial assessment was hasty, but not completely incorrect. The JVM will optimize the compiled code in some cases (the -server flag for instance) and it's been shown countless times that the first (or more) run through of an app can take longer to execute some sections of code then later runs. – Trinatrinal 28/5, 2009 at 0:13

Are you using the -server flag with the jvm? You can't test for performance without it. (You also have to warm up the jvm before doing the test.)

Also, you probably want to use TreeSet<Integer>. HashSet will be slower in the long run.

And which jvm are you using? The newest I hope.

EDIT

When I say use TreeSet, I mean in general, not for this benchmark. TreeSet handles the real world issue of non even hashing of objects. If you get too many objects in the same bin in a HashSet, you will perform about O(n).

Sumerian answered 27/5, 2009 at 22:56 Comment(3)

He initializes startTime after the creation of the HashSet. – Tannin 27/5, 2009 at 23:6

Why would TreeSet be faster than HashSet? HashSet does resize (although not necessary in this case as it is given a oversized capacity (which can reduce preformance due to poor cache locality)). – Drupe 27/5, 2009 at 23:47

So don't use a bad hash function. Good hash function and you remain at O(1). – Drupe 28/5, 2009 at 0:1

If you really want to store primitive types in a set, and do heavy work on it, roll your own set in Java. The generic classes are not fast enough for scientific computing.

As Ants Aasma mentions, Python bypasses hashing and uses the integer directly. Java creates an Integer object (autoboxing), and then casts it to an Object (in your implementation). This object must be hashed, as well, for use in a hash set.

For a fun comparision, try this:

Java

import java.util.HashSet;
class SpeedTest
{ 
  public static class Element {
    private int m_i;
    public Element(int i) {
      m_i = i;
    }
  }

  public static void main(String[] args)
  {        
    long startTime;
    long totalTime;
    int iterations = 1000000;
    HashSet<Element> counts = new HashSet<Element>((int)(2*iterations), 0.75f);

    startTime = System.currentTimeMillis();
    for(int i=0; i<iterations; ++i)
    {
      counts.add(new Element(i));
    }
    totalTime = System.currentTimeMillis() - startTime;
    System.out.println("TOTAL TIME = "+( totalTime/1000f) );
    System.out.println(counts.size());
  }
}

Results:

$java SpeedTest
TOTAL TIME = 3.028
1000000

$java -Xmx1G -Xms1G SpeedTest
TOTAL TIME = 0.578
1000000

Python

#!/usr/bin/python
import time
import sys

class Element(object):
  def __init__(self, i):
    self.num = i

def main(args):    
    iterations = 1000000
    counts = set()
    startTime = time.time();    
    for i in range(0, iterations):
        counts.add(Element(i))
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)


if __name__ == "__main__":
  main(sys.argv)

Results:

$./speedtest.py 
total time = 20.6943161488
1000000

How's that for 'python is faster than java'?

Gratis answered 28/5, 2009 at 9:30 Comment(0)

How much memory did you start the JVM with? It depends? When I run the JVM with your program with 1 Gig of RAM:

$ java -Xmx1024M -Xms1024M -classpath . SpeedTest 
TOTAL TIME = 5.682
10000000
$ python speedtest.py 
total time = 4.48310899734
10000000

If I run the JVM with less memory, it takes longer ... considerably longer:

$ java -Xmx768M -Xms768M -classpath . SpeedTest 
TOTAL TIME = 6.706
10000000
$ java -Xmx600M -Xms600M -classpath . SpeedTest 
TOTAL TIME = 14.086
10000000

I think the HashSet is the performance bottleneck in this particular instance. If I replace the HashSet with a LinkedList, the program gets substantially faster.

Finally -- note that Java programs are initially interpreted and only those methods that are called many times are compiled. Thus, you're probably comparing Python to Java's interpreter, not the compiler.

Muscadel answered 27/5, 2009 at 23:14 Comment(6)

That's curious. The microbenchmark doesn't appear to create much reclaimable garbage, and you are setting -Xms. I assume you are not swapping. – Drupe 27/5, 2009 at 23:31

A gig of RAM? Dedicated to running the JVM on startup? Wow... ummmm... wow. VMWare doesn't even need that much to virtualize WinXP with good speed. Wow. – Kitts 27/5, 2009 at 23:40

It's only a 100 bytes but stored object, although I would hope it actually requires somewhat less. – Drupe 27/5, 2009 at 23:50

I assume the memory's affect on speed has something to do with the performance of the HashSet. – Muscadel 28/5, 2009 at 0:4

Note: When I start the JVM with 1/2 Gig, it gets an OOM error when running. I didn't try to see how much RAM the Python implementation takes when it runs (although I'd imagine it's less). – Muscadel 28/5, 2009 at 1:50

As you said yourself, the bottleneck is HashSet.add() - and the JIT gets plenty of opprtunities to compile that. That's definitely not the problem here. – Heliograph 28/5, 2009 at 9:44

Just a stab in the dark here, but some optimizations that Python is making that Java probably isn't:

The range() call in Python is creating all 10000000 integer objects at once, in optimized C code. Java must create an Integer object each iteration, which may be slower.
In Python, ints are immutable, so you can just store a reference to a global "42", for example, rather than allocating a slot for the object. I'm not sure how Java boxed Integer objects compare.
Many of the built-in Python algorithms and data structures are rather heavily optimized for special cases. For instance, the hash function for integers is, simply the identity function. If Java is using a more "clever" hash function, this could slow things down quite a bit. If most of your time is spent in data structure code, I wouldn't be surprised at all to see Python beat Java given the amount of effort that has been spent over the years hand-tuning the Python C implementation.

Predilection answered 28/5, 2009 at 1:53 Comment(3)

Integers are immutable in Java as well. Those in at least [-128, 127] will be shared. Those outside will be slower to allocate because of the check to see if they are inside. HashSet will rehash the hashCode from elements to make sure that even implementations that don't but their randommost bits near the bottom perform well. Vastly more effort has been put into making Java perform than Python, and it's also easier to do. – Drupe 28/5, 2009 at 2:22

I realize that ints are immutable in Java, but I was under the impression that Integer objects were mutable. As to the "vastly more effort", this may be true in some cases, but since many of Java's collections are written in Java and then (maybe) JIT compiled, where Python's are written in C and ahead-of-time compiled, Java's not really competing against Python here; it's competing against C -- and according to this microbenchmark, losing. – Predilection 28/5, 2009 at 11:17

I should mention that the use of range() is actually slowing the Python down quite a bit. Using xrange() instead (which is closer to the spirit of what the Java is doing) will skip the creation of the temporary list and (in my tests on Python 2.6.2) yields a speedup of 15-20%. – Predilection 28/5, 2009 at 14:8

A few changes for faster Python.

#!/usr/bin/python
import time
import sys

import psyco                 #<<<<  
psyco.full()

class Element(object):
    __slots__=["num"]        #<<<<
    def __init__(self, i):
        self.num = i

def main(args):    
    iterations = 1000000
    counts = set()
    startTime = time.time();
    for i in xrange(0, iterations):
        counts.add(Element(i))
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
  main(sys.argv)

Before

(env)~$ python speedTest.py
total time = 8.82906794548
1000000

After

(env)~$ python speedTest.py
total time = 2.44039201736
1000000

Now some good old cheating and ...

#!/usr/bin/python
import time
import sys
import psyco

psyco.full()

class Element(object):
    __slots__=["num"]
    def __init__(self, i):
        self.num = i

def main(args):    
    iterations = 1000000
    counts = set()
    elements = [Element(i) for i in range(0, iterations)]
    startTime = time.time();
    for e in elements:
        counts.add(e)
    totalTime = time.time() - startTime
    print 'total time =',totalTime
    print len(counts)

if __name__ == "__main__":
  main(sys.argv)

(env)~$ python speedTest.py
total time = 0.526521921158
1000000

Glissade answered 2/6, 2009 at 13:53 Comment(0)

Well, if you're going to tune the Java program, you might as well tune the Python program too.

>>> import timeit
>>> timeit.Timer('x = set()\nfor i in range(10000000):\n    x.add(i)').repeat(3, 1)
[2.1174559593200684, 2.0019571781158447, 1.9973630905151367]
>>> timeit.Timer('x = set()\nfor i in xrange(10000000):\n    x.add(i)').repeat(3, 1)
[1.8742368221282959, 1.8714439868927002, 1.869229793548584]
>>> timeit.Timer('x = set(xrange(10000000))').repeat(3, 1)
[0.74582195281982422, 0.73061800003051758, 0.73396801948547363]

Just using xrange makes it about 8% faster on my machine. And the expression set(xrange(10000000)) builds exactly the same set, but 2.5x faster (from 1.87 seconds to 0.74).

I like how tuning a Python program makes it shorter. :) But Java can do the same trick. As everyone knows, if you want a dense set of smallish integers in Java, you don't use a hash table. You use a java.util.BitSet!

BitSet bits = new BitSet(iterations);

startTime = System.currentTimeMillis();
bits.set(0, iterations, true);
totalTime = System.currentTimeMillis() - startTime;
System.out.println("TOTAL TIME = "+( totalTime/1000f) );
System.out.println(bits.cardinality());

That should be fairly quick. Unfortunately I don't have the time to test it right now.

Proclaim answered 20/11, 2009 at 23:53 Comment(0)

I agree with Gandalf about the startup time. Also, you are allocating a huge HashSet which is not at all similar to your python code. I imaging if you put this under a profiler, a good chunk of time would be spent there. Also, inserting new elements is really going to be slow with this size. I would look into TreeSet as suggested.

Berberine answered 27/5, 2009 at 23:1 Comment(2)

he starts recording the time after creating hashset – Sumerian 27/5, 2009 at 23:5

Why would insert time be slow. There's potentially a cache miss there, but you are probably going to get multiple with a TreeSet. – Drupe 27/5, 2009 at 23:48

The biggest issue is probably that what the given code measures is wall time -- what your watch measures -- but what should be measured to compare code runtime is process time -- the amount of time the cpu spend executing that particular code and not other tasks.

Alphonse answered 27/5, 2009 at 23:24 Comment(1)

Assuming the test was done in reasonable conditions and repeated, the two results should be roughly comparable. – Drupe 27/5, 2009 at 23:29

You can make the Java microbenchamrk much faster, by adding just a simple little extra.

    HashSet counts = new HashSet((2*iterations), 0.75f);

becomes

    HashSet counts = new HashSet((2*iterations), 0.75f) {
        @Override public boolean add(Object element) { return false; }
    };

Simple, faster and gets the same result.

Drupe answered 27/5, 2009 at 23:59 Comment(4)

+1. Very unhelpful, but a fun way to illustrate the problem with the benchmark. :) – Metrist 28/5, 2009 at 0:7

The benchmark is not the problem. It might not have God-like resolution but I believe you will have a difficult time arguing that the python version is slower than the java version, as implemented. Let`s just think of this as an exercise in data structures. Why might one set implementation be faster than the other? – Rasberry 28/5, 2009 at 0:33

Why is it an exercise in pointlessness? The production code could execute for hundreds of seconds and I am trying to figure out a way to shave off time. If someone can help me shave off 10%, I will happily take it. A better understanding of how these data structures work is just a bonus. – Rasberry 28/5, 2009 at 0:50

It's not production code. It's nothing like production code. It behaves nothing like production code. It is indeed misleading as to what production code will do. – Drupe 28/5, 2009 at 1:10

You might want to see if you can "prime" the JIT compiler into compiling the section of code you're interested in, by perhaps running it as a function once beforehand and sleeping briefly afterwords. This might allow the JVM to compile the function down to native code.

Unpin answered 28/5, 2009 at 1:59 Comment(0)

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags