Am I stupid or is Julia just insanely faster than python?
Asked Answered
H

2

8

I'm trying to do a pretty simple task in Python that I have already done in Julia. It consists of taking an array of multiple 3d elements and making a dictionary of indexes of unique values from that list (note the list is 6,000,000 elements long). I have done this in Julia and it is reasonably fast (6 seconds) - here is the code:

function unique_ids(itr)
#create dictionary where keys have type of whatever itr is 
 d = Dict{eltype(itr), Vector}()
#iterate through values in itr
 for (index,val) in enumerate(itr)
    #check if the dictionary 
   if haskey(d, val)
     push!(d[val],index)
   else
     #add value of itr if its not in v yet 
     d[val] = [index]
   end
 end
 return collect(values(d))
end

So far so good. However, when I try doing this in Python, it seems to take forever, so long that I can't even tell you how long. So the question is, am I doing something dumb here, or is this just the reality of the differences between these two languages? Here is my Python code, a translation of the Julia code.

def unique_ids(row_list):
    d = {}
    for (index,val) in tqdm(enumerate(row_list)):
        if str(val) in d:
            d[str(val)].extend([index])
        else:
            d[str(val)] = [index]
    return list(d.values())

Note that I use strings for the keys of the dict in Python as it is not possible to have an array as a key in Python.

Heraldry answered 15/10, 2022 at 15:59 Comment(15)
it is possible to have a tuple as key, that will cut down the time greatly.Quickfreeze
If the keys in your Julia dict are Strings quite likely you can still speed up the Julia code by using Symbols or using ShortStrings.jl instead of just using Strings (depends on particular use case scenario but the speedup could be significant)Sprain
can you provide a minimal reproducible example that people can copy and paste ? or an example of what row_list is like ? people will explain what you are doing wrong in python.Quickfreeze
it's not uncommon for Julia to have a 100x-1000x speed up compare to pure Python for loopsRainie
My suspicion is that the tqdm call causes the whole list to be read into memory at once, where you'd like to keep a generator. Does it help if you take it out? It's definitely not necessary, and your Julia code does not seem to be doing anything like this. Perhaps see also #49320507Oppugnant
I would expect it to be faster, since Julia does JIT compilation, that said, if it takes so long you it does not complete, perhaps it has a bug - have you executed in a debugger to ensire it is progressing?. You are probably not stupid. I am not familiar with either language but are they even equivalent implementations?Radke
See julialang.org/benchmarks. Julia is certainly expected to be far faster than idiomatic Python.Entreat
Your Julia dictionary is not concretely typed, that may cause some loss of performance. Vector by itself is not fully specified. You presumably mean Dict{eltype(itr), Vector{Int}}()Uella
I checked, fixing the type signature of the dictionary gave me a 2.5x speedup.Uella
Minor note: d[str(val)].extend([index]) is silly. Just use d[str(val)].append(index). Also, if val is already a str, stop calling str on it; if it's not a str, call it once upfront and don't convert it multiple times. And collections.defaultdict(list) exists for avoiding the needlessly complex code you've got checking for the existence of a key each time.Heer
Note that the Julia code should use pairs(itr) rather than enumerate(itr) in the loop header, to be properly generic. Using pairs is just as performant too (it actually improves the performance very slightly, by 1-2%), so it's a good practice to get into.Tutor
I made a simple multi-threaded version of the Julia code, feedback welcome. On my machine, julia --threads=auto starts with 4 threads available, and with that, this code runs about 2.5x faster than the serial version.Tutor
(That's 2.5x compared to the code with the concretely typed Dict{eltype(itr), Vector{Int}}(). When compared to the code in the question, it's 3.5x faster for me.)Tutor
Thanks a lot for all this great informations ! learning a lot about small coding stuff I love it. It seems that the second comment got a properly coded function that works well so will use that ! thank you allHeraldry
@Maboi Make sure to also use Vector{Int} in the dictionary signature, that's a fundamental concern in Julia.Uella
M
2

I think the bottom line is this type of function can definitely run in python in less than 6 seconds.

I think the main issue as many people have pointed out is tqdm and using a string in the dictionary. If you take these out it gets a lot faster. Interesting, swapping to the collections.defaultdict really helps as well. If I get a moment I will write the equivalent function in C++ using the python C API and I will append those results.

I have included the test code below, for 1 test iteration with 6,000,000 I get 4.9 secs best in python; this is with an i9-9980XE processor, I don't know what your test is on. Note the key is critical, if I swap the tuple to be an int the time is 1.48 secs, so how the input data is presented makes a huge difference.

Method Time Relative
Original 16.01739319995977 10.804717667865178
Removing tqdm 12.82462279999163 8.650997501336496
Removing to string and just using tuple 5.3935559000237845 3.6382854561971936
Removing double dictionary lookup 4.682285099988803 3.1584895191283664
Using collections defaultdict 4.493273599946406 3.0309896277014063
Using defaultdict with int key 1.4824443999677896 1.0

Looking over a smaller dataset (1,000,000), but more iterations (100) I get a closer gap:

Method Time Relative
Original 253.63316379999742 4.078280268213264
Removing tqdm 195.89607029996114 3.1498999032904
Removing to string and just using tuple 69.18050129996845 1.1123840004584065
Removing double dictionary lookup 68.65376710001146 1.1039144073575153
Using collections defaultdict 62.19120489998022 1.0

The Julia benchmarks do look very interesting. I haven't had a chance to look at these in detail but I do wonder how much the python benchmarks leverage libraries like numpy as scipy.

With this test code:

from tqdm import tqdm
import timeit, random
from collections import defaultdict

random.seed(1)

rand_max = 100
data_size = 1000000
iterations = 100

data = [tuple(random.randint(0, rand_max) for i in range(3)) for j in range(data_size)]
data2 = [t[0] for t in data]

def method0(row_list):
    d = {}
    for (index,val) in tqdm(enumerate(row_list)):
        if str(val) in d:
            d[str(val)].extend([index])
        else:
            d[str(val)] = [index]
    return list(d.values())

def method1(row_list):
    d = {}
    for index,val in enumerate(row_list):
        if str(val) in d:
            d[str(val)].extend([index])
        else:
            d[str(val)] = [index]
    return list(d.values())

def method2(row_list):
    d = {}
    for index, val in enumerate(row_list):
        if val in d:
            d[val].extend([index])
        else:
            d[val] = [index]
    return list(d.values())

def method3(row_list):
    d = {}
    for index, val in enumerate(row_list):
        if (l := d.get(val)):
            l.append(index)
        else:
            d[val] = [index]
    return d.values()

def method4(row_list):
    d = defaultdict(list)
    for (index,val) in enumerate(row_list):
        d[val].append(index)
    return list(d.values())

assert (m0 := method0(data)) == method1(data)
assert m0 == method2(data)
assert (m0 := sorted(m0)) == sorted(method3(data))
assert m0 == sorted(method4(data))

t0 = timeit.timeit(lambda: method0(data), number=iterations)
t1 = timeit.timeit(lambda: method1(data), number=iterations)
t2 = timeit.timeit(lambda: method2(data), number=iterations)
t3 = timeit.timeit(lambda: method3(data), number=iterations)
t4 = timeit.timeit(lambda: method4(data), number=iterations)

tmin = min((t0, t1, t2, t3, t4))

print(f'| Method                                  | Time | Relative      |')
print(f'|------------------                       |----------------------|')
print(f'| Original                                | {t0} | {t0 / tmin}   |')
print(f'| Removing tqdm                           | {t1} | {t1 / tmin}   |')
print(f'| Removing to string and just using tuple | {t2} | {t2 / tmin}   |')
print(f'| Removing double dictionary lookup       | {t3} | {t3 / tmin}   |')
print(f'| Using collections defaultdict           | {t4} | {t4 / tmin}   |')
Mong answered 15/10, 2022 at 21:19 Comment(0)
Q
1

Python is slower, it's just not as slow as the author thinks, to show this with optimized code:

language time (seconds)
cpython 3.11 4.6
cpython + Numba (Jit) 3.3
Julia 1.8.5 1.9

Python is only 2.5 times slower at this, and only 1.5 times slower when numba is in the mix, so if you need the last drop of performance then use julia, but python would win in other aspects, like being compiled to a lightweight executable or having more libraries for general purpose non-numerical programming.

Optimized Python only code:

import random
from collections import defaultdict

input_list_len = 6_000_000
rand_max = 100
row_list = []
for _ in range(input_list_len):
    row_list.append(tuple(random.randint(0,rand_max) for x in range(3)))

def unique_ids(row_list):
    d = defaultdict(list)
    for (index,val) in enumerate(row_list):
        d[val].append(index)

    return list(d.values())

import time
t1 = time.time()
output = unique_ids(row_list)
t2 = time.time()
print(f"total time = {t2-t1}")

Optimized numba LLVM jit code:

import random
import numba
from numba.typed.typedlist import List
from numba.typed.typeddict import Dict
import numba.types
input_list_len = 6_000_000
rand_max = 100

@numba.njit
def generate_list():
    row_list = List()
    for _ in range(input_list_len):
        a = random.randint(0,rand_max)
        b = random.randint(0,rand_max)
        c = random.randint(0,rand_max)
        row_list.append((a,b,c))
    return row_list

row_list = generate_list()

@numba.njit("ListType(ListType(int64))(ListType(UniTuple(int64,3)))")
def unique_ids(row_list):
    d = Dict()
    for (index,val) in enumerate(row_list):
        if val in d:
            d[val].append(index)
        else:
            a = List()
            a.append(index)
            d[val] = a

    return List(d.values())

import time
t1 = time.time()
output = unique_ids(row_list)
t2 = time.time()
print(f"total time = {t2-t1}")

Optimized julia code:

using Random
Random.seed!(3);

input_list_len = 6_000_000
rand_max = 100
data::Vector{Tuple{Int64,Int64,Int64}} = Vector{Tuple{Int64,Int64,Int64}}()
for _ in 1:input_list_len
    push!(data, Tuple{Int64,Int64,Int64}(rand(0:rand_max,3)))
end

function unique_ids(itr)
    #create dictionary where keys have type of whatever itr is 
     d = Dict{eltype(itr), Vector{Int}}()
    #iterate through values in itr
     for (index,val) in enumerate(itr)
        #check if the dictionary 
       if haskey(d, val)
         push!(d[val],index);
       else
         #add value of itr if its not in v yet 
         d[val] = [index]
       end
     end
     return collect(values(d))
    end

@time unique_ids(data);
@time unique_ids(data);
Quickfreeze answered 15/10, 2022 at 16:18 Comment(14)
or use pypy which is also JITMalevolent
Languages are not interpreted or compiled; implementations are. CPython (what everyone thinks of as "Python") compiles Python source to Python byte code, which is then interpreted by a virtual machine. There are other Python implementations, though: PyPy is also a JIT compiler for Python.Akbar
you guys are correct, i forgot that part for a second as i was trying to mimic the author's question.Quickfreeze
"write this operation in C/C++ and add it to CPython very easily to be faster than julia or any other compiled language"... pardon? C/C++ are compiled languages, and Julia is sometimes faster than them (there are operations where it gets into Fortran territory; and yes, Fortran is still faster than C in some cases). See julialang.org/benchmarks -- the blue line is C, and you'll see there are several languages that have benchmarks where they come out faster than an idiomatic C implementation. So I'm not sure what your statement means, and even more unsure about its correctness.Entreat
...remember, calling between C and Python means you've got ABI overhead at the interface point, plus whatever parts are native Python are still running at the Python interpreter's pace (and whatever parts of the C module are calling into CPython to inspect Python-native structures are likewise bound by the Python runtime implementation's performance decisions and tradeoffs). It'll never be faster than 100% C, unless you're using the word to mean "faster to develop", which is an entirely different comparison.Entreat
to be honest it's the best of both worlds, the applications are usually faster to develop in python, and tight loops are rewritten in C to get near-C performance, if someone is developing a large scale application on a tight time budget then python is usually the tool for the job, i'll modify the answer slightly to reflect that.Quickfreeze
I agree that Python often makes excellent tradeoffs, but edit2 doesn't describe things in terms of tradeoffs -- it's making absolute assertions. I have a lot of history with Python and C under my belt, and while I like Julia immensely in theory -- it's a language that's about as readable and easy to write as Python, while compiling to code that's comparable if not faster than what a good C compiler puts out -- Julia also has a history of making changes that break API compatibility across releases more frequently than folks might expect with a language that's been stable longer.Entreat
...which is to say, I would often, personally, choose to use Python+C instead of Julia for compelling and practical reasons (library availability being another of those reasons), but I would never claim that the combination has better performance; julia's combination of ease-of-use and performance is exceptional.Entreat
(folks reviewing the above comments should consider them in the context of the "edit2" section as it existed as of revision 8 of this answer)Entreat
What history of breaking API compatibility is that, @CharlesDuffy? Are you saying that they are not keeping the guarantees of semantic versioning, which they claim to adhere to?Uella
@DNF, to be honest, it's quite possible that I'm still judging them by pre-1.0 churn. Would need to dig into an ex-employer's codebase I no longer have access to to find the examples in mind (or nail down a specific timeline).Entreat
I would think that is the case. 1.0 was released in 2018. I am not aware of any breaking changes since, and that is supposed to be a guarantee.Uella
pre-2018 is extremely believable.Entreat
I ran your code as well as Julia's and still Julia's code is 2X faster. If I time the whole code including data creation in both languages, Julia is even 7X faster.Technics

© 2022 - 2024 — McMap. All rights reserved.