Am I stupid or is Julia just insanely faster than python?

Asked 15/10, 2022 at 15:59 Answered 15/10, 2022 at 21:19

python arrays performance dictionary julia

I'm trying to do a pretty simple task in Python that I have already done in Julia. It consists of taking an array of multiple 3d elements and making a dictionary of indexes of unique values from that list (note the list is 6,000,000 elements long). I have done this in Julia and it is reasonably fast (6 seconds) - here is the code:

function unique_ids(itr)
#create dictionary where keys have type of whatever itr is 
 d = Dict{eltype(itr), Vector}()
#iterate through values in itr
 for (index,val) in enumerate(itr)
    #check if the dictionary 
   if haskey(d, val)
     push!(d[val],index)
   else
     #add value of itr if its not in v yet 
     d[val] = [index]
   end
 end
 return collect(values(d))
end

So far so good. However, when I try doing this in Python, it seems to take forever, so long that I can't even tell you how long. So the question is, am I doing something dumb here, or is this just the reality of the differences between these two languages? Here is my Python code, a translation of the Julia code.

def unique_ids(row_list):
    d = {}
    for (index,val) in tqdm(enumerate(row_list)):
        if str(val) in d:
            d[str(val)].extend([index])
        else:
            d[str(val)] = [index]
    return list(d.values())

Note that I use strings for the keys of the dict in Python as it is not possible to have an array as a key in Python.

Heraldry answered 15/10, 2022 at 15:59 Comment(15)

it is possible to have a tuple as key, that will cut down the time greatly. – Quickfreeze 15/10, 2022 at 16:5

If the keys in your Julia dict are Strings quite likely you can still speed up the Julia code by using Symbols or using ShortStrings.jl instead of just using Strings (depends on particular use case scenario but the speedup could be significant) – Sprain 15/10, 2022 at 16:7

can you provide a minimal reproducible example that people can copy and paste ? or an example of what row_list is like ? people will explain what you are doing wrong in python. – Quickfreeze 15/10, 2022 at 16:8

it's not uncommon for Julia to have a 100x-1000x speed up compare to pure Python for loops – Rainie 15/10, 2022 at 16:14

My suspicion is that the tqdm call causes the whole list to be read into memory at once, where you'd like to keep a generator. Does it help if you take it out? It's definitely not necessary, and your Julia code does not seem to be doing anything like this. Perhaps see also #49320507 – Oppugnant 15/10, 2022 at 16:17

I would expect it to be faster, since Julia does JIT compilation, that said, if it takes so long you it does not complete, perhaps it has a bug - have you executed in a debugger to ensire it is progressing?. You are probably not stupid. I am not familiar with either language but are they even equivalent implementations? – Radke 15/10, 2022 at 16:19

See julialang.org/benchmarks. Julia is certainly expected to be far faster than idiomatic Python. – Entreat 15/10, 2022 at 16:25

Your Julia dictionary is not concretely typed, that may cause some loss of performance. Vector by itself is not fully specified. You presumably mean Dict{eltype(itr), Vector{Int}}() – Uella 15/10, 2022 at 16:38

I checked, fixing the type signature of the dictionary gave me a 2.5x speedup. – Uella 15/10, 2022 at 16:54

Minor note: d[str(val)].extend([index]) is silly. Just use d[str(val)].append(index). Also, if val is already a str, stop calling str on it; if it's not a str, call it once upfront and don't convert it multiple times. And collections.defaultdict(list) exists for avoiding the needlessly complex code you've got checking for the existence of a key each time. – Heer 15/10, 2022 at 21:22

Note that the Julia code should use pairs(itr) rather than enumerate(itr) in the loop header, to be properly generic. Using pairs is just as performant too (it actually improves the performance very slightly, by 1-2%), so it's a good practice to get into. – Tutor 16/10, 2022 at 3:33

I made a simple multi-threaded version of the Julia code, feedback welcome. On my machine, julia --threads=auto starts with 4 threads available, and with that, this code runs about 2.5x faster than the serial version. – Tutor 16/10, 2022 at 4:43

(That's 2.5x compared to the code with the concretely typed Dict{eltype(itr), Vector{Int}}(). When compared to the code in the question, it's 3.5x faster for me.) – Tutor 16/10, 2022 at 5:13

Thanks a lot for all this great informations ! learning a lot about small coding stuff I love it. It seems that the second comment got a properly coded function that works well so will use that ! thank you all – Heraldry 16/10, 2022 at 19:42

@Maboi Make sure to also use Vector{Int} in the dictionary signature, that's a fundamental concern in Julia. – Uella 17/10, 2022 at 7:18

I think the bottom line is this type of function can definitely run in python in less than 6 seconds.

I think the main issue as many people have pointed out is tqdm and using a string in the dictionary. If you take these out it gets a lot faster. Interesting, swapping to the collections.defaultdict really helps as well. If I get a moment I will write the equivalent function in C++ using the python C API and I will append those results.

I have included the test code below, for 1 test iteration with 6,000,000 I get 4.9 secs best in python; this is with an i9-9980XE processor, I don't know what your test is on. Note the key is critical, if I swap the tuple to be an int the time is 1.48 secs, so how the input data is presented makes a huge difference.

Method	Time	Relative
Original	16.01739319995977	10.804717667865178
Removing tqdm	12.82462279999163	8.650997501336496
Removing to string and just using tuple	5.3935559000237845	3.6382854561971936
Removing double dictionary lookup	4.682285099988803	3.1584895191283664
Using collections defaultdict	4.493273599946406	3.0309896277014063
Using defaultdict with int key	1.4824443999677896	1.0

Looking over a smaller dataset (1,000,000), but more iterations (100) I get a closer gap:

Method	Time	Relative
Original	253.63316379999742	4.078280268213264
Removing tqdm	195.89607029996114	3.1498999032904
Removing to string and just using tuple	69.18050129996845	1.1123840004584065
Removing double dictionary lookup	68.65376710001146	1.1039144073575153
Using collections defaultdict	62.19120489998022	1.0

The Julia benchmarks do look very interesting. I haven't had a chance to look at these in detail but I do wonder how much the python benchmarks leverage libraries like numpy as scipy.

With this test code:

from tqdm import tqdm
import timeit, random
from collections import defaultdict

random.seed(1)

rand_max = 100
data_size = 1000000
iterations = 100

data = [tuple(random.randint(0, rand_max) for i in range(3)) for j in range(data_size)]
data2 = [t[0] for t in data]

def method0(row_list):
    d = {}
    for (index,val) in tqdm(enumerate(row_list)):
        if str(val) in d:
            d[str(val)].extend([index])
        else:
            d[str(val)] = [index]
    return list(d.values())

def method1(row_list):
    d = {}
    for index,val in enumerate(row_list):
        if str(val) in d:
            d[str(val)].extend([index])
        else:
            d[str(val)] = [index]
    return list(d.values())

def method2(row_list):
    d = {}
    for index, val in enumerate(row_list):
        if val in d:
            d[val].extend([index])
        else:
            d[val] = [index]
    return list(d.values())

def method3(row_list):
    d = {}
    for index, val in enumerate(row_list):
        if (l := d.get(val)):
            l.append(index)
        else:
            d[val] = [index]
    return d.values()

def method4(row_list):
    d = defaultdict(list)
    for (index,val) in enumerate(row_list):
        d[val].append(index)
    return list(d.values())

assert (m0 := method0(data)) == method1(data)
assert m0 == method2(data)
assert (m0 := sorted(m0)) == sorted(method3(data))
assert m0 == sorted(method4(data))

t0 = timeit.timeit(lambda: method0(data), number=iterations)
t1 = timeit.timeit(lambda: method1(data), number=iterations)
t2 = timeit.timeit(lambda: method2(data), number=iterations)
t3 = timeit.timeit(lambda: method3(data), number=iterations)
t4 = timeit.timeit(lambda: method4(data), number=iterations)

tmin = min((t0, t1, t2, t3, t4))

print(f'| Method                                  | Time | Relative      |')
print(f'|------------------                       |----------------------|')
print(f'| Original                                | {t0} | {t0 / tmin}   |')
print(f'| Removing tqdm                           | {t1} | {t1 / tmin}   |')
print(f'| Removing to string and just using tuple | {t2} | {t2 / tmin}   |')
print(f'| Removing double dictionary lookup       | {t3} | {t3 / tmin}   |')
print(f'| Using collections defaultdict           | {t4} | {t4 / tmin}   |')

Mong answered 15/10, 2022 at 21:19 Comment(0)

Python is slower, it's just not as slow as the author thinks, to show this with optimized code:

language	time (seconds)
cpython 3.11	4.6
cpython + Numba (Jit)	3.3
Julia 1.8.5	1.9

Python is only 2.5 times slower at this, and only 1.5 times slower when numba is in the mix, so if you need the last drop of performance then use julia, but python would win in other aspects, like being compiled to a lightweight executable or having more libraries for general purpose non-numerical programming.

Optimized Python only code:

import random
from collections import defaultdict

input_list_len = 6_000_000
rand_max = 100
row_list = []
for _ in range(input_list_len):
    row_list.append(tuple(random.randint(0,rand_max) for x in range(3)))

def unique_ids(row_list):
    d = defaultdict(list)
    for (index,val) in enumerate(row_list):
        d[val].append(index)

    return list(d.values())

import time
t1 = time.time()
output = unique_ids(row_list)
t2 = time.time()
print(f"total time = {t2-t1}")

Optimized numba LLVM jit code:

import random
import numba
from numba.typed.typedlist import List
from numba.typed.typeddict import Dict
import numba.types
input_list_len = 6_000_000
rand_max = 100

@numba.njit
def generate_list():
    row_list = List()
    for _ in range(input_list_len):
        a = random.randint(0,rand_max)
        b = random.randint(0,rand_max)
        c = random.randint(0,rand_max)
        row_list.append((a,b,c))
    return row_list

row_list = generate_list()

@numba.njit("ListType(ListType(int64))(ListType(UniTuple(int64,3)))")
def unique_ids(row_list):
    d = Dict()
    for (index,val) in enumerate(row_list):
        if val in d:
            d[val].append(index)
        else:
            a = List()
            a.append(index)
            d[val] = a

    return List(d.values())

import time
t1 = time.time()
output = unique_ids(row_list)
t2 = time.time()
print(f"total time = {t2-t1}")

Optimized julia code:

using Random
Random.seed!(3);

input_list_len = 6_000_000
rand_max = 100
data::Vector{Tuple{Int64,Int64,Int64}} = Vector{Tuple{Int64,Int64,Int64}}()
for _ in 1:input_list_len
    push!(data, Tuple{Int64,Int64,Int64}(rand(0:rand_max,3)))
end

function unique_ids(itr)
    #create dictionary where keys have type of whatever itr is 
     d = Dict{eltype(itr), Vector{Int}}()
    #iterate through values in itr
     for (index,val) in enumerate(itr)
        #check if the dictionary 
       if haskey(d, val)
         push!(d[val],index);
       else
         #add value of itr if its not in v yet 
         d[val] = [index]
       end
     end
     return collect(values(d))
    end

@time unique_ids(data);
@time unique_ids(data);

Quickfreeze answered 15/10, 2022 at 16:18 Comment(14)

or use pypy which is also JIT – Malevolent 15/10, 2022 at 16:28

Languages are not interpreted or compiled; implementations are. CPython (what everyone thinks of as "Python") compiles Python source to Python byte code, which is then interpreted by a virtual machine. There are other Python implementations, though: PyPy is also a JIT compiler for Python. – Akbar 15/10, 2022 at 16:29

you guys are correct, i forgot that part for a second as i was trying to mimic the author's question. – Quickfreeze 15/10, 2022 at 16:31

"write this operation in C/C++ and add it to CPython very easily to be faster than julia or any other compiled language"... pardon? C/C++ are compiled languages, and Julia is sometimes faster than them (there are operations where it gets into Fortran territory; and yes, Fortran is still faster than C in some cases). See julialang.org/benchmarks -- the blue line is C, and you'll see there are several languages that have benchmarks where they come out faster than an idiomatic C implementation. So I'm not sure what your statement means, and even more unsure about its correctness. – Entreat 15/10, 2022 at 20:45

...remember, calling between C and Python means you've got ABI overhead at the interface point, plus whatever parts are native Python are still running at the Python interpreter's pace (and whatever parts of the C module are calling into CPython to inspect Python-native structures are likewise bound by the Python runtime implementation's performance decisions and tradeoffs). It'll never be faster than 100% C, unless you're using the word to mean "faster to develop", which is an entirely different comparison. – Entreat 15/10, 2022 at 20:47

to be honest it's the best of both worlds, the applications are usually faster to develop in python, and tight loops are rewritten in C to get near-C performance, if someone is developing a large scale application on a tight time budget then python is usually the tool for the job, i'll modify the answer slightly to reflect that. – Quickfreeze 15/10, 2022 at 20:51

I agree that Python often makes excellent tradeoffs, but edit2 doesn't describe things in terms of tradeoffs -- it's making absolute assertions. I have a lot of history with Python and C under my belt, and while I like Julia immensely in theory -- it's a language that's about as readable and easy to write as Python, while compiling to code that's comparable if not faster than what a good C compiler puts out -- Julia also has a history of making changes that break API compatibility across releases more frequently than folks might expect with a language that's been stable longer. – Entreat 15/10, 2022 at 20:53

...which is to say, I would often, personally, choose to use Python+C instead of Julia for compelling and practical reasons (library availability being another of those reasons), but I would never claim that the combination has better performance; julia's combination of ease-of-use and performance is exceptional. – Entreat 15/10, 2022 at 20:54

(folks reviewing the above comments should consider them in the context of the "edit2" section as it existed as of revision 8 of this answer) – Entreat 15/10, 2022 at 20:59

What history of breaking API compatibility is that, @CharlesDuffy? Are you saying that they are not keeping the guarantees of semantic versioning, which they claim to adhere to? – Uella 15/10, 2022 at 21:56

@DNF, to be honest, it's quite possible that I'm still judging them by pre-1.0 churn. Would need to dig into an ex-employer's codebase I no longer have access to to find the examples in mind (or nail down a specific timeline). – Entreat 15/10, 2022 at 22:3

I would think that is the case. 1.0 was released in 2018. I am not aware of any breaking changes since, and that is supposed to be a guarantee. – Uella 15/10, 2022 at 22:8

pre-2018 is extremely believable. – Entreat 15/10, 2022 at 22:16

I ran your code as well as Julia's and still Julia's code is 2X faster. If I time the whole code including data creation in both languages, Julia is even 7X faster. – Technics 15/10, 2022 at 22:42

Hot tags

Godot Unity Godot Help Programming Godot 4.X GUI GDScript 3D 2D Physics CSharp Godot 3.X VR XR Projects C++

Recommended topics

Hot tags