How does PostgreSQL PL/Python compare with Python outside it in terms of performances?
Asked Answered
C

1

11

I run the exact same Python function, one as a PostgreSQL PL/Python, and the other one outside PostgreSQL as a usual Python script.

Surprisingly, when I call the PostgreSQL PL/Python using select * from pymax7(20000);, it takes on average 65 seconds, while when I call the usual Python script python myscript.py 20000 it takes an average 48 seconds. The averages were computed running the queries and scripts 10 times.

Should such a difference be expected? How does Python inside the PostgreSQL RDBMS (PL/Python) compares with Python outside it in terms of performances?

I'm running PostgreSQL 9.1 and Python 2.7 on Ubuntu 12.04 64bits.

PostgreSQL PL/Python:

CREATE FUNCTION pymax7 (b integer)
  RETURNS float
AS $$    
  a = 0
  for i in range(b):
    for ii in range(b):
      a = (((i+ii)%100)*149819874987) 
  return a
$$ LANGUAGE plpythonu;

Python:

import time
import sys

def pymax7 (b):     
    a = 0
    for i in range(b):
        for ii in range(b):
            a = (((i+ii)%100)*149819874987) # keeping Python busy
    return a

def main():    
    numIterations = int(sys.argv[1])        
    start = time.time()
    print pymax7(numIterations)
    end = time.time()
    print "Time elapsed in Python:"
    print str((end - start)*1000) + ' ms'        

if __name__ == "__main__":
    main()
Crucible answered 15/5, 2013 at 23:58 Comment(7)
Comparison with other PL languages would be more fair.Moffat
Since PL/x calls the interpreter for x (x = Python ; Perl ; etc), isn't it an interesting comparison to assess how much overhead PL/x has over x? Unless I'm missing something?Crucible
It will not call the interpreter executable, it will bind the target language using embedding mechanisms (in the case of Python, dynamically linking to python libraries, I guess).Moffat
Doesn't the PostgreSQL interpreter embed CPython? (someone says so in the 6th comment of https://mcmap.net/q/1072969/-can-nltk-be-used-in-a-postgres-python-stored-procedure)Crucible
That's extremely weird, as PostgreSQL indeed just embeds CPython. I would expect differences when you're doing SQL queries from within PL/Python, but not with regular processing. There should be some small overhead involved in launching PL/Python functions, but not seconds worth.Isometropia
I noticed the issue when writing 2 modules for MADlib (PostgreSQL addon) and the difference of runtime seems to increase as the number of iteration increases (= it's not a fixed cost): img849.imageshack.us/img849/7456/sofranckgraph.png ; img20.imageshack.us/img20/9621/sonazeengrph.png The only thing I can't think of right now would be that the CPython embedded within PostgreSQL is less optimized than the one I use outside PostgreSQL. Do you know if there is any way to check which Python interpreter PostgreSQL 9.1 embeds?Crucible
(although I would find it surprising that there would be such a difference of performance between 2 CPython versions)Crucible
F
5

There shouldn't be any difference. Both of your test cases have about the same run time for me, 53 seconds plus or minus 1.

I did adjust the PL/Python test case to use the same measuring technique as the plain Python test case:

CREATE FUNCTION pymax7a (b integer)
  RETURNS float
AS $$
  import time
  start = time.time()
  a = 0
  for i in range(b):
    for ii in range(b):
      a = (((i+ii)%100)*149819874987)
  end = time.time()
  plpy.info("Time elapsed in Python: " + str((end - start)*1000) + ' ms')
  return a
$$ LANGUAGE plpythonu;

This would tell you if there is any non-Python overhead involved. FWIW, for me, the difference between what this printed and what psql on the client printed as the total time was consistently less than 1 millisecond.

Fidelia answered 16/5, 2013 at 12:59 Comment(3)
Thanks Peter, interesting! Could you please tell me which version of PostgreSQL, Python, and OS you use? I've run the test again on a brand new Ubuntu 12.04 64bits with PostgreSQL 9.1 and Python 2.7.3 as in the original question, I still have the runtime difference. Maybe this is due to the fact that PostgreSQL 9.1 (released on 2011-09-12) embeds some older version of Python than the Python 2.7.3 I use (released on 2012-08-01).Crucible
Although PostgreSQL 9.1 seems to use Python 2.7: packages.debian.org/sid/postgresql-plpython-9.1 (why does the package need dep: libpython2.7 (>= 2.7) whereas PostgreSQL embeds CPython?)Crucible
I tested with PostgreSQL 9.2 (with Ubuntu 12.10 this time), I still have the runtime difference on my server (although overall 10% faster, probably due to some new version of CPython). I use virtual machines, so maybe this can interfere although I wouldn't understand why. I've also tried using plpython3u, it is far slower (88 seconds), but when running the Python script using python3 it is also slower (75 seconds) and anyway I still have the runtime difference, which is the focus of this thread.Crucible

© 2022 - 2024 — McMap. All rights reserved.