How are import statements in plpython handled?
Asked Answered
P

2

17

I have a plypython function which does some json magic. For this it obviously imports the json library.

Is the import called on every call to the function? Are there any performance implication I have to be aware of?

Pappus answered 22/2, 2013 at 11:24 Comment(0)
J
20

The import is executed on every function call. This is the same behavior you would get if you wrote a normal Python module with the import statement inside a function body as oppposed to at the module level.

Yes, this will affect performance.

You can work around this by caching your imports like this:

CREATE FUNCTION test() RETURNS text
LANGUAGE plpythonu
AS $$
if 'json' in SD:
    json = SD['json']
else:
    import json
    SD['json'] = json

 return json.dumps(...)
$$;

This is admittedly not very pretty, and better ways to do this are being discussed, but they won't happen before PostgreSQL 9.4.

Jumbo answered 22/2, 2013 at 13:32 Comment(8)
Ah, only after your answer I looked up in the documentation that there is a global dictionary SD. Thank you very much.Pappus
@Peter Eisentraut do you know if there have been any improvements to this issue in recent pg versions?Reaction
There have not been any changes to this.Jumbo
How about now? Also, where would you find that change?Nocturne
Is this still the recommended way for handling slow imports in PostgreSQL 11 and above?Alkyne
Yes, no changes.Jumbo
@PeterEisentraut: Unless PL/Python functions behave differently than Python's, then you may be mistaken. Importing a module for the first time is expensive because of the steps involved between locating the script and building the module object. Once built, though, that object is cached in sys.modules and subsequent imports will simply fetch it from there.Oliguria
@PeterEisentraut I think this is wrong. if you run do $$ import functools $$ language plpython3u; Then do $$ import sys if 'functools' in sys.modules: plpy.notice('functools has already been imported') else: plpy.notice('functools has not been imported') $$ language plpython3u; The output is 'functools has already been imported'Nocturne
O
5

The declaration in the body of a PL/Python function will eventually become an ordinary Python function and will thus behave as such. When a Python function imports a module for the first time the module is cached in the sys.modules dictionary (https://docs.python.org/3/reference/import.html#the-module-cache). Subsequent imports of the same module will simply bind the import name to the module object found in the dictionary. In a sense, what I'm saying may cast some doubt on the usefulness of the tip given in the accepted answer, since it makes it somewhat redundant, as Python already does a similar caching for you.

To sum things up, I'd say that if you import in the standard way of simply using the import or from [...] import constructs, then you need not worry about repeated imports, in functions or otherwise, Python has got you covered.

On the other hand, Python allows you to bypass its native import semantics and to implement your own (with the __import__() function and importlib module). If this is what you're doing, maybe you should review what's available in the toolbox (https://docs.python.org/3/reference/import.html).

Oliguria answered 15/3, 2020 at 20:55 Comment(3)
This is correct. import will not do anything if the function has already been imported in a different function. I ran into this issue when writing a module around plpython. You need to restart the database (which in turn restarts the interpreter) for imports to be re-run. Or like you suggest, mess around with importlib.Nocturne
Note that the imported modules are cached per database connection. So if your backend has multiple connections to the database (e.g., through a connection pool) each database connection has it's own set of cached modules. This hit me because I had a "large" import in a BEFORE INSERT OR UPDATE ON my_table trigger. This trigger caused a noticeable 3-4 second delay for the first triggering transaction of every database connection. Ouch. I fixed it via a manual preload on connection initialization (move the import delay up front).Telethermometer
This is the answer, not the accepted one. TLDR: import all you want, it will always only get executed once per python process.Sapsago

© 2022 - 2024 — McMap. All rights reserved.