Best practice for lazy loading Python modules
Asked Answered
M

5

60

Occasionally I want lazy module loading in Python. Usually because I want to keep runtime requirements or start-up times low and splitting the code into sub-modules would be cumbersome. A typical use case and my currently preferred implementation is this:

jinja2 = None

class Handler(...):
    ...
    def render_with_jinja2(self, values, template_name):
        global jinja2
        if not jinja2:
            import jinja2
        env = jinja2.Environment(...)
        ...

I wonder: is there a canonical/better way to implement lazy module loading?

Millenarianism answered 14/11, 2010 at 13:47 Comment(2)
Another reason to lazy load might be because it's a poorly written third party module that depends on global environment settings that need to be changed before importing. (These modules are much fun, of course.)Imperator
FWIW, lazy import functionality is drafted in a PEP and in discussion as of Aug 2022.Anis
C
87

There's no reason for you to keep track of imports manually -- the VM maintains a list of modules that have already been imported, and any subsequent attempts to import that module result in a quick dict lookup in sys.modules and nothing else.

The difference between your code and

def render_with_jinja2(self, values, template_name):
    import jinja2
    env = jinja2.Environment(...)

is zero -- when we hit that code, if jinja2 hasn't been imported, it is imported then. If it already has been, execution continues on.

Cassycast answered 14/11, 2010 at 13:55 Comment(4)
Also an obvious difference is, that you code pulls the jinja2 only in the local scope - or do I miss something here?Millenarianism
@mdorseif -- there are two different issues. (1) yes, the jinja2 name is only visible inside that function's scope, but (2) the jinja2 module remains cached in sys.modules, so additional imports (on subsequent calls to that function or anywhere else) are effectively no-ops. I assume that you were planning on putting that 'global jinja2; if not jinja2:' code in multiple functions?Cassycast
@bgporter: occasionaly. I probably used have a dozen lazy import approaches over the years and started wondering what the best way to do it is. There seems to be some consensus my example (module = None) is "bad practice" but little reasoning. I think the 'glancing at the module header' is something which shouldn't be lost. Perhaps a 'from buildins import None as jinja2' approach? But looks horrible, too.Millenarianism
@mdorseif: the reason jinja2 = None is bad is because it is so unusual. When I see that line, I don't think, "He's going to import it later, jinja2 is a module". It looks like you are initializing a global variable to None. The only reason I have any inkling to your intent is because "jinja2" is a well-known module name in the community. If I saw chemical3 = None, I'd have no idea what you are doing. Import the module locally in functions, it's the best way. (cont)Hilda
E
31
class Handler(...):
    ...
    def render_with_jinja2(self, values, template_name):
        import jinja2
        env = jinja2.Environment(...)
        ...

There's no need to cache the imported module; Python does that already.

Expurgate answered 14/11, 2010 at 13:54 Comment(2)
The drawback is that you can't just look at the top of a py file and see what modules it uses, but you could put a comment if you think it's necessary. This is certainly better/cleaner than what the OP was doing, howeverMichelle
The Unladen Swallow talk at PyCon pointed out that for subpackage imports (which is different to this case) there's a reasonable amount of work done parsing and looking up the import levels in sys.modules so don't assume imports are cheap in a frequently-called function (he got a 20% performance improvement optimizing a django function that did this)Triserial
M
12

The other answers have covered the actual details but if you are interested in a lazy loading library, check out apipkg which is part of the py package (py.test fame).

Manikin answered 14/11, 2010 at 14:2 Comment(3)
It allows you to load only once, globally (rather than in each of the 27 places where the first call to jinja2 might happen), and have the actual loading be performed automatically when (and only when) the first access is attempted. Nice Python magic. (Don't try this in C++.)Leonilaleonine
Oh, this is amazing. It doesn't even try to load until you access some attribute on the lazy loaded thing. So even doing from mypkg import path (from the example) doesn't import anything until I try to access something on path.Imperator
This works for libs you create, but not for lazy loading 3rd party libs. AFAICT this doesn't solve OP's issue.Proudlove
G
4

Nice pattern from sqlalchemy: dependency injection:

@util.dependencies("sqlalchemy.orm.query")
def merge_result(query, *args):
    #...
    query.Query(...)

Instead of declaring all "import" statements at the top of the module, it will only import a module when it's actually needed by a function. This can resolve circular dependency problems.

Gram answered 6/5, 2020 at 23:15 Comment(2)
I think this what I want... is the util decorator defined directly in sqlalchemy? Is there nothing that does this outside of sqlalchemy?Airlike
even though this tool is part of sqlalchemy, it does no database-related stuff. You're quite free to use it. As for me, I'd prefer local imports, i.e. within the function's bodyGram
P
1

A better solution has these properties:

  • drop-in replacement for import statement. In other words: can be used for global or local imports, no extra checks at point of use, same syntax as regular module usage
  • no messing with globals() dict
  • can be used to import 3rd party libs, i.e. no code changes
  • minimal performance overhead

Solution

Here's my solution with those properties. The concept is very simple:

class LazyLoader () :
    'thin shell class to wrap modules.  load real module on first access and pass thru'

    def __init__ (me, modname) :
        me._modname  = modname
        me._mod      = None
   
    def __getattr__ (me, attr) :
        'import module on first attribute access'

        if m._mod is None :
            me._mod = importlib.import_module (me._modname)
        
        return getattr (me._mod, attr)

Usage

import sys
math = LazyLoader ('math')  # equivalent to : import math

math.ceil (1.7)  # module loaded here on first access

math is now a symbol with same scope as import statement. You can define it at the top of your module along with other imports. Or you can define it at local scope in a function or class. Decorators in other solutions only work at function scope.

If you need to set values in the module directly, such as logging.VERBOSE = 15, then you can define __setattr__ similarly. Same with __dir__ if you want to do dir(module). Usually these aren't needed.

Improvement

Now a slight improvement that avoids the is none check on every access:

class LazyLoader () :
    'thin shell class to wrap modules.  load real module on first access and pass thru'

    def __init__ (me, modname) :
        me._modname  = modname
        me._mod      = None
   
    def __getattr__ (me, attr) :
        'import module on first attribute access'

        try :
            return getattr (me._mod, attr)
        
        except Exception as e :
            if me._mod is None :
                # module is unset, load it
                me._mod = importlib.import_module (me._modname)
            else :
                # module is set, got different exception from getattr ().  reraise it
                raise e

        # retry getattr if module was just loaded for first time
        # call this outside exception handler in case it raises new exception
        return getattr (me._mod, attr)

# end class            

Usage is the same.

Performance

# Regular top-level import
> python3 -m timeit -s 'import math' -c 'math.floor'
10000000 loops, best of 5: 32.4 nsec per loop

# Local import (same with or without -s)
> python3 -m timeit -s 'import math' -c 'import math ; math.floor'
2000000 loops, best of 5: 126 nsec per loop

# LazyLoader with exceptions
> python3 -m timeit -s 'import LazyLoader ; math = LazyLoader ("math")' -c 'math.floor'
500000 loops, best of 5: 453 nsec per loop

# LazyLoader with "if mod is none"
> python3 -m timeit -s 'import LazyLoader ; math = LazyLoader ("math")' -c 'math.floor'
500000 loops, best of 5: 540 nsec per loop

# path splitting as time comparison :
> python3 -m timeit -s 'import os; path = "/foo/bar"' -c 'os.path.split (path)'
500000 loops, best of 5: 879 nsec per loop

A 400 ns penalty is quite good. 1000 calls adds 400 microsec. A million calls adds 400 millisec. Unless your called function is very fast and you're making an ungodly number of calls, you shouldn't notice any difference.

Proudlove answered 11/4 at 19:16 Comment(1)
Hey @shmulvad, thanks for suggestions, appreciate your feedback. Your edits were rejected before I saw them. Anyway, I don't add typing because python typing is a disaster imo, clutters up the code. Ok for automated tools, not for discussions. And I use "me" because python got it wrong with "self". Interpreters are stupid beasts. Would they say "Self, I suggest that we acquire a baked pastry post-haste"? No. They say "ME WANT COOKIE!!! NOM NOM NOM". Me reminds us that they're mindless automatons with no independent judgment or self-control. "Delete all files? Sure why not!"Proudlove

© 2022 - 2024 — McMap. All rights reserved.