improving speed of Python module import
Asked Answered
N

7

66

The question of how to speed up importing of Python modules has been asked previously (Speeding up the python "import" loader and Python -- Speed Up Imports?) but without specific examples and has not yielded accepted solutions. I will therefore take up the issue again here, but this time with a specific example.

I have a Python script that loads a 3-D image stack from disk, smooths it, and displays it as a movie. I call this script from the system command prompt when I want to quickly view my data. I'm OK with the 700 ms it takes to smooth the data as this is comparable to MATLAB. However, it takes an additional 650 ms to import the modules. So from the user's perspective the Python code runs at half the speed.

This is the series of modules I'm importing:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os

Of course, not all modules are equally slow to import. The chief culprits are:

matplotlib.pyplot   [300ms]
numpy               [110ms]
scipy.signal        [200ms]

I have experimented with using from, but this isn't any faster. Since Matplotlib is the main culprit and it's got a reputation for slow screen updates, I looked for alternatives. One is PyQtGraph, but that takes 550 ms to import.

I am aware of one obvious solution, which is to call my function from an interactive Python session rather than the system command prompt. This is fine but it's too MATLAB-like, I'd prefer the elegance of having my function available from the system prompt.

I'm new to Python and I'm not sure how to proceed at this point. Since I'm new, I'd appreciate links on how to implement proposed solutions. Ideally, I'm looking for a simple solution (aren't we all!) because the code needs to be portable between multiple Mac and Linux machines.

Nihi answered 4/5, 2013 at 10:54 Comment(10)
Check that it's producing .pyc versions of the Python modules - loading those is a bit faster. But those numbers are quite plausible even if the pyc files are there.Daciadacie
Also, if you have a lot of .egg directories on sys.path, it looks for modules inside each one, which slows things down. Use a distribution package manager or pip to install them in a better layout. You're unlikely to get a major speed up, though.Daciadacie
I had noticed the pyc suggestion in an earlier question, but I don't know where to look for the pyc versions of the modules. Right now I'm on a Mac.Nihi
If you're using 3.2 or above, look for __pycache__ directories within the modules (i.e. .../site-packages/matplotlib/__pycache__). For older versions, the .pyc files go right next to the .py files. They're usually created automatically, but in some cases Python doesn't have write permissions where the modules are stored.Daciadacie
Yep, the pyc files are there.Nihi
Maybe the only way to get a substantial speedup is to cache the modules in memory - either with a ramdisk, or with a running Python process that you just signal to redo the calculations.Daciadacie
I wonder if you can import the modules and then freeze the process to a file, so that you can use it by restoring the image and calling the function. BLCR and DMTCP look like the sort of tools you'd need.Daciadacie
I think you're right that caching is the only solution. What do you mean by "signalling to a running Python process"? The other alternatives you mention are, at the moment, probably more trouble than they're worth. That might change in the future depending on how much code I port to Python. Then again, I suppose once I figure out the RAM disk option I could have it set it up automatically each time I boot the machine.Nihi
Well, you could have a Python process continually running in the background, then just have a tiny script which will use some interprocess communication mechanism to tell the Python process to run your function. Unix signals might be the way to go, e.g. use SIGUSR1 as the trigger.Daciadacie
You could also have a python process monitor a directory where you put new data into. watchdoc seems to be interesting, although I've never used it.Tithe
H
28

you could build a simple server/client, the server running continuously making and updating the plot, and the client just communicating the next file to process.

I wrote a simple server/client example based on the basic example from the socket module docs: http://docs.python.org/2/library/socket.html#example

here is server.py:

# expensive imports
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os

# Echo server program
import socket

HOST = ''                 # Symbolic name meaning all available interfaces
PORT = 50007              # Arbitrary non-privileged port
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((HOST, PORT))
s.listen(1)
while 1:
    conn, addr = s.accept()
    print 'Connected by', addr
    data = conn.recv(1024)
    if not data: break
    conn.sendall("PLOTTING:" + data)
    # update plot
    conn.close()

and client.py:

# Echo client program
import socket
import sys

HOST = ''    # The remote host
PORT = 50007              # The same port as used by the server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.sendall(sys.argv[1])
data = s.recv(1024)
s.close()
print 'Received', repr(data)

you just run the server:

python server.py

which does the imports, then the client just sends via the socket the filename of the new file to plot:

python client.py mytextfile.txt

then the server updates the plot.

On my machine running your imports take 0.6 seconds, while running client.py 0.03 seconds.

Herder answered 8/5, 2013 at 0:45 Comment(2)
btw, for plotting, you could take a look to chaco: pypi.python.org/pypi/chacoHerder
Thanks, I think your solution is probably the way to go. I've since switched my code to PyQtGraph, because it's faster than Matplotlib at generating the dynamic plots I'm producing. Chaco is certainly worth a look too.Nihi
D
94

Not an actual answer to the question, but a hint on how to profile the import speed with Python 3.7 and tuna (a small project of mine):

python3 -X importtime -c "import scipy" 2> scipy.log
tuna scipy.log

enter image description here

Dubuffet answered 12/7, 2018 at 8:39 Comment(1)
I'd like to try use this visualizer for my project to see where the imports are slow but I have an issue. A bunch of my modules are in different folders (same parent folder) ie there is Code\Widgets\Button, Code\Gui\Color for example. I am importing them by modifying the sys.path: sys.path.insert(0,os.path.abspath("..")). However, when I run the tuna program in CMD, I get an error saying that No module named .... I believe that it isn't using the sys.path modification. Do you know a quick fix to this? ThanksBinnings
H
28

you could build a simple server/client, the server running continuously making and updating the plot, and the client just communicating the next file to process.

I wrote a simple server/client example based on the basic example from the socket module docs: http://docs.python.org/2/library/socket.html#example

here is server.py:

# expensive imports
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os

# Echo server program
import socket

HOST = ''                 # Symbolic name meaning all available interfaces
PORT = 50007              # Arbitrary non-privileged port
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((HOST, PORT))
s.listen(1)
while 1:
    conn, addr = s.accept()
    print 'Connected by', addr
    data = conn.recv(1024)
    if not data: break
    conn.sendall("PLOTTING:" + data)
    # update plot
    conn.close()

and client.py:

# Echo client program
import socket
import sys

HOST = ''    # The remote host
PORT = 50007              # The same port as used by the server
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((HOST, PORT))
s.sendall(sys.argv[1])
data = s.recv(1024)
s.close()
print 'Received', repr(data)

you just run the server:

python server.py

which does the imports, then the client just sends via the socket the filename of the new file to plot:

python client.py mytextfile.txt

then the server updates the plot.

On my machine running your imports take 0.6 seconds, while running client.py 0.03 seconds.

Herder answered 8/5, 2013 at 0:45 Comment(2)
btw, for plotting, you could take a look to chaco: pypi.python.org/pypi/chacoHerder
Thanks, I think your solution is probably the way to go. I've since switched my code to PyQtGraph, because it's faster than Matplotlib at generating the dynamic plots I'm producing. Chaco is certainly worth a look too.Nihi
R
9

You can import your modules manually instead, using imp. See documentation here.

For example, import numpy as np could probably be written as

import imp
np = imp.load_module("numpy",None,"/usr/lib/python2.7/dist-packages/numpy",('','',5))

This will spare python from browsing your entire sys.path to find the desired packages.

See also:

Manually importing gtk fails: module not found

Ruinous answered 20/3, 2016 at 22:33 Comment(0)
P
4

You can use lazy imports, but it depends on your use case.

If it's an application, you can run necessary modules for GUI, then after window is loaded, you can import all your modules.

If it's a module and user do not use all the dependencies, you can import inside function.

[warning] It's against pep8 i think and it's not recomennded at some places, but all the reason behind this is mostly readability (i may be wrong though...) and some builders (e.g. pyinstaller) bundling (which can be solved with adding missing dependencies param to spec)

If you use lazy imports, use comments so user knows that there are extra dependencies.

Example:

import numpy as np

# Lazy imports
# import matplotlib.pyplot as plt

def plot():
    import matplotlib.pyplot as plt
    
    # Your function here
    # This will be imported during runtime 

For some specific libraries i think it's necessity.

You can also create some let's call it api in __init__.py

For example on scikit learn. If you import sklearn and then call some model, it's not found and raise error. You need to be more specific then and import directly submodule. Though it can be unconvenient for users, it's imho good practice and can reduce import times significantly.

Usually 10% of imported libraries cost 90% of import time. Very simple tool for analysis is line_profiler

import line_profiler
import atexit

profile = line_profiler.LineProfiler()
atexit.register(profile.print_stats)

@profile
def profiled_function():

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt


profiled_function()

This give results

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    20                                               @profile
    21                                               def profiled_function():
    22
    23         1    2351852.0 2351852.0      6.5          import numpy as np
    24         1    6545679.0 6545679.0     18.0          import pandas as pd
    25         1   27485437.0 27485437.0     75.5          import matplotlib.pyplot as plt

75% of three libraries imports time is matplotlib (this does not mean that it's bad written, it just needs a lot of stuff for grafic output)

Note:

If you import library in one module, other imports cost nothing, it's globally shared...

Another note:

If using imports directly from python (e.g pathlib, subprocess etc.) do not use lazy load, python modules import times are close to zero and don't need to be optimized from my experience...

Propst answered 27/12, 2021 at 14:14 Comment(1)
Just a question: some optimization websites do suggest such solution (e.g. medium.com/analytics-vidhya/… ), however "pylint" considers it a "misconduct" - so is the lazy import pythonic?Paravane
T
3

1.35 seconds isn't long, but I suppose if you're used to half that for a "quick check" then perhaps it seems so.

Andrea suggests a simple client/server setup, but it seems to me that you could just as easily call a very slight modification of your script and keep it's console window open while you work:

  • Call the script, which does the imports then waits for input
  • Minimize the console window, switch to your work, whatever: *Do work*
  • Select the console again
  • Provide the script with some sort of input
  • Receive the results with no import overhead
  • Switch away from the script again while it happily awaits input

I assume your script is identical every time, ie you don't need to give it image stack location or any particular commands each time (but these are easy to do as well!).

Example RAAC's_Script.py:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import scipy.ndimage
import scipy.signal
import sys
import os

print('********* RAAC\'s Script Now Running *********')

while True: # Loops forever
    # Display a message and wait for user to enter text followed by enter key.
    # In this case, we're not expecting any text at all and if there is any it's ignored
    input('Press Enter to test image stack...')

    '''
    *
    *
    **RAAC's Code Goes Here** (Make sure it's indented/inside the while loop!)
    *
    *
    '''

To end the script, close the console window or press ctrl+c.

I've made this as simple as possible, but it would require very little extra to handle things like quitting nicely, doing slightly different things based on input, etc.

Tensiometer answered 28/8, 2014 at 10:51 Comment(0)
E
1

I have done just a basic test below, but it shows that runpy can be used to solve this issue when you need to have a whole Python script to be faster (you don't want to put any logic in test_server.py).

test_server.py

import socket
import time
import runpy
import matplotlib.pyplot

HOST = 'localhost'
PORT = 50007

serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
    serversocket.bind((HOST, PORT))
except:
    print("Server is already running")
    exit(1)

# Start server with maximum 100 connections
serversocket.listen(100)

while True:
    connection, address = serversocket.accept()
    buf = connection.recv(64)
    if len(buf) > 0:
        buf_str = str(buf.decode("utf-8"))
        now = time.time()
        runpy.run_path(path_name=buf_str)
        after = time.time()
        duration = after - now
        print("I received " + buf_str + " script and it took " + str(duration) + " seconds to execute it")

test_client.py

import socket
import sys

HOST = 'localhost'
PORT = 50007

clientsocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
clientsocket.connect((HOST, PORT))

message = sys.argv[1].encode()

clientsocket.send(message)

test_lag.py

import matplotlib.pyplot

Testing:

$ python3 test_client.py test_lag.py
I received test_lag.py script and it took 0.0002799034118652344 seconds to execute it

$ time python3 test_lag.py

real    0m0.624s
user    0m1.307s
sys     0m0.180s

Based on this, module is pre-loaded for fast usage.

Expansionism answered 10/11, 2022 at 0:31 Comment(0)
C
0

I think using threading wouldnot be really a bad idea

from concurrent.futures import ThreadPoolExecutor
imports = [
    "import numpy as np",
    "import matplotlib.pyplot as plt",
    "import matplotlib.animation as animation",
    "import scipy.ndimage",
    "import scipy.signal",
    "import sys",
    "import os",
]
def importModule(statement):
    try:
        exec(statement, globals())
    except Exception as e:
        print(f"Error importing {statement}: {e}")
with ThreadPoolExecutor(max_workers=len(imports)) as executor:
    futures = {executor.submit(importModule, statement): statement for statement in imports}
    for future in futures:
        future.result()
Corbett answered 15/11, 2023 at 21:40 Comment(2)
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.Nordstrom
I don't think this will actually speed anything up, the GIL almost certainly applies to importsPelerine

© 2022 - 2024 — McMap. All rights reserved.