How to use C extensions in python to get around GIL

Asked 18/8, 2010 at 16:46 Answered 18/8, 2010 at 17:16

I want to run a cpu intensive program in Python across multiple cores and am trying to figure out how to write C extensions to do this. Are there any code samples or tutorials on this?

Smaltite answered 18/8, 2010 at 16:46 Comment(0)

You can already break a Python program into multiple processes. The OS will already allocate your processes across all the cores.

Do this.

python part1.py | python part2.py | python part3.py | ... etc.

The OS will assure that part uses as many resources as possible. You can trivially pass information along this pipeline by using cPickle on sys.stdin and sys.stdout.

Without too much work, this can often lead to dramatic speedups.

Yes -- to the haterz -- it's possible to construct an algorithm so tortured that it may not be sped up much. However, this often yields huge benefits for minimal work.

And.

The restructuring for this purpose will exactly match the restructuring required to maximize thread concurrency. So. Start with shared-nothing process parallelism until you can prove that sharing more data would help, then move to the more complex shared-everything thread parallelism.

Afraid answered 18/8, 2010 at 17:10 Comment(0)

Take a look at multiprocessing. It's an often overlooked fact that not globally sharing data, and not cramming loads of threads into a single process is what operating systems prefer.

If you still insist that your CPU intensive behaviour requires threading, take a look at the documentation for working with the GIL in C. It's quite informative.

Bots answered 18/8, 2010 at 16:59 Comment(3)

The biggest problem I ran into with trying to use multiprocessing vs threading is that with trying to run 1000+ threads (processes) is that you get a separate instance of the Python interpreter with each one. This gets extremely expensive in terms of memory. – Oxyhydrogen 5/10, 2011 at 12:48

@nalroff: That doesn't sound right. The memory used for the majority of the interpreter is shared by all instances of that interpreter. Only the pages that differ will increase total memory usage. Make sure that you're looking at the right value. It's also worth noting that processes do not use significantly more memory than additional threads. – Bots 5/10, 2011 at 22:55

In every instance I have used the multiprocessing module in Python, I have always seen a dramatic difference in memory usage between processes and threads. Anyway, the threading module seems to be sufficiently fast for threaded web scraping and performance testing of a web app, which is all I'm using it for anyway. – Oxyhydrogen 6/10, 2011 at 12:42

This is a good use of C extension. The keyword you should search for is Py_BEGIN_ALLOW_THREADS.

http://docs.python.org/c-api/init.html#thread-state-and-the-global-interpreter-lock

P.S. I mean if you processing is already in C, like imaging processing, then release the lock in C extension is good. If your processing code is mainly in Python, other people's suggestion to multiprocessing is better. It is usually not justify to rewrite the code in C for background processing.

Honeymoon answered 18/8, 2010 at 17:15 Comment(0)

-1

Have you considered using one of the python mpi libraries like mpi4py? Although MPI is normally used to distribute work across a cluster, it works quite well on a single multicore machine. The downside is that you'll have to refactor your code to use MPI's communication calls (which may be easy).

Florio answered 18/8, 2010 at 16:55 Comment(0)

-2

multiprocessing is easy. if thats not fast enough, your question is complicated.

Concrete answered 18/8, 2010 at 17:16 Comment(0)

Recommended topics

Hot tags