Gevent/Eventlet monkey patching for DB drivers
Asked Answered
B

2

9

After doing Gevent/Eventlet monkey patching - can I assume that whenever DB driver (eg redis-py, pymongo) uses IO through standard library (eg socket) it will be asynchronous?

So using eventlets monkey patching is enough to make eg: redis-py non blocking in eventlet application?

From what I know it should be enough if I take care about connection usage (eg to use different connection for each greenlet). But I want to be sure.

If you known what else is required, or how to use DB drivers correctly with Gevent/Eventlet please type it also.

Boilermaker answered 23/1, 2013 at 23:36 Comment(0)
M
18

You can assume it will be magically patched if all of the following are true.

  • You're sure of the I/O is built on top of standard Python sockets or other things that eventlet/gevent monkeypatches. No files, no native (C) socket objects, etc.
  • You pass aggressive=True to patch_all (or patch_select), or you're sure the library doesn't use select or anything similar.
  • The driver doesn't use any (implicit) internal threads. (If the driver does use threads internally, patch_thread may work, but it may not.)

If you're not sure, it's pretty easy to test—probably easier than reading through the code and trying to work it out. Have one greenlet that just does something like this:

while True:
    print("running")
    gevent.sleep(0.1)

Then have another that runs a slow query against the database. If it's monkeypatched, the looping greenlet will keep printing "running" 10 times/second; if not, the looping greenlet will not get to run while the program is blocked on the query.

So, what do you do if your driver blocks?

The easiest solution is to use a truly concurrent threadpool for DB queries. The idea is that you fire off each query (or batch) as a threadpool job and greenlet-block your gevent on the completion of that job. (For really simple cases, where you don't need many concurrent queries, you can just spawn a threading.Thread for each one instead, but usually you can't get away with that.)

If the driver does significant CPU work (e.g., you're using something that runs an in-process cache, or even an entire in-process DBMS like sqlite), you want this threadpool to actually be implemented on top of processes, because otherwise the GIL may prevent your greenlets from running. Otherwise (especially if you care about Windows), you probably want to use OS threads. (However, this means you can't patch_threads(); if you need to do that, use processes.)

If you're using eventlet, and you want to use threads, there's a built-in simple solution called tpool that may be sufficient. If you're using gevent, or you need to use processes, this won't work. Unfortunately, blocking a greenlet (without blocking the whole event loop) on a real threading object is a bit different between eventlet and gevent, and not documented very well, but the tpool source should give you the idea. Beyond that part, the rest is just using concurrent.futures (see futures on pypi if you need this in 2.x or 3.1) to execute the tasks on a ThreadPoolExecutor or ProcessPoolExecutor. (Or, if you prefer, you can go right to threading or multiprocessing instead of using futures.)


Can you explain why I should use OS threads on Windows?

The quick summary is: If you stick to threads, you can pretty much just write cross-platform code, but if you go to processes, you're effectively writing code for two different platforms.

First, read the Programming guidelines for the multiprocessing module (both the "All platforms" section and the "Windows" section). Fortunately, a DB wrapper shouldn't run into most of this. You only need to deal with processes via the ProcessPoolExecutor. And, whether you wrap things up at the cursor-op level or the query level, all your arguments and return values are going to be simple types that can be pickled. Still, it's something you have to be careful about, which otherwise wouldn't be an issue.

Meanwhile, Windows has very low overhead for its intra-process synchronization objects, but very high overhead for its inter-process ones. (It also has very fast thread creation and very slow process creation, but that's not important if you're using a pool.) So, how do you deal with that? I had a lot of fun creating OS threads to wait on the cross-process sync objects and signal the greenlets, but your definition of fun may vary.

Finally, tpool can be adapted trivially to a ppool for Unix, but it takes more work on Windows (and you'll have to understand Windows to do that work).

Megen answered 24/1, 2013 at 1:55 Comment(3)
Thanks for a great answer! Can you explain why I should use OS threads on Windows ("especially if you care about Windows")?Boilermaker
@RobertZaremba: I'll edit the answer, because it's a bit too long to fit in a comment.Megen
@RobertZaremba: I know a lot of people preach that as some kind of dogma, but it's silly, if you understand the differences. Using lots of threads is always bad; if you need 1000 tasks running, use greenlets. Using threads to parallelize CPU-bound tasks doesn't work in Python; if you need that, use multiprocessing. But when neither of those are relevant, using greenlets or multiprocessing over threads means you're getting something which is designed to work as much like threads as possible but only gets 90% of the way there, for no actual benefit. Just use threads.Megen
L
2

abarnert's answer is correct and very comprehensive. I just want to add that there is no "aggressive" patching in eventlet, probably gevent feature. Also if library uses select that is not a problem, because eventlet can monkey patch that too.

Indeed, in most cases eventlet.monkey_patch() is all you need. Of course, it must be done before creating any sockets.

If you still have any issues, feel free to open issue or write to eventlet mailing list or G+ community. All relevant links can be found at http://eventlet.net/

Lesialesion answered 24/1, 2013 at 9:27 Comment(1)
Yes, aggressive is gevent-specific. It means that everything that can't be patched gets deleted. So, if you just monkey.patch_all() (or monkey.patch_select()), your third-party library may find that, e.g., select.epoll exists and use that in preference to select.select, and end up blocking on epoll; if you do monkey.patch_all(aggressive=True), select.epoll won't exist, the library will fall back to select.select, which is patched, and everything will be fine.Megen

© 2022 - 2024 — McMap. All rights reserved.