I am working on a web backend / API provider that grabs realtime data from a 3rd party web API, puts it in a MySQL database and makes it available over an HTTP/JSON API.
I am providing the API with flask and working with the DB using SQLAlchemy Core.
For the realtime data grabbing part, I have functions that wrap the 3rd party API by sending a request, parsing the returned xml into a Python dict and returning it. We'll call these API wrappers.
I then call these functions within other methods which take the respective data, do any processing if needed (like time zone conversions etc.) and put it in the DB. We'll call these processors.
I've been reading about asynchronous I/O and eventlet specifically and I'm very impressed.
I'm going to incorporate it in my data grabbing code, but I have some questions first:
is it safe for me to monkey patch everything? considering I have flask, SQLAlchemy and a bunch of other libs, are there any downsides to monkey patching (assuming there is no late binding)?
What is the granularity I should divide my tasks to? I was thinking of creating a pool that periodically spawns processors. Then, once the processor reaches the part where it calls the API wrappers, the API wrappers will start a GreenPile for getting the actual HTTP data using eventlet.green.urllib2. Is this a good approach?
- Timeouts - I want to make sure no greenthreads ever hang. Is it a good approach to set the eventlet.Timeout to 10-15 seconds for every greenthread?
FYI, I have about 10 different sets of realtime data, and a processor is spawned every ~5-10 seconds.
Thanks!