peewee and peewee-async: why is async slower
Asked Answered
C

2

8

I am trying to wrap my head around Tornado and async connections to Postgresql. I found a library that can do this at http://peewee-async.readthedocs.io/en/latest/.

I devised a little test to compare traditional Peewee and Peewee-async, but somehow async works slower.

This is my app:

import peewee
import tornado.web
import logging
import asyncio
import peewee_async
import tornado.gen
import tornado.httpclient
from tornado.platform.asyncio import AsyncIOMainLoop

AsyncIOMainLoop().install()
app = tornado.web.Application(debug=True)
app.listen(port=8888)

# ===========
# Defining Async model
async_db = peewee_async.PooledPostgresqlDatabase(
    'reminderbot',
    user='reminderbot',
    password='reminderbot',
    host='localhost'
)
app.objects = peewee_async.Manager(async_db)
class AsyncHuman(peewee.Model):
    first_name = peewee.CharField()
    messenger_id = peewee.CharField()
    class Meta:
        database = async_db
        db_table = 'chats_human'


# ==========
# Defining Sync model
sync_db = peewee.PostgresqlDatabase(
    'reminderbot',
    user='reminderbot',
    password='reminderbot',
    host='localhost'
)
class SyncHuman(peewee.Model):
    first_name = peewee.CharField()
    messenger_id = peewee.CharField()
    class Meta:
        database = sync_db
        db_table = 'chats_human'

# defining two handlers - async and sync
class AsyncHandler(tornado.web.RequestHandler):

    async def get(self):
        """
        An asynchronous way to create an object and return its ID
        """
        obj = await self.application.objects.create(
            AsyncHuman, messenger_id='12345')
        self.write(
            {'id': obj.id,
             'messenger_id': obj.messenger_id}
        )


class SyncHandler(tornado.web.RequestHandler):

    def get(self):
        """
        An traditional synchronous way
        """
        obj = SyncHuman.create(messenger_id='12345')
        self.write({
            'id': obj.id,
            'messenger_id': obj.messenger_id
        })


app.add_handlers('', [
    (r"/receive_async", AsyncHandler),
    (r"/receive_sync", SyncHandler),
])

# Run loop
loop = asyncio.get_event_loop()
try:
    loop.run_forever()
except KeyboardInterrupt:
    print(" server stopped")

and this is what I get from Apache Benchmark:

ab -n 100 -c 100 http://127.0.0.1:8888/receive_async

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    4   1.5      5       7
Processing:   621 1049 256.6   1054    1486
Waiting:      621 1048 256.6   1053    1485
Total:        628 1053 255.3   1058    1492

Percentage of the requests served within a certain time (ms)
  50%   1058
  66%   1196
  75%   1274
  80%   1324
  90%   1409
  95%   1452
  98%   1485
  99%   1492
 100%   1492 (longest request)




ab -n 100 -c 100 http://127.0.0.1:8888/receive_sync
Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        2    5   1.9      5       8
Processing:     8  476 277.7    479    1052
Waiting:        7  476 277.7    478    1052
Total:         15  481 276.2    483    1060

Percentage of the requests served within a certain time (ms)
  50%    483
  66%    629
  75%    714
  80%    759
  90%    853
  95%    899
  98%   1051
  99%   1060
 100%   1060 (longest request)

why is sync faster? where is the bottleneck I'm missing?

Contribute answered 1/10, 2016 at 6:37 Comment(0)
V
13

For a long explanation:

http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/

For a short explanation: synchronous Python code is simple and mostly implemented in the standard library's socket module, which is pure C. Async Python code is more complex than synchronous code. Each request requires several executions of the main event loop code, which is written in Python (in the asyncio case here) and therefore has a lot of overhead compared to C code.

Benchmarks like yours show async's overhead dramatically, because there's no network latency between your application and your database, and you're doing a large number of very small database operations. Since every other aspect of the benchmark is fast, these many executions of the event loop logic add a large proportion of the total runtime.

Mike Bayer's argument, linked above, is that low-latency scenarios like this are typical for database applications, and therefore database operations shouldn't be run on the event loop.

Async is best for high-latency scenarios, like websockets and web crawlers, where the application spends most of its time waiting for the peer, rather than spending most of its time executing Python.

In conclusion: if your application has a good reason to be async (it deals with slow peers), having an async database driver is a good idea for the sake of consistent code, but expect some overhead.

If you don't need async for another reason, don't do async database calls, because they're a bit slower.

Vermiform answered 1/10, 2016 at 13:54 Comment(1)
so async web framework like Sanic github.com/channelcat/sanic can speed up ? It use Python3.5 + uvloopAngeloangelology
C
3

Database ORMs introduce many complexities for async architectures. There are several places within an ORM where blocking may take place and can be overwhelming to alter to an async form. The places where blocking takes place can also vary depending on the database. My guess as to why your results are so slow is because there's a lot of unoptimized calls to and from the event loop (I could be severely wrong, I mostly use SQLAlchemy or raw SQL these days). In my experience, it's generally quicker to execute database code in a thread and yield the result when it's available. I can't really speak for PeeWee, but SQLAlchemy is well suited to run in multiple threads and there aren't too many down sides (but the ones that do exist are very VERY annoying).

I'd recommend you try your experiment using ThreadPoolExecutor and the synchronous Peewee module and run database functions in a thread. You will have to make changes to your main code, however it would be worth it if you ask me. For example, let's say you opt to use callback code, then your ORM queries might look like this:

from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor(max_workers=10)

def queryByName(name):
    query = executor.submit(db_model.findOne, name=name)
    query.add_done_callback(processResult)

def processResult(query):
    orm_obj = query.results()
    # do stuff with the results

You could use yeild from or await in coroutines, but it was a bit problematic for me. Also, I'm not well versed in coroutines yet. This snippet should work well with Tornado so long as devs are careful about deadlocks, db sessions, and transactions. These factors can really slow down your application if something goes wrong in the thread.

If you're feeling very adventurous, MagicStack (the company behind asyncio) has a project called asyncpg and its supposed to be very fast! I've been meaning to try, but haven't found the time :(

Cubiculum answered 1/10, 2016 at 20:51 Comment(1)
I can agree to most of your answer, but this sentence: "MagicStack (the company behind asyncio)" wrongly induces the idea they are responsible or the authors of asyncio. They have contributed to async/await, but that makes them nothing else than another contributors, another piece in the system. Anyway, I've upvoted you since your example is useful and can help other ops to research on that arena.Jokjakarta

© 2022 - 2024 — McMap. All rights reserved.