python logging performance comparison and options

Asked 20/2, 2016 at 6:53 Answered 20/9, 2019 at 20:29

Solved python performance logging python-multithreading python-logging

I am researching high performance logging in Python and so far have been disappointed by the performance of the python standard logging module - but there seem to be no alternatives. Below is a piece of code to performance test 4 different ways of logging:

import logging
import timeit
import time
import datetime
from logutils.queue import QueueListener, QueueHandler
import Queue
import threading

tmpq = Queue.Queue()

def std_manual_threading():
    start = datetime.datetime.now()
    logger = logging.getLogger()
    hdlr = logging.FileHandler('std_manual.out', 'w')
    logger.addHandler(hdlr)
    logger.setLevel(logging.DEBUG)
    def logger_thread(f):
        while True:
            item = tmpq.get(0.1)
            if item == None:
                break
            logging.info(item)
    f = open('manual.out', 'w')
    lt = threading.Thread(target=logger_thread, args=(f,))
    lt.start()
    for i in range(100000):
        tmpq.put("msg:%d" % i)
    tmpq.put(None)
    lt.join()
    print datetime.datetime.now() - start

def nonstd_manual_threading():
    start = datetime.datetime.now()
    def logger_thread(f):
        while True:
            item = tmpq.get(0.1)
            if item == None:
                break
            f.write(item+"\n")
    f = open('manual.out', 'w')
    lt = threading.Thread(target=logger_thread, args=(f,))
    lt.start()
    for i in range(100000):
        tmpq.put("msg:%d" % i)
    tmpq.put(None)
    lt.join()
    print datetime.datetime.now() - start


def std_logging_queue_handler():
    start = datetime.datetime.now()
    q = Queue.Queue(-1)

    logger = logging.getLogger()
    hdlr = logging.FileHandler('qtest.out', 'w')
    ql = QueueListener(q, hdlr)


    # Create log and set handler to queue handle
    root = logging.getLogger()
    root.setLevel(logging.DEBUG) # Log level = DEBUG
    qh = QueueHandler(q)
    root.addHandler(qh)

    ql.start()

    for i in range(100000):
        logging.info("msg:%d" % i)
    ql.stop()
    print datetime.datetime.now() - start

def std_logging_single_thread():
    start = datetime.datetime.now()
    logger = logging.getLogger()
    hdlr = logging.FileHandler('test.out', 'w')
    logger.addHandler(hdlr)
    logger.setLevel(logging.DEBUG)
    for i in range(100000):
        logging.info("msg:%d" % i)
    print datetime.datetime.now() - start

if __name__ == "__main__":
    """
    Conclusion: std logging about 3 times slower so for 100K lines simple file write is ~1 sec while std
    logging ~3. If threads are introduced some overhead causes to go to ~4 and if QueueListener and events
    are used with enhancement for thread sleeping that goes to ~5 (probably because log records are being
    inserted into queue).
    """
    print "Testing"
    #std_logging_single_thread() # 3.4
    std_logging_queue_handler() # 7, 6, 7 (5 seconds with sleep optimization)
    #nonstd_manual_threading() # 1.08
    #std_manual_threading() # 4.3

The nonstd_manual_threading option works best since there is no overhead of the logging module but obviously you miss out on a lot of features such as formatters, filters and the nice interface
The std_logging in a single thread is the next best thing but still about 3 times slower than nonstd manual threading.
The std_manual_threading option dumps messages into a threadsafe queue and in a separate thread uses the standard logging module. That comes out to be about 25% higher than option 2, probably due to context switching costs.
Finally, the option using "logutils"'s QueueHandler comes out to be the most expensive. I tweaked the code of logutils/queue.py's _monitor method to sleep for 10 millis after processing 500 messages as long as there are less than 100K messages in the queue. That brings the runtime down from 7 seconds to 5 seconds (probably due to avoiding context switching costs).

My question is, why is there so much performance overhead with the logging module and are there any alternatives? Being a performance sensitive app does it even make sense to use the logging module?

p.s.: I have profiled the different scenarios and seems like LogRecord creation is expensive.

Trifling answered 20/2, 2016 at 6:53 Comment(5)

At the risk of being "that guy" -- How many lines of output do you expect to generate from your process and how many lines of output would you need to write to spend 1 additional second of computation time? What percent of your total computation time would logging represent? It seems like you would have to log HUGE amounts of data before you ever noticed a real difference in runtime -- at least for longer running processes (where it matters)... – Aleasealeatory 20/2, 2016 at 7:3

Fair point but I'm trying to make informed decision about the best performance I can get out of the logging module. The app I have in mind received 1000s of real time events per second and the better my logger performs the more I will feel "ok" logging these events, since they can be useful in recreating the conditions in which a problem happens. – Trifling 21/2, 2016 at 5:39

Re. your comment on my answer - thanks for the data points. Was the socket receiver on a separate machine? Also, you could try overriding SocketHandler.makePickle with a leaner version (in case you don't need to send everything - and pickle may be overkill for sending a dict, anyway). How much time was shaved off by your steps 2, 3, 4? – Austenite 21/2, 2016 at 10:57

@VinaySajip The socket was local (same machine). Step 2 saves about 10-12% and involves logging._srcfile = None llogging.logThreads = 0 llogging.logProcesses = 0 Step 3 has fluctuating results. Removing the acquire and release from StreamHandler's emit at times lowers the latency by almost 20% but at others only by 2%. This could be random based on how the OS is scheduling the threads Step 4 brings it down by 15 % – Trifling 22/2, 2016 at 3:45

Thanks for the additional info. – Austenite 22/2, 2016 at 22:28

The stdlib logging package provides a lot of flexibility and functionality for developers / devops / support staff, and that flexibility comes at some cost, obviously. If the need for performance trumps the need for flexibility, you need to go with something else. Did you take the steps to optimise described in the docs? A typical logging call takes of the order of tens of microseconds on reasonable hardware, which hardly seems excessive. However, logging in tight loops is seldom advisable, if only because the amount of info generated might take too much time to wade through.

The code to find the caller can be quite expensive, but is needed if you want e.g. filename and line number where the logging call was made.

QueueHandler is intended for scenarios where the logging I/O will take significant time and can't be done in-band. For example, a web application whose logs need to be sent by email to site administrators cannot risk using SMTPHandler directly, because the email handshake can be slow.

Don't forget that thread context switching in Python is slow. Did you try SocketHandler? There is a suitable starting point in the docs for a separate receiver process that does the actual I/O to file, email etc. So your process is only doing socket I/O and not doing context switches just for logging. And using domain sockets or UDP might be faster still, though the latter is of course lossy.

There are other ways to optimise. For example, standard handlers in logging do locking around emit(), for thread safety - if in a specific scenario under your control there is no contention for the handler, you could have a handler subclass that no-ops the lock acquisition and release. And so on.

Austenite answered 20/2, 2016 at 11:28 Comment(1)

Thanks for the tips. I agree that it is obvious that the flexibility will come at a cost. I tried a few things: 1. Implemented a SocketHandler and Unix DataGram Handler. They both increase the time threefold. I imagine it's the pickling. 2. Did some of the performance optimizations mentioned on the page 3. Tried logging without locking in emit of StreamHandler 4. Tried flushing every 100 invocations rather than every invocation # 2,3,4 all helped shave some time. #4 was by far the most effective – Trifling 21/2, 2016 at 5:32

If you want a better answer try to describe your problem in more detail, why you need such a huge number of messages to log? Logging was designed to record important information, especially warnings and errors, not every line you execute.

If logging takes more than 1% of your processing time, probably you are using it wrongly and that's not logging fault.

Second, related to performance: do not build the message before sending it to logging module (replace format % params with format command params). This is because logging does this for you, but much faster.

Vivienviviene answered 20/2, 2016 at 8:19 Comment(1)

Thanks, your 2nd tip was helpful. As for the specific problem, well I'm simply looking for the best performing logging option in Python for a latency sensitive application. I understand the code I pasted isn't practical but it's sole purpose is benchmarking. – Trifling 21/2, 2016 at 5:35

Python is not truly multi-threaded in a traditional sense. Whenever a thread is executing it has to own the gil (global interpreter lock). "threads" yield whenever they call into the system or have to wait on IO. This allows the interpreter thread to run other python "threads". This equates to asynchronous I/O.

Regardless of if the result of the logging message is used or dropped all of the work to evaluate the arguments for the logging message is done. As mentioned in other responses. However what is missed (and where the multi-threaded part of you question comes in) is that while writing a large amount to disk may be slow since modern computers have many cores the process of writing the output to the file will be farmed out to another core while the interpreter moves on to another python "thread". The operating system will complete the async disk write and little to no time will be lost to the actual disc write.

As long as the interpreter always has another thread to switch to virtually no time will be lost to the writes. The interpreter will only actually lose time if all python "threads" are blocked on I/O. Which is not likely the case unless you are really swamping your disk.

Bathyal answered 20/9, 2019 at 20:29 Comment(0)

Recommended topics

Hot tags