Ensuring Python logging in multiple threads is thread-safe
Asked Answered
O

2

37

I have a log.py module, that is used in at least two other modules (server.py and device.py).

It has these globals:

fileLogger = logging.getLogger()
fileLogger.setLevel(logging.DEBUG)
consoleLogger = logging.getLogger()
consoleLogger.setLevel(logging.DEBUG)

file_logging_level_switch = {
    'debug':    fileLogger.debug,
    'info':     fileLogger.info,
    'warning':  fileLogger.warning,
    'error':    fileLogger.error,
    'critical': fileLogger.critical
}

console_logging_level_switch = {
    'debug':    consoleLogger.debug,
    'info':     consoleLogger.info,
    'warning':  consoleLogger.warning,
    'error':    consoleLogger.error,
    'critical': consoleLogger.critical
}

It has two functions:

def LoggingInit( logPath, logFile, html=True ):
    global fileLogger
    global consoleLogger

    logFormatStr = "[%(asctime)s %(threadName)s, %(levelname)s] %(message)s"
    consoleFormatStr = "[%(threadName)s, %(levelname)s] %(message)s"

    if html:
        logFormatStr = "<p>" + logFormatStr + "</p>"

    # File Handler for log file
    logFormatter = logging.Formatter(logFormatStr)
    fileHandler = logging.FileHandler( 
        "{0}{1}.html".format( logPath, logFile ))
    fileHandler.setFormatter( logFormatter )
    fileLogger.addHandler( fileHandler )

    # Stream Handler for stdout, stderr
    consoleFormatter = logging.Formatter(consoleFormatStr)
    consoleHandler = logging.StreamHandler() 
    consoleHandler.setFormatter( consoleFormatter )
    consoleLogger.addHandler( consoleHandler )

And:

def WriteLog( string, print_screen=True, remove_newlines=True, 
        level='debug' ):

    if remove_newlines:
        string = string.replace('\r', '').replace('\n', ' ')

    if print_screen:
        console_logging_level_switch[level](string)

    file_logging_level_switch[level](string)

I call LoggingInit from server.py, which initializes the file and console loggers. I then call WriteLog from all over the place, so multiple threads are accessing fileLogger and consoleLogger.

Do I need any further protection for my log file? The documentation states that thread locks are handled by the handler.

Override answered 5/6, 2013 at 0:25 Comment(10)
If you don't put lock when you want to write, your log records might mix and make an unreadable log.Toxinantitoxin
@Toxinantitoxin - what log records? at the logging level? aren't they covered by the python logging service? #2974400Insupportable
when you use "fileLogger.info('1 2 3 4')" in two different threads, your final log could be a mix of them, something like this "1 2 1 2 3 3 4 4"Toxinantitoxin
I faced the same problem today, you should put lock before and after logging to prevent this.Toxinantitoxin
doesn't that directly contradict the python docs (see my answer)? have you raised a bug? or am i mis-understanding?Insupportable
@AliBZ: No, any handler that implements createLock, acquire, and release—which FileHandler and StreamHandler both do—cannot do that. You will get 1 2 3 4 on one line, and 1 2 3 4 on another.Trishatriskelion
In this case I think my problem was something else, but I did face this problem and fixed it with locking.Toxinantitoxin
Side note: It's generally a bad idea to give variables the same name as stdlib modules or other top-level objects, like string. (Notice that SO highlighted string as if it were a constant. A fancy editor, linter, etc. may be similarly confused—and, more importantly, a human reader.)Trishatriskelion
@AliBZ: Were you actually using FileHandler? Obviously you can write your own Handler subclass that stubs out those methods, or that forgets to call super().__init__, or breaks things in some other way…Trishatriskelion
Possible duplicate of Is Python's logging module thread safe?Topotype
T
72

The good news is that you don't need to do anything extra for thread safety, and you either need nothing extra or something almost trivial for clean shutdown. I'll get to the details later.

The bad news is that your code has a serious problem even before you get to that point: fileLogger and consoleLogger are the same object. From the documentation for getLogger():

Return a logger with the specified name or, if no name is specified, return a logger which is the root logger of the hierarchy.

So, you're getting the root logger and storing it as fileLogger, and then you're getting the root logger and storing it as consoleLogger. So, in LoggingInit, you initialize fileLogger, then re-initialize the same object under a different name with different values.

You can add multiple handlers to the same logger—and, since the only initialization you actually do for each is addHandler, your code will sort of work as intended, but only by accident. And only sort of. You will get two copies of each message in both logs if you pass print_screen=True, and you will get copies in the console even if you pass print_screen=False.

There's actually no reason for global variables at all; the whole point of getLogger() is that you can call it every time you need it and get the global root logger, so you don't need to store it anywhere.


A more minor problem is that you're not escaping the text you insert into HTML. At some point you're going to try to log the string "a < b" and end up in trouble.

Less seriously, a sequence of <p> tags that isn't inside a <body> inside an <html> is not a valid HTML document. But plenty of viewers will take care of that automatically, or you can post-process your logs trivially before displaying them. But if you really want this to be correct, you need to subclass FileHandler and have your __init__ add a header if given an empty file and remove a footer if present, then have your close add a footer.


Getting back to your actual question:

You do not need any additional locking. If a handler correctly implements createLock, acquire, and release (and it's called on a platform with threads), the logging machinery will automatically make sure to acquire the lock when needed to make sure each message is logged atomically.

As far as I know, the documentation doesn't directly say that StreamHandler and FileHandler implement these methods, it does strongly imply it (the text you mentioned in the question says "The logging module is intended to be thread-safe without any special work needing to be done by its clients", etc.). And you can look at the source for your implementation (e.g., CPython 3.3) and see that they both inherit correctly-implemented methods from logging.Handler.


Likewise, if a handler correctly implements flush and close, the logging machinery will make sure it's finalized correctly during normal shutdown.

Here, the documentation does explain what StreamHandler.flush(), FileHandler.flush(), and FileHandler.close(). They're mostly what you'd expect, except that StreamHandler.close() is a no-op, meaning it's possible that final log messages to the console may get lost. From the docs:

Note that the close() method is inherited from Handler and so does no output, so an explicit flush() call may be needed at times.

If this matters to you, and you want to fix it, you need to do something like this:

class ClosingStreamHandler(logging.StreamHandler):
    def close(self):
        self.flush()
        super().close()

And then use ClosingStreamHandler() instead of StreamHandler().

FileHandler has no such problem.


The normal way to send logs to two places is to just use the root logger with two handlers, each with their own formatter.

Also, even if you do want two loggers, you don't need the separate console_logging_level_switch and file_logging_level_switch maps; calling Logger.debug(msg) is exactly the same thing as calling Logger.log(DEBUG, msg). You'll still need some way to map your custom level names debug, etc. to the standard names DEBUG, etc., but you can just do one lookup, instead of doing it once per logger (plus, if your names are just the standard names with different cast, you can cheat).

This is all described pretty well in the `Multiple handlers and formatters section, and the rest of the logging cookbook.

The only problem with the standard way of doing this is that you can't easily turn off console logging on a message-by-message basis. That's because it's not a normal thing to do. Usually, you just log by levels, and set the log level higher on the file log.

But, if you want more control, you can use filters. For example, give your FileHandler a filter that accepts everything, and your ConsoleHandler a filter that requires something starting with console, then use the filter 'console' if print_screen else ''. That reduces WriteLog to almost a one-liner.

You still need the extra two lines to remove newlines—but you can even do that in the filter, or via an adapter, if you want. (Again, see the cookbook.) And then WriteLog really is a one-liner.

Trishatriskelion answered 5/6, 2013 at 0:36 Comment(12)
Thanks for the additional tips. Does this mean all I need to do is specify names in getLogger() and it will work as intended?Override
and, @Trishatriskelion if I do want to continue logging to two different places (by specifying names in getLogger(), I will still need the globals, correct?Override
@Raskol: Well, it depends on what "intended" means, but that likely isn't what you really want. The idea behind getLogger is that you're supposed to use it for some kind of hierarchical log structuring; notice that the documentation for "Logger Objects" talks about "the recommended construction logging.getLogger(__name__) to make your logging hierarchy match your module hierarchy. You can subvert that to have two loggers side by side, but it's a weird thing to do.Trishatriskelion
@Raskol: No, you do not need the globals. If you call getLogger('name'), it will return the exact same object as the last call to getLogger('name'). This is guaranteed by the docs (and guaranteed to be thread-safe, too).Trishatriskelion
Also, I just import the log file into a php document, which takes care of the header and footer. I will have to escape the html though.. again, thanks for the help.Override
Let me edit the answer to show you the normal way to do this kind of thing.Trishatriskelion
i was about to post a similar question because my gunicorn-based development server is not writing logging.exception() atomically. when a bunch of threads fail simultaneously, the stack traces are being interleaved in the logfile. so it may work it may not be the expected behavior? have I missed something?Lanta
@rbp: Post your separate question. (And which platform you're using, and how you're configuring logging and gunicorn.) Most likely, the problem is that gunicorn is a forked server, meaning you have multiple separate processes, so a threading.RLock doesn't do any good. But the greenlets could also be related.Trishatriskelion
@Trishatriskelion you just answered my question!Lanta
@rbp: For the benefit of anyone coming by in the future who doesn't understand why that was enough to solve your problem: I assume you just subclassed your Handler, and overrode its createLock to self.lock = multiprocessing.RLock(),right?Trishatriskelion
@abarnat Now that I understood what the problem was, I was working on a solution. Thanks for posting it here. I had just written up a question and came back here to cross-post it for people to read/respond. If you xpost the answer over there, I'll upvote you and mark it as correct. #16973462Lanta
I wish I could explain this way! If there is a book you've written, I'd buy it :) .Toulon
I
8

Python logging is thread safe:

So you have no problem in the Python (library) code.

The routine that you call from multiple threads (WriteLog) does not write to any shared state. So you have no problem in your code.

So you are OK.

Insupportable answered 5/6, 2013 at 0:36 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.