Configuring the logging of a third party script
Asked Answered
C

5

22

I have a third party python console script, which source I don't want to modify.

But I want to configure the logging which is done by the script and its libraries. The script uses the standard python logging, but does not support configuration of it.

The script uses this pattern:

import logging
logger=logging.getLogger(__name__)

Use cases:

  • I want INFO messages of file foo.py to be ignored.
  • I want to include the PID in the loggings messages.

How can I configure the logging, if I don't want to modify the sources of the console script?

The script gets called via cron.

How can I configure the logging if this script?

Important

Creating a wrapper script like in this answer is not a solution for me.

The linux process hierarchy looks like this:

Cron -> third_party_script

There should be any "glue", "wrapping" or "dirty-hack" script between cron and third_party_script.

Why obtrusive/netpicking?

I want to practice "separation of concerns". I want to be able to configure logging one time and in one place. This configuration should get used by all python code of a virtualenv. Writing a wrapper would be a work-around. I want a solution.

Update

Several months later I think a pth file would an simple answer.

Craving answered 30/4, 2015 at 8:17 Comment(11)
You have 2 "topics" here: one is to modify the logging from that third party (which I believe is addressed in one of the solutions proposed); the other one is how (what you refer as strategy) you apply that solution. Can you confirm if one of the proposed solution works (or none works and why)?Profusive
@Profusive I updated the question to emphasis what is important in this question. Up to now I see only answer which works. But I would like to see a better one.Craving
In that case wouldn't it be better to re-qualify that questions as "Configure the logging of a running python service or script"? (I assume the script or service is already running, is it correct?)Profusive
@Profusive it is a script which you can call from the shell or via cron. There is no service involved in this context.Craving
@Craving if you do not want to edit third party code and if you can not modify the cron table I would argue that you are choosing already for the kacky way. With that in mind, we still need to know if the third party code is running at the moment you are trying to change the log format. If it is not, there is no way you can modify a logger that was not even created.Profusive
@Craving : is a dynamic code edit (patch lib code before it get called by the cron job, and revert the change after execution) an option?Profusive
@Profusive How could dynamic code edit be done? I have no clue.Craving
@guetlli by the way there was a proposal in the Python bug tracker for a $PYTHONRUNFIRST but it never really took off: bugs.python.org/issue14803Caliber
@Caliber thank you for this link!Craving
@Craving there's a new PEP proposal for a new way to handle this. python.org/dev/peps/pep-0648 so far it seems to be favorably viewed by the steering council mail.python.org/archives/list/[email protected]/message/…Caliber
@Caliber PEP648 looks good. Thank you for the link.Craving
C
7

I asked this question several months ago. Unfortunately I got no answer which satisfied me.

The distinction between using logging and setting it up is important for me.

This is my solution: In our context we set up logging in a method which gets called in usercustomize.py.

This way the optional plugins can use the logging without the need to set it up.

This almost solved all my needs.

Up to now I found no better way than usercustomize.py. My perfect solution would be something I would call virtualenvcustomize.py: Some initialization code which gets run if the interpreter loads virtualenv. Up to now I could not find such a hook. Please let me know if you have a solution.

Craving answered 11/4, 2016 at 12:10 Comment(1)
In order to use a virtualenv you need to either use sitecustomize.py or to use --system-site-packages when you create the virtualenv. On "normal" virtualenvs, usercustomize.py is not being imported. For details check my answer.Caliber
C
8

A library isn't supposed to configure logging - that's up to the application developer. Inbar Rose's answer isn't quite right. If the module you're referring to is called foo, then the reference to __name__ in its getLogger call will be passing in foo. So in your configuration code, you would need to do the equivalent of

logging.getLogger('foo').setLevel(logging.WARNING)

To include the PID in the logs, just ensure that you use an appropriate format string for your Formatters, i.e. one which includes %(process)d. A simple example would be:

logging.basicConfig(format='%(process)d %(message)s')

Note that you can't write to the same log file from multiple processes concurrently - you may need to consider an alternative approach if you want to do this.

Update: An application developer is someone who writes Python code which is not the library, but is invoked by e.g. a user or another script via a command line or other means of creating a Python process.

To use the code I posted above, there is no need to wrap or modify the third-party code, as long as it's a library. For example, in the main script which invokes the third-party library:

if __name__ == '__main__':
    # configure logging here
    # sets the third party's logger to do WARNING or greater
    # replace 'foo' with whatever the top-level package name your
    # third party package uses
    logging.getLogger('foo').setLevel(logging.WARNING)
    # set any other loggers to use INFO or greater,
    # unless otherwise configured explicitly
    logging.basicConfig(level=logging.INFO, format='%(process)d %(message)s')
    # now call the main function (or else inline code here)
    main()

If the third party code runs via cron, it's not library code - it's an application, and you are probably out of luck.

Carleencarlen answered 30/4, 2015 at 8:35 Comment(3)
That is supposing the file is actually called "foo.py" I assume the OP just doesn't want to share the name of his module/file. And is therefore using the Anonymous __name__. Clearly he must replace that with the correct log handler.Shout
Quoting from the answer: "A library isn't supposed to configure logging - that's up to the application developer". What's an "application developer"? I think configuring the logging needs to be done by an operator (the "op" part of "devop"). But maybe I am wrong.Craving
How can I execute logging.getLogger('foo').setLevel(logging.WARNING) without modifying or wrapping the third-party code?Craving
C
7

I asked this question several months ago. Unfortunately I got no answer which satisfied me.

The distinction between using logging and setting it up is important for me.

This is my solution: In our context we set up logging in a method which gets called in usercustomize.py.

This way the optional plugins can use the logging without the need to set it up.

This almost solved all my needs.

Up to now I found no better way than usercustomize.py. My perfect solution would be something I would call virtualenvcustomize.py: Some initialization code which gets run if the interpreter loads virtualenv. Up to now I could not find such a hook. Please let me know if you have a solution.

Craving answered 11/4, 2016 at 12:10 Comment(1)
In order to use a virtualenv you need to either use sitecustomize.py or to use --system-site-packages when you create the virtualenv. On "normal" virtualenvs, usercustomize.py is not being imported. For details check my answer.Caliber
P
5

Several possibilities:

Wrapper

If you can edit your cron table you could create a small script in python that get the lib logger, remove the existing log handler and hook your custom handler on it:

# Assumes the lib defines a logger object
from third_party_lib import *

# Note this assumes only one handler was defined by the lib
logger.removeHandler(logger.handlers[0])

# Then we can hook our custom format handler
custom_handler = logging.StreamHandler(sys.stdout)
custom_handler.setFormatter(logging.Formatter(format = '%(asctime)s %(levelname)s %(name)s %(process)d: %(message)s', None))
logger.addHandler(custom_handler)
logger.setLevel(logging.WARNING)

Also keep in mind this suppose the lib does not re-declare the logger on the way.

Dynamic code edit

If you do not have the possibility to modify the cron call, you might be able to do dynamic code edit but that is equivalent to editing the file by hand (hacky):

  • get the third party file containing the logger configuration
  • modify that and save back the modified version
  • the cron job starts the task using third party code
  • after execution of the cron job, you restore the file to their original state.
Profusive answered 22/2, 2017 at 8:27 Comment(7)
I would like to avoid to write wrapper.Craving
Normally your wrapper is as simple as the code above where you import "*" form the third party lib. But you could also include it in your own python code (you would not need a wrapper then). But I think we are missing some element in your description: Are you trying to modify the module logging after this is started via the cron job?Profusive
you said "Are you trying to modify the module logging after this is started via the cron job?". What you said is a "strategy". Up to now I have only a goal, and I don't care which strategy gets used to get to the goal. My goal: I don't want to write a wrapper, and I want to configure the logging of code which I won't modify.Craving
2 questions still: - Although this should not matter, I would like to know which third party lib are we talking about? - Is it ok to change the call command from your cron table?Profusive
why do you want to which third party script I am talking about? For me it does not matter at all. Call it "say_om", "have_fun" or "some_fundamentals_are_not_solved_yet".Craving
No indeed it should not matter but your question is twofold: logging configuration and how to apply it. It is tightly linked to your context thus knowing what lib you use would clarify a bit more how that lib is running / triggered / what access we can expect or not... If you do not want to disclose that info, that is fine but you need to provide some details for instance: Is the lib already running at the time you want to configure the logging? Or do you have control over the cron trigger (meaning you could do some things before the lib get started)?Profusive
The question contains "The script uses this pattern: ..." I don't think more is needed. The script and all libraries just use the two line pattern from the question. Please tell me what more information you need. What do you mean with "is the lib already running"? I am confused. I use the term "running" for servers, not for libraries. You said "meaning you could do some things before the lib get started". Yes, this would be a solution. But writing a wrapper is no solution for me.Craving
C
5

tl;dr

In a nutshell, what we want to do is to inject code that gets executed by the python interpreter before our main code gets executed.

The best way to achieve this is to create a virtualenv and add sitecustomize.py in the virtualenv's site-packages.

Demonstration

Let's assume that the application we want to run is called my_app.py and that its logger has the same name.

$ cat my_app.py

import logging
logger = logging.getLogger("my_app")

logger.debug("A debug message")
logger.info("An info message")
logger.warning("A warning message")
logger.error("An error message")

Running my_app.py should only show the messages whose severity is > WARNING (which is the default behavior in python's logging).

$ python my_app.py

A warning message
An error message

Now let's create a virtual environment

python3 -m venv my_venv

And let's add sitecustomize.py to the virtualenv's site packages.

$ cat my_venv/lib/python3.7/site-packages/sitecustomize.py

import logging

# Setup logging for my_app
# We will only setup a console handler
logger = logging.getLogger("my_app")
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setFormatter(
    logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
)
logger.addHandler(ch)

Now let's try to run my_app.py using the virtualenv:

$ ./my_venv/bin/python my_app.py

2019-01-25 16:03:16,815 - my_app - DEBUG - A debug message
2019-01-25 16:03:16,815 - my_app - INFO - An info message
2019-01-25 16:03:16,815 - my_app - WARNING - A warning message
2019-01-25 16:03:16,815 - my_app - ERROR - An error message

And that was all) We got proper logging without having to modify my_app.py or write a wrapper!

Now if you want to know why this is the optimal approach, keep on reading.

(Really) long answer

Before understanding why using a virtualenv + sitecustomize.py is the correct approach to this problem, we need to make a not so short introduction.

Note: I am going to assume that you create virtual environments using the venv module which uses stdlib's site.py. The virtuaelnv library, uses its own site.py and might be doing things slightly differently. Nevertheless, after reading this answer you should be able to examine if there are any differences between venv and vitualenv and understand how to handle them.

What are site-packages

The short answer is that site-packages is the place where python installs 3rd party code (as in non-stdlib code). For more info read this and the provided links.

How to inject code?

Python does allow you to customize the python interpreter while it starts up, i.e. before our main code/script/whatever gets executed. This can be useful for e.g.:

  • code-coverage
  • profilers
  • and to generally inject code, as we need to do for this question.

The way you achieve the injection is by creating/modyfing either sitecustomize.py or usercustomize.py. You can also use a "path configuration file" (i.e. *.pth) file with an import statement but I will not cover this case here since:

  • These import statements feel like a terrible hack
  • I am not convinced that it offers any real advantage compared to sitecustomize/usercustomize.
  • I would like to keep things simple

Anyway, if you need more info WRT to path configuration files you can check PyMOTW and if you want an example of using them with import statement check this blog post.

sitecustomize & usercustomize

So, sitecustomize and usercustomize are special files that don't exist by default, but if we create them python will automatically import them before it starts to execute our code. We can create these files:

  • either on the global site-packages (e.g. /usr/lib/python3.7/site-packages/)
  • or on the user site-packages (e.g. ~/.local/lib/python3.7/site-packages/)

sitecustomize is always imported before usercustomize. If either file is missing, the ImportError is silently ignored.

As a security precaution, if there is a mismatch between user or group id and the effective id, then user site-packages are disabled (source). Moreover, the python interpreter has CLI arguments that either completely disable site-packages (both system and user ones) or disable user site-packages. Assuming that we don't have an id mismatch and that we don't use any CLI flags, then user site-packages have higher priority than system site-packages. So if we have both:

~/.local/lib/python3.7/site-packages/sitecustomize.py
/usr/lib/python3.7/site-packages/sitecustomize.py

the first one is the one that will be imported. We can actually check the sys.path priority by executing the site.py module:

$ python3 -msite

sys.path = [
    '/tmp/test',
    '/usr/lib/python37.zip',
    '/usr/lib/python3.7',
    '/usr/lib/python3.7/lib-dynload',
    '/home/username/.local/lib/python3.7/site-packages',      # user site-packages
    '/usr/lib/python3.7/site-packages',                       # system site-packages
]
USER_BASE: '/home/username/.local' (exists)
USER_SITE: '/home/username/.local/lib/python3.7/site-packages' (exists)
ENABLE_USER_SITE: True

The important information here is the value of ENABLE_USER_SITE. If it is True then user site-packages are enabled. If it False then we can only use global site-packages. E.g. if we use python -s:

$ python3 -s -msite

sys.path = [
    '/tmp/test',
    '/usr/lib/python37.zip',
    '/usr/lib/python3.7',
    '/usr/lib/python3.7/lib-dynload',
    '/usr/lib/python3.7/site-packages',
]
USER_BASE: '/home/username/.local' (exists)
USER_SITE: '/home/username/.local/lib/python3.7/site-packages' (exists)
ENABLE_USER_SITE: False

Note that in this case ENABLE_USER_SITE is False.

Just for completeness, let's completely disable the site-packages:

$ python3 -S -msite

sys.path = [
    '/tmp/test',
    '/usr/lib/python37.zip',
    '/usr/lib/python3.7',
    '/usr/lib/python3.7/lib-dynload',
]
USER_BASE: '/home/username/.local' (exists)
USER_SITE: '/home/username/.local/lib/python3.7/site-packages' (exists)
ENABLE_USER_SITE: None

Experiment

In order to better understand this, let's do an experiment. First let's create usercustomize sitecustomize modules in both system and user site-packages.

WARNING: We will be creating files in system site-packages. This will be interfering with your python distribution. Be careful and REMEMBER to remove them when we are done.

# system site packages
echo 'print(f"-> {__file__}")' | sudo tee /usr/lib/python3.7/site-packages/usercustomize.py
echo 'print(f"-> {__file__}")' | sudo tee /usr/lib/python3.7/site-packages/sitecustomize.py

# user site packages
echo 'print(f"-> {__file__}")' | tee ~/.local/lib/python3.7/site-packages/usercustomize.py
echo 'print(f"-> {__file__}")' | tee ~/.local/lib/python3.7/site-packages/sitecustomize.py

Let's also create a python module:

echo 'print("Inside foo")' | tee foo.py

Now let's execute foo.py:

$ python3 foo.py

-> /home/username/.local/lib/python3.7/site-packages/sitecustomize.py
-> /home/username/.local/lib/python3.7/site-packages/usercustomize.py
Inside foo

As we can see:

  • both sitecustomize and usercustomize are being imported
  • they are being imported from user site-packages

Now what will happen if we disable user site-packages?

$ python3 -s foo.py

-> /usr/lib/python3.7/site-packages/sitecustomize.py
Inside foo

This time we see that we:

  • only sitecustomize is being imported. Even though usercustomize exists in system site-packages python does not import it! This is important! Keep it mind for when we discuss virtualenvs! (TIP: this has to do with ENABLE_USER_SITE; do you remember what value does it have in this case?)
  • sitecustomize is being imported from system site-packages

Finally, if we completely disable site-packages, obviously usercustomize and sitecustomize will be ignored:

$ python3 -S foo.py

Inside foo

What about virtualenvs?

OK, now let's bring virtualenv's into the game too. There are two types of virtualenvs:

  • the normal ones
  • those created with --system-site-packages.

Let's create virtualenvs of both types

python3 -mvenv venv_no_system
python3 -mvenv --system-site-packages venv_system 

Let's also create sitecustomize.py and usercustomize.py modules into the virtualenv's site-packages:

echo 'print(f"-> {__file__}")' | tee ./venv_no_system/lib/python3.7/site-packages/usercustomize.py
echo 'print(f"-> {__file__}")' | tee ./venv_no_system/lib/python3.7/site-packages/sitecustomize.py

echo 'print(f"-> {__file__}")' | tee ./venv_system/lib/python3.7/site-packages/usercustomize.py
echo 'print(f"-> {__file__}")' | tee ./venv_system/lib/python3.7/site-packages/sitecustomize.py

and let's see the differences:

$ ./venv_no_system/bin/python -msite

/tmp/test/venv_no_system/lib/python3.7/site-packages/sitecustomize.py
sys.path = [
    '/tmp/test',
    '/usr/lib/python37.zip',
    '/usr/lib/python3.7',
    '/usr/lib/python3.7/lib-dynload',
    '/tmp/test/venv_no_system/lib/python3.7/site-packages',
]
USER_BASE: '/home/username/.local' (exists)
USER_SITE: '/home/username/.local/lib/python3.7/site-packages' (exists)
ENABLE_USER_SITE: False

What do we see here? That on normal virtualenv's ENABLE_USER_SITE is False, which means that:

  1. user site-packages are being ignored
  2. only sitecustomize is being imported! I.e. we can't use usercustomize to inject code!!!

We can also see that instead of our global site-packages, the virtualenv is using its own one (i.e. /tmp/test/venv_no_system/lib/python3.7/site-packages).

Now let's repeat this but this time with the virtualenv that uses the system site-packages:

$ ./venv_system/bin/python -msite

-> /home/username/.local/lib/python3.7/site-packages/sitecustomize.py
-> /home/username/.local/lib/python3.7/site-packages/usercustomize.py
sys.path = [
    '/tmp/test',
    '/usr/lib/python37.zip',
    '/usr/lib/python3.7',
    '/usr/lib/python3.7/lib-dynload',
    '/tmp/test/venv_system/lib/python3.7/site-packages',
    '/home/username/.local/lib/python3.7/site-packages',
    '/usr/lib/python3.7/site-packages',
]
USER_BASE: '/home/username/.local' (exists)
USER_SITE: '/home/username/.local/lib/python3.7/site-packages' (exists)
ENABLE_USER_SITE: True

In this case, the behavior is different...

ENABLE_USER_SITE is True which means:

  1. that user site-packages are being enabled
  2. that usercustomize is being imported normally.

But there is also one more difference. In this case we have 3 site-packages directories. The virtualenv one is the one with the higher priority followed by the user site-packages while the system site-packages is the last one.

So what to use?

I think that there are three options:

  1. use the system python installation
  2. use a normal virtualenv
  3. use a virtualenv with system site-packages

I think that in the majority of the cases, e.g. on normal servers/desktops, modifying the system python installation should generally be avoided. At least on *nix, too many things depend on Python. I would be really reluctant to change its behavior. Possible exceptions are ephemeral or static "systems" (e.g. inside a container).

As far as virtual environments go, unless we know we are going to need system site-packages, I think it makes sense to stick with the current practise and use normal ones. If we adhere to this, then in order to inject code before our script gets executed we only have one option:

To add a sitecustomize.py in the virtuaelenv's site packages.

Cleaning up

# remove the virtualenvs
rm -rf my_venv
rm -rf venv_system
rm -rf venv_no_system

# remove our usercustomize.py and sitecustomize.py
sudo rm /usr/lib/python3.7/site-packages/sitecustomize.py
sudo rm /usr/lib/python3.7/site-packages/usercustomize.py
rm ~/.local/lib/python3.7/site-packages/sitecustomize.py
rm ~/.local/lib/python3.7/site-packages/usercustomize.py

# remove the modules
rm foo.py
rm my_app.py
Caliber answered 25/1, 2019 at 15:1 Comment(6)
I am writing a 3rd-party package tool where I need to add a module (*customize.py) to alter Python's builtin behavior and I was thinking that installing it as a user-site package would be safer than a system-site package, even if the tool has to create the user dir itself (ENABLE_USER_SITE: False). That was prolly not gonna go well... Thx! for heads-up.Tainataint
@Tainataint I guess that when you create a virtual environment the user site-packages don't make much sense. The virtual environment is an isolated python distribution. You can do whatever you want in that distribution so why not modify the sitecustomize.py? Moreover by ignoring user-site-packages you help in keeping the virtualenv immutable.Caliber
That totally makes sense to me and a wise, defacto position to take... That in my particular case, I would prefer to have lazy and only added it once in my user-site is something that would, in all likelihood... bit me. But, I sure wish it was obvious somewhere that I have been unable to find (thank gawd I at least found your answer here!) documented.Tainataint
.pth files have the advantage that multiple projects can install their own such files and so minimise clashing. Like multiple projects registering custom code transformation hooks.Frydman
@MartijnPieters thank you, this is really interesting. I agree that *.pth files do indeed make a lot of sense for libraries. Nevertheless, when you as the "end-user" want to control what happens before your code gets imported, then it feels likesitecustomize.py is perhaps a better option. If nothing else, why create two files instead of one (*.pth + the module that get's imported)? And if you have several such changes that you need to make why not keep them in one place instead of searching for *.pth files?Caliber
Sorry, I should have been less terse there; I was pointing out a specific use case for .pth files, why you'd want to use them instead of sitecustomize.py. You stated in your answer you didn't see any reason why you'd use them, that's where they'd be helpful. Also, gkedge commented on an answer of mine, which is what drew me here, and their usecase may well fit pth files better.Frydman
S
2

You can change the minimum log level of that logger.

logging.getLogger(__name__).setLevel(logging.WARNING)

Now, only WARNING and above will be shown. No INFO and no DEBUG.

Addionally, you could also change the format. %(process)d being the PID.

log_format = logging.Formatter('%(asctime)s %(levelname)s %(name)s %(process)d: %(message)s', '%H:%M:%S')
logging.getLogger(__name__).setFormatter(log_format)

All together:

log_format = logging.Formatter('%(asctime)s %(levelname)s %(name)s %(process)d: %(message)s', '%H:%M:%S')
log_handle = logging.getLogger(__name__)
log_handle.setLevel(logging.WARNING)
log_handle.setFormatter(log_format)

Note: You should replace __name__ in your code with the relevant log handler.

Shout answered 30/4, 2015 at 8:22 Comment(7)
How can I execute logging.Formatter() without modifying or wrapping the third-party code?Craving
Do it from your script. The one that calls the other.Shout
The third party script gets called via cron.Craving
There is no way to modify the logging behavior of a script without being able to access it. you must wrap/embed the "third-party-script" with your own code. You can have the cron activate your own script which wraps the third-party one.Shout
is there no way to hook into python, to configure logging before the first line of the third party python script gets called?Craving
You need to hook into the main Python process that is running the third-party script in order to access its logging module.Shout
you said "you must wrap/embed the third-party-script". Yes, you can wrap it. But I don't think the word "must" fits here. I found a solution. We use usercustomize.py. See my answer below.Craving

© 2022 - 2024 — McMap. All rights reserved.