How to modify imported source code on-the-fly?
Asked Answered
T

6

25

Suppose I have a module file like this:

# my_module.py
print("hello")

Then I have a simple script:

# my_script.py
import my_module

This will print "hello".

Let's say I want to "override" the print() function so it returns "world" instead. How could I do this programmatically (without manually modifying my_module.py)?


What I thought is that I need somehow to modify the source code of my_module before or while importing it. Obvisouly, I cannot do this after importing it so solution using unittest.mock are impossible.

I also thought I could read the file my_module.py, perform modification, then load it. But this is ugly, as it will not work if the module is located somewhere else.

The good solution, I think, is to make use of importlib.

I read the doc and found a very intersecting method: get_source(fullname). I thought I could just override it:

def get_source(fullname):
    source = super().get_source(fullname)
    source = source.replace("hello", "world")
    return source

Unfortunately, I am a bit lost with all these abstract classes and I do not know how to perform this properly.

I tried vainly:

spec = importlib.util.find_spec("my_module")
spec.loader.get_source = mocked_get_source
module = importlib.util.module_from_spec(spec)

Any help would be welcome, please.

Throaty answered 25/1, 2017 at 17:42 Comment(5)
my_module does not define print() which is a built-in function in Python 3.x.Burier
@Burier I do not understand what is your point. I use Python 3 so there is no problem using print() without defining it.Throaty
You said you wanted to override the print() function, and I was just pointing out that it's not defined in the module you're importing.Burier
@Burier I see, thank you, indeed I cannot properly "override" the print function, I should rather say that I want to monkey-patch it.Throaty
Also note that doing it for print() might be different than just a general function because it's a built-in.Burier
O
25

Here's a solution based on the content of this great talk. It allows any arbitrary modifications to be made to the source before importing the specified module. It should be reasonably correct as long as the slides did not omit anything important. This will only work on Python 3.5+.

import importlib
import sys

def modify_and_import(module_name, package, modification_func):
    spec = importlib.util.find_spec(module_name, package)
    source = spec.loader.get_source(module_name)
    new_source = modification_func(source)
    module = importlib.util.module_from_spec(spec)
    codeobj = compile(new_source, module.__spec__.origin, 'exec')
    exec(codeobj, module.__dict__)
    sys.modules[module_name] = module
    return module

So, using this you can do

my_module = modify_and_import("my_module", None, lambda src: src.replace("hello", "world"))
Overcasting answered 25/1, 2017 at 23:43 Comment(5)
Thank you for taking the time to help me! Your solution is probably the best way to go.Throaty
For Python 3 (I thought Python 2 as well), you need to get ride of the =None part of package=None. Otherwise, you will get SyntaxError: non-default argument follows default argumentMichele
There's also a video of David Beazley's Modules and Packages presentation on youtube.Burier
I have tested this solution using python 3.6. new_source contains the code of the modified module; however, the returned module contains the original code. Any idea on how to make it work?Misshapen
This ModulePackage.pdf is very informative and helpful, thanks. BTW, according to the document of importlib.util.find_spec, If name is for a submodule (contains a dot), the parent module is **automatically imported**., which is probably not what we wanted. If module_name='a.b', then we should modify the second to last line of the function to sys.modules['a'].b=module\nsys.modules['a.b']=module.Flybynight
B
5

This doesn't answer the general question of dynamically modifying the source code of an imported module, but to "Override" or "monkey-patch" its use of the print() function can be done (since it's a built-in function in Python 3.x). Here's how:

#!/usr/bin/env python3
# my_script.py

import builtins

_print = builtins.print

def my_print(*args, **kwargs):
    _print('In my_print: ', end='')
    return _print(*args, **kwargs)

builtins.print = my_print

import my_module  # -> In my_print: hello
Burier answered 26/1, 2017 at 0:32 Comment(0)
T
5

I first needed to better understand the import operation. Fortunately, this is well explained in the importlib documentation and scratching through the source code helped too.

This import process is actually split in two parts. First, a finder is in charge of parsing the module name (including dot-separated packages) and instantiating an appropriate loader. Indeed, built-in are not imported as local modules for example. Then, the loader is called based on what the finder returned. This loader get the source from a file or from a cache, and executed the code if the module was not previously loaded.

This is very simple. This explains why I actually did not need to use abstract classes from importutil.abc: I do not want to provide my own import process. Instead, I could create a subclass inherited from one of the classes from importuil.machinery and override get_source() from SourceFileLoader for example. However, this is not the way to go because the loader is instantiated by the finder so I do not have the hand on which class is used. I cannot specify that my subclass should be used.

So, the best solution is to let the finder do its job, and then replace the get_source() method of whatever Loader has been instantiated.

Unfortunately, by looking trough the code source I saw that the basic Loaders are not using get_source() (which is only used by the the inspect module). So my whole idea could not work.

In the end, I guess get_source() should be called manually, then the returned source should be modified, and finally the code should be executed. This is what Martin Valgur detailed in his answer.

If compatibility with Python 2 is needed, I see no other way than reading the source file:

import imp
import sys
import types

module_name = "my_module"

file, pathname, description = imp.find_module(module_name)

with open(pathname) as f:
    source = f.read()

source = source.replace('hello', 'world')

module = types.ModuleType(module_name)
exec(source, module.__dict__)

sys.modules[module_name] = module
Throaty answered 26/1, 2017 at 10:6 Comment(0)
O
4

If importing the module before the patching it is okay, then a possible solution would be

import inspect

import my_module

source = inspect.getsource(my_module)
new_source = source.replace('"hello"', '"world"')
exec(new_source, my_module.__dict__)

If you're after a more general solution, then you can also take a look at the approach I used in another answer a while ago.

Overcasting answered 25/1, 2017 at 17:57 Comment(4)
How could this be useful to me? How would you change the print value using your workaround?Throaty
Sorry, I assumed you wanted a generic method to monkey patch any part of a module. Reading your question again it seems that you wish to avoid importing the module first, in which case I agree, my solution would not be relevant here.Overcasting
I rewrote my answer entirely. Is this useful to you? If not, I'll delete it.Overcasting
Thank you. This is not useful to me but this could help someone else (who would not have the problem of mocking before importing) so please do not delete your answer. ;)Throaty
P
1

My solution updates the source file, which works for the inner import situation. The inner import means that transformers.models.albert import modeling_albert from the source file. In such case, even if I use the solution from Martin Valgur, it won't work. So I update the source file. Hope it help the people who have the same trouble with me.

import inspect
from transformers.models.albert import modeling_albert

# Get source
source = inspect.getsource(modeling_albert)
source_before = "AlbertModel(config, add_pooling_layer=False)"
source_after = "AlbertModel(config, add_pooling_layer=True)"
new_source = source.replace(source_before, source_after)

# Update file
file_path = modeling_albert.__spec__.origin
with open(file_path, 'w') as f:
    f.write(new_source)
Pritchard answered 21/2, 2023 at 12:59 Comment(0)
F
0

Not elegant, but works for me (may have to add a path):

with open ('my_module.py') as aFile:
    exec (aFile.read () .replace (<something>, <something else>))
Farmer answered 25/1, 2017 at 17:48 Comment(1)
I precised I would like to avoid having to specify the module path. Moreover, as you said exec() is not elegant at all, it should exist a better solution.Throaty

© 2022 - 2024 — McMap. All rights reserved.