How to check does a file imports from another file in Python
Asked Answered
F

2

9

Suppose i have a project structure like this

src
└── app
    ├── main.py
    ├── db
    │   └── database.py
    ├── models
    │   ├── model_a.py
    │   └── model_b.py
    └── tests
        ├── test_x.py
        └── test_y.py

I want to check which file uses a class or a function from another file. I have a class called Test in main.py

class Test:
    pass

I used that class in model_a,

from ..main import Test

But in model_b i used

from ..main import Test
from ..db.database import Data

I want to to check which file uses another file, just like tree command, just a folder name is enough so i tried an old method but it was inefficient ,dirty and that was not something that i expect. The method was i created a file in src named check.py, i imported all packages

from app.db import database
from app.models import model_a, model_b
from app.tests import test_x, test_y
from app import main 

print('__file__={0:<35} | __name__={1:<20} | __package__={2:<20}'.format(__file__,__name__,str(__package__)))

And i added this line in the bottom of all files

print('__file__={0:<35} | __name__={1:<20} | __package__={2:<20}'.format(__file__,__name__,str(__package__)))

So when i run check.py i get this result

__file__=/home/yagiz/Desktop/struct/src/app/main.py | __name__=app.main             | __package__=app                 
__file__=/home/yagiz/Desktop/struct/src/app/db/database.py | __name__=app.db.database      | __package__=app.db              
__file__=/home/yagiz/Desktop/struct/src/app/models/model_a.py | __name__=app.models.model_a   | __package__=app.models          
__file__=/home/yagiz/Desktop/struct/src/app/models/model_b.py | __name__=app.models.model_b   | __package__=app.models          
__file__=/home/yagiz/Desktop/struct/src/app/tests/test_x.py | __name__=app.tests.test_x     | __package__=app.tests           
__file__=/home/yagiz/Desktop/struct/src/app/tests/test_y.py | __name__=app.tests.test_y     | __package__=app.tests           
__file__=/home/yagiz/Desktop/struct/src/check.py | __name__=__main__             | __package__=None   

The result is dirty and doesn't meet my expectations is there a way to get a output like this?

main.py = app/models/model_a, app/models/model_b   # These files imports something from main.py
models_b = None                                    # No file imports from models_b

Update, i tried @Hessam Korki's suggestion it doesn't works.

I looked up the source code of modulefinder and i found it adds a badmodule in every import statement which is not useful for me.

Here is how did it go, first i created a function, also i created an another project structure.

src
├── empty.py
├── __init__.py
├── main.py
├── module_finder.py
├── other
│   └── other.py
├── test
│   └── some_app.py
└── this_imports.py

Here is the module_finder.py that contains my function

from modulefinder import ModuleFinder
file_names = ["this_imports.py", "main.py", "test/some_app.py", "other/other.py", "empty.py"]

def check_imports(file_names):
    finder = ModuleFinder()
    for file in file_names:
        finder.run_script(file)
        print("\n", file)
        for name, mod in finder.modules.items():
            print('%s: ' % name, end='')
            print(','.join(list(mod.globalnames.keys())[:3]))

            print('\n'.join(finder.badmodules.keys()))

Empty file is empty(as expected), in main.py i have

class Test:
    pass

In this_imports.py i only have

from src.main import Test

In other/other.py i have

from src.main import Test 
from src.test import DifferentTest

And for the last one in test/some_app.py i have

from src.main import Test

class DifferentTest:
    pass

So the result should be:

empty.py = None
main.py = None
other/other.py = src.main , src.test
test/some_app.py = src.main
this_imports.py = src.main 

But the function gives a wrong result, here is the output:

 Filename:  this_imports.py
__main__: Test
src.main

 Filename:  main.py
__main__: Test,__module__,__qualname__
src.main

 Filename:  test/some_app.py
__main__: Test,__module__,__qualname__
src.main

 Filename:  other/other.py
__main__: Test,__module__,__qualname__
src.main
src.test

 Filename:  empty.py
__main__: Test,__module__,__qualname__
src.main
src.test
Fief answered 4/8, 2020 at 13:50 Comment(2)
Why not use a simple grep with regex? grep -rnw . -e "from\s\.*main\simport\sTest"Tying
@RamonMoraes i have no problem with finding import statements. My question is how to understand which file imports anything from another file.Fief
U
3

What you are looking for is to find import dependencies in your package modules. You can run a static analysis on your package directory and parse the import nodes in the syntax trees (ast), and build a dependency graph. Something like below:

import os
from ast import NodeVisitor, parse
import networkx as nx

class Dependency():
    def __init__(self, root):
        self.root = root
        self.base = os.path.basename(root)
        self.dependency = nx.DiGraph()
        self.visitor = NodeVisitor()
        self.visitor.visit_ImportFrom = self.visit_ImportFrom
        
        self.current_node = None
        self.dependency.add_node = self.base
        
    def visit_ImportFrom(self, node):
        self.dependency.add_edge(node.module, self.current_node)
        self.visitor.generic_visit(node)
        
    def run(self):
        for root, dirs, files in os.walk(self.root):
            for file in files:
                full_path = os.path.join(root+os.sep, file)
                loc = full_path.split(self.root+os.sep)[1].replace(os.sep,'.')
                self.current_node = self.base+'.'+loc
                with open(full_path) as fp:
                    src = fp.read()
                    tree = parse(src)
                    self.visitor.generic_visit(tree)
                    
        dependency = {}
        for src, target in nx.dfs_edges(self.dependency):
            if src in dependency:
                dependency[src].add(target)
            else:
                dependency[src] = set([target])
                
        return dependency

For the root location of any package you want to map the import dependencies, you need to do the following then:

root = "path/to/your/src"
d = Dependency(root)
d.run()

This will return the dependency tree (as a dict). Note, we parsed only ImportFrom, you need to add Import to make it complete. Also, all imports are assumed absolute here (i.e. no .. etc). If required, you can add that too (check the level field of the ImportFrom node to do that).

Uremia answered 14/8, 2020 at 17:20 Comment(7)
First of all thanks, secondly it throws an error on fp.read() in line29 i tried open with it rb and encoding with other utf-8 stuff but they didn't worked out. I would be grateful if you can take a look.Fief
Also i changed the import statemens and used absolute import (src.main instead of ..main etc.)Fief
I ran this on Python 3.7. What error do you get here? Btw, it is reading python source file, so should not read binary I suppose.Uremia
I'm getting UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 16: invalid continuation byteFief
I see, it seems to be an unrelated error. Note the loop in the snippet assumes you have only .py files in your package directory. If that is not true, you can put a check to read only the python code files. Else, I suggest you try to do open and read(fp) the file directly (from a different script) to debug the issue.Uremia
I think you have .pyc files in there as well, add a check like below _, ext = os.path.splitext(file), to get the file extension. If that is not .py, simply continue in the file read loop.Uremia
It worked like a charm when i delete the __pycache__ thank you!Fief
C
4

I believe python's Modulefinder will effectively solve your problem. There is a key named '__main__' in the Modulefinder().items() which holds the modules that were imported in a python file. After running the script through your project and storing the data in a way that suits your purpose, you should be good to go

Crowe answered 8/8, 2020 at 0:44 Comment(11)
Thanks for your answer, I tried your approach, it gave a wrong result, i updated the question, please check it out.Fief
I checked your exact example on my machine. The reason you're getting wrong results is import errors, try running 'this_imports.py' directly; that can be solved by adding the 'src' path to your sys.path. However, you can find these with import errors in finder.badmodules dictionary right now(check the last line of each file that you printed your module report)Crowe
But why am i getting 2 different modules in badmodules even it's empty(see the result of empty.py)? also when i used from main instead of from src.main now the badmodules dict is empty so it's not returning any imported modules even if it imports successfully.Fief
The reason “empty.py” returns badmodules has something to do with the content of your “__init__.py”Crowe
When you run_script() your modules cleanly. There is a pair data in finder.modules dictionary for each module that has been imported to that file. The key is the name of the imported module and the value is the path of that module which I believe is the thing you’re looking for. Iterate through the dictionary and save the path of each one in a way that you are comfortable with.Crowe
Just asking for clarifying, how does __init__.py affects to the other files since it's empty. also i looked source code of modulefinder and i found _add_badmodule has two statements if and else it adds badmodule in every condition when a file is imported. check it out.Fief
@YagizcanDegirmenci once a module is in system you will see it when analyzing the next file, even if it's not imported by that file. Do you need this for an external check or you need it to run within the project? Why not just greping imports? And if you need it in python, do a os.walk before to get the list of all packages below src. Ping me if you want me to implement it in python or bashToxicosis
@Toxicosis i need to check this externally, Q: Why not just greping imports? : It's going to be a part of a library so it needs to be os independent(maybe subprocessing can help), also i think i didn't explained the question well, i need only file names and which file imports anything from another file. it so i only need imports to understand is it using a user-created-function-or-class i need to understand this, i don't need all the other imports from std lib or downloaded imports etc.Fief
@YagizcanDegirmenci I understand, but if you store all filenames in a list then you only check for those imports. Also, I am not sure, but I believe that os.walk is independent of the operative system.Toxicosis
Oh that's a great idea, but as i remember os.walk() is slow, and this library going to used on deeply nested projects to understand how project works, kinda like getting project hierarchy, os.walk() calls os.listdir() on each directory it executes the stat() system call or GetFileAttributes() on each file to determine whether the entry is a directory or not, maybe i can go for a workaround with some bash thing like find . -name "*.py" -exec grep import {} \;Fief
Check how slow os.walk is with time. I don't think it's going to be noticeable slower than any other method unless you keep millions of files in the path of a project, which is not adviced. Give it a try. os.walk and then re.search for "^import" or "^from\ [A-Z,a-z]*\ import", etc...Toxicosis
U
3

What you are looking for is to find import dependencies in your package modules. You can run a static analysis on your package directory and parse the import nodes in the syntax trees (ast), and build a dependency graph. Something like below:

import os
from ast import NodeVisitor, parse
import networkx as nx

class Dependency():
    def __init__(self, root):
        self.root = root
        self.base = os.path.basename(root)
        self.dependency = nx.DiGraph()
        self.visitor = NodeVisitor()
        self.visitor.visit_ImportFrom = self.visit_ImportFrom
        
        self.current_node = None
        self.dependency.add_node = self.base
        
    def visit_ImportFrom(self, node):
        self.dependency.add_edge(node.module, self.current_node)
        self.visitor.generic_visit(node)
        
    def run(self):
        for root, dirs, files in os.walk(self.root):
            for file in files:
                full_path = os.path.join(root+os.sep, file)
                loc = full_path.split(self.root+os.sep)[1].replace(os.sep,'.')
                self.current_node = self.base+'.'+loc
                with open(full_path) as fp:
                    src = fp.read()
                    tree = parse(src)
                    self.visitor.generic_visit(tree)
                    
        dependency = {}
        for src, target in nx.dfs_edges(self.dependency):
            if src in dependency:
                dependency[src].add(target)
            else:
                dependency[src] = set([target])
                
        return dependency

For the root location of any package you want to map the import dependencies, you need to do the following then:

root = "path/to/your/src"
d = Dependency(root)
d.run()

This will return the dependency tree (as a dict). Note, we parsed only ImportFrom, you need to add Import to make it complete. Also, all imports are assumed absolute here (i.e. no .. etc). If required, you can add that too (check the level field of the ImportFrom node to do that).

Uremia answered 14/8, 2020 at 17:20 Comment(7)
First of all thanks, secondly it throws an error on fp.read() in line29 i tried open with it rb and encoding with other utf-8 stuff but they didn't worked out. I would be grateful if you can take a look.Fief
Also i changed the import statemens and used absolute import (src.main instead of ..main etc.)Fief
I ran this on Python 3.7. What error do you get here? Btw, it is reading python source file, so should not read binary I suppose.Uremia
I'm getting UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 16: invalid continuation byteFief
I see, it seems to be an unrelated error. Note the loop in the snippet assumes you have only .py files in your package directory. If that is not true, you can put a check to read only the python code files. Else, I suggest you try to do open and read(fp) the file directly (from a different script) to debug the issue.Uremia
I think you have .pyc files in there as well, add a check like below _, ext = os.path.splitext(file), to get the file extension. If that is not .py, simply continue in the file read loop.Uremia
It worked like a charm when i delete the __pycache__ thank you!Fief

© 2022 - 2024 — McMap. All rights reserved.