How to refer to relative paths of resources when working with a code repository
Asked Answered
W

10

220

We are working with a code repository which is deployed to both Windows and Linux - sometimes in different directories. How should one of the modules inside the project refer to one of the non-Python resources in the project (CSV files, etc.)?

If we do something like:

thefile = open('test.csv')

or:

thefile = open('../somedirectory/test.csv')

It will work only when the script is run from one specific directory, or a subset of the directories.

What I would like to do is something like:

path = getBasePathOfProject() + '/somedirectory/test.csv'
thefile = open(path)

Is it possible?

Waltner answered 13/8, 2009 at 9:22 Comment(0)
I
298

Try to use a filename relative to the current files path. Example for './my_file':

fn = os.path.join(os.path.dirname(__file__), 'my_file')

In Python 3.4+ you can also use pathlib:

fn = pathlib.Path(__file__).parent / 'my_file'
Inhibition answered 13/8, 2009 at 9:27 Comment(9)
I think this solution will only work if the resource is in the same directory of the python file, or in a sub directory of it. How do you solve it when you have the following tree structure: /Project_Root_dir /python_files_dir /Some more subdirs here py_file.py /resources /some subdirs here resource_file.csvWaltner
Sorry, the file tree got garbled on that last message... second try: you have your file at /Project_Root_dir/python_files_dir/some_subdirs/py_file.py and you have your resource file at /Project_Root_dir/resources/some_subdirs/resource_file.csvWaltner
You should be able to get to the parent directory using join(foo, '..'). So from /root/python_files/module/myfile, use os.path.join(os.path.dirname(__file__), '..', '..', 'resources')Inhibition
os.pardir is slightly better than '..', though the two are equivalent on both POSIX and Windows.Kanter
@Kanter is it equivalent, or is it better, in the end?Agneta
@cedbeu: It is equivalent on every system I ever came across and I think every system python runs on today (please correct me if i'm wrong here). However, if you expect python to be ported to a system using a different path separator in the future and want your code to be ready for it, os.pardir will be more portable. I'd make the case that every programmer, even one who never read any python knows the meaning of "..", while "os.pardir" is a level o f indirection one would have to look up in the documentation so personally I'd stick to "..".Inhibition
Will this work on a shared folder? Like if I have files and data on my PC in a shared folder, and then my coworker on the network runs the .py file, I need the code to reference the correct files.Unpolitic
Does this have any security risks if I'm using it to expose files to a user via Flask? (In other words, will the user be able to see my entire directory structure, and if so, is that a bad thing?)Endowment
With the pathlib solution I get AttributeError: 'PosixPath' object has no attribute 'split' Python 3.7.5Anguish
R
52

If you are using setup tools or distribute (a setup.py install) then the "right" way to access these packaged resources seem to be using package_resources.

In your case the example would be

import pkg_resources
my_data = pkg_resources.resource_string(__name__, "foo.dat")

Which of course reads the resource and the read binary data would be the value of my_data

If you just need the filename you could also use

resource_filename(package_or_requirement, resource_name)

Example:

resource_filename("MyPackage","foo.dat")

The advantage is that its guaranteed to work even if it is an archive distribution like an egg.

See http://packages.python.org/distribute/pkg_resources.html#resourcemanager-api

Relay answered 7/2, 2012 at 14:24 Comment(3)
I know this is an old answer, my preferred way is(/was maybe?) to use pkg_resources, but with the disappearance of zipped eggs, is there any harm in just using __file__ like the good old days?Workaday
This is a solid approach. Even if the egg convention is going away, setuptools isn't and many are still installing deps against git repos where the egg is built at runtimeScapolite
In Python 3.7+ you should prefer importlib.resources instead. The same stuff, but standard library and better performance.Ivaivah
G
22

In Python, paths are relative to the current working directory, which in most cases is the directory from which you run your program. The current working directory is very likely not as same as the directory of your module file, so using a path relative to your current module file is always a bad choice.

Using absolute path should be the best solution:

import os
package_dir = os.path.dirname(os.path.abspath(__file__))
thefile = os.path.join(package_dir,'test.cvs')
Galvanotropism answered 25/3, 2017 at 3:4 Comment(0)
K
15

I often use something similar to this:

import os
DATA_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), 'datadir'))

# if you have more paths to set, you might want to shorten this as
here = lambda x: os.path.abspath(os.path.join(os.path.dirname(__file__), x))
DATA_DIR = here('datadir') 

pathjoin = os.path.join
# ...
# later in script
for fn in os.listdir(DATA_DIR):
    f = open(pathjoin(DATA_DIR, fn))
    # ...

The variable

__file__

holds the file name of the script you write that code in, so you can make paths relative to script, but still written with absolute paths. It works quite well for several reasons:

  • path is absolute, but still relative
  • the project can still be deployed in a relative container

But you need to watch for platform compatibility - Windows' os.pathsep is different than UNIX.

Kinata answered 13/8, 2009 at 12:11 Comment(0)
G
5
import os
cwd = os.getcwd()
path = os.path.join(cwd, "my_file")
f = open(path)

You also try to normalize your cwd using os.path.abspath(os.getcwd()). More info here.

Gut answered 13/8, 2009 at 9:40 Comment(3)
very few use-cases where the cwd is the path of a module, thoughAgneta
it doesn't work inside a package, just from the same dir (or the working dir) set by the script.Eggleston
This won't work if user runs program using absolute path from different directory. e.g. python3 /usr/someone/test.pyTungstate
O
2

You can use the build in __file__ variable. It contains the path of the current file. I would implement getBaseOfProject in a module in the root of your project. There I would get the path part of __file__ and would return that. This method can then be used everywhere in your project.

Oneself answered 13/8, 2009 at 9:28 Comment(0)
B
1

I got stumped here a bit. Wanted to package some resource files into a wheel file and access them. Did the packaging using manifest file, but pip install was not installing it unless it was a sub directory. Hoping these sceen shots will help

├── cnn_client
│   ├── image_preprocessor.py
│   ├── __init__.py
│   ├── resources
│   │   ├── mscoco_complete_label_map.pbtxt
│   │   ├── retinanet_complete_label_map.pbtxt
│   │   └── retinanet_label_map.py
│   ├── tf_client.py

MANIFEST.in

recursive-include cnn_client/resources *

Created a weel using standard setup.py . pip installed the wheel file. After installation checked if resources are installed. They are

ls /usr/local/lib/python2.7/dist-packages/cnn_client/resources

mscoco_complete_label_map.pbtxt
retinanet_complete_label_map.pbtxt 
 retinanet_label_map.py  

In tfclient.py to access these files. from

templates_dir = os.path.join(os.path.dirname(__file__), 'resources')
 file_path = os.path.join(templates_dir, \
            'mscoco_complete_label_map.pbtxt')
        s = open(file_path, 'r').read()

And it works.

Balladeer answered 13/5, 2019 at 9:24 Comment(0)
I
1

Since you say you have some code that you deploy to various places, you should use the python ecosystem to distribute resources, which is not limited to files only. It also supports accessing files inside zip archives, which can be nice so that you don't have to bother with that.

Previously, this was handeled with pkg_resources from setuptools, but with more and more tools popping up, the ecosystem has shifted. Since python 3.7, you should use importlib.resources

import importlib.resources
with importlib.resources.open_text('mypackage.somedirectory','text.csv') as f:
    print(f.read()) # or whatever

But you must also instruct your installer to include package resources. Otherwise, a pip install mypackage would not bundle the data files.

There are many ways to do that, but one way to do it is to add

[options.package_data]
mypackage = 
    "somedirectory/*.csv"

into your setup.cfg. There are equivalent approaches for when using setup.py or pyproject.toml. A more complete account is available on setuptools homepage

Ivaivah answered 20/10, 2021 at 13:41 Comment(0)
E
0

If you want to later compile your script to .exe then __file__ won't give you the path of the .exe file. In this case you should

Use sys.argv[0]

sys.argv[0] gives you the path of the file when it's a .exe and when you run the script like python script.py

This is how I currently reference things

os.path.join(os.path.dirname(os.path.abspath(sys.argv[0])), 'Resources')

There's more detail on why that's a thing here

Economist answered 2/11, 2022 at 18:2 Comment(0)
P
-5

I spent a long time figuring out the answer to this, but I finally got it (and it's actually really simple):

import sys
import os
sys.path.append(os.getcwd() + '/your/subfolder/of/choice')

# now import whatever other modules you want, both the standard ones,
# as the ones supplied in your subfolders

This will append the relative path of your subfolder to the directories for python to look in It's pretty quick and dirty, but it works like a charm :)

Presber answered 2/2, 2011 at 12:43 Comment(2)
This will only work if you're running the Python program from the same directory as the .py file in question. And in that case, you could just do open('your/subfolder/of/choice') anyway.Moffitt
and the OP mentioned that the code needs to work on both Windows and Linux. This will not.Guthrun

© 2022 - 2024 — McMap. All rights reserved.