A packaging prelude:
Before you can even worry about reading resource files, the first step is to make sure that the data files are getting packaged into your distribution in the first place - it is easy to read them directly from the source tree, but the important part is making sure these resource files are accessible from code within an installed package.
Structure your project like this, putting data files into a subdirectory within the package:
.
├── package
│ ├── __init__.py
│ ├── templates
│ │ └── temp_file
│ ├── mymodule1.py
│ └── mymodule2.py
├── README.rst
├── MANIFEST.in
└── setup.py
You should pass include_package_data=True
in the setup()
call. The manifest file is only needed if you want to use setuptools/distutils and build source distributions. To make sure the templates/temp_file
gets packaged for this example project structure, add a line like this into the manifest file:
recursive-include package *
Historical cruft note: Using a manifest file is not needed for modern build backends such as flit, poetry, which will include the package data files by default. So, if you're using pyproject.toml
and you don't have a setup.py
file then you can ignore all the stuff about MANIFEST.in
.
Now, with packaging out of the way, onto the reading part...
Recommendation:
Use standard library pkgutil
APIs. It's going to look like this in library code:
# within package/mymodule1.py, for example
import pkgutil
data = pkgutil.get_data(__name__, "templates/temp_file")
It works in zips. It works on Python 2 and Python 3. It doesn't require third-party dependencies. I'm not really aware of any downsides (if you are, then please comment on the answer).
Bad ways to avoid:
Bad way #1: using relative paths from a source file
This was previously described in the accepted answer. At best, it looks something like this:
from pathlib import Path
resource_path = Path(__file__).parent / "templates"
data = resource_path.joinpath("temp_file").read_bytes()
What's wrong with that? The assumption that you have files and subdirectories available is not correct. This approach doesn't work if executing code which is packed in a zip or a wheel, and it may be entirely out of the user's control whether or not your package gets extracted to a filesystem at all.
Bad way #2: using pkg_resources APIs
This is described in the top-voted answer. It looks something like this:
from pkg_resources import resource_string
data = resource_string(__name__, "templates/temp_file")
What's wrong with that? It adds a runtime dependency on setuptools, which should preferably be an install time dependency only. Importing and using pkg_resources
can become really slow, as the code builds up a working set of all installed packages, even though you were only interested in your own package resources. That's not a big deal at install time (since installation is once-off), but it's ugly at runtime.
Bad way #3: using legacy importlib.resources APIs
This is currently was previously the recommendation of the top-voted answer. It's in the standard library since Python 3.7. It looks like this:
from importlib.resources import read_binary
data = read_binary("package.templates", "temp_file")
What's wrong with that? Well, unfortunately, the implementation left some things to be desired and it is likely to be was deprecated in Python 3.11. Using importlib.resources.read_binary
, importlib.resources.read_text
and friends will require you to add an empty file templates/__init__.py
so that data files reside within a sub-package rather than in a subdirectory. It will also expose the package/templates
subdirectory as an importable package.templates
sub-package in its own right. This won't work with many existing packages which are already published using resource subdirectories instead of resource sub-packages, and it's inconvenient to add the __init__.py
files everywhere muddying the boundary between data and code.
This approach was deprecated in upstream importlib_resources
in 2021, and was deprecated in stdlib from version Python 3.11. bpo-45514 tracked the deprecation and migrating from legacy offers _legacy.py
wrappers to aid with transition.
Honorable mention: using the traversable importlib resources API
This had not been mentioned in the top-voted answer when I posted about it (2020), but the author has subsequently edited it into their answer (2023). importlib_resources
is more than a simple backport of the Python 3.7+ importlib.resources
code. It has traversable APIs for accessing resources with usage similar to pathlib
:
import importlib_resources
my_resources = importlib_resources.files("package")
data = my_resources.joinpath("templates", "temp_file").read_bytes()
This works on Python 2 and 3, it works in zips, and it doesn't require spurious __init__.py
files to be added in resource subdirectories. The only downside vs pkgutil
that I can see is that the traversable APIs are only available in the stdlib importlib.resources
from Python-3.9+, so there is still a third-party dependency needed to support older Python versions. If you only need to run on Python-3.9+ then use this approach, or you can add a compatibility layer and a conditional dependency on the backport for older Python versions:
# in your library code:
try:
from importlib.resources import files
except ImportError:
from importlib_resources import files
# in your setup.py or similar:
from setuptools import setup
setup(
...
install_requires=[
'importlib_resources; python_version < "3.9"',
]
)
Until Python 3.8 is end-of-life, my recommendation remains with stdlib pkgutil
, to avoid the extra complexity of a conditional dependency.
Example project:
I've created an example project on GitHub and uploaded on PyPI, which demonstrates all five approaches discussed above. Try it out with:
$ pip install resources-example
$ resources-example
See https://github.com/wimglenn/resources-example for more info.