Avoiding module namespace pollution in Python

C

2

14

TL;DR: What's the cleanest way to keep implementation details out of a module's namespace?

There are a number of similar questions on this topic already, but none seems to have a satisfactory answer relative to modern tools and language features.

I'm designing a Python package, and I'd like to keep each module's public interface clean, exposing only what's intended, keeping implementation details (especially imports) hidden.

Over the years, I've seen a number of techniques:

Don't worry about it. Just document how to use your package and let consumers of it just ignore the implementation details.

This is just horrible, in my opinion. A well-designed interface should be easily discoverable. Having the implementation details publicly visible makes the interface much more confusing. Even as the author of a package, I don't want to use it when it exposes too much, as it makes autocompletion less useful.

Add an underscore to the beginning of all implementation details.

This is a well-understood convention, and most development tools are smart enough to at least sort underscore-prefixed names to the bottom of autocomplete lists. It works fine if you have a small number of names to treat this way, but as the number of names grows, it becomes more and more tedious and ugly.

Take for example this relatively simple list of imports:

import struct

from abc    import abstractmethod, ABC
from enum   import Enum
from typing import BinaryIO, Dict, Iterator, List, Optional, Type, Union

Applying the underscore technique, this relatively small list of imports becomes this monstrosity:

import struct as _struct

from abc    import abstractmethod as _abstractmethod, ABC as _ABC
from enum   import Enum as _Enum
from typing import (
    BinaryIO as _BinaryIO,
    Dict     as _Dict,
    Iterator as _Iterator,
    List     as _List,
    Optional as _Optional,
    Type     as _Type,
    Union    as _Union
)

Now, I know this problem can be partially mitigated by never doing from imports, and just importing the entire package, and package-qualifying everything. While that does help this situation, and I realize that some people prefer to do this anyway, it doesn't eliminate the problem, and it's not my preference. There are some packages I prefer to import directly, but I usually prefer to import type names and decorators explicitly so that I can use them unqualified.

There's an additional small problem with the underscore prefix. Take the following publicly exposed class:

class Widget(_ABC):
    @_abstractmethod
    def implement_me(self, input: _List[int]) -> _Dict[str, object]:
        ...

A consumer of this package implementing his own Widget implementation will see that he needs to implement the implement_me method, and it needs to take a _List and return a _Dict. Those aren't actual type names, and now the implementation-hiding mechanism has leaked into my public interface. It's not a big problem, but it does contribute to the ugliness of this solution.

Hide the implementation details inside a function.

This one's definitely hacky, and it doesn't play well with most development tools.

Here's an example:

def module():
    import struct

    from abc    import abstractmethod, ABC
    from typing import BinaryIO, Dict, List

    def fill_list(r: BinaryIO, count: int, lst: List[int]) -> None:
        while count > 16:
            lst.extend(struct.unpack("<16i", r.read(16 * 4)))
            count -= 16
        while count > 4:
            lst.extend(struct.unpack("<4i", r.read(4 * 4)))
            count -= 4
        for _ in range(count):
            lst.append(struct.unpack("<i", r.read(4))[0])

    def parse_ints(r: BinaryIO) -> List[int]:
        count = struct.unpack("<i", r.read(4))[0]
        rtn: List[int] = []
        fill_list(r, count, rtn)
        return rtn

    class Widget(ABC):
        @abstractmethod
        def implement_me(self, input: List[int]) -> Dict[str, object]:
            ...

    return (parse_ints, Widget)

parse_ints, Widget = module()
del module

This works, but it's super hacky, and I don't expect it to operate cleanly in all development environments. ptpython, for example, fails to provide method signature information for the parse_ints function. Also, the type of Widget becomes my_package.module.<locals>.Widget instead of my_package.Widget, which is weird and confusing to consumers.

Use `all`.

This is a commonly given solution to this problem: list the "public" members in the global __all__ variable:

import struct

from abc    import abstractmethod, ABC
from typing import BinaryIO, Dict, List

__all__ = ["parse_ints", "Widget"]

def fill_list(r: BinaryIO, count: int, lst: List[int]) -> None:
    ...  # You've seen this.

def parse_ints(r: BinaryIO) -> List[int]:
    ...  # This, too.

class Widget(ABC):
    ...  # And this.

This looks nice and clean, but unfortunately, the only thing __all__ affects is what happens when you use wildcard imports from my_package import *, which most people don't do, anyway.

Convert the module to a subpackage, and expose the public interface in `init.py`.

This is what I'm currently doing, and it's pretty clean for most cases, but it can get ugly if I'm exposing multiple modules instead of flattening everything:

my_package/
+--__init__.py
+--_widget.py
+--shapes/
   +--__init__.py
   +--circle/
   |  +--__init__.py
   |  +--_circle.py
   +--square/
   |  +--__init__.py
   |  +--_square.py
   +--triangle/
      +--__init__.py
      +--_triangle.py

Then my __init__.py files look kind of like this:

# my_package.__init__.py

from my_package._widget.py import parse_ints, Widget

# my_package.shapes.circle.__init__.py

from my_package.shapes.circle._circle.py import Circle, Sphere

# my_package.shapes.square.__init__.py

from my_package.shapes.square._square.py import Square, Cube

# my_package.shapes.triangle.__init__.py

from my_package.shapes.triangle._triangle.py import Triangle, Pyramid

This makes my interface clean, and works well with development tools, but it makes my directory structure pretty messy if my package isn't completely flat.

Is there a better technique?

Cayman answered 30/8, 2019 at 14:43 Comment(4)

I would stick to the packaging and reorganise your directory structure instead. If I am not mistaken, you could get rid of most of the __init__.py files within the sub-directories and instead ensure you add the root folder path to the module-lookup path (I think it was sys.path). Then you could store the python modules within the "shapes" directory and make it a sub-package (and then simply call from shapes import * or whichever import version you prefer) in the my_package package. – Nanon 30/8, 2019 at 23:9

People usually recommend to look at some of the existing Python packages like requests - perhaps you will find something valuable there. – Nanon 30/8, 2019 at 23:9

@KacperFloriański: I think you missed the point of what I was trying to do with the shapes directory. I'm trying not to flatten the namespace there. The package will expose my_package.shapes.circle.Circle, my_package.shapes.circle.Sphere, my_package.shapes.square.Square, etc. Generally, I do prefer to flatten my namespaces, but there are some cases where it makes sense to expose a hierarchy, and I wanted to make sure to cover that case, especially because that's where my technique gets particularly messy. – Cayman 31/8, 2019 at 4:48

I don't think there is a better way to do it in Python than your last approach if you don't want to flatten your structure no matter what. But if I was a developer using your tool I could potentially get upset that I need to type 4 levels of imports - at least in your example it doesn't make much sense to use the circle, square etc. directories. To my knowledge, the Pythonic way would be to expose the modules directly within the shapes package. But I hope someone more versed in professional packaging will save you! – Nanon 31/8, 2019 at 10:42

Y

1

Convert to subpackages to limit the number of classes in a place and to separate concerns. If a class or constant is not needed outside of its module, prefix it with a double underscore. Import the module name if you do not want to explicitly import many classes from it. You have laid out all the solutions.

Ytterbia answered 20/10, 2021 at 15:46 Comment(0)

I

0

Not sure if this breaks anything, but one can do

"""Module Docstring"""

__all__ = [
    # Classes
    "Foo",
    # Functions
    "bar",
]
__ALL__ = dir() + __all__  # catch default module attributes.

# Imports go here

def __dir__() -> list[str]:
    return __ALL__

Explanation: dir(obj) tries to call obj.__dir__(). Modules are objects as well, and we can add a custom __dir__ method. Using this setup, you should get

dir(module) = ['__all__', '__builtins__', '__cached__',
               '__doc__', '__file__', '__name__', 
               '__package__', '__spec__', 'Foo', 'bar']

Reference: PEP 562

Inbred answered 3/9, 2022 at 21:46 Comment(0)

Don't worry about it. Just document how to use your package and let consumers of it just ignore the implementation details.

Add an underscore to the beginning of all implementation details.

Hide the implementation details inside a function.

Use `all`.

Convert the module to a subpackage, and expose the public interface in `init.py`.

Recommended topics

Hot tags

Don't worry about it. Just document how to use your package and let consumers of it just ignore the implementation details.

Add an underscore to the beginning of all implementation details.

Hide the implementation details inside a function.

Use __all__.

Convert the module to a subpackage, and expose the public interface in __init__.py.

Recommended topics

Hot tags

Use `all`.

Convert the module to a subpackage, and expose the public interface in `init.py`.