First, I don't think this is at all useful. It's very common for modules to be pure-Python wrappers around a C extension module—or, in some cases, pure-Python wrappers around a C extension module if it's available, or a pure Python implementation if not.
For some popular third-party examples: numpy
is pure Python, even though everything important is implemented in C; bintrees
is pure Python, even though its classes may all be implemented either in C or in Python depending on how you build it; etc.
And this is true in most of the stdlib from 3.2 on. For example, if you just import pickle
, the implementation classes will be built in C (what you used to get from cpickle
in 2.7) in CPython, while they'll be pure-Python versions in PyPy, but either way pickle
itself is pure Python.
But if you do want to do this, you actually need to distinguish three things:
- Built-in modules, like
sys
.
- C extension modules, like 2.x's
cpickle
.
- Pure Python modules, like 2.x's
pickle
.
And that's assuming you only care about CPython; if your code runs in, say, Jython, or IronPython, the implementation could be JVM or .NET rather than native code.
You can't distinguish perfectly based on __file__
, for a number of reasons:
- Built-in modules have no
__file__
at all. (This is documented in a few places—e.g., the Types and members table in the inspect
docs.) Note that if you're using something like py2app
or cx_freeze
, what counts as "built-in" may be different from a standalone installation.
- A pure-Python module may have a .pyc/.pyo file without having a .py file in a distributed app.
- A module in a a package installed as a single-file egg (which is common with
easy_install
, less so with pip
) will have either a blank or useless __file__
.
- If you build a binary distribution, there's a good chance your whole library will be packed in a zip file, causing the same problem as single-file eggs.
In 3.1+, the import process has been massively cleaned up, mostly rewritten in Python, and mostly exposed to the Python layer.
So, you can use the importlib
module to see the chain of loaders used to load a module, and ultimately you'll get to BuiltinImporter
(builtins), ExtensionFileLoader
(.so/.pyd/etc.), SourceFileLoader
(.py), or SourcelessFileLoader
(.pyc/.pyo).
You can also see the suffixes assigned to each of the four, on the current target platform, as constants in importlib.machinery
. So, you could check that the any(pathname.endswith(suffix) for suffix in importlib.machinery.EXTENSION_SUFFIXES))
, but that won't actually help in, e.g., the egg/zip case unless you've already traveled up the chain anyway.
The best heuristics anyone has come up with for this are the ones implemented in the inspect
module, so the best thing to do is to use that.
The best choice will be one or more of getsource
, getsourcefile
, and getfile
; which is best depends on which heuristics you want.
A built-in module will raise a TypeError
for any of them.
An extension module ought to return an empty string for getsourcefile
. This seems to work in all the 2.5-3.4 versions I have, but I don't have 2.4 around. For getsource
, at least in some versions, it returns the actual bytes of the .so file, even though it should be returning an empty string or raising an IOError
. (In 3.x, you will almost certainly get a UnicodeError
or SyntaxError
, but you probably don't want to rely on that…)
Pure Python modules may return an empty string for getsourcefile
if in an egg/zip/etc. They should always return a non-empty string for getsource
if source is available, even inside an egg/zip/etc., but if they're sourceless bytecode (.pyc/etc.) they will return an empty string or raise an IOError.
The best bet is to experiment with the version you care about on the platform(s) you care about in the distribution/setup(s) you care about.
dir
/check the docs to find out more. – Slowlynumpy
is a pure Python module, andpickle
is pure Python whether_Pickle
and friends come from the C accelerator or from pure Python. – Pensilerepr
of, say, 2.7'scPickle
, it has a pathname, not the stringbuilt-in
. And the only official heuristic for distinguishing built-in modules is that__file__
is missing, which again is not true forcPickle
. – Pensile__file__
attribute of the extension module ends with.so
when the C implementation is being used, but I don't know if that is always or usually the case. – Presumptuous