Elf Sternberg: The Semantics of Python Import: An aside about special cases

It's time to come around to a point that's been bugging me for a long time: why is the Python import routine so, well, so darned convoluted? The answer is "history," basically the history of Python and the attempt to turn import foo.bar.baz into a tool that's incredibly easy to use and understand for the common programmer, yet flexible enough to give the advanced programmer the power to redefine it into whatever else it has to mean.

We've talked about how the system has two different loading systems: the sys.meta_path and the sys.path_hooks, and how the latter is just as arbitrary as the former: the last path_hook is for the filesystem, so it runs os.isdir() on every item in sys.path and only offers to handle the ones that returns true, and it only runs after everything else has been run, so:

  * If a meta_path interpreted an import fullname with respect to a path that's a directory, the default won't get it,
  * If a path_hook said it could handle it, the default won't get it,

... and so on. The whole point of using first-one-wins priority pathing is to leave the responsibility for not failing up to the developer. The default really is the fallback position, and it uses only a subset of sys.path. The formal type of a sys.path entry is... no type at all. It could be a string, a filesystem directory iterator, an object that interacts with a path_hook. It could be anything at all. The only consideration is that, if it can't be coerced into a string that os.isdir() can reject, you had better handle it before it falls through to the default.

It's really time to call it like it is: sys.path and sys.path_hooks are a special case for loading. They're the original special case, but that's what they are. They lead to weird results like one finder saying it can handle foo.bar.baz and another foo.bar.quux, turning the leading elements of the fullname into arbitrary and meaningless tokens.

I wish I could call for a more rational import system, one in which we talked only about resource managers which had the ability to access resource archives, iterate through the contents, identify corresponding resources, load the contents of that resource, and compilers that could identify the text that had just been accessed (via whatever metadata was available) and turn it into a Python module.

But we can't. Python is too well-established to put up with such rationalizing shenanigans, and too many people are dependent upon the existing behavior to help make it so. Python was born when NFS was the thing, when there were no real open-source databases, no object stores. Python was released two years before the Mosaic web browser! It would be far too disruptive. So we'll keep getting PEPs forever trying to rationalize the irrational.

That's okay. It gives me something to get paid for.

But, it does point out one major flaw: because Finders and Loaders are so intimately linked, even if we manage to rationalize FileFinder and SourceFileLoader, that's only with respect to the Filesystem. We'll have to make equivalent loader/finders for any other sort of accessor, be it Zipfiles or any of the other wacky resource pools that people have come up with.

Unfortunately, I don't have a good plan for those. Fortunately, filesystems are still the most common way of storing and loading libraries, so concentrating on those gets us 99% of the way there.

Elf M. Sternberg

Full Stack Web Developer

Where one teaches, two learn.

Blog

THE SEMANTICS OF PYTHON IMPORT: AN ASIDE ABOUT SPECIAL CASES