The Semantics of Python Import, Part 3: Loaders

In the last post we discussed Finders. The whole point of a Finder is to find a resource stored somewhere (usually a file on a filesystem, but it could be anything-- a row in a database, a webpage, a range in a zip file) and supply the appropriate loader for it.

More accurately, there is a "FinderFinder" mechanism by which sys.meta_path and sys.path are searched to find the best Finder to run against a resource, and then the Finder is invoked to find the loader to load the resource. This lets Python differentiates between the archive (resource type-- folder, database, zipfile, etc), the resource itself (file, row/column, zipfile index), and the type of that resource: source code (.py), compiled Python bytecode (.pyc or .pyo), or a compiled binary (.so or .dll) file that conforms to the Python ABI.

The point of the Loader is to take what the Finder has found and convert that resource into a stream of characters, which it then turns into Python executable code. Compared to the Finder, the Loader is pretty simple.

Typically, the Loader does whatever work is necessary to read in and convert (for example, to uncompress) the resource, compile it, attach the resulting compiled code as the executable to a new Module object, decorate the object with metadata, and then attach that new module object to the calling context, as well as caching a copy in sys.modules.

That's more or less it.

Python 3.4 introduces the idea of a ModuleSpec, which describes the relationship between a module and its loader, in much the same way that the ModuleType describes a relationship between a module and the modules that import it.

Unfortunately for my needs, ModuleSpec doesn't address several critical issues that we care about for the Heterogeneous Python project. It doesn't really address the disconnect between Finders, Loaders, and the navigation of archives; Finders and Loaders are still very much related to each other with respect to the way a resource is identified and incorporated into the Python running instance.

Typical import tutorials focus on one of two different issues: loading Python source out of alternative resource types (like databases or websites), or loading alternative source code that cannot ever be confused with or treated as Python source. An example of the latter would be to have a path hook early in sys.path_hooks that says, "That path there belongs to me, and it contains CSV files, and when you import from it, the end result is an array of processed CSV rows." By putting it before all other path hooks, that prevents Python from Finding inside that path and rejecting its contents for not having any .py files.

Our goals are different: A directory in sys.paths should be able to have a mixed code: CSV files, Hy (lisp) files, regular Python files, and byte-compiled Python files, and the loader/finder pair should be able to understand and interpret all of them correctly.

To do that, the loader has to be able to find the right compiler at load time. But there's a problem: Python 2 hard-codes what suffixes (filename extensions) it recognizes and compiles as Python in the imp builtin module; in Python 3 these suffixes are constants defined in a private section of importlib; in either case, they are unavailable for modification. This lack of access to the extensions list prevents the discovery of heterogenous source code packages.

We have to get in front of Python's native handlers, supply our own Finder that recognizes all our code-like suffixes, provides a source code loader that provides our compilers for our own suffixes and falls back on Python's native loader behavior when we encounter native suffixes.

I can now announce that Polyloader accomplishes this.  After you import polyloader, you call polyloader.install(compiler, [extensions]) for files that compiler can handle, and it... works.

It works well with Hy. And it works performantly and without breakage on a modern Django application, allowing you to write Django models, views, urls, management commands, even manage.hy and settings.hy, in Hy.

There are three more posts in this series: Python Package Iterators, the resource-vs-compiler problem, and a really crazy idea that may break Python-- or may finally get around all the other code that hard-codes ".py" problematically (I'm looking at you, django.core.migrations.loader, and you, modulefinder).