17Jul

The Semantics of Python Import, part 4: Iterators

Posted by Elf Sternberg as Uncategorized

Module Iterators, as defined in pkgutil.py, aren’t really part of the mess that has been imposed on us by PEP-302 and its follow-on attempts to rationalize the loading process, but they’re used by so many different libraries that when we talk about creating a new general class of importers, we have to talk about iterators.

Iterators, after all, are why I started down this project in the first place. It was Django’s inability to find heterogeneously defined modules that I set out to fix.

Iterators are define in the pgkutil module; their entire purpose is, given some kind of reference to an archive, to be able to list the contents of that archive, and to recursively descend into that archive if it happens to be a tree-like structure.

When you call pkgutil.iter_modules(path, prefix), you get back a list of all the modules within that path or, if no path is supplied, all the paths in sys.path. As I pointed out in my last post, the paths is sys.path aren’t necessarily paths on the filesystem or, if they are, they’re not necessarily directory paths. All that matters is that for each path, a path_hook exists that can return a Finder, and that Finder has a method for listing the contents of the path found.

In Python 2, pkgutil depends upon Finders (those things we said were attached to meta_path and path_hooks) to have a special function called iter_modules; if it does, that function is used to list the contents of the “path”.

In Python 3, the functools.singledispatch tools is used to differentiate between different Finders; once a Finder has been identified by path_hooks, the singledispath us used to find a corresponding resource iterator for that Finder. It doesn’t necessarily have to be a method on the Finder, although the default has a classmethod that is its finder.

An iterator is pretty straightforward; once you know the “path” (resource identifier) and the Finder for that path, you can call a function that checks for the presence of modules. In the case of FileFinder, that function is a combination of listdir, isfile, and isdir/isfile to check fordir/__init__ pairs indicating a submodule.

For our purposes, of course, we had to provide a path_hook that eclipses the existing path_hook, and we had a provide a Finder that was more precisely ours than the inherited base FileFinder, so that single dispatch would find ours before it found FileFinder‘s and still work correctly.


There is one other module I have to worry about: modulefinder. It’s not used often, it’s not used by Django or any of the other major tools that I usually use, and it’s never been covered by Python Module of the Week. That doesn’t mean that it’s hard-coding of the ‘.py’ suffix isn’t problematic. I’m just not sure what to do about it at this point.

Comment Form

Subscribe to Feed

Categories

Calendar

July 2016
M T W T F S S
« Jun   Aug »
 123
45678910
11121314151617
18192021222324
25262728293031