Filipe Laíns - 2026-W10

This Week

Unfortunately, this week I wasn't able to pick the pypackaging-native/pkgconf-pypi work back up, due to my ADHD shifting my focus to the CPython work. I am happy with what I was able to accomplish there — I think the UX improvements are significant — but since none of that was funded, I have no billable hours for this entire week, which is not sustainable.

I'll link my Github sponsors page here, in case anyone wants to make a contribution for this work.

CPython

This week I worked on a couple improvements to the interpreter initialization.

The initial motivation was to tackle the obscure/unclear "unraisable" errors that can occur while setting up the interpreter. A common example is the No module named 'encodings' error resulting from the standard library being missing from the module search path.

$ PYTHONHOME=nonsense ./python -c 'print("foo!")'
Fatal Python error: Failed to import encodings module
Python runtime state: core initialized
Exception ignored in the internal traceback machinery:
ModuleNotFoundError: No module named 'traceback'
ModuleNotFoundError: No module named 'encodings'

Stack (most recent call first):

It occurs because during the initialization process, the interpreter tries to import the encodings module, required by builtins.open.

The two main changes I made here were adding a warning when we fail to find the standard library, and freezing the encodings module, resulting in the following:

$ PYTHONHOME=nonsense ./python -c 'print("foo!")'
WARN: Could not find the standard library directory! The Python 'home' directory was set to 'nonsense', is this correct?
foo!

Additionally, with the addition of this warning, I also added the -X pathconfig_warnings CLI option, and the PYTHON_PATHCONFIG_WARNINGS environment variable, which allow disabling warnings issued by the module search path initialization.

You can read a more in-depth write-up on the technical details in the dedicated section below.

Finally, also related to the UX during the Python interpreter initialization, I'd like to highlight Steve Dower's PR to remove the hard dependency on the __pyrepl module (python/cpython#145159) from several places. This makes it so that the interpreter automatically falls back to the basic REPL when it can't find the standard library, without needing to set the PYTHON_BASIC_REPL environment variable, which most users don't know about.

$ PYTHONHOME=nonsense ./python
WARN: Could not find the standard library directory! The Python 'home' directory was set to 'nonsense', is this correct?
Python 3.15.0a6+ free-threading build (heads/optional-pyrepl:3b745a340dd, Mar  9 2026, 14:50:41) [GCC 15.2.1 20260209] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/local/lib/python315t.zip', '/home/anubis/git/cpython/nonsense/lib/python3.15t', '/home/anubisgit/cpython/build/lib.linux-x86_64-3.15', '/home/anubis/.local/lib/python3.15t/site-packages']
>>>

After implementing this, and seeing Steve's PR, I realized this pretty much solves the issue around ._pth files being incredibly difficult to debug, so much so that there has been extensive discussion around deprecating them (python/cpython#78125).

With these improvements, if you happen to have a misconfigured ._pth file, instead of the interpreter failing miserably to launch with an obscure error, you can now drop into a REPL and inspect sys.path to understand what went wrong.

Running local builds

When running Python from the build directory, the -X frozen_modules/PYTHON_FROZEN_MODULES option is disabled. To achieve the behavior described here, that option needs to be enabled.

The Python module search path initialization

At a certain point during the interpreter initialization, a private "module" called getpath is run to compute the path configuration, finalizing PyConfig. More importantly, it sets module_search_paths, which becomes sys.path.

While we call it a "module", that's not really accurate. It is a piece of Python code that runs with a custom globals/locals dictionary, which is used to pass the input configuration, as well as retrieve the computed output parameters.

As documented in the link above, it computes several PyConfig fields, with module_search_paths being the one of most interest to us.

Most folks know of sys.path as the field by the import machinery to find user modules, but more importantly, it also provides the location where several key parts of the standard library can be found.

I said several, not all, because sys.path specifies file-system paths where Python modules can be found, and not all of the standard library implementation exists on the file-system — at least not as "regular" modules.

The standard library is composed of three parts:

Built-in modules: These are native modules, which are directly embedded (statically linked) in the interpreter. As such, they do not exist on their on in the file-system.
Extension modules: These are native modules, provided as a loadable file (shared/dynamic library). They exist in the "platform-specific standard library directory", usually referred to as platstdlib_dir.
Pure-Python modules: These are modules written in Python (*.py), being platform-independent. They usually exist in the "platform-independent standard library directory", usually referred to as stdlib_dir, but in certain scenarios, they can also exists as ZIP file.

Only the latter two exist on the file-system, so only they are in the purview of sys.path. Built-in modules, on the other hand, require a custom non path-based importer — BuiltinImporter — which exists on sys.meta_path.

Without going into much detail, sys.meta_path exists upstream of sys.path, and defines the base machinery for the import system. It contains a list of importer objects, which know how to find and load modules. During the initialization of the import system, it gets populated with BuiltinImporter, FrozenImporter, and PathFinder. We already discussed BuiltinImporter, leaving only FrozenImporter and PathImporter.

PathImporter is fairly straightforward, it regular path-based module search, which most users are familiar with, meaning it is responsible for searching sys.path. However, is also provides a mechanism where users can register hooks to sys.path_hooks that can produce a finder given a path. This is how support for importing pure-Python modules from ZIP files is implemented (see zipimport).

Per its name, FrozenImporter is responsible for supporting "frozen modules", which is something we have mentioned, but haven't explained yet. In summary, frozen modules are pure-Python modules which have been compiled to bytecode, and embedded in the interpreter (as data blobs). This means that these modules will be available to import, even if we can't locate the standard library — hence why freezing the encodings module allows the interpreter to initialize. They are also key to be able to bootstrap the import machinery, which is mostly implemented in pure-Python as importlib. They can be disabled via the -X frozen_modules/PYTHON_FROZEN_MODULES options.

Coming back to getpath, it needs to locate both the platform-specific and independent parts of the standard library. The search logic is incredibly complicated, but in a normal scenario, it roughly goes as follows:

Find the installation prefixes (sys.prefix and sys.exec_prefix)
- Manually set via PYTHONHOME or PyConfig.home
- Set via the home key of pyvenv.cfg, which presence indicates we are in a virtual environment
- Derived from the location of libpython, if dynamically linked
- Derived from the location of the interpreter
- Falls back to the hardcoded installation prefix calculated at build time
Search for the standard library ZIP file in sys.prefix
Search for the platform-independent standard library directory (stdlib_dir) in sys.prefix
Search for the platform-dependent standard library directory (platstdlib_dir) in sys.exec_prefix

Following this, with the new changes, we show a warning referring to the "standard library" — omitting the "platform-independent" part for simplicity — if we were unable to locate either the ZIP file, or stdlib_dir. If one of these were found, we then check platstdlib_dir, and show a warning referring to the "platform-dependent standard library".

When issuing these warnings, we also try to hint to the possible cause — either via a bad home path (eg. setting PYTHONHOME), or via a miscalculated sys.prefix or sys.exec_prefix.

There is a limitation, however, which is when the user overwrites the path calculation functionality — by setting module_search_paths and module_search_paths_set=1. This is currently the case on sub-interpreters.

There's also a possibility that the embedding users might trigger this warning, even though they shouldn't be able as long as they are the initialization APIs correctly. Similarly, in some cases, users might set a bad home (eg. setting PYTHONHOME incorrectly), or might run into some edge case where getpath fails to calculate the prefixes correctly, and instead rely on using PYTHONPATH to put the standard library directories in sys.path.

In such cases, -X pathconfig_warnings=0 or PYTHON_FROZEN_MODULES=0 might be set to suppress the warning.

Reading `/proc/self/exe` for a more accurate `sys.executable`

In addition to the above-mentioned work on getpath, I looked into the possibility of reading the /proc/self/exe symlink on Linux systems to be able to determine the interpreter executable more accurately.

Currently, on POSIX platforms, we either derive sys.executable from PyConfig.program_name, or argv. The latter can be problematic, as exemplified in python/cpython#124241.

I opened python/cpython#145486 to attempt to solve this issue by reading /proc/self/exe. It sets the real_executable input variable of getpath to the real interpreter path, but contrary to its name, it seems we actually rely on being able to set real_executable to a fake path. In fact, macOS, which was already setting real_executable based on _NSGetExecutablePath, has a specific workaround to replace its value with the value of executable.

...

if not executable and SEP in program_name:
    # Resolve partial path program_name against current directory
    executable = abspath(program_name)

if not executable:
    # All platforms default to real_executable if known at this
    # stage. POSIX does not set this value.
    executable = real_executable
elif os_name == 'darwin':
    # QUIRK: On macOS we may know the real executable path, but
    # if our caller has lied to us about it (e.g. most of
    # test_embed), we need to use their path in order to detect
    # whether we are in a build tree. This is true even if the
    # executable path was provided in the config.
    real_executable = executable

...

The getpath code is extremely complicated and fragile, so solving this is not trivial. Even after working on it for a couple years, it's still difficult to understand all quirks and custom behaviors that users downstream are relying on. Aggravating this, there is a lot of behavior the tests rely on that is not representative of real-world use — the tests rely on it to make their implementation easier.

There's a lot of technical debt, and we are stuck with a lot of decisions resulting of lack of foresight. I am not blaming anyone here, this is just extremely old code in a very big project, so this is completely normal.

I spent a fairly large amount of time this week looking at this code, and debugging under a couple different scenarios, in order to better understand what it was doing, and which part of it was actually required. This was motivated by the issue we are discussing, but it wasn't restricted only to the apparently affect code areas.

It would also make sense to point out that there are very different scenarios that this code supports. The main one is the regular interpreter initialization — meaning the process the user spawned was the actual interpreter (eg. /usr/bin/python). Within the context of this discussion, this scenario isn't all that complex. When it start becoming complex is the other scenario, embedding.

The issues caused by my change seem to be mainly to do with the embedding applications, which set PyConfig.program_name, and expect real_executable to be derived from it. This is because real_executable is used to determine the installation prefixes, and then locate the standard library directories.

I still need to look a bit more into the actual needs of embedding users, so that I can better understand if there should be a different field, in addition to program_name, to specify the actual Python interpreter binary.

The more apparent solution would be to have embedding users set PyConfig.home, which should work if they're either working within an existing Python installation, or if they are providing the installation themselves. However, setting home bypasses the virtual environment detection. This is something we could change — we could try looking for pyvenv.cfg in the specified home — however it is not clear to me that this is the best solution.

I think the more short-term fix would be to set real_executable to program_name if it was provided by the user. This should result in the current behavior of searching for the installation prefixes in the path the emebedding users provided. When running the Python interpreter itself, program_name wouldn't be set, so we would use the real real_executable value.

That said, that needs proper testing, and since I am not getting paid for this work, I think I'll probably have to postpone it — that is assuming my ADHD doesn't pull me back into it 😟.

Next Week

Well, this will be basically the same was last week, minus the CPython work.

Finish implementing the changes for the next pypackaging-native/pkgconf-pypi

While waiting for a review on the pkgconf-pypi work, I'd like to start on the following:

Document the technical aspects of FFY00/dynamic-library
Add text to PEP 739 specifying what happens on conflicts

Some other lower priority work I'd like to pick back up the next couple weeks:

Finish scipy/scipy#24064

Development log for

Date: 02/03/2026 18:00

This Week

CPython

The Python module search path initialization

Reading `/proc/self/exe` for a more accurate `sys.executable`

Next Week

Development log for

Date: 02/03/2026 18:00

This Week

CPython

The Python module search path initialization

Reading /proc/self/exe for a more accurate sys.executable

Next Week

Reading `/proc/self/exe` for a more accurate `sys.executable`