perf: avoid repeated scan of entire venv via packages_distributions() at import time#16579
perf: avoid repeated scan of entire venv via packages_distributions() at import time#16579bonauer-pf wants to merge 1 commit intogoogleapis:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request optimizes the _get_pypi_package_name function by replacing the slow packages_distributions() scan with targeted metadata.distribution() lookups. It also introduces lazy resolution for package labels to improve performance on supported Python versions. However, the top-level import of importlib.metadata breaks compatibility with Python 3.7. Feedback was provided to handle this as an optional import and to ensure _get_pypi_package_name handles cases where the metadata module is unavailable.
packages/google-api-core/google/api_core/_python_version_support.py
Outdated
Show resolved
Hide resolved
packages/google-api-core/google/api_core/_python_version_support.py
Outdated
Show resolved
Hide resolved
packages_distributions() scans every installed package in the environment to build a complete module-to-distribution mapping. In large venvs (500+ packages, common with many google-cloud-* libs), this causes multi-second import delays for google.api_core and every library that depends on it. Two changes: - Wrap packages_distributions() with functools.cache so the expensive O(n) scan happens at most once per process. - Defer the package label resolution in check_python_version() so it only runs when a warning is actually emitted, not on the common happy path of a supported Python version.
e63555e to
6425506
Compare
|
Thanks for your PR! It looks very sensible. We'll approve once it passes the presubmits. |
packages_distributions() scans every installed package in the environment to build a complete module-to-distribution mapping. In large venvs (500+ packages, common with many google-cloud-* libs), this causes multi-second import delays for google.api_core and every library that depends on it.
This PR contains 2 changes:
packages_distributions()with functools.cache so the expensive O(n) scan happens at most once per process.check_python_version()so it only runs when a warning is actually emitted, not on the common happy path of a supported Python version.Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #15015 and #16552.