# Python extraction Python extraction happens in two phases: 1. [Setup](#1-Setup-Phase) - determine which version to analyze the project as - creating virtual environment (only LGTM.com) - determine python import path - invoking the actual python extractor 2. [The actual Python extractor](#2-The-actual-Python-extractor) - walks files and folders, and performs extraction The rule for `pack_zip('python-extractor')` in `build` defines what files are included in a distribution and in the CodeQL CLI. After building the CodeQL CLI locally, the files are in `target/intree/codeql/python/tools`. ## Local development This project uses - [poetry](https://python-poetry.org/) as the package manager - [tox](https://tox.wiki/en) together with [pytest](https://docs.pytest.org/en/) to run tests across multiple versions You can install both tools with [`pipx`](https://pypa.github.io/pipx/), like so ```sh pipx install poetry pipx inject poetry virtualenv-pyenv # to allow poetry to find python versions from pyenv pipx install tox pipx inject tox virtualenv-pyenv # to allow tox to find python versions from pyenv ``` Once you've installed poetry, you can do this: ```sh # install required packages $ poetry install # to run tests against python version used by poetry $ poetry run pytest # or $ poetry shell # activate poetry environment $ pytest # so now pytest is available # to run tests against all support python versions $ tox # to run against specific version (Python 3.9) $ tox -e py39 ``` To install multiple python versions locally, we recommend you use [`pyenv`](https://github.com/pyenv/pyenv) _(don't try to use `tox run-parallel`, our tests are not set up for this to work 😅)_ ### Zip files Currently we distribute our code in an obfuscated way, by including the code in the subfolders in a zip file that is imported at run-time (by the python files in the top level of this directory). The one exception is the `data` directory (used for stubs) which is included directly in the `tools` folder. The zip creation is managed by [`make_zips.py`](./make_zips.py), and currently we make one zipfile for Python 2 (which is byte compiled), and one for Python 3 (which has source files, but they are stripped of comments and docstrings). ### A note about Python versions We expect to be able to run our tools (setup phase) with either Python 2 or Python 3, and after determining which version to analyze the code as, we run the extractor with that version. So we must support: - Setup tools run using Python 2: - Extracting code using Python 2 - Extracting code using Python 3 - Setup tools run using Python 3: - Extracting code using Python 2 - Extracting code using Python 3 # 1. Setup phase **For extraction with the CodeQL CLI locally** (`codeql database create --language python`) - Runs [`language-packs/python/tools/autobuild.sh`](/language-packs/python/tools/autobuild.sh) and this script runs [`index.py`](./index.py) ### Overview of control flow for [`setup.py`](./setup.py) The representation of the code in the figure below has in some cases been altered slightly, but is accurate as of 2020-03-20.
![python extraction overiew](./docs/extractor-python-setup.svg)
### Overview of control flow for [`index.py`](./index.py) The representation of the code in the figure below has in some cases been altered slightly, but is accurate as of 2020-03-20.
![python extraction overiew](./docs/extractor-python-index.svg)
# 2. The actual Python extractor ## Overview The entrypoint of the actual Python extractor is [`python_tracer.py`](./python_tracer.py). The usual way to invoke the extractor is to pass a directory of Python files to the launcher. The extractor extracts code from those files a