# Python extraction
Python extraction happens in two phases:
1. [Setup](#1-Setup-Phase)
- determine which version to analyze the project as
- creating virtual environment (only LGTM.com)
- determine python import path
- invoking the actual python extractor
2. [The actual Python extractor](#2-The-actual-Python-extractor)
- walks files and folders, and performs extraction
The rule for `pack_zip('python-extractor')` in `build` defines what files are included in a distribution and in the CodeQL CLI. After building the CodeQL CLI locally, the files are in `target/intree/codeql/python/tools`.
## Local development
This project uses
- [poetry](https://python-poetry.org/) as the package manager
- [tox](https://tox.wiki/en) together with [pytest](https://docs.pytest.org/en/) to run tests across multiple versions
You can install both tools with [`pipx`](https://pypa.github.io/pipx/), like so
```sh
pipx install poetry
pipx inject poetry virtualenv-pyenv # to allow poetry to find python versions from pyenv
pipx install tox
pipx inject tox virtualenv-pyenv # to allow tox to find python versions from pyenv
```
Once you've installed poetry, you can do this:
```sh
# install required packages
$ poetry install
# to run tests against python version used by poetry
$ poetry run pytest
# or
$ poetry shell # activate poetry environment
$ pytest # so now pytest is available
# to run tests against all support python versions
$ tox
# to run against specific version (Python 3.9)
$ tox -e py39
```
To install multiple python versions locally, we recommend you use [`pyenv`](https://github.com/pyenv/pyenv)
_(don't try to use `tox run-parallel`, our tests are not set up for this to work 😅)_
### Zip files
Currently we distribute our code in an obfuscated way, by including the code in the subfolders in a zip file that is imported at run-time (by the python files in the top level of this directory).
The one exception is the `data` directory (used for stubs) which is included directly in the `tools` folder.
The zip creation is managed by [`make_zips.py`](./make_zips.py), and currently we make one zipfile for Python 2 (which is byte compiled), and one for Python 3 (which has source files, but they are stripped of comments and docstrings).
### A note about Python versions
We expect to be able to run our tools (setup phase) with either Python 2 or Python 3, and after determining which version to analyze the code as, we run the extractor with that version. So we must support:
- Setup tools run using Python 2:
- Extracting code using Python 2
- Extracting code using Python 3
- Setup tools run using Python 3:
- Extracting code using Python 2
- Extracting code using Python 3
# 1. Setup phase
**For extraction with the CodeQL CLI locally** (`codeql database create --language python`)
- Runs [`language-packs/python/tools/autobuild.sh`](/language-packs/python/tools/autobuild.sh) and this script runs [`index.py`](./index.py)
### Overview of control flow for [`setup.py`](./setup.py)
The representation of the code in the figure below has in some cases been altered slightly, but is accurate as of 2020-03-20.

### Overview of control flow for [`index.py`](./index.py)
The representation of the code in the figure below has in some cases been altered slightly, but is accurate as of 2020-03-20.

# 2. The actual Python extractor
## Overview
The entrypoint of the actual Python extractor is [`python_tracer.py`](./python_tracer.py).
The usual way to invoke the extractor is to pass a directory of Python files to the launcher. The extractor extracts code from those files a