gh-146073: Add fitness/exit quality mechanism for JIT trace frontend by cocolato · Pull Request #147966 · python/cpython

cocolato · 2026-04-01T13:24:31Z

Background

Introduced a preliminary fitness/exit quality mechanism for JIT trace frontend has, enabling the tracer to:

Stop proactively: Stop at the appropriate time without waiting for the buffer to fill up or hitting a dead end
Stop at good locations: Prioritize stopping at the entry points of existing executors (ENTER_EXECUTOR) to avoid stopping on instructions that can be optimized
Control frame depth: Apply a penalty that increases with depth for function call inlining

Issue: Improving trace quality by tracking "fitness" and "exit quality" #146073

cocolato · 2026-04-01T13:26:07Z

Include/internal/pycore_optimizer.h

+/* Default fitness configuration values for trace quality control.
+ * These can be overridden via PYTHON_JIT_FITNESS_* environment variables. */
+#define FITNESS_INITIAL             1000
+#define FITNESS_INITIAL_SIDE         800
+#define FITNESS_PER_INSTRUCTION        2
+#define FITNESS_BRANCH_BIASED          5
+#define FITNESS_BRANCH_UNBIASED       25
+#define FITNESS_BACKWARD_EDGE         80
+#define FITNESS_FRAME_ENTRY           10
+
+/* Default exit quality constants for fitness-based trace termination.
+ * Higher values mean better places to stop the trace.
+ * These can be overridden via PYTHON_JIT_EXIT_QUALITY_* environment variables. */
+#define EXIT_QUALITY_ENTER_EXECUTOR  500
+#define EXIT_QUALITY_DEFAULT         200
+#define EXIT_QUALITY_SPECIALIZABLE    50


I ran some benchmarks with the current configuration:

Performance version: 1.14.0 Python version: 3.15.0a7+ (64-bit) revision 2f9438a25f Report on Linux-6.6.87.2-microsoft-standard-WSL2-x86_64-with-glibc2.41 Number of logical CPUs: 12 Start date: 2026-04-01 21:04:33.548904 End date: 2026-04-01 21:09:52.782976 +----------------------+---------------+--------------+--------------+------------------------+ | Benchmark | baseline.json | fitness.json | Change | Significance | +======================+===============+==============+==============+========================+ | chaos | 44.2 ms | 45.1 ms | 1.02x slower | Not significant | +----------------------+---------------+--------------+--------------+------------------------+ | deltablue | 2.14 ms | 2.23 ms | 1.04x slower | Significant (t=-9.76) | +----------------------+---------------+--------------+--------------+------------------------+ | fannkuch | 246 ms | 256 ms | 1.04x slower | Significant (t=-7.84) | +----------------------+---------------+--------------+--------------+------------------------+ | float | 46.1 ms | 49.6 ms | 1.07x slower | Significant (t=-6.55) | +----------------------+---------------+--------------+--------------+------------------------+ | go | 82.8 ms | 91.2 ms | 1.10x slower | Significant (t=-11.45) | +----------------------+---------------+--------------+--------------+------------------------+ | json_dumps | 7.59 ms | 8.32 ms | 1.10x slower | Significant (t=-19.49) | +----------------------+---------------+--------------+--------------+------------------------+ | json_loads | 18.4 us | 18.9 us | 1.03x slower | Significant (t=-6.08) | +----------------------+---------------+--------------+--------------+------------------------+ | nbody | 50.2 ms | 50.7 ms | 1.01x slower | Not significant | +----------------------+---------------+--------------+--------------+------------------------+ | pickle_pure_python | 279 us | 262 us | 1.07x faster | Significant (t=13.59) | +----------------------+---------------+--------------+--------------+------------------------+ | pidigits | 137 ms | 139 ms | 1.02x slower | Not significant | +----------------------+---------------+--------------+--------------+------------------------+ | pyflate | 264 ms | 274 ms | 1.04x slower | Significant (t=-8.33) | +----------------------+---------------+--------------+--------------+------------------------+ | raytrace | 252 ms | 211 ms | 1.19x faster | Significant (t=40.18) | +----------------------+---------------+--------------+--------------+------------------------+ | regex_compile | 102 ms | 103 ms | 1.01x slower | Not significant | +----------------------+---------------+--------------+--------------+------------------------+ | regex_effbot | 2.16 ms | 2.17 ms | 1.01x slower | Not significant | +----------------------+---------------+--------------+--------------+------------------------+ | richards | 16.0 ms | 16.0 ms | 1.00x faster | Not significant | +----------------------+---------------+--------------+--------------+------------------------+ | spectral_norm | 60.8 ms | 57.3 ms | 1.06x faster | Significant (t=2.43) | +----------------------+---------------+--------------+--------------+------------------------+ | telco | 6.02 ms | 5.80 ms | 1.04x faster | Significant (t=5.54) | +----------------------+---------------+--------------+--------------+------------------------+ | unpickle_pure_python | 177 us | 170 us | 1.04x faster | Significant (t=5.28) | +----------------------+---------------+--------------+--------------+------------------------+ | xml_etree_generate | 78.5 ms | 74.4 ms | 1.06x faster | Significant (t=5.83) | +----------------------+---------------+--------------+--------------+------------------------+ | xml_etree_iterparse | 75.6 ms | 71.5 ms | 1.06x faster | Significant (t=4.14) | +----------------------+---------------+--------------+--------------+------------------------+ | xml_etree_parse | 134 ms | 120 ms | 1.12x faster | Significant (t=6.42) | +----------------------+---------------+--------------+--------------+------------------------+ | xml_etree_process | 48.7 ms | 49.4 ms | 1.02x slower | Not significant | +----------------------+---------------+--------------+--------------+------------------------+

Fidget-Spinner

Great work. Thanks for doing this!

Python/optimizer.c

Include/internal/pycore_optimizer.h

Fidget-Spinner · 2026-04-01T14:08:40Z

I also forgot to mention: but we should adjust for the following:

Penalize frame depth underflow more so than normal frame push/pop.
Stop tracing when we hit MAX_ABSTRACT_FRAME_DEPTH, as we can't optimize it anyways.

markshannon

This looks good, and thanks for doing this.

Overall, the approach looks good.

I originally had in a mind that the fitness would reduce geometrically, not arithmetically, but your arithmetic approach looks easier to reason about and at least as good in terms of trace quality.

We need to reduce the number of configurable parameters to just one or two.
Then we can make sure that fitness and penalties are set such that we are guaranteed not to overflow the trace or optimizer buffers.

I'm not sure that rewinding is worth it. As long as "good" exits have a much higher score than "bad" exits, then we should (almost) always end up at a good exit.

Include/internal/pycore_interp_structs.h

Include/internal/pycore_optimizer.h

Python/optimizer.c

markshannon · 2026-04-01T18:45:47Z

Python/optimizer.c

+    // Check if fitness is depleted — should we stop the trace?
+    if (ts->fitness < eq &&
+        !(progress_needed && uop_buffer_length(trace) < CODE_SIZE_NO_PROGRESS)) {
+        // Prefer stopping at the best recorded exit point


Just stop here. Backing up could give us a better exit, but it might give us a worse trace. And it is more complex to implement.

markshannon · 2026-04-01T18:46:35Z

Python/optimizer.c

+        else {
+            // No valid best exit — stop at current position
+            ADD_TO_TRACE(_EXIT_TRACE, 0, 0, target);
+            uop_buffer_last(trace)->operand1 = true; // is_control_flow


This doesn't count as control flow. It is a terminator, not a branch.

markshannon · 2026-04-01T18:48:13Z

Python/optimizer.c

    trace->end -= needs_guard_ip;

    int space_needed = expansion->nuops + needs_guard_ip + 2 + (!OPCODE_HAS_NO_SAVE_IP(opcode));
    if (uop_buffer_remaining_space(trace) < space_needed) {


Suggested change

assert (uop_buffer_remaining_space(trace) > space_needed);

If we choose the fitness and exit values correctly, we can't run out of space.

markshannon · 2026-04-01T18:53:39Z

Python/optimizer.c

+                    _PyJitTracerTranslatorState *ts_depth = &tracer->translator_state;
+                    if (ts_depth->frame_depth <= 0) {
+                        // Underflow
+                        ts_depth->fitness -= (int32_t)tstate->interp->opt_config.fitness_frame_entry * 2;


I think this is fundamentally different to making a call, so should have its own distinct (and probably larger) penalty.

markshannon · 2026-04-01T18:53:44Z

Python/optimizer.c

+                        // Underflow
+                        ts_depth->fitness -= (int32_t)tstate->interp->opt_config.fitness_frame_entry * 2;
+                    }
+                    ts_depth->frame_depth = ts_depth->frame_depth <= 0 ? 0 : ts_depth->frame_depth - 1;


Suggested change

ts_depth->frame_depth = ts_depth->frame_depth <= 0 ? 0 : ts_depth->frame_depth - 1;

ts_depth->frame_depth = ts_depth->frame_depth <= 0 ? 0 : ts_depth->frame_depth - 1;

ts_depth->fitness -= penalty;

We should increase the fitness on returning, we want to inline calls to small functions, so we shouldn't penalize it.

bedevere-app · 2026-04-01T19:04:39Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

cocolato and others added 2 commits April 1, 2026 00:57

add fitness && exit quality mechanism

1bfa176

Rewrite the code structure

2f9438a

cocolato requested review from FFY00, Fidget-Spinner, ZeroIntensity, ericsnowcurrently and markshannon as code owners April 1, 2026 13:24

bedevere-app bot added the awaiting review label Apr 1, 2026

bedevere-app bot mentioned this pull request Apr 1, 2026

Improving trace quality by tracking "fitness" and "exit quality" #146073

Open

cocolato added the skip news label Apr 1, 2026

cocolato commented Apr 1, 2026

View reviewed changes

Fidget-Spinner reviewed Apr 1, 2026

View reviewed changes

Python/optimizer.c Outdated Show resolved Hide resolved

Include/internal/pycore_optimizer.h Outdated Show resolved Hide resolved

address review

709c0a1

markshannon requested changes Apr 1, 2026

View reviewed changes

bedevere-app bot added awaiting changes and removed awaiting review labels Apr 1, 2026

address many reviews

ef6ac24

	ts_depth->frame_depth = ts_depth->frame_depth <= 0 ? 0 : ts_depth->frame_depth - 1;
	ts_depth->frame_depth = ts_depth->frame_depth <= 0 ? 0 : ts_depth->frame_depth - 1;
	ts_depth->fitness -= penalty;

Uh oh!

Conversation

cocolato commented Apr 1, 2026 • edited by markshannon Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Uh oh!

cocolato Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Fidget-Spinner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Fidget-Spinner commented Apr 1, 2026

Uh oh!

markshannon left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markshannon Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

markshannon Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

markshannon Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

markshannon Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

markshannon Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

bedevere-app bot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cocolato commented Apr 1, 2026 •

edited by markshannon

Loading