Skip to content

gh-127958: Trace from RESUME in the JIT#145905

Open
Fidget-Spinner wants to merge 42 commits intopython:mainfrom
Fidget-Spinner:resume_tracing
Open

gh-127958: Trace from RESUME in the JIT#145905
Fidget-Spinner wants to merge 42 commits intopython:mainfrom
Fidget-Spinner:resume_tracing

Conversation

@Fidget-Spinner
Copy link
Member

@Fidget-Spinner Fidget-Spinner commented Mar 13, 2026

This adds a CACHE entry to RESUME and a RESUME_CHECK_JIT opcode.

Performance numbers are rather promising:

@markshannon
Copy link
Member

Do you know why the comprehensions benchmark is so much slower, and do you have a link to the x86 results?

The code looks good, but I'm a bit concerned about our increasingly ad-hoc approach to when to stop tracing.
Could you factor out the decision about whether to continue tracing when we hit ENTER_EXECUTOR into a helper function to make it easier to fix up later?

@Fidget-Spinner
Copy link
Member Author

Do you know why the comprehensions benchmark is so much slower, and do you have a link to the x86 results?

Updated the original description with the x86 results.
The comprehensions benchmark is so much slower due to RESUME_CHECK_JIT being that much slower than RESUME_CHECK last I checked.

The code looks good, but I'm a bit concerned about our increasingly ad-hoc approach to when to stop tracing. Could you factor out the decision about whether to continue tracing when we hit ENTER_EXECUTOR into a helper function to make it easier to fix up later?

Ok

Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comprehensions benchmark is so much slower due to RESUME_CHECK_JIT being that much slower than RESUME_CHECK last I checked.

I think that will need to be addressed before we can merge this. I've one suggestion that might help.

continue
executors.append(exec)
seen.add(id(exec))
# Add the side exit executors as well.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit surprised this doesn't break other tests.
Maybe change the signature of this function to get_all_executors(func, side_exits=False) and only get the side exits for the tests that need them?

return testfunc(x-1)

sys.setrecursionlimit(TIER2_RESUME_THRESHOLD * 2)
testfunc(TIER2_RESUME_THRESHOLD)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
testfunc(TIER2_RESUME_THRESHOLD)
for _ in range((TIER2_RESUME_THRESHOLD + 99)//100)
testfunc(101)

Recursing too deep might have some odd interactions with stack chunking, causing a stack chunk overflow to deopt at the wrong point.

self.assertEqual(opnames[:load_attr_top].count("_GUARD_TYPE_VERSION"), 1)
self.assertEqual(opnames[call:load_attr_bottom].count("_CHECK_VALIDITY"), 2)
if ex is not None:
opnames = list(iter_opnames(ex))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the assert no longer valid?

start_line, end_line, start_col, end_col = pos
if i == 0 and start_col == end_col == 0:
# ignore the RESUME in the beginning
if i == 0 or i == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if i == 0 or i == 1:
if i <= 1:


long resume_threshold = interp->opt_config.resume_initial_value + 2;
if (PyModule_Add(module, "TIER2_RESUME_THRESHOLD",
// + 1 more due to one loop spent on tracing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The +2 above and this comment don't match up

PAUSE_ADAPTIVE_COUNTER(this_instr[1].counter);
}
DISPATCH_GOTO();
DISPATCH_GOTO_NON_TRACING();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why make this change? They should be equivalent at this point, but DISPATCH_GOTO is generally faster.

#if _Py_TIER2
// 0 for success, -1 for error.
static int
static Py_NO_INLINE int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?


// We want to trace over RESUME traces. Otherwise, functions with lots of RESUME
// end up with many fragmented traces which perform badly.
// See for example, the richards benchmark in pyperformance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment belongs here.
This function is for translating a single bytecode; it is not concerned with trace formation

void
_Py_Specialize_Resume(_Py_CODEUNIT *instr, PyThreadState *tstate)
{
if (tstate->tracing == 0 && instr->op.code == RESUME) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we excluding (sys.monitoring) tracing functions? Those are likely to be hot.

if (!IS_JIT_TRACING() && backoff_counter_triggers(counter) &&
this_instr->op.code == JUMP_BACKWARD_JIT &&
if (!IS_JIT_TRACING() &&
(backoff_counter_triggers(counter) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!IS_JIT_TRACING() is mostly true and backoff_counter_triggers(counter) is mostly false, so this would be faster as backoff_counter_triggers(counter) && !IS_JIT_TRACING() && ...

@bedevere-app
Copy link

bedevere-app bot commented Mar 13, 2026

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants