Skip to content

Conversation

@ShaharNaveh
Copy link
Collaborator

@ShaharNaveh ShaharNaveh commented Jan 3, 2026

Summary by CodeRabbit

  • Refactor

    • Internal bytecode instruction names and encodings were reorganized for clearer, strongly-discriminated layouts; this improves consistency in disassembly, error messages, and runtime handling without changing observable behavior.
  • Chores

    • Opcode metadata generation was automated from source to keep tooling and printed opcode names synchronized.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 3, 2026

📝 Walkthrough

Walkthrough

Refactors the bytecode Instruction enum to use explicit numeric discriminants, renames and reshapes many opcodes (including argument-bearing variants), and propagates those renames across compiler, codegen, VM, JIT, stdlib metadata, tests, and the opcode-generation script.

Changes

Cohort / File(s) Summary
Bytecode enum
crates/compiler-core/src/bytecode.rs
Introduced explicit numeric discriminants and new/renamed variants (e.g., BinaryOp { op }, Call, CallKw, RaiseVarargs, LoadName, StoreName, BinarySubscr, DeleteSubscr, PopExcept, SetupAnnotations, Jump { target }, etc.). Adjusted TryFrom, display/serialization, label_arg, and stack-effect logic to match the new discriminants and shapes.
Code generation
crates/codegen/src/compile.rs
Replaced emission sites to use renamed instructions (Call/CallKw, LoadName, StoreName, StoreSubscr/DeleteSubscr, BinarySubscr, RaiseVarargs, CompareOp, PopExcept, SetupAnnotations, etc.). Many call/raise/name handling sites updated.
VM / Frame execution
crates/vm/src/frame.rs
Updated interpreter match arms to new variants (Call/CallKw, LoadName, StoreName, BinarySubscr, DeleteSubscr, RaiseVarargs, PopExcept, CompareOp, SetupAnnotations, etc.) and consolidated unreachable/default handling for removed placeholders.
JIT & runtime helpers
crates/jit/src/instructions.rs, crates/jit/tests/common.rs
Adjusted internal dispatch and tests to new variant names (Call, CompareOp, LoadName, StoreName), preserving argument shapes where applicable.
Stdlib opcode utilities
crates/stdlib/src/opcode.rs
Updated Opcode::has_name/has_jump checks to reflect renames (LoadName, StoreName, DeleteName) and removed some jump variants from the jump set.
Opcode metadata generator
scripts/generate_opcode_metadata.py
Rewrote generator to parse the Rust Instruction enum directly, derive opcode mapping dynamically, add Opcode NamedTuple with cpython_name, and emit Lib/_opcode_metadata.py instead of relying on hard-coded maps.
Tests / misc
crates/jit/tests/*, various call sites across crates
Propagated many variant renames in tests and call sites; ensure alignment with new discriminants and argument names.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • youknowone
  • arihant2math

Poem

🐇 Hop, hop — bytes align in rows,
Names and numbers find their pose,
Calls and raises, tidy, neat,
Opcodes march with steady feet,
A rabbit cheers: the bytecode grows! 🎉

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Assign opcode ids' directly and clearly describes the main objective of the changeset: assigning explicit discriminant values to instruction opcodes throughout the codebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/compiler-core/src/bytecode.rs (2)

592-871: Opcode discriminant assignments and TryFrom mapping look sound

The explicit repr(u8) values and the whitelist in Instruction::try_from are internally consistent: every value admitted by the match corresponds to a real variant (including the 222–255 range), so the mem::transmute::<u8, Instruction> call is sound and there are no “hole” values that would cause UB. From a maintainability perspective, if you expect to churn opcodes often, you might eventually consider an explicit match value { ... => Ok(Variant), _ => Err(..) } mapping instead of a broad range + transmute, but the current approach is fine correctness‑wise.

Also applies to: 875-915


656-662: Align CheckExcMatch documentation with its current behavior

The doc comment for Instruction::CheckExcMatch still describes it as pushing a boolean result (STACK[-2] vs STACK[-1]), but the variant now carries a Label arg, is included in label_arg, and its stack_effect is -2, matching the “duplicate exc + type, then consume both and keep the original exc” pattern used in compile_try_statement. It would be less confusing for future readers if the comment were updated to describe the jump‑on‑no‑match semantics and the fact that it no longer produces a standalone bool on the stack.

Also applies to: 1643-1656, 1761-1762

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0526b9 and f4b8be8.

📒 Files selected for processing (3)
  • crates/codegen/src/compile.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/frame.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/frame.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/codegen/src/compile.rs
🧬 Code graph analysis (2)
crates/vm/src/frame.rs (2)
crates/vm/src/vm/mod.rs (2)
  • value (675-675)
  • value (677-677)
crates/vm/src/exceptions.rs (2)
  • value (426-426)
  • value (590-590)
crates/compiler-core/src/bytecode.rs (1)
crates/compiler-core/src/marshal.rs (1)
  • try_from (79-111)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run tests under miri
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (19)
crates/vm/src/frame.rs (10)

557-564: LGTM: Instruction rename is consistent.

The rename from BinarySubscript to BinarySubscr aligns with the PR's opcode standardization objective. The implementation logic remains unchanged.


791-803: LGTM: Semantic clarification with DeleteName.

The rename from DeleteLocal to DeleteName better reflects that this instruction deletes by name from the locals mapping (using ass_subscript), distinguishing it from DeleteFast which operates on the fastlocals array by index. This improves code clarity.


804-804: LGTM: Instruction rename is consistent.

The rename from DeleteSubscript to DeleteSubscr aligns with the opcode standardization.


1350-1358: LGTM: Instruction rename is consistent.

The rename from PopException to PopExcept follows the abbreviated naming convention used elsewhere in the bytecode.


1370-1370: LGTM: Instruction rename is consistent.

The rename from Raise to RaiseVarargs better communicates that this instruction handles variable arguments for raise statements (reraise, raise with cause, etc.).


1418-1418: LGTM: Instruction rename is consistent.

The rename from SetupAnnotation (singular) to SetupAnnotations (plural) is more accurate since the instruction sets up the __annotations__ dictionary which holds multiple annotations.


1487-1492: LGTM: Semantic clarification with StoreName.

The rename from StoreLocal to StoreName better reflects that this instruction stores by name in the locals mapping (using ass_subscript), distinguishing it from StoreFast which stores by index in the fastlocals array. This improves code clarity.


1493-1493: LGTM: Instruction rename is consistent.

The rename from StoreSubscript to StoreSubscr aligns with the opcode standardization and abbreviated naming convention.


530-1578: Ensure code passes cargo fmt and cargo clippy before merging.

Per the coding guidelines for Rust files, run cargo fmt to format the code and cargo clippy to check for lints. Fix any warnings introduced by these changes before final submission.


700-722: CheckExcMatch implementation is correct.

The implementation properly validates that exception types inherit from BaseException, handles both single types and tuples, and leverages is_instance which correctly delegates tuple matching to object_recursive_isinstance. The bytecode instruction is correctly emitted during exception handler compilation, and the logic accurately mirrors Python's exception matching behavior.

crates/compiler-core/src/bytecode.rs (2)

1696-1823: Stack effects for new/renamed opcodes match codegen usage

The updated stack_effect arms for BinarySubscr, Subscript, StoreSubscr, DeleteSubscr, StoreName/DeleteName, CheckEgMatch, CheckExcMatch, RaiseVarargs, EndAsyncFor, SetupAnnotations, etc., line up with how they’re emitted in compile.rs:

  • StoreSubscr => -3 and DeleteSubscr => -2 match the [value, container, index] and [container, index] stack shapes produced by compile_subscript, compile_store, and compile_augassign.
  • BinarySubscr => -1 is used where we duplicate the subject and then index into it in pattern helpers.
  • RaiseVarargs’s -(kind.get(arg) as u8 as i32) matches the number of values the compiler pushes for each RaiseKind (0/1/2).
  • CheckEgMatch => 0 (pop 2, push 2) is consistent with the [orig, list, rest, type] -> [orig, list, new_rest, match] comment in compile_try_star_except.
  • CheckExcMatch(_) => -2 matches the CopyItem + exc_type pattern in compile_try_statement (3 stack entries down to the original exc).

I don’t see any inconsistency between these definitions and the codegen sites.


875-915: Confirm bytecode compatibility expectations after changing opcode values

By pinning explicit repr(u8) discriminants and updating Instruction::try_from accordingly, the binary encoding of bytecode has changed. Any previously serialized CodeUnit/CodeObject (e.g., cached .pyc-like artifacts) using the old opcode layout will now either fail to decode (InvalidBytecode) or, if produced by older code that also used transmute, could be misinterpreted if reused across versions. If RustPython ever persists bytecode across versions, it would be good to confirm that:

  • Either the magic/marshal version is bumped so old caches are rejected explicitly, or
  • You guarantee such caches are never reused across a change like this.

Otherwise, you may get hard‑to‑diagnose failures when loading stale compiled artifacts.

crates/codegen/src/compile.rs (7)

445-502: Subscript codegen matches new BinarySubscr/StoreSubscr/DeleteSubscr semantics

The various subscript paths are internally consistent with the new opcodes and their stack effects:

  • compile_subscript:

    • Load: evaluates value, then slice, and calls Subscript, giving [obj, index] -> result as expected.
    • Store: when used from compile_store, we start with the assigned value on the stack; after compiling value and slice (and BuildSlice when optimizing 2‑element slices) we end up with [value, container, sub], which matches StoreSubscr => -3.
    • Delete: compiles value then slice and calls DeleteSubscr, with [container, sub] matching DeleteSubscr => -2.
  • pattern_helper_sequence_subscr duplicates the subject and uses BinarySubscr with [subject, subject, idx] so the helper keeps the original subject while extracting an element, consistent with BinarySubscr => -1.

  • compile_annotated_assign builds [annotation, __annotations__, name] before StoreSubscr, so __annotations__[name] = annotation is encoded correctly.

  • compile_augassign’s subscript case carefully arranges the stack to [result, container, index] before StoreSubscr, again matching the [value, container, sub] convention.

Given the stack_effect definitions in bytecode.rs, all of these sites look correct.

Also applies to: 1801-1827, 3621-3622, 4484-4511, 4588-4648


1180-1302: NameOp::Name migration to LoadNameAny/StoreName/DeleteName is coherent

Routing NameOp::Name through LoadNameAny / StoreName / DeleteName is consistent with how indices are allocated:

  • name() and get_global_name_index() both index into CodeInfo.metadata.names, which is exactly what these opcodes consume.
  • Module/class bodies (compile_program, compile_program_single, compile_class_body) and class metadata setup (__module__, __qualname__, __doc__, __firstlineno__, __type_params__, __classcell__) all use name() and StoreName, so they’re aligned with the bytecode enum.

As long as the VM side now treats LOAD_NAME_ANY/STORE_NAME/DELETE_NAME as the generic name path (mirroring the old local/name opcodes), behavior should be unchanged.

Also applies to: 2959-3042


1501-1518: RaiseVarargs and exception-matching opcodes integrate cleanly in try/except and assert paths

The switch from Raise to RaiseVarargs and from JumpIfNotExcMatch to CheckExcMatch is wired consistently:

  • Stmt::Raise:

    • RaiseKind::Raise is emitted after compiling one expression (stack_effect -1).
    • RaiseKind::RaiseCause is emitted after compiling two expressions (stack_effect -2).
    • Bare raise uses RaiseKind::Reraise (stack_effect 0), matching the absence of new stack values.
  • In assert lowering, the explicit RaiseVarargs { kind: Raise } matches compiling exactly one exception instance.

  • Classic try/except:

    • For typed handlers, the CopyItem + exc_type + CheckExcMatch(next_handler) pattern creates [exc, exc, type] so that CheckExcMatch can consume the duplicate and type and still leave the original exc for the bound name or PopTop.
    • Unhandled cases jump to the final next_handler block, where RaiseVarargs { kind: Reraise } is emitted, so unmatched exceptions are re‑raised.
  • try/except*:

    • Uses CheckEgMatch with the documented [orig, list, rest, type] -> [orig, list, new_rest, match] shape.
    • SetExcInfo is used before running the handler body so a bare raise picks up the right match.
    • The “no more to reraise” path does PopTop (drop the None result) and PopExcept, while the re‑raise path deliberately omits PopExcept and just executes RaiseVarargs { kind: Raise }, leaving block unwinding to the VM.

These all line up with the stack_effect definitions in bytecode.rs; I don’t see a mismatch.

Also applies to: 1586-1591, 2053-2147, 2178-2420


2923-2957: SetupAnnotations usage remains consistent with annotation detection

The rename to Instruction::SetupAnnotations has been plumbed through all relevant sites:

  • Module and REPL compilation (compile_program, compile_program_single) call find_ann and emit SetupAnnotations only when needed.
  • Class bodies (compile_class_body) also call find_ann and emit SetupAnnotations once at the top when annotations are present.

Since compile_annotated_assign is already gated on !ctx.in_func() for runtime annotation storage, this preserves the prior behavior with the new opcode name.

Also applies to: 1034-1053, 1063-1065, 3019-3022


3821-4022: CompareOp/ComparisonOperator wiring for patterns and comparisons looks correct

The migration to ComparisonOperator + Instruction::CompareOp is coherent:

  • Mapping in compile_addcompare:
    • CmpOp::Eq/NotEq/Lt/LtE/Gt/GtE map to Equal, NotEqual, Less, LessOrEqual, Greater, GreaterOrEqual respectively.
  • Structural pattern matching:
    • Mapping patterns use CompareOp { op: GreaterOrEqual } for the len(subject) >= size and len(subject) >= size-1 checks.
    • Sequence patterns use CompareOp { op: Equal } for len(subject) == size and GreaterOrEqual where a *rest is present.
    • Value patterns use CompareOp { op: Equal } for subject == value.
  • Other relational/inclusion ops (In/NotIn, Is/IsNot) delegate to ContainsOp and IsOp with the right Invert flag.

Given CompareOp has stack effect -1 and the surrounding code always leaves the subject underneath the comparison, these changes look correct.

Also applies to: 4132-4209, 4211-4223, 4361-4376


4484-4517: Annotation storage and augmented assignment now use StoreSubscr consistently

Two key sites now rely on StoreSubscr and are consistent with its stack contract:

  • compile_annotated_assign:

    • For simple name: T outside functions, after compile_annotation the stack is [annotation], then LoadNameAny(__annotations__), then the mangled name string, yielding [annotation, __annotations__, name] before StoreSubscr, which matches the [value, container, index] convention.
  • compile_augassign:

    • The subscript case builds [container, slice], duplicates both to get a copy for loading, uses Subscript to obtain the original value, then performs the inplace binary op with the RHS and finally reorders to [result, container, slice] just before StoreSubscr.

Both cases align with the StoreSubscr stack_effect and with how simple subscript stores are compiled via compile_subscript.

Also applies to: 4588-4648


5443-5630: Async for/comprehensions correctly use EndAsyncFor under the new opcode layout

In both explicit async for loops and comprehensions:

  • compile_for emits EndAsyncFor on the async path after the else block, matching EndAsyncFor => -2 (it cleans up the async iterator state).
  • compile_comprehension emits EndAsyncFor when any generator is async, and uses GetAIter/GetANext + Resume { AfterAwait }, consistent with the updated async for bytecodes.

Given the updated numeric assignment for EndAsyncFor and the unchanged stack_effect, the codegen here still looks correct.

Also applies to: 3331-3377

@youknowone
Copy link
Member

#6638 will change instruction a lot. Can this be aligned to that patch?

@ShaharNaveh ShaharNaveh marked this pull request as draft January 6, 2026 08:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/compiler-core/src/bytecode.rs (1)

1063-1074: Verify transmute safety with non-contiguous discriminants.

The TryFrom<u8> implementation uses unsafe { core::mem::transmute::<u8, Self>(value) } after checking value <= u8::from(LAST_INSTRUCTION). With explicit discriminants that have gaps (e.g., no opcodes at 0, 3-4, 11, 13, etc.), transmuting these gap values could produce invalid Instruction enum variants, which is undefined behavior.

Consider either:

  1. Filling all gaps with Reserved variants (e.g., Reserved0, Reserved3, Reserved4, etc.)
  2. Using an explicit match instead of transmute to validate each discriminant
🔎 Example of safer implementation using match
 impl TryFrom<u8> for Instruction {
     type Error = MarshalError;
 
     #[inline]
     fn try_from(value: u8) -> Result<Self, MarshalError> {
-        if value <= u8::from(LAST_INSTRUCTION) {
-            Ok(unsafe { core::mem::transmute::<u8, Self>(value) })
-        } else {
-            Err(MarshalError::InvalidBytecode)
-        }
+        // Safe alternative: explicit validation
+        match value {
+            1 => Ok(Self::BeforeAsyncWith),
+            2 => Ok(Self::BeforeWith),
+            5 => Ok(Self::BinarySubscr),
+            // ... other valid opcodes
+            149 => Ok(Self::Resume { arg: Arg::marker() }),
+            _ => Err(MarshalError::InvalidBytecode),
+        }
     }
 }

Note: This requires #[repr(u8)] on the enum and careful maintenance, but is memory-safe.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 153a03a and 5369dab.

📒 Files selected for processing (1)
  • crates/compiler-core/src/bytecode.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/compiler-core/src/bytecode.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (2)
crates/compiler-core/src/bytecode.rs (2)

679-712: Coordinate with PR #6638 before merging.

Based on the PR comments, PR #6638 will significantly change the instruction structure. Since this PR assigns explicit discriminants to the Instruction enum, merging both could create conflicts or require substantial rework.

Recommend coordinating with the author of PR #6638 to determine:

  1. Which PR should merge first
  2. Whether discriminant assignments should be deferred until after #6638
  3. How to handle potential conflicts in the instruction enum structure

679-712: Cache (opcode 0) and Reserved3 (opcode 3) variants appear to be referenced but not defined in the Instruction enum.

The review comment correctly identifies that opcode 0 is missing from the visible enum definition (BeforeAsyncWith starts at 1). However, the code references Cache and Reserved3 in match statements within stack_effect() (line 1846) and the display logic (line 2084), but these variants cannot be located in the enum definition. If not properly defined, this would cause compilation errors.

This discrepancy needs clarification:

  • If Cache and Reserved3 are intentionally omitted per CPython 3.13 spec (which removed CACHE as a separate opcode), they should be removed from match statements.
  • If they should be included, they must be added to the enum with appropriate discriminants (Cache = 0, Reserved3 = 3).

LoadClosure(Arg<NameIdx>) = 253, // CPython uses pseudo-op 258
LoadMethod {
idx: Arg<NameIdx>,
} = 254, // CPython uses pseudo-op 259
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will actually put these ops on pseudo ops once we get them.
Planning to do it for next task.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That, will be great:)
If you already have a local branch with the changes, can you please open a draft PR? I might have some inputs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not yet. I will start one during weekend

} else {
Err(MarshalError::InvalidBytecode)
match value {
1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not the opposite way? listing reserved may be easier. Reserved ops doesn't have args, so using names is also easy like Self::Reserved3 as u8

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. What is the purposed code change?


I created this match arm with a simple script

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid using literal numbers in the code. That's not easy to read, and maintainers must not forget to run the 'simple script'. Using name prevent to run the script.

if (1..=Self::Resume as u8).contains(value) && ![Self::Reserved3 as u8, Self::Reserved17 as u8, Self::Reserved141 as u8...].contains(value) {
    Ok(core::mem::transmute::<u8, Self>(value))
} else {
    Err(InvalidBytecode)
}

This code will automatically error-catachable by breaking build if we change reserved141 to something other. (Though not working the inverse way)

Copy link
Collaborator Author

@ShaharNaveh ShaharNaveh Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we have "holes" in our IDs, this can lead to undefined behavior.

I would love to use num_enum crate, but it doesn't support enum variants that holds a value:/

I do think we should rename the Instruction enum to Opcode, and have an Instruction struct that holds:

struct Instruction {
  opcode: Opcode,
  oparg: Oparg // Oparg enum
}

Like a typed CodeUnit.
But that's out of scope for this PR, and needs a further dissection

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we had that form originally. The commits of that change or comments of the PR might have note that why we moved here.
cc @coolreader18

| Continue { target: l }
| Send { target: l }
// Dummy jump instructions (placeholders)
| JumpBackward { target: l }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to implement these to replace our own old ones, I hesitate to remove it now to recover it back later.
We will not implement any other instruction in this slot anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My philosophy is that we can always add new code, but to have a placeholder code (that can be deleted) is just clutter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit to delete them? They are deletable, but no reason to do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO they are just noise, but I can re-add them, that's not an issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that could be. At the same time, it is also like TODO

def test_boundaries(self):
self.assertEqual(dis.opmap["EXTENDED_ARG"], dis.EXTENDED_ARG)

@unittest.expectedFailure # TODO: RUSTPYTHON; AssertionError: 29 not less than or equal to 20
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because BINARY_OP_INPLACE_ADD_UNICODE opcode was added, we are missing some truncation logic when formatting the dis output

@ShaharNaveh ShaharNaveh marked this pull request as ready for review January 9, 2026 12:50
@ShaharNaveh ShaharNaveh requested a review from youknowone January 9, 2026 12:50
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In @crates/codegen/src/compile.rs:
- Around line 1016-1025: The stack effect for PopExcept is incorrectly
documented as 0 but the VM's frame implementation (where PopExcept does let
prev_exc = self.pop_value()) consumes one value; update the stack_effect mapping
for the PopExcept opcode (the entry for PopExcept in your
stack_effect/stack_effects function or match table) from 0 to -1 so the
documented contract matches the VM behavior and downstream stack analysis.
- Around line 1881-1885: The generic-class keyword handling is wrong: instead of
emitting individual STRING constants for each kw name and increasing nargs with
keywords, build a single tuple constant of keyword name strings (use
ConstantData::Tuple) and emit that one tuple onto the stack, and emit the call
with nargs equal to the number of positional arguments only (do not add keywords
to nargs); mirror the pattern in compile_call_inner so the VM's
collect_keyword_args receives a PyTuple at the top of the stack and the
Call/CallKw ABI is satisfied.

In @crates/compiler-core/src/bytecode.rs:
- Around line 679-958: The VM currently ends bytecode dispatch with a catch-all
that calls unreachable!(), so placeholder enum variants in Bytecode (e.g.,
BinaryOpInplaceAddUnicode, BinarySlice, EndFor, LoadLocals, PushNull) can cause
a panic if accidentally executed; change the dispatch default arm to return a
descriptive runtime error (or Err) indicating an invalid/placeholder opcode
rather than panicking, and add an explicit match arm or helper check (e.g.,
is_placeholder_opcode) that logs or maps placeholder discriminants to that
error; additionally, add a validation step in bytecode loading or TryFrom<u8>
conversion for Bytecode to reject placeholder discriminants early (or emit
warnings) and document the retained placeholder variants in the Bytecode enum
comment to clarify they must never be emitted by codegen.

In @crates/vm/src/frame.rs:
- Around line 1782-1784: The match currently uses a catch-all arm with
unreachable!() for the Instruction enum which defeats compile-time
exhaustiveness checking; replace the wildcard arm by explicitly matching every
Instruction variant used in this match so the compiler will error if new
variants are added (i.e., remove the `_ => unreachable!()` arm in the match on
`instruction` and add explicit arms for each `Instruction::<Variant>` handled
here), or if the enum is intended to be extensible annotate the enum definition
with `#[non_exhaustive]` and then handle the `_` case with a documented
conversion/error path; ensure you update the match in the function/method in
frame.rs where `instruction` is matched so future additions to `Instruction`
cause a compile-time failure instead of a runtime panic.

In @scripts/generate_opcode_metadata.py:
- Around line 21-29: The cpython_name property contains multiple return
statements after the intended two-step transformation, making the last two
returns unreachable; edit the cpython_name method to keep the special-case for
"CopyItem" and the intended conversion (assign rust_name to name, apply the two
regex substitutions and return name.upper()), and remove the extraneous return
lines (the duplicate re.sub returns) so only the correct transformation and
single return remain in the cpython_name property.
🧹 Nitpick comments (1)
crates/compiler-core/src/bytecode.rs (1)

975-999: Document intentional gaps in opcode ranges.

The TryFrom<u8> implementation validates four separate ranges:

  • CPython opcodes: Cache (0) to YieldValue (118)
  • Resume: 149
  • RustPython-only: Break (119) to Subscript (135)
  • Pseudo opcodes: Jump (252) to PopBlock (255)

This leaves gaps (136-148, 150-251) that will return InvalidBytecode errors. Consider:

  1. Adding a comment in the TryFrom implementation documenting these intentional gaps.
  2. Verifying that deserialization of legacy bytecode won't encounter these gap values.
📝 Add documentation for opcode gaps
     #[inline]
     fn try_from(value: u8) -> Result<Self, MarshalError> {
+        // Valid opcode ranges:
+        // - 0..=118: CPython 3.13 compatible opcodes
+        // - 149: Resume instruction
+        // - 119..=135: RustPython-specific opcodes
+        // - 252..=255: Pseudo opcodes (not emitted in bytecode)
+        // Gaps (136-148, 150-251) are reserved for future use
         let cpython_start = u8::from(Self::Cache);
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5369dab and 258f61c.

⛔ Files ignored due to path filters (3)
  • Lib/_opcode_metadata.py is excluded by !Lib/**
  • Lib/test/test__opcode.py is excluded by !Lib/**
  • Lib/test/test_dis.py is excluded by !Lib/**
📒 Files selected for processing (7)
  • crates/codegen/src/compile.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/jit/src/instructions.rs
  • crates/jit/tests/common.rs
  • crates/stdlib/src/opcode.rs
  • crates/vm/src/frame.rs
  • scripts/generate_opcode_metadata.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/opcode.rs
  • crates/jit/src/instructions.rs
  • crates/jit/tests/common.rs
  • crates/vm/src/frame.rs
  • crates/codegen/src/compile.rs
  • crates/compiler-core/src/bytecode.rs
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/generate_opcode_metadata.py
🧠 Learnings (1)
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When modifying bytecode instructions, perform a full clean build by running `rm -r target/debug/build/rustpython-* && find . | grep -E '\.pyc$' | xargs rm -r`

Applied to files:

  • scripts/generate_opcode_metadata.py
🧬 Code graph analysis (2)
crates/codegen/src/compile.rs (1)
crates/compiler-core/src/bytecode.rs (8)
  • from (394-396)
  • from (437-439)
  • from (964-967)
  • from (1044-1046)
  • from (1050-1052)
  • try_from (974-999)
  • try_from (1020-1025)
  • try_from (1034-1040)
crates/compiler-core/src/bytecode.rs (3)
crates/vm/src/exceptions.rs (4)
  • value (438-438)
  • value (602-602)
  • w (1775-1775)
  • w (1776-1776)
crates/vm/src/frame.rs (3)
  • arg (2646-2647)
  • name (1322-1322)
  • name (2638-2638)
crates/codegen/src/compile.rs (1)
  • name (1210-1212)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
🔇 Additional comments (16)
crates/compiler-core/src/bytecode.rs (2)

1763-1914: LGTM! Stack effect calculations properly updated.

The stack effect calculations have been correctly updated to use the renamed instruction variants (LoadName, StoreName, DeleteName, Call, CallKw, RaiseVarargs, etc.). The logic for each instruction's stack manipulation appears correct.


1982-2104: LGTM! Disassembly formatting correctly updated.

The fmt_dis method has been properly updated to display the renamed instruction variants with their appropriate names (e.g., BINARY_SUBSCR, DELETE_SUBSCR, LOAD_NAME, STORE_NAME, CALL, CALL_KW, RAISE_VARARGS, etc.).

scripts/generate_opcode_metadata.py (2)

55-81: LGTM! Generation logic correctly parses and outputs opcode metadata.

The script properly:

  1. Reads the Rust source file
  2. Extracts the Instruction enum body
  3. Parses opcodes with their discriminants
  4. Generates sorted output in CPython 3.13 compatible format

The dynamic parsing approach eliminates the need for manual updates when opcodes change.


32-37: Python version requirement is acceptable for this development script.

This is a development-only script located in the scripts/ directory used to generate opcode metadata. The script already indicates compatibility with CPython 3.13 in its header comment, and the project's whats_left.py explicitly requires Python 3.13+. Since itertools.batched was introduced in Python 3.12, its use is fully compatible with the project's stated Python version requirements.

No action needed.

crates/jit/tests/common.rs (1)

99-105: LGTM! Test instruction handling correctly updated.

The test stack machine has been properly updated to handle the renamed instruction variants:

  • LoadNameAnyLoadName
  • StoreLocalStoreName

The logic remains the same, only the variant names changed.

crates/jit/src/instructions.rs (2)

457-476: LGTM! JIT call instruction updated.

The JIT compiler correctly handles the renamed Call instruction (previously CallFunctionPositional). The implementation logic remains unchanged.


492-545: LGTM! Comparison instruction renamed.

The CompareOp instruction (previously CompareOperation) is correctly handled in the JIT with proper type conversions and condition code mappings for both integer and float comparisons.

crates/stdlib/src/opcode.rs (1)

66-83: LGTM! Opcode name classification correctly updated.

The has_name function has been properly updated to recognize the renamed instruction variants:

  • DeleteLocalDeleteName
  • LoadNameAnyLoadName
  • StoreLocalStoreName

All three renamed variants correctly appear in the pattern match for instructions that use co_names.

crates/vm/src/frame.rs (1)

394-394: LGTM! Instruction variant renames are consistent.

The renamed instruction variants align well with standard Python bytecode naming conventions:

  • RaiseVarargs, BinarySubscr, CallKw, Call, CompareOp, DeleteName, DeleteSubscr, LoadName, PopExcept, SetupAnnotations, StoreName, and StoreSubscr are all clearer and more conventional names.

The logic remains unchanged—these are purely API surface updates.

Also applies to: 613-613, 714-718, 756-756, 825-825, 838-838, 1195-1195, 1431-1431, 1460-1460, 1562-1562, 1624-1624, 1630-1630

crates/codegen/src/compile.rs (7)

1818-1819: The kind mapping correctly covers all three raise forms (bare raise, raise exc, raise exc from cause), and the encoding aligns with the RaiseKind enum and opcode metadata. No issues found.


5222-5227: Annotation assignment, augassign, and normal call paths updated to new ops—monitor for stack effect regressions

All three locations (lines 5222–5227, 5363–5364, 6111–6115) correctly use the updated instructions (LoadName, StoreSubscr, Call, CallKw). Run the full test suite plus targeted cases (subscripting stores, __annotations__ in class/module, kwargs calls, exception handling) to verify stack effects before merge.


1345-1346: SetupAnnotations instruction verified: correctly gated and scoped

SetupAnnotations is properly emitted in all three entry points (module, function, class) using consistent gating with Self::find_ann(), which recursively detects annotated assignments. The runtime implementation correctly initializes the __annotations__ dict only when needed. No silent breakage risk identified—scopes align with Python semantics and the instruction behavior is sound.


1597-1602: LoadName/StoreName/DeleteName scope semantics are correct

LoadNameAny does not exist in the codebase. The current implementation correctly emits LoadName for NameOp::Name, which handles the proper LEGB lookup (Locals → Enclosing → Globals → Builtins): it first tries self.locals.mapping().subscript(name, vm), and on KeyError falls back to self.load_global_or_builtin(name, vm). This is the correct behavior for module and class scopes. StoreName and DeleteName similarly use locals.mapping() for storage/deletion, which is appropriate for these contexts. No behavior regression exists.


4338-4339: Pattern-matching opcode renames (BinarySubscr / CompareOp / DeleteSubscr) are properly defined and handled
All three opcodes are correctly defined in the Instruction enum (BinarySubscr = 5, DeleteSubscr = 9, CompareOp with op field), with proper VM dispatch handlers in frame.rs, correct stack effect tracking, and consistent bytecode serialization. All usages in compile.rs align with the defined opcode metadata.


497-510: Subscript opcodes are properly defined with documented stack semantics

The StoreSubscr (opcode 39, -3 stack effect), DeleteSubscr (opcode 9, -2 stack effect), and BinarySubscr (opcode 5, -1 stack effect) instructions are well-defined in crates/compiler-core/src/bytecode.rs with clear stack effect documentation. The usage patterns in compile.rs are correct: for store operations, the container and index are on the stack followed by the value; for delete operations, the container and index are consumed. No stack order issues detected.


3564-3596: Verify LoadName semantics for .type_params access in PEP 695 generic class contexts

While .type_params is registered with the TYPE_PARAM flag (making it a bound symbol), the code uses LoadName instructions to access it rather than LoadDeref. This appears to work because:

  1. In builtins.rs (line 1040), .type_params is explicitly set in the class function's namespace dictionary before execution
  2. The compile_name function has special handling for TypeParams scope that allows fallback to parent scope lookup
  3. At runtime, LoadName succeeds by finding .type_params in the namespace dict

However, this relies on runtime namespace provisioning rather than the symbol scope mechanism. Confirm that:

  • .type_params is guaranteed to be in the namespace dict in all PEP 695 execution paths
  • No optimization or refactoring removes the .type_params namespace entry before the class code executes
  • The fallback logic in compile_name consistently resolves .type_params correctly across all TypeParams scope access patterns (lines 3589-3590, 3679-3680, 3690, and others)

Comment on lines +1016 to 1025
// We need to pop: exc, prev_exc (via PopExcept)
if preserve_tos {
emit!(self, Instruction::Swap { index: 2 });
}
emit!(self, Instruction::PopTop); // exc
if preserve_tos {
emit!(self, Instruction::Swap { index: 2 });
}
emit!(self, Instruction::PopException); // prev_exc is restored
emit!(self, Instruction::PopExcept); // prev_exc is restored
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find PopExcept / PushExcInfo / RaiseVarargs implementations + stack comments
rg -n --hidden --no-messages "PopExcept|PushExcInfo|RaiseVarargs|ReraiseFromStack" crates | head -n 200

Repository: RustPython/RustPython

Length of output: 5787


🏁 Script executed:

#!/bin/bash
# Inspect Instruction definition in compiler-core bytecode
rg -n --hidden --no-messages "enum Instruction" crates/compiler-core/src/bytecode.rs | head -n 20

Repository: RustPython/RustPython

Length of output: 91


🏁 Script executed:

#!/bin/bash
# Look for bytecode.rs file to understand Instruction definition
find crates -name "bytecode.rs" -type f

Repository: RustPython/RustPython

Length of output: 101


🏁 Script executed:

#!/bin/bash
# Read bytecode.rs around the Instruction enum and stack effect docs
sed -n '1835,1910p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 3439


🏁 Script executed:

#!/bin/bash
# Also read the Instruction enum definition
sed -n '677,750p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1937


🏁 Script executed:

#!/bin/bash
# Check the PopExcept and PushExcInfo implementations in frame.rs
sed -n '1425,1470p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 2441


🏁 Script executed:

#!/bin/bash
# Check PushExcInfo implementation
sed -n '1500,1520p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 893


🏁 Script executed:

#!/bin/bash
# Read more context around the stack_effect function to understand the contract
sed -n '1830,1905p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 3463


🏁 Script executed:

#!/bin/bash
# Check if there are any test cases or comments explaining the expected behavior
rg -n "PopExcept" crates --context=3 -A 2 -B 2

Repository: RustPython/RustPython

Length of output: 7469


🏁 Script executed:

#!/bin/bash
# Look at the compile.rs code around all mentioned line ranges to check for consistency
sed -n '1014,1030p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 877


🏁 Script executed:

#!/bin/bash
# Check other PopExcept usages in compile.rs
sed -n '1058,1070p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 582


🏁 Script executed:

#!/bin/bash
# Look at lines 2382-2400 mentioned in the review to check expected stack state
sed -n '2375,2410p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 1437


🏁 Script executed:

#!/bin/bash
# Check if there are any recent comments or issues about PopExcept stack effect
rg -n "stack.effect|stack_effect" crates/compiler-core/src/bytecode.rs | grep -A 2 -B 2 "PopExcept"

Repository: RustPython/RustPython

Length of output: 47


PopExcept's documented stack effect is incorrect and must be fixed

The bytecode stack effect documentation states PopExcept => 0, (bytecode.rs:1897), but the VM implementation (frame.rs:1431) actually pops one value: let prev_exc = self.pop_value();. This is a -1 stack effect, not 0.

All usage sites in compile.rs correctly assume PopExcept has a -1 effect (e.g., lines 1016-1024 comment "prev_exc is restored"), and the comments match the actual behavior. However, the incorrect documented contract could cause issues with stack analysis tools or future verification passes that rely on the stack_effect function.

Fix the stack_effect documentation in bytecode.rs:1897 from PopExcept => 0, to PopExcept => -1, to match the actual VM implementation.

🤖 Prompt for AI Agents
In @crates/codegen/src/compile.rs around lines 1016 - 1025, The stack effect for
PopExcept is incorrectly documented as 0 but the VM's frame implementation
(where PopExcept does let prev_exc = self.pop_value()) consumes one value;
update the stack_effect mapping for the PopExcept opcode (the entry for
PopExcept in your stack_effect/stack_effects function or match table) from 0 to
-1 so the documented contract matches the VM behavior and downstream stack
analysis.

Comment on lines +1881 to 1885
emit!(self, Instruction::Call { nargs: 1 });
}
None => {
emit!(self, Instruction::CallFunctionPositional { nargs: 0 });
emit!(self, Instruction::Call { nargs: 0 });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the compile.rs file at the mentioned lines
head -c 1000000 crates/codegen/src/compile.rs | wc -l

Repository: RustPython/RustPython

Length of output: 69


🏁 Script executed:

# Look at the code around the specified lines
sed -n '1875,1890p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 762


🏁 Script executed:

# Find all occurrences of CallKw and Call instructions in the codebase
rg -n "CallKw|Instruction::Call\b" crates/codegen/src/compile.rs | head -50

Repository: RustPython/RustPython

Length of output: 1029


🏁 Script executed:

# Find the compile_call_inner function
rg -n "fn compile_call_inner" crates

Repository: RustPython/RustPython

Length of output: 126


🏁 Script executed:

# Search for CALL and CALL_KW instruction implementations in the VM
rg -n "CALL_KW|CALL\b" crates --type rust | grep -i "opcode\|instruction\|match" | head -40

Repository: RustPython/RustPython

Length of output: 158


🏁 Script executed:

# Read the compile_call_inner function (around line 6127)
sed -n '6120,6200p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 3137


🏁 Script executed:

# Look for CallKw enum variant definition
rg -n "CallKw" crates/codegen/src/compile.rs -B2 -A2

Repository: RustPython/RustPython

Length of output: 583


🏁 Script executed:

# Find instruction definitions - look for CallKw struct/variant
fd "bytecode.rs\|instruction" crates --type f | head -10

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Search for where Call/CallKw instructions are defined
rg -n "pub enum Instruction|struct Call|struct CallKw" crates --type rust | head -20

Repository: RustPython/RustPython

Length of output: 249


🏁 Script executed:

# Read the Instruction enum definition in bytecode.rs around line 677
sed -n '677,750p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1937


🏁 Script executed:

# Search for Call and CallKw variants in the enum
rg -n "Call\s*\{|CallKw\s*\{" crates/compiler-core/src/bytecode.rs -A2

Repository: RustPython/RustPython

Length of output: 1373


🏁 Script executed:

# Find VM implementation of CALL and CALL_KW instructions
rg -n "Instruction::Call\s*\{|Instruction::CallKw\s*\{" crates/vm/src --type rust -B2 -A5 | head -80

Repository: RustPython/RustPython

Length of output: 927


🏁 Script executed:

# Look for the execute function that handles instructions in the VM
fd "interpreter.rs\|execute.rs" crates/vm/src --type f

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Find collect_keyword_args and collect_positional_args implementations
rg -n "fn collect_keyword_args|fn collect_positional_args" crates/vm/src -A10

Repository: RustPython/RustPython

Length of output: 1214


🏁 Script executed:

# Look at the context around CallKw in frame.rs to understand the full call flow
sed -n '700,730p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 1505


🏁 Script executed:

# Search for stack effect calculations to understand the contract better
rg -n "CallKw.*stack|collect_keyword" crates --type rust | head -20

Repository: RustPython/RustPython

Length of output: 351


🏁 Script executed:

# Look at the context around line 3733-3735 where CallKw is emitted differently
sed -n '3720,3760p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 1713


🏁 Script executed:

# Find what function this is in
sed -n '3700,3750p' crates/codegen/src/compile.rs | head -60

Repository: RustPython/RustPython

Length of output: 2052


🏁 Script executed:

# Check if there are multiple compile functions handling different call contexts
rg -n "fn compile.*call" crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 304


🏁 Script executed:

# Look for the function containing the class definition code at line 3733
sed -n '3600,3650p' crates/codegen/src/compile.rs | head -60

Repository: RustPython/RustPython

Length of output: 1745


🏁 Script executed:

# Search for tests of generic class definitions to verify this code path executes
rg -n "generic.*class|TypeParam" crates --type rust | grep test | head -20

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Get the full function signature where line 3733 is located
sed -n '3600,3650p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 1745


🏁 Script executed:

# Look at the full context of the compile_class_def function, especially the CallKw usage
sed -n '3630,3780p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 5811


🏁 Script executed:

# Check if this is within an if statement checking for generics/type params
sed -n '3650,3770p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 4735


🏁 Script executed:

# Let's verify how the generic class CallKw differs from the normal call path
# by examining what should be on the stack
rg -n "emit_load_const.*Tuple|BuildTuple" crates/codegen/src/compile.rs | grep -A2 -B2 "keyword\|kw"

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Check the implementation of compile_keywords to see the correct pattern
rg -n "fn compile_keywords" crates/codegen/src/compile.rs -A20

Repository: RustPython/RustPython

Length of output: 1112


🏁 Script executed:

# Look at how non-generic class handles keywords via compile_call_inner
sed -n '6160,6175p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 663


🏁 Script executed:

# Verify the emit_load_const pattern used in the generic class case (line 3728)
sed -n '3720,3745p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 1017


🏁 Script executed:

# Let's trace through what stack state emerges from the generic class keyword code
# Look at the exact sequence of stack operations
sed -n '3721,3743p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 931


🏁 Script executed:

# Search for tests or examples of generic classes with keyword arguments
rg -n "class.*\[.*\].*:" tests --type py | head -10

Repository: RustPython/RustPython

Length of output: 110


🏁 Script executed:

# Check if there are any Python test files with generic classes
fd "test.*\.py" tests --type f | xargs grep -l "class.*\[.*\]" | head -5

Repository: RustPython/RustPython

Length of output: 157


🏁 Script executed:

# Find test directories
find . -type d -name "*test*" | head -10

Repository: RustPython/RustPython

Length of output: 136


🏁 Script executed:

# Look for Python files with generics
rg -n "TypeVar|class.*\[" --type py | head -20

Repository: RustPython/RustPython

Length of output: 1579


🏁 Script executed:

# Let's check if there are any tests that would exercise generic class definitions
rg -n "class.*Generic\|TypeParams" --type py | head -10

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Let's verify the stack state at the generic class CallKw point
# by checking if the pattern matches what compile_keywords produces
sed -n '6059,6085p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 1154


🏁 Script executed:

# Let's verify the exact stack layout difference
# For the generic class case, trace what's on stack before CallKw
# Line 3721: first load function
# Line 3724: load class name string
# Line 3727-3730: load bases
# Line 3732: load generic_base
# Then the keyword loop at 3734-3742

# Now let's check what compile_keywords produces vs what generic class produces
sed -n '6059,6085p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 1154


🏁 Script executed:

# Check the exact difference: does compile_keywords build a map or tuple?
sed -n '6079,6088p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 304


🏁 Script executed:

# Look at CallMethodKeyword to see if there's another CallKw pattern
rg -n "CallMethodKeyword" crates/codegen/src/compile.rs -B3 -A3

Repository: RustPython/RustPython

Length of output: 446


🏁 Script executed:

# Check the stack effect difference between the two patterns
# Look at line 6165 in compile_call_inner
sed -n '6163,6167p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 258


🏁 Script executed:

# Let me verify the critical difference in stack patterns
# Generic class (line 3734-3742): for each keyword: emit STRING, then compile_expression for value
# compile_call_inner (line 6161-6167): builds a TUPLE of all strings, then compiles expressions

# Let's confirm this difference by looking at what's built
echo "=== Generic class pattern ===" 
sed -n '3734,3742p' crates/codegen/src/compile.rs

echo ""
echo "=== compile_call_inner pattern ==="
sed -n '6161,6167p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 744


🏁 Script executed:

# Verify if compile_keywords (used by compile_call_inner) creates a tuple
# and compile_keywords is called from where
rg -n "compile_keywords" crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 215


🏁 Script executed:

# The key question: does the generic class CallKw ever get executed?
# Look for any tests or examples that would use generic classes with kwargs
rg -n "class.*\[.*\].*:" extra_tests --type py | head -20

Repository: RustPython/RustPython

Length of output: 176


🏁 Script executed:

# Let me trace the exact stack layout for the generic class CallKw case
# At line 3734-3742, what's the exact sequence?

echo "=== Full generic class keyword handling (lines 3721-3743) ==="
sed -n '3721,3743p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 993


🏁 Script executed:

# Now compare with the correct compile_call_inner pattern (lines 6161-6168)
echo "=== Correct compile_call_inner CallKw pattern (lines 6161-6168) ==="
sed -n '6161,6168p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 452


🏁 Script executed:

# Let's verify the critical difference:
# In generic class: individual STRING constants are emitted
# In compile_call_inner: a TUPLE of strings is built and emitted as a single constant

# Check the exact bytecode produced
echo "=== What does emit_load_const produce for each? ==="
rg -n "fn emit_load_const" crates/codegen/src/compile.rs -A5

Repository: RustPython/RustPython

Length of output: 387


🏁 Script executed:

# Let me understand the exact stack layout difference
# For generic class, we need to see the full picture

echo "=== Stack analysis ==="
echo "Generic class with kwargs: emits individual STRINGS in loop, then CallKw"
echo "compile_call_inner with kwargs: builds TUPLE of strings, emits LoadConst with tuple, then CallKw"
echo ""

# Let's look at what the VM actually does with CallKw
sed -n '2072,2090p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 986


🏁 Script executed:

# The key insight: generic class emits strings individually in a loop
# But collect_keyword_args pops a TUPLE first!
# This is a stack mismatch!

# Let's verify by checking if generic class CallKw is even reachable
# Search for any test that would hit the is_generic && !arguments.keywords.is_empty() path

rg -n "TypeParam\|generic" extra_tests --type py | head -20

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Search for Python 3.12+ generic class tests
fd "*.py" extra_tests | xargs grep -l "class.*\[.*\]:" 2>/dev/null | head -10

Repository: RustPython/RustPython

Length of output: 348


🏁 Script executed:

# Final confirmation: check if there's any comment or note about this being a known issue
rg -n "TODO\|FIXME\|BUG\|HACK" crates/codegen/src/compile.rs | grep -A2 -B2 "373[0-5]\|CallKw"

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Verify the nargs calculation in the generic class case one more time
echo "=== Generic class nargs calculation ==="
sed -n '3739,3746p' crates/codegen/src/compile.rs

echo ""
echo "=== compile_call_inner nargs calculation ==="
sed -n '6170,6172p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 452


🏁 Script executed:

# Let me get the exact nargs value at line 3733-3735
sed -n '3720,3747p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 1130


🏁 Script executed:

# Understand what nargs contains before the keyword handling
sed -n '3718,3735p' crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 774


Critical: CallKw nargs ABI violation and stack layout mismatch in generic class handler

The generic class keyword argument handling at lines 3721–3747 contains two critical bugs that contradict the compile_call_inner pattern (lines 6161–6168):

  1. Stack Layout Mismatch: Generic class emits individual STRING constants in a loop for each keyword name, resulting in stack layout [string_1, value_1, string_2, value_2, ...]. The VM's collect_keyword_args (frame.rs:2072–2090) expects a single PyTuple at the top of the stack, and will panic when attempting to downcast the final keyword value (not a tuple) to PyTuple.

  2. nargs ABI Violation: Generic class calculates nargs: nargs + arguments.keywords.len(), including keyword count in nargs. This contradicts the VM contract where nargs represents positional argument count only. The correct pattern in compile_call_inner emits a tuple constant for keyword names and passes only positional count as nargs.

The non-generic class path correctly uses compile_call_inner (line 3777), which follows the proper ABI. Fix the generic class code to build a tuple of keyword name strings (via ConstantData::Tuple) and pass only the positional argument count as nargs, matching the compile_call_inner implementation.

🤖 Prompt for AI Agents
In @crates/codegen/src/compile.rs around lines 1881 - 1885, The generic-class
keyword handling is wrong: instead of emitting individual STRING constants for
each kw name and increasing nargs with keywords, build a single tuple constant
of keyword name strings (use ConstantData::Tuple) and emit that one tuple onto
the stack, and emit the call with nargs equal to the number of positional
arguments only (do not add keywords to nargs); mirror the pattern in
compile_call_inner so the VM's collect_keyword_args receives a PyTuple at the
top of the stack and the Call/CallKw ABI is satisfied.

Comment on lines +679 to 958
Cache = 0, // Placeholder
BeforeAsyncWith = 1,
BeforeWith = 2,
BinaryOpInplaceAddUnicode = 3, // Placeholder
BinarySlice = 4, // Placeholder
BinarySubscr = 5,
CheckEgMatch = 6,
CheckExcMatch = 7,
CleanupThrow = 8,
DeleteSubscr = 9,
EndAsyncFor = 10,
EndFor = 11, // Placeholder
EndSend = 12,
ExitInitCheck = 13, // Placeholder
FormatSimple = 14,
FormatWithSpec = 15,
GetAIter = 16,
Reserved = 17,
GetANext = 18,
GetIter = 19,
GetLen = 20,
GetYieldFromIter = 21,
InterpreterExit = 22, // Placeholder
LoadAssertionError = 23, // Placeholder
LoadBuildClass = 24,
LoadLocals = 25, // Placeholder
MakeFunction = 26,
MatchKeys = 27,
MatchMapping = 28,
MatchSequence = 29,
Nop = 30,
PopExcept = 31,
PopTop = 32,
PushExcInfo = 33,
PushNull = 34, // Placeholder
ReturnGenerator = 35, // Placeholder
ReturnValue = 36,
SetupAnnotations = 37,
StoreSlice = 38, // Placeholder
StoreSubscr = 39,
ToBool = 40,
UnaryInvert = 41,
UnaryNegative = 42,
UnaryNot = 43,
WithExceptStart = 44,
// ==================== With-argument instructions (opcode > 44) ====================
BinaryOp {
op: Arg<BinaryOperator>,
},
// 46: BUILD_CONST_KEY_MAP - not implemented, placeholder
} = 45,
BuildConstKeyMap {
size: Arg<u32>,
},
// 47: BUILD_LIST
} = 46, // Placeholder
BuildList {
size: Arg<u32>,
},
// 48: BUILD_MAP
} = 47,
BuildMap {
size: Arg<u32>,
},
// 49: BUILD_SET
} = 48,
BuildSet {
size: Arg<u32>,
},
// 50: BUILD_SLICE
} = 49,
BuildSlice {
argc: Arg<BuildSliceArgCount>,
},
// 51: BUILD_STRING
} = 50,
BuildString {
size: Arg<u32>,
},
// 52: BUILD_TUPLE
} = 51,
BuildTuple {
size: Arg<u32>,
},
// 53: CALL
CallFunctionPositional {
} = 52,
Call {
nargs: Arg<u32>,
},
// 54: CALL_FUNCTION_EX
} = 53,
CallFunctionEx {
has_kwargs: Arg<bool>,
},
// 55: CALL_INTRINSIC_1
} = 54,
CallIntrinsic1 {
func: Arg<IntrinsicFunction1>,
},
// 56: CALL_INTRINSIC_2
} = 55,
CallIntrinsic2 {
func: Arg<IntrinsicFunction2>,
},
// 57: CALL_KW
CallFunctionKeyword {
} = 56,
CallKw {
nargs: Arg<u32>,
},
// 58: COMPARE_OP
CompareOperation {
} = 57,
CompareOp {
op: Arg<ComparisonOperator>,
},
// 59: CONTAINS_OP
ContainsOp(Arg<Invert>),
// 60: CONVERT_VALUE
} = 58,
ContainsOp(Arg<Invert>) = 59,
ConvertValue {
oparg: Arg<ConvertValueOparg>,
},
// 61: COPY
} = 60,
CopyItem {
index: Arg<u32>,
},
// 62: COPY_FREE_VARS - not implemented, placeholder
} = 61,
CopyFreeVars {
count: Arg<u32>,
},
// 63: DELETE_ATTR
} = 62, // Placeholder
DeleteAttr {
idx: Arg<NameIdx>,
},
// 64: DELETE_DEREF
DeleteDeref(Arg<NameIdx>),
// 65: DELETE_FAST
DeleteFast(Arg<NameIdx>),
// 66: DELETE_GLOBAL
DeleteGlobal(Arg<NameIdx>),
// 67: DELETE_NAME
DeleteLocal(Arg<NameIdx>),
// 68: DICT_MERGE - not implemented, placeholder
} = 63,
DeleteDeref(Arg<NameIdx>) = 64,
DeleteFast(Arg<NameIdx>) = 65,
DeleteGlobal(Arg<NameIdx>) = 66,
DeleteName(Arg<NameIdx>) = 67,
DictMerge {
index: Arg<u32>,
},
// 69: DICT_UPDATE
} = 68, // Placeholder
DictUpdate {
index: Arg<u32>,
},
// 70: ENTER_EXECUTOR - not implemented, placeholder
EnterExecutor {
index: Arg<u32>,
},
// 71: EXTENDED_ARG
ExtendedArg,
// 72: FOR_ITER
} = 69,
EnterExecutor = 70, // Placeholder
ExtendedArg = 71,
ForIter {
target: Arg<Label>,
},
// 73: GET_AWAITABLE
GetAwaitable,
// 74: IMPORT_FROM
} = 72,
GetAwaitable = 73, // TODO: Make this instruction to hold an oparg
ImportFrom {
idx: Arg<NameIdx>,
},
// 75: IMPORT_NAME
} = 74,
ImportName {
idx: Arg<NameIdx>,
},
// 76: IS_OP
IsOp(Arg<Invert>),
// 77: JUMP_BACKWARD - not implemented, placeholder
} = 75,
IsOp(Arg<Invert>) = 76,
JumpBackward {
target: Arg<Label>,
},
// 78: JUMP_BACKWARD_NO_INTERRUPT - not implemented, placeholder
} = 77, // Placeholder
JumpBackwardNoInterrupt {
target: Arg<Label>,
},
// 79: JUMP_FORWARD - not implemented, placeholder
} = 78, // Placeholder
JumpForward {
target: Arg<Label>,
},
// 80: LIST_APPEND
} = 79, // Placeholder
ListAppend {
i: Arg<u32>,
},
// 81: LIST_EXTEND - not implemented, placeholder
} = 80,
ListExtend {
i: Arg<u32>,
},
// 82: LOAD_ATTR
} = 81, // Placeholder
LoadAttr {
idx: Arg<NameIdx>,
},
// 83: LOAD_CONST
} = 82,
LoadConst {
idx: Arg<u32>,
},
// 84: LOAD_DEREF
LoadDeref(Arg<NameIdx>),
// 85: LOAD_FAST
LoadFast(Arg<NameIdx>),
// 86: LOAD_FAST_AND_CLEAR
LoadFastAndClear(Arg<NameIdx>),
// 87: LOAD_FAST_CHECK - not implemented, placeholder
LoadFastCheck(Arg<NameIdx>),
// 88: LOAD_FAST_LOAD_FAST - not implemented, placeholder
} = 83,
LoadDeref(Arg<NameIdx>) = 84,
LoadFast(Arg<NameIdx>) = 85,
LoadFastAndClear(Arg<NameIdx>) = 86,
LoadFastCheck(Arg<NameIdx>) = 87, // Placeholder
LoadFastLoadFast {
arg: Arg<u32>,
},
// 89: LOAD_FROM_DICT_OR_DEREF - not implemented, placeholder
LoadFromDictOrDeref(Arg<NameIdx>),
// 90: LOAD_FROM_DICT_OR_GLOBALS - not implemented, placeholder
LoadFromDictOrGlobals(Arg<NameIdx>),
// 91: LOAD_GLOBAL
LoadGlobal(Arg<NameIdx>),
// 92: LOAD_NAME
LoadNameAny(Arg<NameIdx>),
// 93: LOAD_SUPER_ATTR - not implemented, placeholder
} = 88, // Placeholder
LoadFromDictOrDeref(Arg<NameIdx>) = 89, // Placeholder
LoadFromDictOrGlobals(Arg<NameIdx>) = 90, // Placeholder
LoadGlobal(Arg<NameIdx>) = 91,
LoadName(Arg<NameIdx>) = 92,
LoadSuperAttr {
arg: Arg<u32>,
},
// 94: MAKE_CELL - not implemented, placeholder
MakeCell(Arg<NameIdx>),
// 95: MAP_ADD
} = 93, // Placeholder
MakeCell(Arg<NameIdx>) = 94, // Placeholder
MapAdd {
i: Arg<u32>,
},
// 96: MATCH_CLASS
MatchClass(Arg<u32>),
// 97: POP_JUMP_IF_FALSE
} = 95,
MatchClass(Arg<u32>) = 96,
PopJumpIfFalse {
target: Arg<Label>,
},
// 98: POP_JUMP_IF_NONE - not implemented, placeholder
} = 97,
PopJumpIfNone {
target: Arg<Label>,
},
// 99: POP_JUMP_IF_NOT_NONE - not implemented, placeholder
} = 98, // Placeholder
PopJumpIfNotNone {
target: Arg<Label>,
},
// 100: POP_JUMP_IF_TRUE
} = 99, // Placeholder
PopJumpIfTrue {
target: Arg<Label>,
},
// 101: RAISE_VARARGS
Raise {
} = 100,
RaiseVarargs {
kind: Arg<RaiseKind>,
},
// 102: RERAISE
} = 101,
Reraise {
depth: Arg<u32>,
},
// 103: RETURN_CONST
} = 102,
ReturnConst {
idx: Arg<u32>,
},
// 104: SEND
} = 103,
Send {
target: Arg<Label>,
},
// 105: SET_ADD
} = 104,
SetAdd {
i: Arg<u32>,
},
// 106: SET_FUNCTION_ATTRIBUTE
} = 105,
SetFunctionAttribute {
attr: Arg<MakeFunctionFlags>,
},
// 107: SET_UPDATE - not implemented, placeholder
} = 106,
SetUpdate {
i: Arg<u32>,
},
// 108: STORE_ATTR
} = 107, // Placeholder
StoreAttr {
idx: Arg<NameIdx>,
},
// 109: STORE_DEREF
StoreDeref(Arg<NameIdx>),
// 110: STORE_FAST
StoreFast(Arg<NameIdx>),
// 111: STORE_FAST_LOAD_FAST
} = 108,
StoreDeref(Arg<NameIdx>) = 109,
StoreFast(Arg<NameIdx>) = 110,
StoreFastLoadFast {
store_idx: Arg<NameIdx>,
load_idx: Arg<NameIdx>,
},
// 112: STORE_FAST_STORE_FAST - not implemented, placeholder
} = 111,
StoreFastStoreFast {
arg: Arg<u32>,
},
// 113: STORE_GLOBAL
StoreGlobal(Arg<NameIdx>),
// 114: STORE_NAME
StoreLocal(Arg<NameIdx>),
// 115: SWAP
} = 112, // Placeholder
StoreGlobal(Arg<NameIdx>) = 113,
StoreName(Arg<NameIdx>) = 114,
Swap {
index: Arg<u32>,
},
// 116: UNPACK_EX
} = 115,
UnpackEx {
args: Arg<UnpackExArgs>,
},
// 117: UNPACK_SEQUENCE
} = 116,
UnpackSequence {
size: Arg<u32>,
},
// 118: YIELD_VALUE
} = 117,
YieldValue {
arg: Arg<u32>,
},
// ==================== RustPython-only instructions (119+) ====================
// 119: BREAK
} = 118,
Resume {
arg: Arg<u32>,
} = 149,
// ==================== RustPython-only instructions (119-135) ====================
// Ideally, we want to be fully aligned with CPython opcodes, but we still have some leftovers.
// So we assign random IDs to these opcodes.
Break {
target: Arg<Label>,
},
// 120: BUILD_LIST_FROM_TUPLES
} = 119,
BuildListFromTuples {
size: Arg<u32>,
},
// 121: BUILD_MAP_FOR_CALL
} = 120,
BuildMapForCall {
size: Arg<u32>,
},
// 122: BUILD_SET_FROM_TUPLES
} = 121,
BuildSetFromTuples {
size: Arg<u32>,
},
// 123: BUILD_TUPLE_FROM_ITER
BuildTupleFromIter,
// 124: BUILD_TUPLE_FROM_TUPLES
} = 122,
BuildTupleFromIter = 123,
BuildTupleFromTuples {
size: Arg<u32>,
},
// 125: CALL_METHOD
} = 124,
CallMethodPositional {
nargs: Arg<u32>,
},
// 126: CALL_METHOD_KW
} = 125,
CallMethodKeyword {
nargs: Arg<u32>,
},
// 127: CALL_METHOD_EX
} = 126,
CallMethodEx {
has_kwargs: Arg<bool>,
},
// 128: CONTINUE
} = 127,
Continue {
target: Arg<Label>,
},
// 129: JUMP (CPython uses pseudo-op 256)
Jump {
target: Arg<Label>,
},
// 130: JUMP_IF_FALSE_OR_POP
} = 128,
JumpIfFalseOrPop {
target: Arg<Label>,
},
// 131: JUMP_IF_TRUE_OR_POP
} = 129,
JumpIfTrueOrPop {
target: Arg<Label>,
},
// 132: JUMP_IF_NOT_EXC_MATCH
JumpIfNotExcMatch(Arg<Label>),
// 133: LOAD_CLASSDEREF
LoadClassDeref(Arg<NameIdx>),
// 134: LOAD_CLOSURE (CPython uses pseudo-op 258)
LoadClosure(Arg<NameIdx>),
// 135: LOAD_METHOD (CPython uses pseudo-op 259)
LoadMethod {
idx: Arg<NameIdx>,
},
// 136: POP_BLOCK (CPython uses pseudo-op 263)
PopBlock,
// 137: REVERSE
} = 130,
JumpIfNotExcMatch(Arg<Label>) = 131,
LoadClassDeref(Arg<NameIdx>) = 132,
Reverse {
amount: Arg<u32>,
},
// 138: SET_EXC_INFO
SetExcInfo,
// 139: SUBSCRIPT
Subscript,
// 140-148: Reserved (padding to keep RESUME at 149)
Reserved140,
Reserved141,
Reserved142,
Reserved143,
Reserved144,
Reserved145,
Reserved146,
Reserved147,
Reserved148,
// 149: RESUME
Resume {
arg: Arg<u32>,
},
// If you add a new instruction here, be sure to keep LAST_INSTRUCTION updated
} = 133,
SetExcInfo = 134,
Subscript = 135,
// ===== Pseudo Opcodes (252+) ======
Jump {
target: Arg<Label>,
} = 252, // CPython uses pseudo-op 256
LoadClosure(Arg<NameIdx>) = 253, // CPython uses pseudo-op 258
LoadMethod {
idx: Arg<NameIdx>,
} = 254, // CPython uses pseudo-op 259
PopBlock = 255, // CPython uses pseudo-op 263
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check placeholder opcode handling in VM and JIT

echo "=== Checking VM frame dispatch for placeholder handling ==="
rg -n -C3 "BinarySlice|EndFor|LoadLocals|PushNull" crates/vm/src/frame.rs

echo "=== Checking JIT instruction handling ==="
rg -n -C3 "BinarySlice|EndFor|LoadLocals|PushNull" crates/jit/src/instructions.rs

echo "=== Checking compiler emission ==="
rg -n -C3 "BinarySlice|EndFor|LoadLocals|PushNull" crates/codegen/src/compile.rs

Repository: RustPython/RustPython

Length of output: 201


🏁 Script executed:

# Also check if there's a TryFrom or match statement that handles these opcodes
echo "=== Checking for opcode dispatch mechanisms ==="
rg -n "impl.*TryFrom.*Instruction\|match.*opcode\|Instruction::" crates/compiler-core/src/bytecode.rs | head -20

Repository: RustPython/RustPython

Length of output: 112


🏁 Script executed:

# Look for panic or unreachable patterns on instruction handling
echo "=== Checking for error handling on unsupported opcodes ==="
rg -n -C2 "panic\!|unreachable\!|unimplemented\!" crates/vm/src/frame.rs | grep -A2 -B2 "opcode\|instruction"

Repository: RustPython/RustPython

Length of output: 249


🏁 Script executed:

# Find the actual structure
echo "=== Repository structure ==="
git ls-files | grep -E "(vm|jit|interpreter)" | head -20

echo "=== Looking for instruction-related files ==="
git ls-files | grep -E "instruction|opcode|dispatch|frame" | head -20

echo "=== Check if crates/vm/src/frame.rs exists ==="
ls -la crates/vm/src/ 2>/dev/null || echo "crates/vm/src not found"

echo "=== List all crates ==="
ls -la crates/ | grep "^d"

Repository: RustPython/RustPython

Length of output: 5171


🏁 Script executed:

# Search for instruction enum usage
echo "=== Finding where Instruction enum is used ==="
rg -l "Instruction::" | head -10

echo "=== Search for match statements on Instruction ==="
rg -n "match.*instruction\|match.*opcode" -i | head -20

Repository: RustPython/RustPython

Length of output: 363


🏁 Script executed:

# Check the bytecode.rs file itself for usage patterns
echo "=== Check bytecode.rs for impl blocks ==="
rg -n "^impl" crates/compiler-core/src/bytecode.rs

echo "=== Look for execute or dispatch methods ==="
rg -n "fn.*execute\|fn.*dispatch" crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 2145


🏁 Script executed:

# Check the TryFrom implementation for Instruction
echo "=== TryFrom<u8> for Instruction (around line 970) ==="
sed -n '970,1010p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1293


🏁 Script executed:

# Check the Instruction impl block at line 1711
echo "=== Instruction impl block (around line 1711) ==="
sed -n '1711,1750p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1360


🏁 Script executed:

# Look for instruction dispatch in frame.rs
echo "=== Looking for instruction dispatch in frame.rs ==="
rg -n "Instruction::" crates/vm/src/frame.rs | head -30

Repository: RustPython/RustPython

Length of output: 2292


🏁 Script executed:

# Check what happens with unsupported instructions
echo "=== Looking for dispatch patterns in frame.rs ==="
sed -n '1,100p' crates/vm/src/frame.rs | head -50

Repository: RustPython/RustPython

Length of output: 1863


🏁 Script executed:

# Find the execute_instruction method or similar
echo "=== Find main execution loop ==="
rg -n "fn.*execute\|fn.*dispatch\|match.*instruction" crates/vm/src/frame.rs | head -15

Repository: RustPython/RustPython

Length of output: 97


🏁 Script executed:

# Look for the large match statement that handles all instructions
echo "=== Search for instruction execution pattern ==="
rg -n "bytecode::Instruction::" crates/vm/src/frame.rs | wc -l

# Find where the instruction execution happens
echo "=== Looking for instruction match arms ==="
sed -n '550,800p' crates/vm/src/frame.rs | grep -n "Instruction::"

Repository: RustPython/RustPython

Length of output: 2474


🏁 Script executed:

# Get a larger view to see the dispatch pattern
echo "=== Finding the execute instruction method ==="
rg -n "fn execute\|fn apply\|match.*instruction" crates/vm/src/frame.rs -A5 | head -40

Repository: RustPython/RustPython

Length of output: 111


🏁 Script executed:

# Search for all variants mentioned in the review as placeholders
echo "=== Check if placeholder instructions are handled ==="
rg -n "BinaryOpInplaceAddUnicode|BinarySlice|EndFor|LoadLocals|PushNull|BuildConstKeyMap" crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 118


🏁 Script executed:

# See the complete instruction match statement structure
wc -l crates/vm/src/frame.rs
sed -n '560,900p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 14993


🏁 Script executed:

# Let's see if the match statement has a catch-all or default case
echo "=== Looking for match statement structure and completeness ==="
sed -n '560,2700p' crates/vm/src/frame.rs | tail -200

Repository: RustPython/RustPython

Length of output: 8160


🏁 Script executed:

# Search for the end of the match statement
echo "=== Find where the match statement ends ==="
rg -n "^        }" crates/vm/src/frame.rs | tail -5

Repository: RustPython/RustPython

Length of output: 185


🏁 Script executed:

# Check if there's a wildcard or default pattern in the match
echo "=== Check for default/wildcard patterns in match ==="
rg -n "_ =>" crates/vm/src/frame.rs | grep -A2 -B2 "instruction\|opcode"

Repository: RustPython/RustPython

Length of output: 117


🏁 Script executed:

# Get context around line 2651 to see if there's a wildcard pattern
echo "=== Context around match closing brace ==="
sed -n '2600,2660p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 2614


🏁 Script executed:

# Count all instruction variants in the enum to see if all are handled
echo "=== Count Instruction enum variants ==="
grep -E "^\s+\w+\s*(\{|=)" crates/compiler-core/src/bytecode.rs | grep -v "impl\|^--" | wc -l

echo "=== Count instruction match arms in frame.rs ==="
rg "bytecode::Instruction::" crates/vm/src/frame.rs | wc -l

Repository: RustPython/RustPython

Length of output: 161


🏁 Script executed:

# Look for any unreachable or panic patterns related to instructions
echo "=== Search for instruction validation ==="
rg -n "is_valid\|panic\|unreachable\|match.*instruction" crates/compiler-core/src/bytecode.rs | head -20

Repository: RustPython/RustPython

Length of output: 106


🏁 Script executed:

# Check if the match is marked as exhaustive
echo "=== Check the actual match structure ==="
sed -n '585,595p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 502


🏁 Script executed:

# This is critical - 437 variants but only 117 match arms
# Let's find the complete match statement in frame.rs
echo "=== Find the main instruction match statement ==="
sed -n '585,2500p' crates/vm/src/frame.rs | grep -n "^        }" | head -1

Repository: RustPython/RustPython

Length of output: 129


🏁 Script executed:

# Let's see what's around line 585 more broadly to see if the match closes
echo "=== Checking match statement closure ==="
sed -n '1200,1250p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 2885


🏁 Script executed:

# Search for where the big instruction match ends
echo "=== Finding match arms count more accurately ==="
rg -n "bytecode::Instruction::" crates/vm/src/frame.rs | tail -20

Repository: RustPython/RustPython

Length of output: 1398


🏁 Script executed:

# Check if there's any wildcard or default handling
echo "=== Check the last match arms ==="
sed -n '1100,1200p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 4826


🏁 Script executed:

# Look for any code that might catch unhandled instructions
echo "=== Search for error handling ==="
rg -n "unreachable\|panic\|todo\|unimplemented" crates/vm/src/frame.rs | head -20

Repository: RustPython/RustPython

Length of output: 98


🏁 Script executed:

# Let's get the actual size of the match statement by looking more carefully
echo "=== Get the full match statement in execute_instruction ==="
sed -n '540,1800p' crates/vm/src/frame.rs | tail -50

Repository: RustPython/RustPython

Length of output: 1974


🏁 Script executed:

# Look for where the match statement actually ends
echo "=== Search for the end of the instruction match ==="
rg -n "^        \}" crates/vm/src/frame.rs | head -10

Repository: RustPython/RustPython

Length of output: 260


🏁 Script executed:

# Check if there's a wildcard pattern at the end
echo "=== Last 20 lines of instruction match arms ==="
sed -n '1770,1800p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 1252


🏁 Script executed:

# Let's verify by finding all unhandled instruction variants
echo "=== All Instruction variants ==="
sed -n '679,958p' crates/compiler-core/src/bytecode.rs | grep -E "^\s+\w+(\s|{)" | head -60

Repository: RustPython/RustPython

Length of output: 138


🏁 Script executed:

# Check if compiler validation prevents emitting placeholder opcodes
echo "=== Search for placeholder validation in codegen ==="
rg -n "Placeholder\|BinarySlice\|EndFor" crates/codegen/src/compile.rs | head -20

Repository: RustPython/RustPython

Length of output: 117


🏁 Script executed:

# Check all Instruction variants more carefully
echo "=== List instruction variants from bytecode.rs ==="
sed -n '679,958p' crates/compiler-core/src/bytecode.rs | grep "=" | grep -v "impl\|//" | head -70

Repository: RustPython/RustPython

Length of output: 1544


🏁 Script executed:

# Check codegen to see if it emits placeholder opcodes
echo "=== Check if codegen emits these opcodes ==="
rg -n "emit\|Instruction::" crates/codegen/src/compile.rs | grep -E "Binary|EndFor|LoadLocals|PushNull" | head -20

Repository: RustPython/RustPython

Length of output: 109


🏁 Script executed:

# Check if there's validation before bytecode generation
echo "=== Look for any bytecode validation ==="
rg -n "validate\|check.*instruction\|verify" crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 105


🏁 Script executed:

# Verify if the TryFrom implementation actually accepts placeholder discriminants
echo "=== Re-check the TryFrom ranges ==="
sed -n '970,1010p' crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 1275


🏁 Script executed:

# Check what the ranges actually are - look at the enum values
echo "=== Extract discriminant values for placeholders ==="
sed -n '679,958p' crates/compiler-core/src/bytecode.rs | grep -E "BinarySlice|EndFor|LoadLocals|PushNull|BinaryOpInplaceAddUnicode"

Repository: RustPython/RustPython

Length of output: 326


🏁 Script executed:

# Confirm the actual enum discriminant values
echo "=== Get exact discriminant values from enum ==="
sed -n '679,958p' crates/compiler-core/src/bytecode.rs | grep -E "= [0-9]+" | head -80

Repository: RustPython/RustPython

Length of output: 2024


🏁 Script executed:

# Check if YieldValue is indeed 118
echo "=== Find YieldValue discriminant ==="
rg -n "YieldValue.*=" crates/compiler-core/src/bytecode.rs

Repository: RustPython/RustPython

Length of output: 203


🏁 Script executed:

# Find YieldValue in the enum definition
echo "=== Find YieldValue enum variant ==="
sed -n '679,958p' crates/compiler-core/src/bytecode.rs | grep -n "YieldValue"

Repository: RustPython/RustPython

Length of output: 122


🏁 Script executed:

# Get context around line 118 to see what's there
echo "=== Get all discriminants with their values ==="
sed -n '679,960p' crates/compiler-core/src/bytecode.rs | tail -100

Repository: RustPython/RustPython

Length of output: 2580


VM dispatch for placeholder opcodes relies on panic rather than graceful error handling.

Placeholder instruction variants (e.g., BinaryOpInplaceAddUnicode, BinarySlice, EndFor, LoadLocals, PushNull) are defined in the enum and pass TryFrom<u8> validation since their discriminants fall within accepted ranges (0–118, 149, 119–135, 252–255). However, the VM's instruction dispatch ends with a catch-all pattern that calls unreachable!(), which will panic if any unhandled instruction reaches execution.

While this prevents silent failures, it's fragile: if codegen or bytecode loading ever accidentally emits a placeholder opcode, the VM will crash instead of failing gracefully. Consider:

  1. Adding explicit error handling or a warning for placeholders in the catch-all pattern.
  2. Validating at compile time that codegen never emits these opcodes.
  3. Documenting why these opcodes exist in the enum if they're never used.
🤖 Prompt for AI Agents
In @crates/compiler-core/src/bytecode.rs around lines 679 - 958, The VM
currently ends bytecode dispatch with a catch-all that calls unreachable!(), so
placeholder enum variants in Bytecode (e.g., BinaryOpInplaceAddUnicode,
BinarySlice, EndFor, LoadLocals, PushNull) can cause a panic if accidentally
executed; change the dispatch default arm to return a descriptive runtime error
(or Err) indicating an invalid/placeholder opcode rather than panicking, and add
an explicit match arm or helper check (e.g., is_placeholder_opcode) that logs or
maps placeholder discriminants to that error; additionally, add a validation
step in bytecode loading or TryFrom<u8> conversion for Bytecode to reject
placeholder discriminants early (or emit warnings) and document the retained
placeholder variants in the Bytecode enum comment to clarify they must never be
emitted by codegen.

Comment on lines +1782 to 1784
_ => {
unreachable!("{instruction:?} instruction should not be executed")
}
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's see the context around lines 1782-1784 in frame.rs
head -1790 crates/vm/src/frame.rs | tail -20

Repository: RustPython/RustPython

Length of output: 732


🏁 Script executed:

# Find the Instruction enum definition
fd -t f "bytecode.rs" | head -5

Repository: RustPython/RustPython

Length of output: 144


🏁 Script executed:

# Let's also check if there's an instruction definition in frame.rs itself
rg "enum Instruction" crates/vm/src/ -A 20

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Find and read the Instruction enum definition
rg "pub enum Instruction" crates/compiler-core/src/bytecode.rs -A 100

Repository: RustPython/RustPython

Length of output: 2516


🏁 Script executed:

# Get more context around lines 1782-1784 to see the full match statement
sed -n '1700,1790p' crates/vm/src/frame.rs | head -100

Repository: RustPython/RustPython

Length of output: 4048


🏁 Script executed:

# Find the beginning of the match statement that contains lines 1782-1784
# Start from line 1782 and go backwards to find the match statement
sed -n '1600,1782p' crates/vm/src/frame.rs | tail -200

Repository: RustPython/RustPython

Length of output: 8457


🏁 Script executed:

# Search for the function name that contains this code
sed -n '1650,1782p' crates/vm/src/frame.rs | grep -E "fn |match " | head -20

Repository: RustPython/RustPython

Length of output: 119


🏁 Script executed:

# Find the function definition containing this match statement
sed -n '1400,1700p' crates/vm/src/frame.rs | grep -B 50 "match.*instruction" | head -60

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Let's search for the execute or dispatch function that contains this match
rg "fn execute" crates/vm/src/frame.rs -A 5 | head -40

Repository: RustPython/RustPython

Length of output: 1542


🏁 Script executed:

# Find the beginning of execute_instruction function
rg "fn execute_instruction" crates/vm/src/frame.rs -A 500 | head -100

Repository: RustPython/RustPython

Length of output: 4067


🏁 Script executed:

# Count the number of instruction variants and see if the match is exhaustive
rg "bytecode::Instruction::" crates/vm/src/frame.rs | wc -l

Repository: RustPython/RustPython

Length of output: 68


🏁 Script executed:

# Check if there are placeholder instructions and how they're handled
rg "Placeholder" crates/compiler-core/src/bytecode.rs | head -20

Repository: RustPython/RustPython

Length of output: 788


🏁 Script executed:

# Look for comments around the catch-all unreachable to understand its intent
sed -n '1770,1790p' crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 786


🏁 Script executed:

# Get the full list of instruction variants from the enum
rg "^\s+([\w:]+)\s*=" crates/compiler-core/src/bytecode.rs | grep -E "pub enum Instruction" -A 200 | grep "=" | wc -l

Repository: RustPython/RustPython

Length of output: 66


🏁 Script executed:

# Check if Rust compiler would complain about non-exhaustive match
# Look for warnings about unhandled variants
rg "unreachable" crates/vm/src/frame.rs -B 5 -A 2

Repository: RustPython/RustPython

Length of output: 344


🏁 Script executed:

# Let's verify if the Instruction enum is marked as exhaustive
rg "enum Instruction" crates/compiler-core/src/bytecode.rs -B 2 -A 1

Repository: RustPython/RustPython

Length of output: 232


🏁 Script executed:

# Get the complete list of all instruction variants from the enum
rg "pub enum Instruction" crates/compiler-core/src/bytecode.rs -A 500 | grep -E "^\s+\w+.*=" | sed 's/.*Instruction:://' | sed 's/\s*=.*//' | sort > /tmp/enum_variants.txt

# Get all the instruction variants handled in execute_instruction
rg "bytecode::Instruction::" crates/vm/src/frame.rs | sed 's/.*bytecode::Instruction:://' | sed 's/[^a-zA-Z_].*//' | sort | uniq > /tmp/handled_variants.txt

# Compare the two
echo "=== Variants in enum but NOT handled (should be empty or only placeholders) ==="
comm -23 /tmp/enum_variants.txt /tmp/handled_variants.txt

echo ""
echo "=== Total enum variants ==="
wc -l /tmp/enum_variants.txt

echo ""
echo "=== Total handled variants ==="
wc -l /tmp/handled_variants.txt

Repository: RustPython/RustPython

Length of output: 2554


🏁 Script executed:

# Let me get a cleaner list of just the variant names from the enum
rg "pub enum Instruction" crates/compiler-core/src/bytecode.rs -A 500 | grep -E "^\s+[A-Z][a-zA-Z0-9]*(\s*{|$|\()" | awk '{print $1}' | sort | uniq > /tmp/enum_clean.txt

# Get handled variants more carefully
rg "Instruction::[A-Z]" crates/vm/src/frame.rs -o | sed 's/Instruction:://' | sort | uniq > /tmp/handled_clean.txt

echo "=== Unhandled instruction variants ==="
comm -23 /tmp/enum_clean.txt /tmp/handled_clean.txt

echo ""
echo "=== Count ==="
echo "Enum variants: $(wc -l < /tmp/enum_clean.txt)"
echo "Handled variants: $(wc -l < /tmp/handled_clean.txt)"

Repository: RustPython/RustPython

Length of output: 1752


🏁 Script executed:

# Let's also check if there are any placeholder instructions that don't have handlers
rg "// Placeholder" crates/compiler-core/src/bytecode.rs | grep -oE "[A-Z][a-zA-Z0-9]*.*Placeholder" | sed 's/ .*//' > /tmp/placeholders.txt

echo "=== Placeholder instructions ==="
cat /tmp/placeholders.txt

Repository: RustPython/RustPython

Length of output: 379


🏁 Script executed:

# Let me extract instruction handlers more carefully by looking at the actual match arms
rg "bytecode::Instruction::" crates/vm/src/frame.rs | head -100 | sed 's/.*Instruction:://' | sed 's/\s*=>.*//' | sed 's/\s*{.*//' | awk '{print $1}' | sort | uniq

Repository: RustPython/RustPython

Length of output: 1286


🏁 Script executed:

# Count how many match arms are in execute_instruction
rg "bytecode::Instruction::" crates/vm/src/frame.rs | wc -l

Repository: RustPython/RustPython

Length of output: 68


🏁 Script executed:

# Check if there are explicit handlers for the placeholder instructions
rg "Instruction::(Cache|BinaryOpInplaceAddUnicode|BinarySlice|EndFor|ExitInitCheck|InterpreterExit|LoadAssertionError|LoadLocals|PushNull|ReturnGenerator|StoreSlice)" crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Let's verify the total count of instruction variants more accurately
# by extracting just the variant names from the enum definition
sed -n '/pub enum Instruction/,/^}/p' crates/compiler-core/src/bytecode.rs | grep -E '^\s+[A-Z][a-zA-Z0-9]*\s*(\{|=|\()' | wc -l

Repository: RustPython/RustPython

Length of output: 68


🏁 Script executed:

# Check the structure of the enum more carefully
sed -n '/pub enum Instruction/,/^}/p' crates/compiler-core/src/bytecode.rs | tail -50

Repository: RustPython/RustPython

Length of output: 1280


🏁 Script executed:

# Let's verify if Rust's exhaustiveness checking would work here
# by checking if the enum has any special attributes
rg "#\[.*\]" crates/compiler-core/src/bytecode.rs | grep -B 5 "pub enum Instruction" | head -20

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Let's get the exact list of unhandled variants by comparing the handled list with enum
sed -n '/pub enum Instruction/,/^}/p' crates/compiler-core/src/bytecode.rs | grep -E '^\s+[A-Z]' | awk '{print $1}' | sort > /tmp/all_variants.txt

# Get all handled variants from execute_instruction
rg "bytecode::Instruction::" crates/vm/src/frame.rs | sed 's/.*Instruction:://' | sed 's/\s*=>.*//' | sed 's/\s*{.*//' | sed 's/\s*(.*//' | awk '{print $1}' | sort | uniq > /tmp/handled.txt

# Find variants that are not handled
echo "=== Variants NOT explicitly handled ==="
comm -23 /tmp/all_variants.txt /tmp/handled.txt

echo ""
echo "=== Are these all placeholders? ==="
# Check which of the unhandled are marked as placeholders
sed -n '/pub enum Instruction/,/^}/p' crates/compiler-core/src/bytecode.rs | grep -B 1 "Placeholder" | grep -E "^\s+[A-Z]" | awk '{print $1}' | sort > /tmp/placeholders_list.txt
cat /tmp/placeholders_list.txt

Repository: RustPython/RustPython

Length of output: 1504


🏁 Script executed:

# Let's verify that these "unhandled" variants are actually being rejected
# by checking if there's any handling outside of execute_instruction
rg "LoadFast|LoadDeref|StoreFast|StoreDeref|DeleteFast" crates/vm/src/frame.rs | head -20

Repository: RustPython/RustPython

Length of output: 544


🏁 Script executed:

# Check if there's pattern matching on the argument or other routing logic
# that might handle these variants separately
sed -n '1200,1400p' crates/vm/src/frame.rs | grep -E "match|fn " | head -30

Repository: RustPython/RustPython

Length of output: 2371


🏁 Script executed:

# Let's look at the actual match statement more carefully
# and search for specific instruction handlers
sed -n '1300,1500p' crates/vm/src/frame.rs | grep -A 3 "LoadFast\|StoreFast\|DeleteFast"

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

# Let's see if the unhandled variants are indeed in the match statement
# by checking the line count and structure more carefully
rg "bytecode::Instruction::" crates/vm/src/frame.rs | grep -c "=> {"

Repository: RustPython/RustPython

Length of output: 67


🏁 Script executed:

# Get a comprehensive list of what's actually being matched
sed -n '/fn execute_instruction/,/^    fn /p' crates/vm/src/frame.rs | grep "bytecode::Instruction::" | sed 's/.*Instruction:://' | sed 's/\s*=>.*//' | sed 's/\s*{.*//' | awk '{print $1}' | sort | uniq | wc -l

Repository: RustPython/RustPython

Length of output: 68


🏁 Script executed:

# Let's verify the exact list of handled variants with a more precise approach
sed -n '/fn execute_instruction/,/^    fn [a-z]/p' crates/vm/src/frame.rs | grep "bytecode::Instruction::" | sed 's/.*bytecode::Instruction:://' | sed 's/\s*[=({].*//' | sort | uniq > /tmp/actually_handled.txt

cat /tmp/actually_handled.txt | wc -l

echo ""
echo "=== Sample of handled variants ==="
head -20 /tmp/actually_handled.txt

Repository: RustPython/RustPython

Length of output: 366


🏁 Script executed:

# Check if these unhandled variants are truly all placeholders or only used variants
comm -23 /tmp/all_variants.txt /tmp/actually_handled.txt | while read var; do
  grep -q "$var.*Placeholder" <(sed -n '/pub enum Instruction/,/^}/p' crates/compiler-core/src/bytecode.rs) && echo "$var: PLACEHOLDER" || echo "$var: NOT PLACEHOLDER"
done

Repository: RustPython/RustPython

Length of output: 4028


🏁 Script executed:

# Let's verify if the key instructions like LoadFast are actually being handled
# by checking the actual match arms
rg "Instruction::LoadFast\(" crates/vm/src/frame.rs

Repository: RustPython/RustPython

Length of output: 118


🏁 Script executed:

# Let's count the total lines of the match statement to understand its scope
sed -n '/fn execute_instruction/,/^    fn /p' crates/vm/src/frame.rs | wc -l

Repository: RustPython/RustPython

Length of output: 69


🏁 Script executed:

# Let's check the end of execute_instruction to see the catch-all
sed -n '/fn execute_instruction/,/^    fn /p' crates/vm/src/frame.rs | tail -20

Repository: RustPython/RustPython

Length of output: 765


Add compile-time exhaustiveness checking for new instruction variants.

The catch-all pattern with unreachable!() disables Rust's exhaustiveness checking. If new Instruction variants are added to the enum without corresponding match arms here, they will panic at runtime rather than fail at compile time. Consider restructuring to leverage the compiler's exhaustiveness checking—for example, by removing the catch-all if all variants are already handled, or by using a #[non_exhaustive] attribute on the enum to make missing handlers a compile error.

🤖 Prompt for AI Agents
In @crates/vm/src/frame.rs around lines 1782 - 1784, The match currently uses a
catch-all arm with unreachable!() for the Instruction enum which defeats
compile-time exhaustiveness checking; replace the wildcard arm by explicitly
matching every Instruction variant used in this match so the compiler will error
if new variants are added (i.e., remove the `_ => unreachable!()` arm in the
match on `instruction` and add explicit arms for each `Instruction::<Variant>`
handled here), or if the enum is intended to be extensible annotate the enum
definition with `#[non_exhaustive]` and then handle the `_` case with a
documented conversion/error path; ensure you update the match in the
function/method in frame.rs where `instruction` is matched so future additions
to `Instruction` cause a compile-time failure instead of a runtime panic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaharNaveh uuuuh, this is totally right. I immediately broken build by forgetting change arm during merge. listing all is better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +21 to +29
@property
def cpython_name(self) -> str:
if self.rust_name == "CopyItem":
return "COPY"
name = self.rust_name
name = re.sub(r"(?<=[a-z0-9])([A-Z])", r"_\1", name)
return re.sub(r"(\D)(\d+)$", r"\1_\2", name).upper()
return re.sub(r"(?<=[a-z0-9])([A-Z])", r"_\1", self.rust_name).upper()
return re.sub(r"(?<!^)(?=[A-Z])", "_", self.rust_name).upper()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix unreachable code in cpython_name property.

Lines 28-29 are unreachable because line 27 always executes a return statement. This appears to be a copy-paste error during development.

🐛 Remove unreachable code
     @property
     def cpython_name(self) -> str:
         if self.rust_name == "CopyItem":
             return "COPY"
         name = self.rust_name
         name = re.sub(r"(?<=[a-z0-9])([A-Z])", r"_\1", name)
         return re.sub(r"(\D)(\d+)$", r"\1_\2", name).upper()
-        return re.sub(r"(?<=[a-z0-9])([A-Z])", r"_\1", self.rust_name).upper()
-        return re.sub(r"(?<!^)(?=[A-Z])", "_", self.rust_name).upper()
🤖 Prompt for AI Agents
In @scripts/generate_opcode_metadata.py around lines 21 - 29, The cpython_name
property contains multiple return statements after the intended two-step
transformation, making the last two returns unreachable; edit the cpython_name
method to keep the special-case for "CopyItem" and the intended conversion
(assign rust_name to name, apply the two regex substitutions and return
name.upper()), and remove the extraneous return lines (the duplicate re.sub
returns) so only the correct transformation and single return remain in the
cpython_name property.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@youknowone youknowone merged commit c2bfdf3 into RustPython:main Jan 9, 2026
13 checks passed
terryluan12 pushed a commit to terryluan12/RustPython that referenced this pull request Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants