-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Pseudo ops #6678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pseudo ops #6678
Conversation
|
Important Review skippedReview was skipped due to path filters ⛔ Files ignored due to path filters (1)
CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including You can disable this status message by setting the 📝 WalkthroughWalkthroughThis PR unifies LOAD_ATTR_METHOD into LOAD_ATTR by encoding a method flag within the oparg, replaces pseudo Jump instructions with directional JumpForward/JumpBackward variants, and adds keyword argument support for class definitions via CallKw instruction. Changes span compilation, instruction encoding/decoding, runtime execution, and JIT handling. Changes
Sequence DiagramssequenceDiagram
participant Codegen as Codegen
participant IR as IR Transform
participant Bytecode as Bytecode
participant VM as Runtime VM
Codegen->>IR: Class definition with base + kwarg
IR->>IR: convert_pseudo_ops: encode method flag into LOAD_ATTR oparg
IR->>Bytecode: emit LOAD_ATTR with method-encoded oparg
Bytecode->>Bytecode: encode_load_attr_arg(name_idx, is_method)
Bytecode->>VM: Instruction::LoadAttr { idx: encoded_oparg }
VM->>VM: load_attr: decode_load_attr_arg(oparg) → (name_idx, is_method)
alt is_method == true
VM->>VM: push [method_value, self]
else
VM->>VM: push attribute_value
end
sequenceDiagram
participant Codegen as Codegen
participant IR as IR Transform
participant Bytecode as Bytecode
participant VM as Runtime VM
Codegen->>IR: Class call with keyword arguments
IR->>Bytecode: compute kwarg_names tuple from keyword.arg
Bytecode->>Bytecode: emit CallKw with nargs = positional + kwarg_count
Bytecode->>VM: Instruction::CallKw { nargs, kwnames }
VM->>VM: Execute Call with keyword arguments resolved
sequenceDiagram
participant Codegen as Codegen
participant IR as IR Transform
participant Bytecode as Bytecode
participant VM as Runtime VM
Codegen->>IR: Pseudo Jump instruction (direction unknown)
IR->>IR: convert_pseudo_ops: resolve target offset
alt target > current_offset
IR->>Bytecode: emit JumpForward { target }
else
IR->>Bytecode: emit JumpBackward { target }
end
Bytecode->>VM: Real directional jump instruction
VM->>VM: Execute jump to resolved target offset
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using git pull origin pseudo-ops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/compiler-core/src/bytecode.rs (1)
803-811: Add the three new jump instruction variants to thelabel_arg()method.The new
JumpBackward,JumpBackwardNoInterrupt, andJumpForwardvariants store labels in theirtargetfield but are missing from thelabel_arg()method (lines 1740-1754). The requirement at line 587 explicitly states: "if you add a new instruction that stores a Label, make sure to add it in Instruction::label_arg". Without this,label_targets()will not correctly identify these jump targets, breaking jump offset analysis.Add these three variants to the match statement in
label_arg():JumpBackward { target: l } | JumpBackwardNoInterrupt { target: l } | JumpForward { target: l } => Some(*l),crates/codegen/src/ir.rs (1)
153-218: Move pseudo-op normalization beforemax_stackdepth()computation.Currently,
max_stackdepth()(line 157) analyzes stack effects of unrewritten pseudo-ops (LoadAttrMethod,PopBlock), while the actual bytecode emits their normalized forms (LoadAttrwith encoded bit,Nop). Although these forms have equivalent stack effects, analysis should operate on the final instruction set to prevent subtle mismatches if instruction semantics change later.Suggested change: normalize pseudo-ops before
max_stackdepth()impl CodeInfo { pub fn finalize_code( mut self, opts: &crate::compile::CompileOpts, ) -> crate::InternalResult<CodeObject> { if opts.optimize > 0 { self.dce(); } + // Normalize pseudo-ops to their final forms before stack depth analysis. + for block in self + .blocks + .iter_mut() + .filter(|b| b.next != BlockIdx::NULL || !b.instructions.is_empty()) + { + for info in &mut block.instructions { + match info.instr { + Instruction::LoadAttrMethod { idx } => { + let encoded = OpArg((idx.get(info.arg) << 1) | 1); + info.arg = encoded; + info.instr = Instruction::LoadAttr { idx: Arg::marker() }; + } + Instruction::LoadAttr { idx } => { + let encoded = OpArg(idx.get(info.arg) << 1); + info.arg = encoded; + info.instr = Instruction::LoadAttr { idx: Arg::marker() }; + } + Instruction::PopBlock => info.instr = Instruction::Nop, + _ => {} + } + } + } + let max_stackdepth = self.max_stackdepth()?; let cell2arg = self.cell2arg();
🤖 Fix all issues with AI agents
In @crates/codegen/src/ir.rs:
- Around line 199-210: stack_effect() incorrectly assumes LoadAttr always has
method flag 0; when the encoded arg has the method bit set it should return 1
(pop obj, push method+None). Fix by adding centralized helpers in
rustpython_compiler_core::bytecode—e.g., encode_load_attr_arg(name_idx: usize,
is_method: bool) -> OpArg and decode_load_attr_arg(oparg: OpArg) -> (name_idx,
is_method)—then replace the manual encoding in Instruction::LoadAttrMethod /
Instruction::LoadAttr in ir.rs and the decoding in vm::frame.rs and
compiler-core::bytecode.rs to use those helpers, and update stack_effect() to
call decode_load_attr_arg(info.arg) (or equivalent) and return 1 when is_method
is true, otherwise 0.
🧹 Nitpick comments (1)
crates/codegen/src/compile.rs (1)
6215-6225: Nice invariant message; consider pre-allocatingkwarg_namescapacity.
This keeps the same behavior but makes internal failures easier to diagnose.Suggested tweak
- let mut kwarg_names = vec![]; + let mut kwarg_names = Vec::with_capacity(arguments.keywords.len()); for keyword in &arguments.keywords { let name = keyword .arg .as_ref() .expect("has_double_star is false, so arg must be Some"); kwarg_names.push(ConstantData::Str { value: name.as_str().into(), }); self.compile_expression(&keyword.value)?; }
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (4)
crates/codegen/src/compile.rscrates/codegen/src/ir.rscrates/compiler-core/src/bytecode.rscrates/vm/src/frame.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/frame.rscrates/codegen/src/ir.rscrates/compiler-core/src/bytecode.rscrates/codegen/src/compile.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When modifying bytecode instructions, perform a full clean build by running `rm -r target/debug/build/rustpython-* && find . | grep -E '\.pyc$' | xargs rm -r`
Applied to files:
crates/codegen/src/ir.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Applied to files:
crates/codegen/src/ir.rs
🧬 Code graph analysis (3)
crates/codegen/src/ir.rs (1)
crates/compiler-core/src/bytecode.rs (6)
marker(535-537)core(957-957)core(1021-1021)new(34-42)new(540-542)new(1038-1040)
crates/compiler-core/src/bytecode.rs (2)
crates/codegen/src/ir.rs (1)
idx(47-49)crates/codegen/src/compile.rs (1)
name(1208-1210)
crates/codegen/src/compile.rs (2)
crates/vm/src/function/argument.rs (1)
kwarg_names(130-130)crates/vm/src/builtins/genericalias.rs (1)
arguments(66-66)
🔇 Additional comments (7)
crates/compiler-core/src/bytecode.rs (2)
2063-2065: LGTM!The display logic for the new jump instructions is consistent with the existing pattern and correctly uses the
w!macro to format the instruction name and target.
2070-2085: The oparg encoding incrates/codegen/src/ir.rs(lines 200-209) correctly implements the bitfield scheme:
- LoadAttrMethod encodes as
(idx.get(info.arg) << 1) | 1(bit 0 = 1, bits 1+ = name index)- LoadAttr encodes as
idx.get(info.arg) << 1(bit 0 = 0, bits 1+ = name index)This perfectly matches the decoding logic in the display function:
is_method = (encoded & 1) == 1andname_idx = encoded >> 1. The encoding is consistent and correct.crates/vm/src/frame.rs (4)
1085-1096: LGTM!The runtime handling for the new direction-aware jump instructions is straightforward and correct. All three variants simply update the program counter via
self.jump(target.get(arg)), which is consistent with the genericJumpinstruction.
2509-2535: LGTM!The
load_attrimplementation correctly handles the unified LoadAttr encoding:
Bitfield decoding (lines 2510-2513): Matches the display logic in
bytecode.rs- bit 0 for method flag, bits 1+ for name index.Method call path (lines 2516-2528): Properly uses
PyMethod::getand handles both cases:
Function: Pushes[func, parent]to enable method call semanticsAttribute: Pushes[val, NULL]to indicate non-method callRegular attribute path (lines 2529-2533): Simple
get_attrfor non-method access.The stack layout matches the calling convention in
execute_call(lines 2130-2152), which popsself_or_nullthencallableand prependsself_or_nullto arguments when present.
1416-1418: The code is correct as written.PopBlockis a pseudo-op that is reliably converted toNopduring IR generation incrates/codegen/src/ir.rs(lines 212-213), which occurs before bytecode reaches the frame executor. Theunreachable!()macro is appropriately justified and will not trigger at runtime.
1107-1110: Theunreachable!assertion is correct.LoadAttrMethodis indeed converted toLoadAttrduring the compilation's IR conversion phase (incrates/codegen/src/ir.rs), where the pseudo-op is replaced withLoadAttrand the method flag is encoded directly into the instruction argument via bit-shifting. By the time bytecode reaches the VM, allLoadAttrMethodinstructions have been transformed, making this handler safely unreachable.crates/codegen/src/ir.rs (1)
244-282: Verify instruction rewrite consistency; JumpBackwardNoInterrupt appears unused.The
current_offsettracking (line 245) and comparison withtarget_offset(lines 257-266) are correct for offsets measured in finalCodeUnits includingEXTENDED_ARG. The offset increment at line 281 properly tracks emission.Observations on two concerns:
- JumpBackwardNoInterrupt: This variant is defined in bytecode (marked as "Placeholder") but is never emitted during codegen—only
JumpForwardandJumpBackwardare selected. It exists only in the VM execution path (frame.rs). Unless this is a future feature, it appears unused.- info.instr reassignment: The code emits
opbut does not updateinfo.instr. However,generate_exception_table(called after emission) only usesinstr.except_handlerandinstr_size(), not theinstrfield itself. For consistency with the pattern shown in adjacent code (lines 203, 209, 213 whereinfo.instris reassigned), consider updatinginfo.instr = ophere as well, though it is not strictly required for correctness.
66cb68b to
3a318d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/compiler-core/src/bytecode.rs (1)
1755-1769: Control-flow helpers should recognize new jump variants as “label args” and unconditional branches.
label_arg()now includesJumpForward/JumpBackward/..., butunconditional_branch()(and any other CFG helpers) should also treat these as unconditional branches to keep analyses consistent post-rewrite.Proposed update
pub const fn unconditional_branch(&self) -> bool { matches!( self, Jump { .. } + | JumpForward { .. } + | JumpBackward { .. } + | JumpBackwardNoInterrupt { .. } | Continue { .. } | Break { .. } | ReturnValue | ReturnConst { .. } | RaiseVarargs { .. } | Reraise { .. } ) }crates/codegen/src/compile.rs (1)
6215-6233: Explicit nargs computation required forCALL_KWcorrectnessThe
CallKwinstruction nargs must equal the sum of positional and keyword argument counts to maintain stack protocol. Sincearguments.len()semantics for call expressions are ambiguous (it may only count positional args), compute nargs explicitly in this branch.Proposed fix
- let mut kwarg_names = vec![]; + let mut kwarg_names = Vec::with_capacity(arguments.keywords.len()); for keyword in &arguments.keywords { let name = keyword .arg .as_ref() .expect("has_double_star is false, so arg must be Some"); kwarg_names.push(ConstantData::Str { value: name.as_str().into(), }); self.compile_expression(&keyword.value)?; } self.emit_load_const(ConstantData::Tuple { elements: kwarg_names, }); - emit!(self, Instruction::CallKw { nargs: count }); + let positional = additional_positional + .checked_add(u32::try_from(arguments.args.len()).expect("too many positional args")) + .expect("too many positional args"); + let keyword = u32::try_from(arguments.keywords.len()).expect("too many keyword args"); + emit!( + self, + Instruction::CallKw { + nargs: positional.checked_add(keyword).expect("too many arguments") + } + );
🤖 Fix all issues with AI agents
In @crates/compiler-core/src/bytecode.rs:
- Around line 87-99: encode_load_attr_arg shifts name_idx left one bit which
will truncate the high bit for name_idx >= 2^31; add a debug-only guard in
encode_load_attr_arg to assert name_idx fits in 31 bits (e.g., assert name_idx <
(1<<31)) and return/handle error if not, and ensure decode_load_attr_arg remains
symmetric; additionally ensure any serialization/marshal of bytecode is
versioned or invalidated so old serialized code using the previous packing
cannot be mis-decoded (update the bytecode format version or add a compatibility
check when unmarshaling).
🧹 Nitpick comments (4)
crates/compiler-core/src/bytecode.rs (1)
2087-2100: LOAD_ATTR display: consider printing decoded index (optional).
Right now it prints the encoded oparg as the numeric field; that’s correct but a bit opaque when debugging. Optionally print(name_idx, ...)as the numeric part (or include both).crates/codegen/src/ir.rs (1)
244-282: Jump direction selection: clarify equality case + keep direction logic explicit.
Usingtarget_offset > current_offsetmakestarget==currentbecomeJumpBackward; probably fine, but it’s a surprising tie-break. Consider>=(or an explicit comment) to avoid ambiguity.crates/vm/src/frame.rs (2)
1085-1096: New jump opcodes handled uniformly (OK), but “NoInterrupt” currently doesn’t differ.
All three callself.jump(...), butJumpBackwardNoInterruptdoesn’t currently skipvm.check_signals()since that’s done unconditionally at instruction entry. If this opcode is meant to differ, it’ll need plumbing later (fine to defer if intentional).
2509-2533: LOAD_ATTR method-flag path looks correct for the CALL protocol; add bounds hardening (optional).
ThePyMethod::FunctionvsPyMethod::Attributesplit matches the “callable + self_or_null” stack contract. Optional: avoid panicking on malformed bytecode by checkingname_idxbounds and raising aSystemError/RuntimeError.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (6)
.github/workflows/ci.yamlcrates/codegen/src/compile.rscrates/codegen/src/ir.rscrates/compiler-core/src/bytecode.rscrates/jit/src/instructions.rscrates/vm/src/frame.rs
💤 Files with no reviewable changes (1)
- .github/workflows/ci.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/codegen/src/ir.rscrates/jit/src/instructions.rscrates/codegen/src/compile.rscrates/vm/src/frame.rscrates/compiler-core/src/bytecode.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Applied to files:
crates/codegen/src/ir.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
crates/codegen/src/ir.rs
🧬 Code graph analysis (2)
crates/codegen/src/compile.rs (1)
crates/vm/src/function/argument.rs (1)
kwarg_names(130-130)
crates/compiler-core/src/bytecode.rs (2)
crates/codegen/src/ir.rs (1)
idx(47-49)crates/codegen/src/compile.rs (1)
name(1208-1210)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (windows-2025)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run rust tests (windows-2025)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Check Rust code with clippy
- GitHub Check: Run tests under miri
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run snippets and cpython tests on wasm-wasi
🔇 Additional comments (7)
crates/jit/src/instructions.rs (2)
563-569: Jump/JIT emission path looks consistent for JumpForward/JumpBackward.
Routing the new variants through the sameget_or_create_block+jumpemission is the right alignment.
212-219: AddJumpBackwardNoInterruptto unreachable code handling to match other terminating jumps.
JumpBackwardNoInterruptis defined in the bytecode specification and treated as an unconditional terminator in the VM (likeJumpBackward), but it's missing from both the unreachable code match at line 212-219 and theadd_instructionhandler at line 563-569. If the compiler can emit this instruction in JIT-eligible code, the JIT will silently skip it or fail to handle it correctly. Either add it to both match statements alongside the otherJump*variants, or add an assertion/comment explaining why it cannot appear in JIT-compiled code.crates/compiler-core/src/bytecode.rs (1)
2079-2083: Disassembly formatting for new jump opcodes is fine.
This keeps the display path aligned withlabel_arg()and the new instruction set.crates/codegen/src/ir.rs (2)
4-12: Imports for new encoding utilities look appropriate.
No concerns with pullingencode_exception_table/encode_load_attr_arginto this layer.
192-218: No double-encoding risk; safe by design. Thefinalize_codemethod takesmut self, consuming theCodeInfoinstance. EachCodeInfois pushed ontocode_stackonce, popped once, and then destroyed afterfinalize_codereturns. Rust's ownership semantics make it impossible to callfinalize_codetwice on the same instance, preventing any reuse scenario entirely.crates/vm/src/frame.rs (2)
1415-1418: PopBlock unreachable aligns with codegen rewrite.
Assuming pseudo-ops never survive into executed code objects, this is consistent.
1107-1110: LoadAttr now expects encoded oparg — good runtime assertion.The
unreachable!is correct:LoadAttrMethodis a pseudo-op converted toLoadAttrduring bytecode finalization infinalize_code()(crates/codegen/src/ir.rs:199-204), encoding the method flag into the oparg. It cannot reach runtime execution in frame.rs.The primary risk—loading older code objects—is mitigated by magic number validation on deserialization (crates/vm/src/builtins/code.rs:360), which rejects bytecode with mismatched
PYC_MAGIC_NUMBER(currently 2997). Ensure any bytecode encoding changes bump the magic number.
ShaharNaveh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
I think a future improvement would be to extract pseudos to their own enum
3a318d8 to
3139355
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In @crates/compiler-core/src/bytecode.rs:
- Around line 87-99: The encode_load_attr_arg function can silently truncate
when name_idx has its top bit set; add an explicit size invariant check (e.g.,
assert or debug_assert) that name_idx fits in 31 bits before shifting, and
document this invariant in the function comment; keep decode_load_attr_arg
unchanged but ensure the pair of functions' contract/comments state the 31-bit
limit so callers cannot pass larger values.
In @crates/vm/src/frame.rs:
- Around line 1085-1096: The unconditional vm.check_signals() in
execute_instruction() defeats the JUMP_BACKWARD_NO_INTERRUPT semantics; change
the flow so signal checking is skipped for
bytecode::Instruction::JumpBackwardNoInterrupt. Concretely: before calling
vm.check_signals(), peek or match the current instruction (the same instruction
value used in the existing match) and only call vm.check_signals() when the
instruction is not JumpBackwardNoInterrupt; alternatively, move the
vm.check_signals() call into each instruction handler and omit it in the
JumpBackwardNoInterrupt branch (the branch handling
bytecode::Instruction::JumpBackwardNoInterrupt should call
self.jump(target.get(arg)) and return without invoking vm.check_signals()).
Ensure the next instruction that normally checks signals still runs that check
so interrupts are deferred correctly.
🧹 Nitpick comments (5)
crates/codegen/src/compile.rs (1)
6215-6242: ConfirmCALL_KWstack contract + add a compiler snapshot test for keyword calls.This looks consistent with CPython’s
CALL_KWshape (values evaluated left-to-right, then the names tuple, withnargs = positional + keyword_value_count). Two small tweaks would make it safer/clearer:
- Prefer
debug_assert!(keyword.arg.is_some())(or an internal error) overexpect(...)for the invariant on Line 6219-6223.- Consider reusing
sizefromgather_elements(...)as the positional count (instead of recomputing fromarguments.args.len()), since this branch is explicitly the non-unpack path.Also, please add/extend a dis-snapshot-style test to lock in the new emission (e.g.,
f(x=1),obj.f(x=1)) and ensureCALL_KW’snargsmatches runtime expectations.crates/jit/src/instructions.rs (1)
172-201: Potential MSRV break: Rustlet_chainsinif let ... && ....
if let Some(cur) = self.builder.current_block() && cur != target_blockrequireslet_chains(Rust 1.64+). If RustPython’s MSRV is lower, this will fail to compile.Proposed compatibility rewrite
- if let Some(cur) = self.builder.current_block() - && cur != target_block - { + if let Some(cur) = self.builder.current_block() { + if cur != target_block { // Check if the block needs a terminator by examining the last instruction let needs_terminator = match self.builder.func.layout.last_inst(cur) { None => true, // Empty block needs terminator Some(inst) => { // Check if the last instruction is a terminator !self.builder.func.dfg.insts[inst].opcode().is_terminator() } }; if needs_terminator { self.builder.ins().jump(target_block, &[]); } - } + } + }crates/compiler-core/src/bytecode.rs (1)
2082-2103: Disassembly change is OK, but consider printing decodedname_idxfor readability/stability.
Right now it prints the encoded arg (e.g.,name_idx<<1) as the numeric field; that’s a bit surprising in dumps.crates/codegen/src/ir.rs (1)
192-219: Pseudo-op lowering pass is clear; consider nullingarg/targetonPopBlock -> Nopdefensively.
Even ifPopBlockcurrently always hasarg=0/target=NULL, making it explicit avoids future regressions whereNOPaccidentally gets emitted with EXTENDED_ARGs.Defensive tweak
Instruction::PopBlock => { info.instr = Instruction::Nop; + info.arg = OpArg::null(); + info.target = BlockIdx::NULL; }crates/vm/src/frame.rs (1)
2509-2533:load_attr: avoid panics on badname_idx, and usePyMethod::Function.target.
Today this can panic onself.code.names[name_idx as usize], and it ignores thetargetreturned byPyMethod::get.Proposed hardening
fn load_attr(&mut self, vm: &VirtualMachine, oparg: u32) -> FrameResult { let (name_idx, is_method) = bytecode::decode_load_attr_arg(oparg); - let attr_name = self.code.names[name_idx as usize]; + let attr_name = *self + .code + .names + .get(name_idx as usize) + .ok_or_else(|| vm.new_runtime_error("LOAD_ATTR name_idx out of bounds".to_owned()))?; let parent = self.pop_value(); if is_method { // Method call: push [method, self_or_null] let method = PyMethod::get(parent.clone(), attr_name, vm)?; match method { - PyMethod::Function { target: _, func } => { + PyMethod::Function { target, func } => { self.push_value(func); - self.push_value(parent); + self.push_value(target); } PyMethod::Attribute(val) => { self.push_value(val); self.push_null(); } } } else { // Regular attribute access let obj = parent.get_attr(attr_name, vm)?; self.push_value(obj); } Ok(None) }
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (6)
.github/workflows/ci.yamlcrates/codegen/src/compile.rscrates/codegen/src/ir.rscrates/compiler-core/src/bytecode.rscrates/jit/src/instructions.rscrates/vm/src/frame.rs
💤 Files with no reviewable changes (1)
- .github/workflows/ci.yaml
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/codegen/src/compile.rscrates/vm/src/frame.rscrates/compiler-core/src/bytecode.rscrates/codegen/src/ir.rscrates/jit/src/instructions.rs
🧠 Learnings (4)
📓 Common learnings
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When modifying bytecode instructions, perform a full clean build by running `rm -r target/debug/build/rustpython-* && find . | grep -E '\.pyc$' | xargs rm -r`
Applied to files:
crates/codegen/src/ir.rs
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Applied to files:
crates/codegen/src/ir.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
crates/codegen/src/ir.rs
🧬 Code graph analysis (4)
crates/codegen/src/compile.rs (1)
crates/vm/src/function/argument.rs (1)
kwarg_names(130-130)
crates/vm/src/frame.rs (2)
crates/compiler-core/src/bytecode.rs (3)
decode_load_attr_arg(95-99)get(464-470)get(567-569)crates/vm/src/vm/method.rs (1)
get(22-88)
crates/compiler-core/src/bytecode.rs (1)
crates/codegen/src/compile.rs (1)
name(1208-1210)
crates/codegen/src/ir.rs (1)
crates/compiler-core/src/bytecode.rs (7)
encode_load_attr_arg(89-91)marker(549-551)core(971-971)core(1035-1035)new(34-42)new(554-556)new(1052-1054)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: Run rust tests (windows-2025)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (windows-2025)
- GitHub Check: Run snippets and cpython tests on wasm-wasi
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Ensure compilation on various targets
🔇 Additional comments (9)
crates/jit/src/instructions.rs (2)
211-222: Unreachable tracking updated for new jump variants: looks right.
MarkingJumpForward/JumpBackward/JumpBackwardNoInterruptas unreachable block terminators matches the intent and avoids compiling dead code.
564-571: Unified jump emission for new variants: looks consistent.
Routing Jump-like variants through the samejump(target_block)path keeps JIT control-flow handling aligned with VM/codegen.crates/compiler-core/src/bytecode.rs (2)
1753-1796: Control-flow helpers updated for jump variants: good.
AddingJumpForward/JumpBackward/JumpBackwardNoInterrupttolabel_arg()andunconditional_branch()is the right plumbing for CFG-based logic.
1809-1830: No action needed—code is correct as-is.
LoadAttrandLoadAttrMethodremain separate enum variants with their own opcodes and correct stack effects. There is no lowering ofLoadAttrMethodintoLoadAttrwith an encoded method-flag; the two instructions are distinct in the current implementation. Thestack_effectmethod correctly returns0forLoadAttrand1forLoadAttrMethod.Likely an incorrect or invalid review comment.
crates/codegen/src/ir.rs (3)
6-10: Public export changes look fine.
Re-exportingArgandencode_load_attr_argmatches the new encoding-based pipeline.
244-282: Jump direction selection based oncurrent_offsetlooks consistent with “absolute Label” targets.
Nice touch updatingcurrent_offsetbyinstr_size()so EXTENDED_ARG width changes can flip jump direction on recompilation passes.
149-292: Pseudo op conversion is correctly implemented. The code properly convertsLoadAttrMethodtoLoadAttr(with method flag=1) andPopBlocktoNopin the pre-loop conversion block (lines 200–214). Additionally, theJumppseudo-op is converted toJumpForwardorJumpBackwardbased on target offset in the main instruction loop. The frame.rs unreachable!() assertion confirmsLoadAttrMethodshould never reach execution. The clean rebuild advice from learnings remains good practice after bytecode instruction changes, though the conversion logic itself is exhaustive and correct.crates/vm/src/frame.rs (2)
1107-1111: Good invariant assertion, but confirm no untrusted bytecode path can reach this.
LoadAttrMethodbeingunreachable!()is fine iff all bytecode sources (marshal/.pyc/etc.) go through the lowering pass.
1416-1418: Same concern forPopBlock: ok if compile-time eliminated everywhere.
If any legacy code object can still containPopBlock, this becomes a hard panic.
| /// Encode LOAD_ATTR oparg: bit 0 = method flag, bits 1+ = name index. | ||
| #[inline] | ||
| pub const fn encode_load_attr_arg(name_idx: u32, is_method: bool) -> u32 { | ||
| (name_idx << 1) | (is_method as u32) | ||
| } | ||
|
|
||
| /// Decode LOAD_ATTR oparg: returns (name_idx, is_method). | ||
| #[inline] | ||
| pub const fn decode_load_attr_arg(oparg: u32) -> (u32, bool) { | ||
| let is_method = (oparg & 1) == 1; | ||
| let name_idx = oparg >> 1; | ||
| (name_idx, is_method) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOAD_ATTR oparg encoding needs an explicit size invariant (avoid silent truncation).
(name_idx << 1) will discard the top bit if name_idx >= 2^31. If that’s impossible by construction, consider asserting/documenting it.
Possible defensive tweak
pub const fn encode_load_attr_arg(name_idx: u32, is_method: bool) -> u32 {
- (name_idx << 1) | (is_method as u32)
+ // name_idx uses bits 1..; bit0 is reserved for is_method.
+ debug_assert!(name_idx < (1 << 31));
+ (name_idx << 1) | (is_method as u32)
}🤖 Prompt for AI Agents
In @crates/compiler-core/src/bytecode.rs around lines 87 - 99, The
encode_load_attr_arg function can silently truncate when name_idx has its top
bit set; add an explicit size invariant check (e.g., assert or debug_assert)
that name_idx fits in 31 bits before shifting, and document this invariant in
the function comment; keep decode_load_attr_arg unchanged but ensure the pair of
functions' contract/comments state the 31-bit limit so callers cannot pass
larger values.
3139355 to
68a8060
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
crates/codegen/src/compile.rs (2)
3764-3807: Genericclassbuild_class path: ensure keyword validation + add coverage formetaclass=(and friends).In the generic “simple case” you manually emit keyword values + kw-name tuple and then
CallKw. This bypasses the keyword-name validation done incompile_call_helper(e.g.,check_forbidden_name), so generic class defs may behave differently from non-generic ones for edge-case keyword names.Proposed minimal consistency fix
@@ if let Some(arguments) = arguments && !arguments.keywords.is_empty() { let mut kwarg_names = vec![]; for keyword in &arguments.keywords { let name = keyword.arg.as_ref().expect( "keyword argument name must be set (no **kwargs in this branch)", ); + // Keep behavior consistent with compile_call_helper keyword validation. + self.check_forbidden_name(name.as_str(), NameUsage::Store)?; kwarg_names.push(ConstantData::Str { value: name.as_str().into(), }); self.compile_expression(&keyword.value)?; }Also worth adding a regression test that compiles/executes (or at least disassembles) something like:
class C[T](metaclass=Meta): ...class C[T](*bases, metaclass=Meta): ...
to ensure both theCallKwandCallFunctionExbranches behave correctly.
6215-6244:compile_call_helper: CallKw emission looks right; add a debug-assert to prevent future nargs drift.Nice:
Vec::with_capacity, and thechecked_addchain makes overflow failures explicit.One small guardrail: since this branch is only reachable when
unpack == falseandhas_double_star == false,gather_elements(...)’s returnedsizeshould matchadditional_positional + arguments.args.len(); adebug_assert_eq!would catch accidental divergence ifgather_elementsevolves.Suggested guardrail
@@ } else if !arguments.keywords.is_empty() { @@ - let positional = additional_positional + let positional = additional_positional .checked_add(u32::try_from(arguments.args.len()).expect("too many positional args")) .expect("too many positional args"); + debug_assert_eq!(positional, size, "gather_elements size drift (non-unpack path)"); let keyword_count = u32::try_from(arguments.keywords.len()).expect("too many keyword args");
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (5)
crates/codegen/src/compile.rscrates/codegen/src/ir.rscrates/compiler-core/src/bytecode.rscrates/jit/src/instructions.rscrates/vm/src/frame.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/jit/src/instructions.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/codegen/src/ir.rscrates/compiler-core/src/bytecode.rscrates/vm/src/frame.rscrates/codegen/src/compile.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.
📚 Learning: 2025-08-26T05:20:54.540Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6110
File: vm/src/frame.rs:1311-1316
Timestamp: 2025-08-26T05:20:54.540Z
Learning: In the RustPython codebase, only certain builtin types should be marked with the SEQUENCE flag for pattern matching. List and tuple are sequences, but bytes, bytearray, and range are not considered sequences in this context, even though they may implement sequence-like protocols.
Applied to files:
crates/codegen/src/ir.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
crates/codegen/src/ir.rs
🧬 Code graph analysis (3)
crates/codegen/src/ir.rs (1)
crates/compiler-core/src/bytecode.rs (8)
encode_exception_table(47-59)encode_load_attr_arg(89-91)marker(549-551)core(971-971)core(1035-1035)new(34-42)new(554-556)new(1052-1054)
crates/vm/src/frame.rs (3)
crates/compiler-core/src/bytecode.rs (3)
decode_load_attr_arg(95-99)get(464-470)get(567-569)crates/vm/src/vm/method.rs (1)
get(22-88)crates/vm/src/object/core.rs (1)
get(430-432)
crates/codegen/src/compile.rs (2)
crates/vm/src/function/argument.rs (1)
kwarg_names(130-130)crates/compiler-core/src/bytecode.rs (3)
try_from(979-1039)try_from(1060-1065)try_from(1074-1080)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: Run snippets and cpython tests (windows-2025)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Check the WASM package and demo
- GitHub Check: Run rust tests (windows-2025)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Check Rust code with clippy
🔇 Additional comments (12)
crates/compiler-core/src/bytecode.rs (5)
87-99: LGTM! Clean encode/decode helpers for LOAD_ATTR oparg.The encoding scheme is straightforward and correct: bit 0 stores the method flag, and the remaining bits store the name index. The
const fnannotation and#[inline]attributes are appropriate for these small, hot-path functions.
1756-1759: LGTM! New jump variants properly integrated into label_arg.The new
JumpBackward,JumpBackwardNoInterrupt, andJumpForwardinstructions are correctly added to the pattern matching forlabel_arg(), ensuring they are recognized as label-bearing instructions.
1786-1788: LGTM! New jump variants correctly marked as unconditional branches.Adding
JumpForward,JumpBackward, andJumpBackwardNoInterrupttounconditional_branch()ensures proper control flow analysis and dead code elimination.
2083-2085: LGTM! Display formatting added for new jump instructions.The disassembly output will now correctly show the new jump variants with their targets.
2090-2103: LGTM! LoadAttr display now shows decoded method flag.The display logic correctly decodes the oparg to show both the name and method flag, which improves disassembly readability for debugging.
crates/codegen/src/ir.rs (3)
7-9: LGTM! Required imports added.The
Argandencode_load_attr_argimports are necessary for the pseudo-op conversion logic.
192-218: LGTM! Clean pseudo-op conversion pass.The preprocessing correctly:
- Converts
LoadAttrMethodtoLoadAttrwith method flag encoded- Encodes regular
LoadAttrwith method flag = false- Converts
PopBlocktoNopThis ensures pseudo-ops are eliminated before final emission.
244-281: LGTM! Jump direction resolution based on offset comparison.The logic correctly determines jump direction by comparing
target_offsetwithcurrent_offset:
- Forward jump when
target_offset > current_offset- Backward jump otherwise
The
current_offsetis properly incremented after each instruction to track position accurately.crates/vm/src/frame.rs (4)
1085-1096: LGTM! New jump instruction handlers correctly delegate to jump().All three new jump variants (
JumpForward,JumpBackward,JumpBackwardNoInterrupt) correctly use the existingjump()method. The implementation is consistent with the existingJumphandler.Note:
JumpBackwardNoInterruptcurrently behaves identically toJumpBackward. If interrupt checking is needed in the future (e.g., for generator resumption without signal checks), this would need to be differentiated.
1108-1110: LGTM! LoadAttrMethod correctly marked as unreachable.Since
LoadAttrMethodis converted toLoadAttrduring compilation inir.rs, encountering it at runtime indicates a compiler bug. Theunreachable!with a descriptive message is appropriate.
1416-1418: LGTM! PopBlock correctly marked as unreachable.Since
PopBlockis converted toNopduring compilation, reaching this arm at runtime should never happen.
2509-2533: LGTM! Unified load_attr implementation with method flag decoding.The implementation correctly:
- Decodes
opargto extractname_idxandis_methodflag- For method calls (
is_method=true): pushes[func, self]forPyMethod::Function, or[attr, NULL]forPyMethod::Attribute- For regular attribute access: pushes just the attribute value
The stack effects match the bytecode definitions in
bytecode.rs(LoadAttr: 0, LoadAttrMethod: +1).
966e027 to
68a8060
Compare
* better expect * pseudo instruction * Fix CallKw narg * Auto-format: cargo fmt --all --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.