-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Updated the re library + test #6648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ed SreFlag::TEMPLATE reference in `sre.rs`
📝 WalkthroughWalkthroughThe PR updates SRE constants by removing the TEMPLATE flag definition and updating the SRE_MAGIC version number. It introduces a new Python script to auto-generate Rust constants from Python source files, replacing manual generation and standardizing tooling. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using git pull origin update_re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
crates/jit/src/instructions.rs (1)
219-228: Consider extracting the terminator check into a helper method.The terminator check logic at lines 221-224 duplicates the pattern at lines 180-186. Extracting this into a small helper would improve maintainability.
🔎 Proposed refactor
+ fn block_needs_terminator(&self, block: Block) -> bool { + match self.builder.func.layout.last_inst(block) { + None => true, + Some(inst) => !self.builder.func.dfg.insts[inst].opcode().is_terminator(), + } + }Then use
self.block_needs_terminator(cur)in both places.scripts/generate_sre_constants.py (2)
90-93: Consider adding error handling for file operations.If
infiledoesn't exist or is unreadable, the script will fail with a raw exception. Adding explicit error handling would provide clearer feedback when running the generator.🔎 Proposed improvement
def main( infile="Lib/re/_constants.py", outfile_constants="crates/sre_engine/src/constants.rs", ): ns = {} - with open(infile) as fp: - code = fp.read() + try: + with open(infile) as fp: + code = fp.read() + except FileNotFoundError: + print(f"Error: Input file '{infile}' not found.", file=sys.stderr) + sys.exit(1) exec(code, ns)
136-139: Consider usingargparsefor better CLI ergonomics.The current approach using
*sys.argv[1:]works but doesn't provide usage help or argument validation. For a script that may be run manually,argparsewould be more user-friendly.🔎 Proposed improvement
if __name__ == "__main__": - import sys - - main(*sys.argv[1:]) + import argparse + parser = argparse.ArgumentParser(description="Generate SRE constants for RustPython") + parser.add_argument("infile", nargs="?", default="Lib/re/_constants.py", + help="Input Python constants file") + parser.add_argument("outfile_constants", nargs="?", + default="crates/sre_engine/src/constants.rs", + help="Output Rust constants file") + args = parser.parse_args() + main(args.infile, args.outfile_constants)crates/compiler-core/src/varint.rs (1)
104-114: Consider adding round-trip verification to tests.The current test verifies expected byte lengths but doesn't read back values to confirm encoding correctness. Adding read verification would strengthen confidence in the implementation.
🔎 Suggested test enhancement
#[test] fn test_write_read_varint() { let mut buf = Vec::new(); write_varint(&mut buf, 0); write_varint(&mut buf, 63); write_varint(&mut buf, 64); write_varint(&mut buf, 4095); // Values: 0, 63, 64, 4095 assert_eq!(buf.len(), 1 + 1 + 2 + 2); + + // Verify round-trip for single values + let mut buf = Vec::new(); + for val in [0u32, 1, 63, 64, 127, 128, 4095, u32::MAX] { + buf.clear(); + write_varint(&mut buf, val); + let mut pos = 0; + assert_eq!(read_varint(&buf, &mut pos), Some(val), "round-trip failed for {val}"); + } }scripts/generate_opcode_metadata.py (1)
14-27: Make enum parsing and CPython-compat checks more robustThe generator looks good overall, but two robustness gaps stand out:
re.search(r"pub enum Instruction \{(.+?)\n\}", ...)is quite format-sensitive. Any change such as attributes on the enum, different spacing around{, or an indented closing}will cause the match to fail or, worse, capture the wrong body. Consider anchoring on^/$withre.MULTILINEand allowing flexible whitespace, or (better) scanning braces to find the enum body rather than relying on a single regex.The “expected” opcode indices at the bottom are only printed; a drift between
Instructionordering and the CPython 3.13 expectations won’t fail the generation step. It would be safer to turn those into hard checks (raising if any key opcode index doesn’t match), so_opcode_metadata.pyis never silently generated with inconsistent metadata.Example of the second point:
assert find_opcode("CACHE") == 0 assert find_opcode("WITH_EXCEPT_START") == 44 # ...etc for a small set of golden opcodes...so a mismatch fails fast rather than being buried in logs.
Also, remember that whenever
Instruction’s order changes, you’ll want a full clean rebuild (including removingrustpython-*build artifacts and stray.pycfiles) so the bytecode/opcode metadata stays consistent across the toolchain. Based on learnings, ...Also applies to: 188-227, 234-246
crates/compiler-core/src/bytecode.rs (1)
17-43: Tighten invariants around exception table entry encodingThe new exception table machinery looks good and the tests cover the happy paths, but there are two spots where adding explicit invariants would make failures easier to diagnose:
- Depth truncation to
u16
ExceptionTableEntry.depthisu16, while stack depths elsewhere are tracked asu32. In bothExceptionTableEntry::newand the callers (e.g. codegen),depthis passed asu32and then cast tou16:pub struct ExceptionTableEntry { pub depth: u16, // ... } pub fn new(start: u32, end: u32, target: u32, depth: u16, push_lasti: bool) -> Self { ... }Given
max_stackdepthisu32, it would be safer to assert thatdepth <= u16::MAXat the point you construct entries (or in codegen before callingnew), instead of silently truncating on overflow. Even adebug_assert!would help catch impossible cases in development.
- Ensuring
end >= start
encode_exception_tablecomputessizeviaend.saturating_sub(start). If a bogus entry ever slips through withend < start, this will quietly encode a zero-length range and make the handler effectively unreachable:let size = entry.end.saturating_sub(entry.start);Consider either asserting
end >= startwhen constructingExceptionTableEntry, or replacingsaturating_subwith a checked subtraction that panics or returns an error on invalid input. This makes malformed tables fail fast rather than degrading into “no handler found” behavior later.[ suggest_recommended_refactor ]
Given how tightly this format is coupled to the
Instructionlayout and the generated_opcode_metadata.py, it’s also worth doing a full clean build (removing stalerustpython-*build artifacts and.pycfiles) whenever you touch the opcode/exception-table definitions, to avoid mismatches between compiled bytecode and the metadata. Based on learnings, ...Also applies to: 45-59, 61-85, 2367-2435
crates/vm/src/frame.rs (1)
464-532: Confirm delegatedthrow()path produces expected tracebacksIn the delegated
throw()path (gen_throwwhenyield_from_target()finds a receiver), errors from the inner coroutine/iterator are handled via:
- forwarding the
throw, and- on error, pushing
Noneand callingunwind_blocksdirectly withUnwindReason::Raising { exception: err }, then either resumingrunor propagatingerr.Unlike the non-delegated
throw_herepath, this branch doesn’t go throughhandle_exception, so it doesn’t explicitly add a traceback frame or callvm.contextualize_exceptionfor the outer generator.This might be exactly what you want (reusing the inner traceback without re-chaining), but it’s subtle. It would be good to double-check—ideally via tests comparing against CPython—that:
g.throw(...)into ayield fromchain still yields the expected traceback frames (both inner and outer frames where appropriate), and- you don’t accidentally double-chain
__context__compared to the non-delegated raise path.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (52)
Lib/_opcode_metadata.pyis excluded by!Lib/**Lib/importlib/_bootstrap_external.pyis excluded by!Lib/**Lib/pkgutil.pyis excluded by!Lib/**Lib/platform.pyis excluded by!Lib/**Lib/plistlib.pyis excluded by!Lib/**Lib/re/__init__.pyis excluded by!Lib/**Lib/re/_casefix.pyis excluded by!Lib/**Lib/re/_compiler.pyis excluded by!Lib/**Lib/re/_constants.pyis excluded by!Lib/**Lib/re/_parser.pyis excluded by!Lib/**Lib/sre_constants.pyis excluded by!Lib/**Lib/test/_test_multiprocessing.pyis excluded by!Lib/**Lib/test/test__opcode.pyis excluded by!Lib/**Lib/test/test_asyncgen.pyis excluded by!Lib/**Lib/test/test_compile.pyis excluded by!Lib/**Lib/test/test_contextlib.pyis excluded by!Lib/**Lib/test/test_contextlib_async.pyis excluded by!Lib/**Lib/test/test_dis.pyis excluded by!Lib/**Lib/test/test_exceptions.pyis excluded by!Lib/**Lib/test/test_generators.pyis excluded by!Lib/**Lib/test/test_import/__init__.pyis excluded by!Lib/**Lib/test/test_import/data/circular_imports/import_cycle.pyis excluded by!Lib/**Lib/test/test_import/data/circular_imports/singlephase.pyis excluded by!Lib/**Lib/test/test_import/data/double_const.pyis excluded by!Lib/**Lib/test/test_import/data/package3/__init__.pyis excluded by!Lib/**Lib/test/test_import/data/package3/submodule.pyis excluded by!Lib/**Lib/test/test_import/data/package4/__init__.pyis excluded by!Lib/**Lib/test/test_import/data/package4/submodule.pyis excluded by!Lib/**Lib/test/test_importlib/test_util.pyis excluded by!Lib/**Lib/test/test_patma.pyis excluded by!Lib/**Lib/test/test_pkgutil.pyis excluded by!Lib/**Lib/test/test_platform.pyis excluded by!Lib/**Lib/test/test_plistlib.pyis excluded by!Lib/**Lib/test/test_re.pyis excluded by!Lib/**Lib/test/test_traceback.pyis excluded by!Lib/**Lib/test/test_unittest/test_async_case.pyis excluded by!Lib/**Lib/test/test_unittest/test_discovery.pyis excluded by!Lib/**Lib/test/test_unittest/test_loader.pyis excluded by!Lib/**Lib/test/test_unittest/test_program.pyis excluded by!Lib/**Lib/test/test_unittest/test_runner.pyis excluded by!Lib/**Lib/test/test_unittest/testmock/testasync.pyis excluded by!Lib/**Lib/test/test_unittest/testmock/testhelpers.pyis excluded by!Lib/**Lib/test/test_unittest/testmock/testpatch.pyis excluded by!Lib/**Lib/test/test_weakref.pyis excluded by!Lib/**Lib/token.pyis excluded by!Lib/**Lib/unittest/__init__.pyis excluded by!Lib/**Lib/unittest/async_case.pyis excluded by!Lib/**Lib/unittest/loader.pyis excluded by!Lib/**Lib/unittest/main.pyis excluded by!Lib/**Lib/weakref.pyis excluded by!Lib/**crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snapcrates/codegen/src/snapshots/rustpython_compiler_core__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (20)
.cspell.dict/cpython.txtcrates/codegen/src/compile.rscrates/codegen/src/error.rscrates/codegen/src/ir.rscrates/codegen/src/symboltable.rscrates/compiler-core/src/bytecode.rscrates/compiler-core/src/lib.rscrates/compiler-core/src/varint.rscrates/jit/src/instructions.rscrates/sre_engine/src/constants.rscrates/stdlib/src/opcode.rscrates/vm/src/builtins/asyncgenerator.rscrates/vm/src/builtins/code.rscrates/vm/src/coroutine.rscrates/vm/src/frame.rscrates/vm/src/stdlib/sre.rscrates/vm/src/version.rscrates/vm/src/vm/mod.rsscripts/generate_opcode_metadata.pyscripts/generate_sre_constants.py
💤 Files with no reviewable changes (1)
- crates/vm/src/stdlib/sre.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/vm/mod.rscrates/vm/src/coroutine.rscrates/jit/src/instructions.rscrates/compiler-core/src/lib.rscrates/vm/src/builtins/code.rscrates/codegen/src/error.rscrates/codegen/src/ir.rscrates/compiler-core/src/varint.rscrates/vm/src/version.rscrates/sre_engine/src/constants.rscrates/stdlib/src/opcode.rscrates/codegen/src/symboltable.rscrates/compiler-core/src/bytecode.rscrates/vm/src/frame.rscrates/vm/src/builtins/asyncgenerator.rs
**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code
Files:
scripts/generate_sre_constants.pyscripts/generate_opcode_metadata.py
🧠 Learnings (2)
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.
Applied to files:
crates/vm/src/vm/mod.rscrates/codegen/src/ir.rscrates/compiler-core/src/bytecode.rscrates/vm/src/frame.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When modifying bytecode instructions, perform a full clean build by running `rm -r target/debug/build/rustpython-* && find . | grep -E '\.pyc$' | xargs rm -r`
Applied to files:
scripts/generate_opcode_metadata.py
🧬 Code graph analysis (5)
crates/codegen/src/ir.rs (2)
crates/compiler-core/src/bytecode.rs (4)
encode_exception_table(47-59)new(34-42)new(540-542)new(1150-1152)crates/compiler-core/src/varint.rs (2)
write_signed_varint(22-30)write_varint(9-17)
crates/compiler-core/src/varint.rs (1)
crates/vm/src/builtins/code.rs (1)
read_varint(1078-1096)
crates/stdlib/src/opcode.rs (2)
crates/compiler-core/src/bytecode.rs (3)
try_from(1131-1137)try_from(1158-1163)try_from(1172-1178)crates/stdlib/src/array.rs (2)
try_from(604-607)try_from(1468-1486)
crates/compiler-core/src/bytecode.rs (1)
crates/compiler-core/src/varint.rs (4)
read_varint(74-98)read_varint_with_start(45-70)write_varint(9-17)write_varint_with_start(34-41)
scripts/generate_opcode_metadata.py (1)
crates/compiler-core/src/bytecode.rs (3)
split(424-433)get(450-456)get(553-555)
🔇 Additional comments (26)
.cspell.dict/cpython.txt (1)
17-17: LGTM!The
CLASSDEREFentry is correctly positioned in alphabetical order and is a valid CPython term that should be present in the spell-check dictionary. This supports the PR's enhanced bytecode and variable handling changes.crates/jit/src/instructions.rs (2)
175-190: LGTM!The terminator check before emitting fallthrough jumps is correct. This prevents emitting invalid IR by checking if the block already has a terminator instruction before adding an unconditional jump.
208-216: LGTM!Including
Jumpin the list of instructions that mark subsequent code as unreachable is correct. Unconditional jumps transfer control elsewhere, making any following instructions dead code that should be skipped.crates/vm/src/builtins/asyncgenerator.rs (4)
11-11: LGTM!The
Destructorimport is correctly added to support the new destructor implementation forPyAsyncGen.
35-35: LGTM!Adding
Destructorto thewith(...)list correctly enables the destructor path forPyAsyncGen, aligning with Python's async generator finalization semantics.
76-86: LGTM!Comment updates improve clarity. The error-ignoring behavior correctly matches CPython's
PyErr_WriteUnraisablesemantics for finalizer exceptions.
647-658: Implementation looks correct.The destructor properly checks if the generator is already closed before calling the finalizer, avoiding unnecessary work during GC.
Note: The AI summary mentions a "duplicate Destructor impl block at the bottom of the file," but only this single implementation exists. This appears to be an inconsistency in the summary.
crates/sre_engine/src/constants.rs (1)
1-128: LGTM - Generated file looks correct.The auto-generated constants align with CPython's
Lib/re/_constants.py. The updatedSRE_MAGICvalue (20230612), new enums (SreAtCode,SreCatCode), and removal of theTEMPLATEflag fromSreFlagreflect the upstream CPython 3.11+ changes.scripts/generate_sre_constants.py (3)
7-16: Clean update-file pattern.The
update_filefunction correctly avoids unnecessary writes by comparing content first, which helps avoid triggering unnecessary rebuilds in build systems.
60-83: Bitflags generation correctly preserves original values.Unlike the enum generation, the bitflags function correctly uses the actual values from the dictionary (
valuefrom the tuple), ensuring bit positions are preserved accurately.
36-57: Docstring is accurate; enum value assignment is working correctly.The parameter
dis correctly documented as(list)in the docstring. The sequential enum value assignment (0, 1, 2...) based on sorted order is intentional and works correctly because OPCODES, ATCODES, and CHCODES are already defined in alphabetical order. The generated Rust enums have correct sequential values.crates/compiler-core/src/lib.rs (1)
10-10: LGTM!The new
varintmodule is appropriately added to the public API, enabling exception table and line number table encoding utilities to be consumed by dependent crates likecodegen.crates/vm/src/builtins/code.rs (1)
1003-1028: LGTM!The three-way lookup correctly implements CPython's variable name resolution semantics, properly indexing into varnames, cellvars, and freevars with correctly adjusted offsets. This aligns with how bytecode instructions like
LOAD_FASTandLOAD_DEREFreference variables by combined index.crates/vm/src/coroutine.rs (2)
84-106: LGTM!The exception state swap mechanism correctly isolates generator exception context. The pattern of capturing the generator's saved exception before frame entry, replacing the None pushed by
with_frame, and saving the state back before exit properly maintains exception chain semantics across yields.
174-178: LGTM!The guard for unstarted generators (
lasti() == 0) correctly avoids unnecessaryGeneratorExitthrow handling, matching CPython's behavior where closing a never-started generator is effectively a no-op.crates/codegen/src/symboltable.rs (3)
58-61: LGTM!The
comp_inlinedfield correctly tracks PEP 709 comprehension inlining state, enabling the symbol table to mark non-generator comprehensions for potential inlining optimizations.
352-381: LGTM!The symbol merging logic correctly implements PEP 709 semantics:
BOUND_FLAGSproperly identifies symbols that are bound within the comprehension- Skipping
.0avoids exposing the internal iteration parameter- Only bound symbols (not Free references to outer scope) are merged
- Merged symbols are correctly marked as
Localin the parent scope
1503-1516: LGTM!The inlining decision correctly follows CPython's PEP 709 logic: comprehensions are inlined when they are not generators and the parent scope cannot see a class scope. The
saturating_sub(2)safely handles edge cases where the tables stack is shallow.crates/compiler-core/src/varint.rs (2)
9-17: LGTM!The 6-bit chunk encoding with 0x40 continuation flag is correctly implemented and matches the format used in
LineTableReader::read_varintincode.rs.
22-30: LGTM!The zig-zag encoding correctly handles signed integers, including
i32::MINviawrapping_subwhich avoids overflow on negation.crates/codegen/src/error.rs (2)
79-81: LGTM!The new async error variants appropriately extend error handling for async constructs that appear outside async functions.
119-126: LGTM!The error messages are consistent with CPython's error reporting for these invalid async construct scenarios.
crates/vm/src/vm/mod.rs (2)
523-529: LGTM!The per-frame exception context isolation correctly prevents exception state leakage between function calls. This is essential for proper exception chaining semantics where each frame maintains its own exception context.
852-871: LGTM!The Floyd's cycle detection algorithm correctly guards against infinite loops when traversing exception context chains. The slow pointer advances every other iteration via the toggle, which properly implements the tortoise-and-hare pattern for detecting pre-existing cycles.
crates/codegen/src/ir.rs (1)
86-105: Exception handler metadata and table generation look internally consistentThe way
ExceptHandlerInfois threaded throughInstructionInfo, consumed inmax_stackdepth, and then encoded viagenerate_exception_tablelines up well:
handler_depthinmax_stackdepthmatches the documented stack layout aroundPUSH_EXC_INFO(base depth + optional lasti + exc, thenPUSH_EXC_INFOadds one more).instr.arg.instr_size()is used both for stack-depth accounting and for computinginstr_indexingenerate_exception_table, so the table’s(start, end, target)indices are in the same CodeUnit units thatlastiandframe.rsuse.- The iteration guard in
max_stackdepthgives a clear failure mode (InternalError::StackOverflow) instead of silently underestimating depth.I don’t see correctness issues here; this integrates cleanly with the new exception-table-driven flow.
Also applies to: 325-343, 375-418, 603-676
crates/vm/src/frame.rs (1)
427-504: Yield‑from / async send and cleanup flow matches documented stack layoutsThe new control-flow pieces around delegation look internally consistent:
yield_from_target’s detection (SEND atlasti, or RESUME preceded byYIELD_VALUEwitharg >= 1and a non-empty stack) matches the comment and the new meaning of the YIELD_VALUE oparg.Send/EndSendimplement the documented stack transforms:
Send:(receiver, value) -> (receiver, retval)or(receiver, stop_value)plus jump, with net stack effect 0.EndSend:(receiver, value) -> (value), popping the receiver and leaving the final value.CleanupThrow’s behavior for StopIteration vs other exceptions ((sub_iter, last_sent_val, exc)to(None, value)or re-raise) aligns with how the stack depth is modeled inInstruction::stack_effectand with the comments about handler depth in codegen.Given those cross-checks, the stack invariants for the new delegation opcodes look solid.
Also applies to: 1008-1036, 1683-1764
|
@terryluan12 This PR is including lots of unrelated previously merged commits. please use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/codegen/src/ir.rs (2)
483-676: Add explicit stack depth bounds check to prevent silent truncationThe cast
curr_handler.stack_depth as u16will silently truncate ifmax_stackdepth()ever exceedsu16::MAX(65535), leading to incorrect exception handler unwinding at runtime. SinceExceptionTableEntry::new()takes au16for the depth parameter, add an explicit check either inmax_stackdepth()or here ingenerate_exception_table()to validate the computed depth doesn't exceed the bounds, returning anInternalErrorif it does.Regarding
block_to_index: the current implementation correctly maintains coupling by recomputing it on each loop iteration within the EXTENDED_ARG fixup loop and using it only after the loop terminates, ensuring it reflects final instruction sizes. However, document this tight coupling more explicitly if the instruction array layout changes in the future.
86-106: Address silent truncation when casting stack_depth to u16 in exception table entriesThe
handler_depthformula and stack-depth semantics are correct. The code properly computes handler entry depth ashandler.stack_depth + 1 + preserve_lasti, matching CPython 3.11's exception unwinding semantics: the exception table stores the minimal stack depth to restore before pushing the exception and optional lasti.However, three places (lines 639, 652, 670) silently cast
curr_handler.stack_depthfromu32tou16without overflow checking. If stack depth exceedsu16::MAX(65535), the value truncates, corrupting the exception table. While rare, deeply nested code could trigger this. Add a bounds check:if curr_handler.stack_depth > u16::MAX as u32 { return Err(InternalError::StackOverflow); }The iteration guard of
blocks.len() * 100is appropriately conservative; the algorithm is monotonic and converges quickly for normal code.
♻️ Duplicate comments (2)
crates/vm/src/version.rs (1)
132-132: PYC_MAGIC_NUMBER is incorrect for Python 3.13—must be 3571, not 2997.The Python 3.13 bytecode magic number is 3571 per CPython's
_bootstrap_external.py. Using 2997 breaks .pyc compatibility and validation against Python 3.13 runtime.🔎 Fix to use correct Python 3.13 magic number
-pub const PYC_MAGIC_NUMBER: u16 = 2997; +pub const PYC_MAGIC_NUMBER: u16 = 3571;crates/stdlib/src/opcode.rs (1)
49-53: Fix the off-by-one error inhas_arg(duplicate issue).This is the same issue previously flagged: line 52 uses
opcode > Self::HAVE_ARGUMENT, which incorrectly treats opcode 44 (WITH_EXCEPT_START) as not having an argument. CPython'sHAVE_ARGUMENT = 44means opcodes>= 44carry arguments.Change the comparison to
>=:pub fn has_arg(opcode: i32) -> bool { - Self::is_valid(opcode) && opcode > Self::HAVE_ARGUMENT + Self::is_valid(opcode) && opcode >= Self::HAVE_ARGUMENT }Based on past review comments, this off-by-one was previously identified but not yet fixed.
🧹 Nitpick comments (18)
crates/jit/src/instructions.rs (1)
219-228: Consider extracting duplicate terminator-check logic into a helper.The
needs_terminatorcomputation at lines 221-223 is identical to lines 180-186. Extracting this into a small helper method would improve maintainability.🔎 Proposed helper extraction
+ fn block_needs_terminator(&self, block: Block) -> bool { + match self.builder.func.layout.last_inst(block) { + None => true, + Some(inst) => !self.builder.func.dfg.insts[inst].opcode().is_terminator(), + } + }Then replace both occurrences:
- let needs_terminator = match self.builder.func.layout.last_inst(cur) { - None => true, // Empty block needs terminator - Some(inst) => { - // Check if the last instruction is a terminator - !self.builder.func.dfg.insts[inst].opcode().is_terminator() - } - }; - if needs_terminator { + if self.block_needs_terminator(cur) { self.builder.ins().jump(target_block, &[]); }- let needs_terminator = match self.builder.func.layout.last_inst(cur) { - None => true, - Some(inst) => !self.builder.func.dfg.insts[inst].opcode().is_terminator(), - }; - if needs_terminator { + if self.block_needs_terminator(cur) { self.builder.ins().trap(TrapCode::user(0).unwrap()); }crates/sre_engine/src/constants.rs (2)
108-120: Inconsistent indentation inside bitflags! macro.Line 109's
#[derive(...)]attribute has extra leading spaces compared to the macro body. Since this file is auto-generated, this should be fixed in the generator script.🔎 Proposed fix
bitflags! { -#[derive(Debug, PartialEq, Eq, Clone, Copy)] + #[derive(Debug, PartialEq, Eq, Clone, Copy)] pub struct SreFlag: u16 {
122-128: SreInfo missing derive attributes for consistency.
SreInfolacks#[derive(Debug, PartialEq, Eq, Clone, Copy)]thatSreFlaghas. This could cause issues if these traits are needed elsewhere (e.g., for debugging or comparisons). Since this is auto-generated, consider updating the generator script.🔎 Proposed fix
bitflags! { + #[derive(Debug, PartialEq, Eq, Clone, Copy)] pub struct SreInfo: u32 { const PREFIX = 1; const LITERAL = 2; const CHARSET = 4; } }README.md (1)
69-87: Shell snippet style triggers markdownlint MD014 (optional).The new
venvblock uses$prompts in ash-tagged fence without showing command output, which markdownlint flags (MD014). If you care about a clean lint run, consider either dropping the$prefixes here or switching this block to match the existingbashstyle used earlier in the README.crates/vm/src/frame.rs (3)
347-417: Exception/traceback handling refactor looks sound; double‑check interaction with VM exception stack.The new
is_reraisedetection, conditional traceback augmentation, andhandle_exception+unwind_blocksflow are coherent:
is_reraisecorrectly classifiesRaiseKind::BareRaise,RaiseKind::ReraiseFromStack, andInstruction::Reraiseto avoid adding duplicate traceback entries and re‑chaining on re‑raise.- For non‑reraise, you add a single traceback entry using
code.locations[idx]and storelastiin byte units (idx * 2), which matches the later restoration logic (tb.lasti / 2).- On failure inside
unwind_blocks, restoringlastifrom the traceback for this frame before returning the error helps keepframe.f_linenoconsistent with the final traceback.The only subtlety that’s not obvious from this file is how
vm.contextualize_exceptionandvm.current_exceptioninteract withPushExcInfo(which expects to see the previous active exception incurrent_exception). It would be good to confirm thatcontextualize_exceptiondoes not overwrite the “previous” exception in a way that breaksprev_excchaining for nested handlers.
1008-1036:GetYieldFromIterbehavior matches CPython’s GET_YIELD_FROM_ITER design.The coroutine/generator/iterator split:
- Requiring
IS_COROUTINEfor yielding from acoroutine_typeand erroring otherwise.- Allowing direct delegation to
generator_type.- Falling back to
iter()for other objects.is the expected behavior for GET_YIELD_FROM_ITER. Just make sure
CodeFlags::IS_COROUTINEreally covers all coroutine/async‑generator cases you intend to gate here; if async generators use a different flag, they could be erroneously rejected.
782-792: New stack underflow checks are useful but could be routed throughfatal()for consistency.Both
CopyItemandpop_multiplenow:
- Detect attempts to read/pull more stack entries than available.
- Emit detailed diagnostics (stack length, opcode,
lasti, code object name, source path) and panic.That’s a big improvement in debuggability for internal compiler/VM bugs. If you want to keep failure handling uniform, consider redirecting to
fatal()instead of nakedpanic!so all fatal interpreter errors have a single code path, but this is purely stylistic.Also applies to: 2742-2757
scripts/generate_opcode_metadata.py (2)
13-16: Regex pattern may be fragile for complex enum bodies.The pattern
r"pub enum Instruction \{(.+?)\n\}"uses a non-greedy match that stops at the first\n}. If theInstructionenum contains nested items with closing braces or if the formatting changes, this could capture incomplete content.🔎 Suggested improvement using brace counting
# Alternative: use brace counting for more robust parsing def extract_enum_body(content, enum_name): pattern = rf"pub enum {enum_name}\s*\{{" match = re.search(pattern, content) if not match: return None start = match.end() depth = 1 pos = start while pos < len(content) and depth > 0: if content[pos] == '{': depth += 1 elif content[pos] == '}': depth -= 1 pos += 1 return content[start:pos-1]
1-5: Consider adding a shebang and encoding declaration.The script lacks a shebang line for direct execution.
🔎 Proposed addition
+#!/usr/bin/env python3 +# -*- coding: utf-8 -*- """Generate _opcode_metadata.py for RustPython bytecode. This file generates opcode metadata that is compatible with CPython 3.13. RustPython's Instruction enum is now ordered to match CPython opcode numbers exactly. """crates/compiler-core/src/varint.rs (1)
100-137: Consider adding roundtrip tests for read functions.The tests verify write operations and one roundtrip for
varint_with_start, but don't verify thatread_varintcorrectly decodes values written bywrite_varint.🔎 Suggested additional test
#[test] fn test_roundtrip_varint() { let test_values = [0u32, 1, 63, 64, 127, 128, 4095, 4096, u32::MAX]; for &val in &test_values { let mut buf = Vec::new(); write_varint(&mut buf, val); let mut pos = 0; let read_val = read_varint(&buf, &mut pos); assert_eq!(read_val, Some(val), "roundtrip failed for {}", val); assert_eq!(pos, buf.len(), "position mismatch for {}", val); } }crates/vm/src/vm/mod.rs (1)
463-497: Consider allowing caller‑supplied path for run_pyc_bytes/file
run_pyc_bytesalways uses"<pyc_bytes>"for the code object filename and"<source>"for__file__viawith_simple_run, regardless of where the.pycoriginally came from. For real files this makes tracebacks and__file__/__cached__look synthetic.If you expect this to be used with actual compiled files (not only opaque buffers), consider:
- Adding an optional
path: &strparameter torun_pyc_bytes, threading it into bothPyCode::from_pycandwith_simple_run, or- Deriving a best‑effort path from the
.pycheader metadata when available.It’s not correctness‑critical, but would improve debuggability and parity with the CLI path‑based execution.
Also applies to: 539-583
crates/codegen/src/ir.rs (1)
483-601: Linetable encoding relies on CPython 3.11 invariants (line/column ranges)The reworked
generate_linetablematches the Python 3.11+ scheme (Short, OneLine, Long, NoColumns) and uses the sharedwrite_signed_varint/write_varinthelpers, which is good.A couple of invariants seem assumed but not enforced here:
end_line_deltais cast tou32forwrite_varint; this is fine as long asend_loc.line >= loc.linealways holds. If any AST producer ever hands you a location whereend_line < line(e.g., from malformed ranges), the cast will wrap and produce a very large unsigned delta.Similarly, the Short and OneLine encodings assume
end_col >= coland columns< 128; other cases fall back to Long, but the assumptions are implicit.If those properties are already guaranteed by your
SourceLocation/parser pipeline, this is fine. Otherwise, it might be worth adding debug asserts (only in debug builds) to make violations obvious during development rather than silently emitting nonsensical linetables.crates/compiler-core/src/bytecode.rs (1)
1115-1115: VerifyLAST_INSTRUCTIONremains in sync with enum.The constant is now
Instruction::Resume { arg: Arg::marker() }, which appears to be the last variant at opcode 149. However, this requires manual maintenance whenever the enum changes.Consider adding a compile-time assertion or documentation to help maintain this invariant, especially since the comment states this is critical for marshaling.
crates/stdlib/src/opcode.rs (1)
57-163: Eliminate double conversion in opcode metadata methods.Methods
has_const,has_name,has_jump,has_free, andhas_localall follow this pattern:pub fn has_const(opcode: i32) -> bool { if !Self::is_valid(opcode) { // <- first try_from return false; } if let Ok(instr) = Instruction::try_from(opcode as u8) { // <- second try_from matches!(instr, ...) } else { false } }Since
is_validalready callsInstruction::try_from, this converts the opcode twice. Refactor to convert once:🔎 Proposed refactor
pub fn has_const(opcode: i32) -> bool { - if !Self::is_valid(opcode) { - return false; - } - if let Ok(instr) = Instruction::try_from(opcode as u8) { + if !(0..=255).contains(&opcode) { + return false; + } + if let Ok(instr) = Instruction::try_from(opcode as u8) { matches!( instr, Instruction::LoadConst { .. } | Instruction::ReturnConst { .. } ) } else { false } }Apply the same pattern to
has_name,has_jump,has_free, andhas_local.crates/vm/src/vm/python_run.rs (2)
33-67: Consider enriching file read error context.The implementation correctly handles pyc vs source files with appropriate loaders. However, line 62 converts I/O errors to OS errors without including the filename:
Err(err) => { return Err(self.new_os_error(err.to_string())); }Consider including the filename for better debugging:
🔎 Suggested improvement
Err(err) => { - return Err(self.new_os_error(err.to_string())); + return Err(self.new_os_error(format!("{}: {}", path, err))); }
105-119: Error handling for loader setup is adequate, but error messages could be enhanced.The
set_main_loaderfunction correctly propagates errors from all operations (import, attribute access, instantiation, and dict assignment) using the?operator. However, consider wrapping these operations to provide loader-specific context in error messages (e.g., "Failed to load X loader for main"), similar to how other parts of the codebase usenew_import_error()andnew_attribute_error()with descriptive context.crates/stdlib/src/ssl.rs (2)
2778-2784: Consider hoisting socket acquisition outside the loopThe
py_socketandsocketare acquired inside the while loop on every iteration. This is inefficient and could theoretically fail mid-flush, leaving the buffer in a partial state. Consider acquiring the socket once before the loop.🔎 Proposed optimization
pub(crate) fn flush_pending_tls_output( &self, vm: &VirtualMachine, deadline: Option<std::time::Instant>, ) -> PyResult<()> { let mut pending = self.pending_tls_output.lock(); if pending.is_empty() { return Ok(()); } let socket_timeout = self.get_socket_timeout(vm)?; let is_non_blocking = socket_timeout.map(|t| t.is_zero()).unwrap_or(false); + // Acquire socket once before the loop + let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?; + let socket = py_socket + .sock() + .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?; + let mut sent_total = 0; while sent_total < pending.len() { // Calculate timeout: use deadline if provided, otherwise use socket timeout let timeout_to_use = if let Some(dl) = deadline { // ... timeout calculation ... }; - // Use sock_select directly with calculated timeout - let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?; - let socket = py_socket - .sock() - .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?; let timed_out = sock_select(&socket, SelectKind::Write, timeout_to_use) .map_err(|e| vm.new_os_error(format!("select failed: {e}")))?;
2920-2937: Consider adding a safety limit to prevent potential infinite loopsIf
sock_sendrepeatedly returnssent == 0without raisingBlockingIOError(unusual but possible edge case), the loop could spin indefinitely. While the comment acknowledges this, consider adding a safety counter.🔎 Suggested defensive improvement
pub(crate) fn blocking_flush_all_pending(&self, vm: &VirtualMachine) -> PyResult<()> { let timeout = self.get_socket_timeout(vm)?; + let mut zero_send_count = 0; + const MAX_ZERO_SENDS: usize = 100; loop { // ... existing code ... match self.sock_send(&pending_data, vm) { Ok(result) => { let sent: usize = result.try_to_value::<isize>(vm)?.try_into().unwrap_or(0); if sent > 0 { + zero_send_count = 0; let mut pending = self.pending_tls_output.lock(); pending.drain(..sent); - } - // If sent == 0, socket wasn't ready despite select() saying so - // Continue loop to retry - this avoids infinite loops + } else { + zero_send_count += 1; + if zero_send_count >= MAX_ZERO_SENDS { + return Err(vm.new_os_error( + "Socket repeatedly returned 0 bytes sent".to_string() + )); + } + } }
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (52)
Lib/_opcode_metadata.pyis excluded by!Lib/**Lib/importlib/_bootstrap_external.pyis excluded by!Lib/**Lib/pkgutil.pyis excluded by!Lib/**Lib/platform.pyis excluded by!Lib/**Lib/plistlib.pyis excluded by!Lib/**Lib/tempfile.pyis excluded by!Lib/**Lib/test/_test_multiprocessing.pyis excluded by!Lib/**Lib/test/test__opcode.pyis excluded by!Lib/**Lib/test/test_asyncgen.pyis excluded by!Lib/**Lib/test/test_cmd_line_script.pyis excluded by!Lib/**Lib/test/test_compile.pyis excluded by!Lib/**Lib/test/test_contextlib.pyis excluded by!Lib/**Lib/test/test_contextlib_async.pyis excluded by!Lib/**Lib/test/test_dis.pyis excluded by!Lib/**Lib/test/test_exceptions.pyis excluded by!Lib/**Lib/test/test_ftplib.pyis excluded by!Lib/**Lib/test/test_generators.pyis excluded by!Lib/**Lib/test/test_import/__init__.pyis excluded by!Lib/**Lib/test/test_import/data/circular_imports/import_cycle.pyis excluded by!Lib/**Lib/test/test_import/data/circular_imports/singlephase.pyis excluded by!Lib/**Lib/test/test_import/data/double_const.pyis excluded by!Lib/**Lib/test/test_import/data/package3/__init__.pyis excluded by!Lib/**Lib/test/test_import/data/package3/submodule.pyis excluded by!Lib/**Lib/test/test_import/data/package4/__init__.pyis excluded by!Lib/**Lib/test/test_import/data/package4/submodule.pyis excluded by!Lib/**Lib/test/test_importlib/test_util.pyis excluded by!Lib/**Lib/test/test_patma.pyis excluded by!Lib/**Lib/test/test_pkgutil.pyis excluded by!Lib/**Lib/test/test_platform.pyis excluded by!Lib/**Lib/test/test_plistlib.pyis excluded by!Lib/**Lib/test/test_subprocess.pyis excluded by!Lib/**Lib/test/test_tempfile.pyis excluded by!Lib/**Lib/test/test_threading.pyis excluded by!Lib/**Lib/test/test_traceback.pyis excluded by!Lib/**Lib/test/test_unittest/test_async_case.pyis excluded by!Lib/**Lib/test/test_unittest/test_discovery.pyis excluded by!Lib/**Lib/test/test_unittest/test_loader.pyis excluded by!Lib/**Lib/test/test_unittest/test_program.pyis excluded by!Lib/**Lib/test/test_unittest/test_runner.pyis excluded by!Lib/**Lib/test/test_unittest/testmock/testasync.pyis excluded by!Lib/**Lib/test/test_unittest/testmock/testhelpers.pyis excluded by!Lib/**Lib/test/test_unittest/testmock/testpatch.pyis excluded by!Lib/**Lib/test/test_weakref.pyis excluded by!Lib/**Lib/threading.pyis excluded by!Lib/**Lib/token.pyis excluded by!Lib/**Lib/unittest/__init__.pyis excluded by!Lib/**Lib/unittest/async_case.pyis excluded by!Lib/**Lib/unittest/loader.pyis excluded by!Lib/**Lib/unittest/main.pyis excluded by!Lib/**Lib/weakref.pyis excluded by!Lib/**crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snapcrates/codegen/src/snapshots/rustpython_compiler_core__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (33)
.cspell.dict/cpython.txt.vscode/launch.json.vscode/tasks.jsonREADME.mdcrates/codegen/src/compile.rscrates/codegen/src/error.rscrates/codegen/src/ir.rscrates/codegen/src/symboltable.rscrates/compiler-core/src/bytecode.rscrates/compiler-core/src/lib.rscrates/compiler-core/src/varint.rscrates/compiler/src/lib.rscrates/jit/src/instructions.rscrates/sre_engine/src/constants.rscrates/stdlib/src/opcode.rscrates/stdlib/src/ssl.rscrates/stdlib/src/ssl/compat.rscrates/vm/src/builtins/asyncgenerator.rscrates/vm/src/builtins/code.rscrates/vm/src/builtins/type.rscrates/vm/src/coroutine.rscrates/vm/src/exceptions.rscrates/vm/src/frame.rscrates/vm/src/import.rscrates/vm/src/stdlib/ast.rscrates/vm/src/version.rscrates/vm/src/vm/compile.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/python_run.rscrates/vm/src/vm/vm_new.rsscripts/generate_opcode_metadata.pyscripts/generate_sre_constants.pysrc/lib.rs
💤 Files with no reviewable changes (2)
- .vscode/launch.json
- .vscode/tasks.json
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/generate_sre_constants.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/builtins/type.rscrates/vm/src/coroutine.rscrates/codegen/src/error.rscrates/vm/src/import.rscrates/compiler/src/lib.rscrates/jit/src/instructions.rscrates/compiler-core/src/lib.rscrates/vm/src/exceptions.rscrates/vm/src/builtins/code.rscrates/codegen/src/ir.rscrates/vm/src/builtins/asyncgenerator.rscrates/stdlib/src/opcode.rscrates/vm/src/vm/vm_new.rscrates/stdlib/src/ssl/compat.rscrates/vm/src/frame.rscrates/codegen/src/symboltable.rscrates/vm/src/version.rscrates/compiler-core/src/varint.rscrates/vm/src/stdlib/ast.rscrates/vm/src/vm/compile.rscrates/compiler-core/src/bytecode.rscrates/vm/src/vm/mod.rscrates/stdlib/src/ssl.rscrates/vm/src/vm/python_run.rssrc/lib.rscrates/sre_engine/src/constants.rs
**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code
Files:
scripts/generate_opcode_metadata.py
🧠 Learnings (5)
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.
Applied to files:
crates/vm/src/frame.rscrates/compiler-core/src/bytecode.rscrates/vm/src/vm/mod.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When testing Python code, always use RustPython instead of the standard `python` command; use `cargo run -- script.py` or `cargo run` for interactive REPL
Applied to files:
README.mdcrates/vm/src/vm/mod.rscrates/vm/src/vm/python_run.rssrc/lib.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: RustPython is a Python 3 interpreter written in Rust implementing Python 3.13.0+ compatibility with complete Python environment in Rust (not CPython bindings), clean implementation without compatibility hacks, and cross-platform support including WebAssembly compilation
Applied to files:
README.mdcrates/vm/src/vm/mod.rssrc/lib.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When modifying bytecode instructions, perform a full clean build by running `rm -r target/debug/build/rustpython-* && find . | grep -E '\.pyc$' | xargs rm -r`
Applied to files:
crates/vm/src/vm/compile.rscrates/vm/src/vm/mod.rsscripts/generate_opcode_metadata.py
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
crates/vm/src/vm/compile.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/python_run.rssrc/lib.rs
🧬 Code graph analysis (8)
crates/vm/src/builtins/code.rs (1)
crates/vm/src/import.rs (1)
check_pyc_magic_number_bytes(11-13)
crates/codegen/src/ir.rs (2)
crates/compiler-core/src/bytecode.rs (4)
encode_exception_table(47-59)new(34-42)new(540-542)new(1150-1152)crates/compiler-core/src/varint.rs (2)
write_signed_varint(22-30)write_varint(9-17)
crates/vm/src/builtins/asyncgenerator.rs (1)
crates/vm/src/types/slot.rs (1)
del(1579-1579)
crates/stdlib/src/opcode.rs (1)
crates/compiler-core/src/bytecode.rs (3)
try_from(1131-1137)try_from(1158-1163)try_from(1172-1178)
crates/compiler-core/src/varint.rs (1)
crates/vm/src/builtins/code.rs (1)
read_varint(1116-1134)
crates/compiler-core/src/bytecode.rs (1)
crates/compiler-core/src/varint.rs (4)
read_varint(74-98)read_varint_with_start(45-70)write_varint(9-17)write_varint_with_start(34-41)
crates/stdlib/src/ssl.rs (2)
crates/stdlib/src/socket.rs (13)
std(1252-1252)std(1315-1315)std(1415-1415)std(1837-1837)std(1843-1843)std(1877-1877)std(1883-1883)std(2354-2354)std(2397-2397)std(2434-2434)timeout_error_msg(3296-3298)sock_select(2720-2776)timeout(835-837)crates/stdlib/src/ssl/error.rs (1)
create_ssl_want_write_error(112-118)
crates/vm/src/vm/python_run.rs (1)
crates/vm/src/import.rs (1)
check_pyc_magic_number_bytes(11-13)
🪛 markdownlint-cli2 (0.18.1)
README.md
74-74: Dollar signs used before commands without showing output
(MD014, commands-show-output)
75-75: Dollar signs used before commands without showing output
(MD014, commands-show-output)
76-76: Dollar signs used before commands without showing output
(MD014, commands-show-output)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Run snippets and cpython tests (ubuntu-latest)
- GitHub Check: Run rust tests (windows-latest)
- GitHub Check: Run rust tests (ubuntu-latest)
- GitHub Check: Run snippets and cpython tests (macos-latest)
- GitHub Check: Run snippets and cpython tests (windows-latest)
- GitHub Check: Run rust tests (macos-latest)
- GitHub Check: Ensure compilation on various targets
- GitHub Check: Check Rust code with clippy
🔇 Additional comments (58)
.cspell.dict/cpython.txt (1)
17-17: LGTM!The addition of
CLASSDEREFis correctly placed in alphabetical order within the spell-check dictionary. This entry aligns with the broader PR changes that expand the bytecode instruction set.crates/jit/src/instructions.rs (2)
175-189: LGTM! Proper terminator checking before fallthrough jump.The logic correctly checks whether the current block needs a terminator before inserting a fallthrough jump, preventing duplicate terminators that would cause Cranelift compilation errors.
208-216: LGTM! Correct unreachable code tracking for unconditional jumps.Adding
Jumpto the unreachable code detection is semantically correct—after an unconditional jump, subsequent instructions are unreachable until a branch target label re-establishes reachability.crates/vm/src/builtins/type.rs (1)
2001-2021: LGTM! Well-implemented fallback for non-PyType metaclasses.The enhancement correctly handles metaclass resolution when
fast_issubclassis inconclusive by falling back to the generalis_subclassmethod. The logic properly:
- Preserves the fast path optimization for common PyType metaclasses
- Handles both directions of subclass checking
- Propagates errors appropriately with the
?operator- Maintains the same control flow as the fast path
The comment at line 2010 clearly explains the purpose of the fallback.
Note: The PR title mentions "Updated the re library + test", but this file contains metaclass resolution changes for
type.rs. This may indicate a title/description mismatch.crates/vm/src/import.rs (1)
11-13: LGTM!The implementation correctly validates the pyc magic number prefix. Using
starts_withwith the first 2 bytes is appropriate sincePYC_MAGIC_NUMBERis au16.crates/vm/src/builtins/code.rs (1)
1045-1064: LGTM: Variable name resolution correctly extended.The refactored logic now properly resolves indices across varnames, cellvars, and freevars with correct boundary checks and index adjustments.
crates/sre_engine/src/constants.rs (2)
17-64: SreOpcode enum looks complete and well-structured.The enum includes all expected SRE opcodes including newer additions like
ATOMIC_GROUP,POSSESSIVE_REPEAT,POSSESSIVE_REPEAT_ONE, and the various_UNI_IGNOREvariants that correspond to Python 3.11+ regex features.
66-106: SreAtCode and SreCatCode enums are correctly defined.Both enums cover the full set of anchor codes and category codes expected by the SRE engine. The sequential numbering and naming conventions match CPython's
_constants.py.crates/vm/src/frame.rs (9)
427-462:yield_from_targetheuristic matches generator state layout and looks correct.The logic to detect a suspended
yield from/await:
- Guards on non‑empty stack.
- Uses
lastias the index of the next instruction (RESUMEorSend) and checks the previous instruction forYieldValuewitharg >= 1.- Returns the stack top as delegate only in that configuration.
That matches how this interpreter increments
lastiinrunand how the newYIELD_VALUEoparg is used; it should safely returnNonefor unstarted frames or direct yields.
464-532:gen_throwintegration withCLEANUP_THROWand exception tables is well‑structured but stack layout is brittle.The new
gen_throwbehavior for delegatedthrow:
- Correctly prefers
Coro::throwfor built‑in coroutines/generators and falls back to athrowattribute.- On failure, pushing
None(so[receiver] -> [receiver, None]) and then callingunwind_blockswithRaising { exception }lines up with the comments and theCLEANUP_THROWhandler’s expectation of[sub_iter, last_sent_val, exc]once the exception is pushed.- The direct “no delegate” path builds and contextualizes an exception, adds a traceback frame at the yield site, pushes
Noneto establish[receiver, None], and then reusesunwind_blocksas for normal raises.Given how tightly this relies on the compiler’s chosen handler depths and on the exception table’s
push_lastisettings, I’d strongly suggest making sure there are tests that cover:
throwinto plain generators, async generators, andyield fromchains.- Delegates that implement and don’t implement
.throw.- Interactions with surrounding
try/finallycleanups.If those pass, this path looks solid.
876-893:EndAsyncForsemantics align with CPython’s StopAsyncIteration handling.Popping
(exc, awaitable)and:
- Treating
StopAsyncIteration(byfast_isinstance) as the normal termination of an async for loop and clearing the VM’s current exception.- Re‑raising any other exception.
matches CPython’s behavior. The explicit downcast to
PyBaseExceptionwill panic only if the compiler violates the expected stack layout, which is acceptable here.
969-999:GetAwaitable’s “already awaited” guard is a good safety check.The new branch:
- Detects
PyCoroutinevalues and callsframe().yield_from_target()to approximate_PyGen_yf, raisingRuntimeError("coroutine is being awaited already")if it’s non‑None.- Otherwise uses
__await__and validates that it returns an iterator.This should prevent double‑awaiting the same coroutine while remaining compatible with custom awaitables. The only thing to watch is that
yield_from_targetcorrectly reflects all states in which a coroutine is already being awaited (i.e., that it’s neverNonewhen the coroutine is suspended in an await chain). If you have edge‑case tests around double‑await of coroutines, it’d be worth confirming they now pass and behave like CPython.
1164-1173: Comprehension opcodes (LoadFastAndClear/StoreFastLoadFast) look correct; rely on compiler invariants.
LoadFastAndCleardoes atake()on the fastlocal slot and pushesNonewhen it’s empty, which matches inlined comprehension semantics where the temp may or may not have been initialized.StoreFastLoadFaststores to one index and then immediately loads another (often the same) under a single lock, assuming the load slot is definitely populated afterward (expect).Both rely on the compiler never emitting these opcodes in contexts where the slots are uninitialized in ways Python would treat as an error. As long as that invariant holds in the codegen, this is fine; otherwise,
expectwill panic. No changes needed, but it’s worth keeping an eye on codegen evolution if new patterns start using these instructions.Also applies to: 1603-1617
1429-1521: New exception opcodes (PopBlock,PopException,PushExcInfo,CheckExcMatch,Reraise,WithExceptStart) match CPython’s 3.11+ model.Highlights:
- Treating
PopBlockas a pseudo‑instruction is consistent with zero‑cost exception tables: the real cleanup is driven by the exception table, not the block stack.PushExcInfoandPopExceptionmanage(prev_exc, exc)on the value stack while updating VM exception state, which is in line with CPython’sPUSH_EXC_INFO/POP_EXCEPTbehavior.CheckExcMatchsimply returns a bool on the stack; type validation is done earlier in opcodes likeJumpIfNotExcMatch, which is fine.Reraise { depth }pops the given depth plus the exception and then re‑errors with either the exception or the current active exception, which is what you want for structured re‑raise sites.WithExceptStartcorrectly locates__exit__atstack_len - 4given the documented stack layout and passes(type, value, tb)fromvm.current_exception()into it.Overall this cluster looks consistent with CPython’s new exception stack protocol. The main risk is divergence between assumed and actual stack layouts if future codegen tweaks these opcodes; adding targeted tests around nested
with+exceptandexcept*would help lock this down.Also applies to: 1532-1558, 1663-1682
1683-1764:YieldValue,Send,EndSend, andCleanupThrowimplement PEP 380/PEP 525 semantics correctly.
YieldValueuses its oparg to distinguish plain yields (arg 0, wrapping inPyAsyncGenWrappedValuefor coroutine code) fromyield from/await(arg ≥ 1, no wrapping), which is what CPython does.Sendkeeps(receiver, retval)on the stack after a normal send and, onStopIteration, replaces the top value with the stop value and jumps to the target label—matching the control‑flow pattern aroundYIELD_FROM.EndSendremoves the receiver and leaves only the value, as expected at the end of a send chain.CleanupThrow’s treatment ofStopIterationvs other exceptions (extracting the value and producing(None, value)vs re‑raising) matches CPython’sCLEANUP_THROWbehavior, given your exception table comments.These opcodes are subtle, but the stack comments match the code and the surrounding
gen_throwlogic. As long as the compiler’s exception table depths align with the documented stacks, they should behave correctly.
2808-2811:Frame’sDebugoutput is clearer now.Including both
StackandLocalsin the formatted string is useful when diagnosing fatal stack issues (e.g., underflows/overflows). No issues here.
1921-2005: Break/continue semantics with finally/with blocks are correctly implemented at compile time.For
Break/Continueopcodes insidetry/finallyorwithblocks, the compiler (compile_break_continue) emits all required cleanup code before theBreak/Continueinstruction itself:
- Finally bodies are compiled inline before the jump
- With-block exit handlers are emitted before the jump
- For-loop iterators are popped before the jump (for
breakonly)This compile-time approach ensures all cleanup executes before control transfers, correctly matching CPython's semantics where a finally block that executes
break/continuewill suppress any pending exception. Theunwind_blocksfunction correctly handles onlyBreak/Continueat runtime with a direct jump (no exception table lookup needed), since no exception can be raised by the jump itself—all exception handling paths are covered by exception table entries pointing to handler blocks, not by theBreak/Continueinstructions.crates/vm/src/coroutine.rs (2)
71-107: Generator exception‑state swapping inrun_with_contextis a solid improvement; verifywith_frame’s behavior matches assumptions.The new logic:
- Captures the saved generator exception (
gen_exc) fromself.exception.- Inside
with_frame, pops the temporaryNonethatwith_framepushed, then pushesgen_exc, runs the frame, and storesvm.current_exception()back intoself.exceptionbeforewith_framepops.- Restores
runningafterwards as before.This ensures each generator/coroutine resumes with its own exception context and that the caller’s exception chain is restored on exit, which should fix subtle traceback/context issues around
send/throw. Just make surewith_framereally doespush_exception(None)before invoking the closure andpop_exception()after, as assumed in the comments; otherwise the manual pop/push here could unbalance the exception stack.
170-178:closeearly‑exit for unstarted generators matches CPython semantics.Short‑circuiting
close()whenframe.lasti() == 0(never started):
- Avoids injecting a
GeneratorExitinto a frame that has never run.- Still marks the coroutine/generator as closed.
That’s in line with CPython’s behavior and avoids unnecessary work or surprising side effects.
crates/compiler-core/src/lib.rs (1)
6-11: Publicly exposingvarintis appropriate.Adding
pub mod varint;cleanly surfaces the new varint utilities to downstream crates (e.g., bytecode/exception‑table encoders) without affecting existing exports.crates/vm/src/stdlib/ast.rs (1)
272-283: Parse error location expansion to includeend_locationlooks correct.Transforming
parse_error.location: TextRangeinto aPySourceRangeviatext_range_to_source_range, then setting:
locationfromrange.start.to_source_location()end_locationfromrange.end.to_source_location()gives the compiler/VM access to both start and end positions, consistent with how other layers now report syntax locations. Using
"<unknown>"forsource_pathwhen invoked through this AST helper also seems fine, since callers typically treat this as a generic parse API rather than a file‑bound compiler entrypoint.crates/vm/src/vm/vm_new.rs (2)
65-80:new_scope_with_maincorrectly wires__main__into globals andsys.modules.
- Builds a scope with builtins.
- Creates a
__main__module sharing that scope’s globals.- Initializes
__main__.__annotations__to an empty dict.- Registers the module in
sys.modules["__main__"].That’s a clean, self‑contained replacement for ad‑hoc main‑module setup. The
dict().set_item(...).expect(...)usage is fine here given the dict is a fresh built‑in mapping.
332-525: SyntaxError now includes end position metadata, improving diagnostics.The addition:
- Calls
error.python_end_location()and, when present, setsend_linenoandend_offseton the SyntaxError instance alongside the existinglineno/offsetandtext/filename.This aligns with the extended location tracking you wired through the parser/compiler and matches how CPython exposes
end_lineno/end_offset. No issues.crates/vm/src/exceptions.rs (1)
220-270: SyntaxError caret rendering now correctly respects multi‑line ranges.The updated logic:
- Reads
lineno,end_lineno,offset, andend_offset.- Only draws the caret line when
end_linenois equal tolineno(or whenend_linenois missing), avoiding misleading carets for multi‑line errors.- Derives a default
end_offsetfromoffsetwhen it’s absent/0/−1 and ensures at least one caret is shown.- Computes caret width as
end_colno - colnoover the stripped leading whitespace, preserving the previous visual alignment.This is a nice improvement for range‑aware SyntaxErrors and remains backward‑compatible for the older single‑point cases.
crates/codegen/src/error.rs (1)
79-81: LGTM!The new
InvalidAsyncFor,InvalidAsyncWith, andInvalidAsyncComprehensionerror variants follow the existing patterns and provide appropriate error messages consistent with CPython behavior.Also applies to: 119-126
crates/compiler/src/lib.rs (1)
87-95: LGTM!The
python_end_locationaccessor correctly returns the end location for parse errors andNonefor codegen errors, matching the pattern of the existingpython_locationmethod.src/lib.rs (2)
150-171: LGTM - Good implementation ofrun_file.The function correctly mirrors CPython's
pymain_run_file_objbehavior:
- Checks for package directories via
get_importer- Uses
runpy._run_module_as_mainfor package execution- Properly handles
sys.path[0]insertion based onsafe_pathsetting
354-361: LGTM - Test updated to use new APIs.The test correctly uses
vm.new_scope_with_main()and exercises both direct file execution and directory package execution viarun_file.crates/vm/src/vm/compile.rs (1)
1-30: LGTM - Clean separation of concerns.Good refactoring to separate compilation from execution. The module now has a single, clear responsibility, and the documentation properly directs users to
python_run.rsfor execution functions.crates/vm/src/builtins/asyncgenerator.rs (2)
35-35: LGTM - Correct Destructor implementation for async generators.The
Destructorimplementation properly follows CPython's_PyGen_Finalizepattern:
- Skips finalization if the generator is already closed
- Calls the finalizer hook through
call_finalizerwhich ignores errors (matchingPyErr_WriteUnraisablebehavior)Also applies to: 647-658
76-86: LGTM - Finalizer error handling is correct.The
call_finalizermethod correctly ignores errors from the finalizer callback, matching CPython's behavior of usingPyErr_WriteUnraisableto silently discard exceptions during finalization.crates/compiler-core/src/varint.rs (3)
9-17: LGTM - Correct varint encoding implementation.The 6-bit chunk encoding with
0x40continuation bit follows the standard pattern used for Python bytecode exception tables.
22-30: LGTM - Correct zigzag encoding for signed integers.The encoding correctly handles
i32::MINusingwrapping_subto avoid undefined behavior, and the zigzag transformation((0 - val) << 1) | 1for negatives /(val << 1)for positives is standard.
74-98: The 0x80 check difference is intentional by design. Theread_varintincompiler-core/src/varint.rsis used for exception tables where entries are delimited by 0x80 markers (set bywrite_varint_with_start), while theread_varintincode.rsis for line tables which use linear byte streams without entry delimiters. The 0x80 check invarint.rsat line 87 is a defensive mechanism that breaks before consuming a byte with 0x80 set (when shift > 0), allowing the nextread_varint_with_startcall to correctly identify and read the next entry. These implementations serve different formats and are not interchangeable, but this is expected and safe by design. Adding documentation to clarify the purpose of eachread_varintvariant would reduce confusion.scripts/generate_opcode_metadata.py (1)
225-226: No action required—HAVE_ARGUMENT = 44is correctly positioned.The constant correctly identifies the boundary between no-argument instructions (opcodes 0–43) and with-argument instructions (44+) in RustPython's
Instructionenum. Position 44 corresponds toWithExceptStart, which matches the enum structure and is documented consistently across the bytecode definition, stdlib opcode module, and the generation script's own verification checks.crates/vm/src/vm/mod.rs (1)
607-618: Verify new per‑frame exception stack isolation doesn’t regress sys.exc_info/context behavior
with_framenow unconditionally does:
push_exception(None)before running the frame, andpop_exception()afterwards, discarding whateverset_exceptionstored in the topExceptionStackentry during execution.Because the actual error is propagated via the
PyResultreturn, this is fine as long as:
- No code relies on
current_exception()(ortopmost_exception()) still containing the just‑raised exception afterrun_frame/with_framereturns, and- All uses of
push_exception/pop_exceptionaroundtry/exceptblocks remain balanced with this new implicit per‑frame layer.Given
contextualize_exceptionandtopmost_exceptionnow walk the entireExceptionStackand include cycle detection, the internal behavior looks sound, but I recommend double‑checking any callers that previously usedtake_exception/current_exceptionforsys.exc_info()or logging to ensure they still see the right value at the right time.Also applies to: 908-920, 932-955
crates/compiler-core/src/bytecode.rs (5)
17-85: LGTM! Exception table implementation is solid.The encoding and decoding logic correctly implements the 6-bit varint format with start markers. The use of
saturating_subfor size calculation prevents underflow, and the handler lookup correctly uses half-open interval semantics[start, end).
614-629: Excellent documentation on RaiseKind variants.The detailed comments explaining the mapping between each variant and Python syntax/CPython opcodes are very helpful. This makes the distinction between
BareRaise(getting exception from VM state) andReraiseFromStack(getting from stack) clear.
1922-2090: Stack effect implementation looks correct.The updated
stack_effectmethod properly handles:
- Different
RaiseKindvariants with appropriate stack effects- Generator/async operations (
Send,EndSend,CleanupThrow)- Defensive handling in
BuildSliceusingtry_getwith a sensible fallbackThe comprehensive coverage and inline comments make the logic easy to follow.
2367-2436: Good test coverage for exception table functionality.The test suite covers key scenarios:
- Multiple entries with different configurations
- Boundary conditions (inclusive start, exclusive end)
- Edge cases (empty table, single entry)
- Multiple lookups at various offsets
This provides solid confidence in the encoding/decoding implementation.
678-1111: The runtime guards and bytecode validation are actually in place. The VM explicitly callsunreachable!()in frame.rs for all placeholder instructions, and bytecode deserialization validates opcodes againstLAST_INSTRUCTION. Additionally, codegen never emits these instructions.However, there is an accuracy issue:
GetYieldFromIteris marked as "not implemented, placeholder" in the bytecode enum comment, but the implementation at frame.rs:1008 shows it's fully implemented with logic to handle coroutines, generators, and other iterables. This documentation/label is misleading.Likely an incorrect or invalid review comment.
crates/stdlib/src/opcode.rs (1)
165-171: Correct alignment with exception table design.The updated
has_excnow returnsfalsefor all opcodes with a clear explanation that exception handling is managed via the exception table introduced inbytecode.rs. This aligns with the Python 3.11+ zero-cost exception handling model.crates/vm/src/vm/python_run.rs (3)
18-31: LGTM! Clean delegation pattern.The
run_any_file→run_simple_file→with_simple_runflow is clear and handles the edge case of empty paths appropriately.
72-102: Well-designed public API with proper error context.The execution methods provide clear entry points for different use cases. Good practices observed:
- Syntax errors include source context for better error messages
- Deprecated
run_code_stringprovides migration guidancerun_block_expruses specializedMode::BlockExprfor embedded expressions
126-151: Solid pyc detection with proper fallback logic.The two-tier detection strategy is well-designed:
- Fast path: check
.pycextension- Fallback: read magic number (2 bytes only, per CPython compatibility)
The comment about text mode (lines 147-149) is a helpful CPython compatibility note. The file is properly closed via RAII when the
filevariable goes out of scope.crates/stdlib/src/ssl.rs (13)
2342-2347: LGTM - Well-documented buffering fieldThe
pending_tls_outputfield withPyMutex<Vec<u8>>is appropriate for thread-safe buffering of unsent TLS bytes. The documentation clearly explains its purpose for preventing data loss in non-blocking scenarios.
2831-2888: LGTM - Correct handling of partial sendsThe
send_tls_outputmethod correctly handles partial sends by saving unsent bytes to thepending_tls_outputbuffer. The error handling preserves data integrity in all failure modes (timeout, blocking IO error, other errors).
2727-2743: LGTM - Cleaner API with byte slice parameterThe signature change from
PyObjectRefto&[u8]simplifies the internal API. Callers now pass raw bytes directly, and the method handles conversion to Python objects internally. This is more idiomatic for Rust code.
3593-3640: LGTM - Critical TLS 1.3 handshake fixThe flush operations before writing new data and after
write_allare critical for correct TLS 1.3 behavior. This ensures the Finished message reaches the server before application data, preventing protocol violations. The integration withsend_tls_outputcorrectly handles partial sends.
3962-3986: LGTM - Critical data ordering fix in shutdownThe flush sequence before
send_close_notify()is essential for TLS correctness:
- First drains rustls internal buffer to TLS records
- Then flushes
pending_tls_outputto socket- Only then sends
close_notifyThis prevents data loss where the peer receives
close_notifybefore all application data. The BIO vs socket mode distinction is correctly handled.
4040-4078: LGTM - Correct CPython/OpenSSL compatibilityThe timeout handling correctly mirrors CPython/OpenSSL behavior by returning after sending
close_notify. The tiered flush strategy (best-effort for non-blocking, timeout-respecting for timeout mode, blocking for blocking mode) is appropriate. The comment explaining the asyncore deadlock issue is valuable for future maintainers.
4103-4127: LGTM - Correct TLS record orderingThe
flush_pending_tls_outputcall before draining new TLS data ensures proper ordering of TLS records. This prevents scenarios where new data could bypass previously buffered data.
4130-4178: LGTM - Improved EOF detection logicThe enhanced EOF detection distinguishes between "no data available yet" (BIO mode without EOF flag) and "peer closed connection" (socket EOF or BIO with EOF flag). The error handling that treats connection reset as peer closed is correct for shutdown scenarios.
3432-3436: LGTM - OpenSSL-compatible auto-handshakeAuto-completing the handshake on first
read()matches OpenSSL'sSSL_read()behavior. This is the expected behavior for applications that don't explicitly calldo_handshake().
3564-3568: LGTM - Consistent with read methodThe auto-handshake on
write()is consistent with theread()implementation and matches OpenSSL'sSSL_write()behavior.
1905-1905: LGTM - Correct field initializationThe
pending_tls_outputis correctly initialized with an emptyVecwrapped inPyMutex.
1976-1976: LGTM - Consistent with _wrap_socketThe field initialization is consistent with the
_wrap_socketconstructor.
2711-2724: LGTM - Comment clarificationsThe comment updates accurately reflect that flags are not supported in BIO mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (10)
crates/compiler/src/lib.rs (1)
55-61: Use saturating subtraction for defensive safety.As noted in a previous review, the unchecked subtraction at line 57 relies on the implicit assumption that
error.location.end()is non-zero when this branch executes. While the condition logic suggests this is safe, usingsaturating_submakes the intent explicit and guards against edge cases.🔎 Suggested fix
- let prev_line_end = error.location.end() - ruff_text_size::TextSize::from(1); + let prev_line_end = error.location.end().saturating_sub(ruff_text_size::TextSize::from(1));crates/sre_engine/src/constants.rs (1)
108-120: Fix indentation in bitflags! macro.As noted in a previous review, line 109's
#[derive(...)]attribute lacks proper indentation. Since this file is auto-generated, the fix should be applied toscripts/generate_sre_constants.pyin thedump_bitflagsfunction to emit consistent indentation matching rustfmt style.crates/vm/src/builtins/code.rs (1)
339-352: Fix compilation error:OsStrdoesn't have adisplay()method.Line 341 calls
.display()onOsStr(fromfile_stem()), but this method doesn't exist onOsStr. This will cause a compilation error.🔎 Proposed fix
pub fn from_pyc_path(path: &std::path::Path, vm: &VirtualMachine) -> PyResult<PyRef<Self>> { let name = match path.file_stem() { - Some(stem) => stem.display().to_string(), + Some(stem) => stem.to_string_lossy().to_string(), None => "".to_owned(), }; let content = std::fs::read(path).map_err(|e| e.to_pyexception(vm))?;src/lib.rs (1)
173-202: Potential issue: cachedNoneimporters not handled correctly.In CPython's
path_importer_cache, entries can beNoneto indicate that no importer was found for a path. The current implementation at lines 179-181 returnsSome(importer)for any cached value, includingNone. This could causerun_fileto incorrectly treat a path as having an importer when it doesn't.🔎 Proposed fix
if let Some(importer) = path_importer_cache.get_item_opt(path, vm)? { + // CPython caches None for paths with no importer + if vm.is_none(&importer) { + return Ok(None); + } return Ok(Some(importer)); }crates/codegen/src/symboltable.rs (1)
352-382: PEP 709 merge timing: symbols merged after nested scope analysis.The merge of inlined comprehension symbols into the parent scope happens at lines 352-382, which is after
analyze_symbol_tableis called on allsub_tables(lines 344-346). This means nested functions/scopes are analyzed before comprehension bindings are visible in the parent, causing incorrect scope resolution for walrus operator targets referenced in nested functions.crates/stdlib/src/opcode.rs (1)
51-52: Off-by-one inhas_argalready flagged.This issue was identified in a previous review. The condition should be
opcode >= Self::HAVE_ARGUMENTto match CPython semantics where opcodes 44+ have arguments.crates/stdlib/src/ssl/compat.rs (2)
1197-1263: Handshake completion flush can misbehave on non‑blocking sockets
handle_handshake_completein socket mode:
- Sends post‑handshake TLS data via
send_all_bytes, breaking onSslError::WantWritebut still proceeding.- Then unconditionally calls
socket.blocking_flush_all_pending(vm)for all non‑BIO sockets, mapping any Python exception (including timeouts) toSslError::Py.On non‑blocking sockets (
timeout == 0), this means:
blocking_flush_all_pending()usessock_select(..., timeout=0)and treats “not immediately writable” as a timeout, which is then surfaced as asocket.timeoutinstead of a WANT_WRITE‑style condition.WantWritecoming from the earliersend_all_bytesloop is not propagated; the handshake may be treated as fully complete or fail with timeout, rather than signaling “try again when writable”.This diverges from OpenSSL/CPython’s non‑blocking handshake behavior where completion is gated on I/O readiness and signaled back via
SSLWantReadError/SSLWantWriteError.I’d suggest:
- Teaching
blocking_flush_all_pendingto treattimeout == 0as non‑blocking and to yield a WANT_WRITE‑style error instead of a timeout when the socket isn’t immediately writable; and/or- Special‑casing non‑blocking sockets in
handle_handshake_completeto avoid a fully blocking (or pseudo‑blocking) flush, propagating a WANT_WRITE condition when pending TLS data cannot be flushed.
1006-1078:send_all_bytesbuffering semantics look sound butsent == 0is ambiguousThe helper correctly:
- Flushes
pending_tls_outputfirst to preserve TLS record ordering.- Stashes any unsent suffix into
pending_tls_outputonBlockingIOError, timeouts, or other send errors, avoiding data loss.However, treating
sent == 0fromsock_send()asSslError::WantWriteconflates “would block” with a potential connection-closed condition. For Python’ssocket.send(), a return of 0 on a non-empty buffer more typically indicates the peer has closed the connection rather than EAGAIN/EWOULDBLOCK, which should surface as an error.Consider distinguishing a genuine “zero bytes written on non-empty buffer” as a hard EOF (mapped to an
SslError::Eof/Syscallvariant) instead ofWantWrite, or at least tightening this path based on platform semantics or additional checks.crates/stdlib/src/ssl.rs (2)
3960-4077: New shutdown flow improves ordering but still depends onblocking_flush_all_pendingsemanticsThe updated
shutdown():
- Before sending
close_notify, flushes all existing TLS records:
- Drains rustls’ internal write buffer via
conn.write_tlsandsend_tls_output.- Then flushes
pending_tls_output:
- Socket mode:
blocking_flush_all_pending(vm)?(guarantees all application data is sent beforeclose_notify).- BIO mode: non‑blocking
flush_pending_tls_output.- After
close_notifyis queued, useswrite_pending_tls()and, for real sockets, returns immediately once ourclose_notify+ pending bytes are flushed according to the socket’s timeout mode, aligning more closely with OpenSSL’s “one‑sided” shutdown semantics.Ordering and data‑loss prevention look good, but the behavior on non‑blocking sockets again hinges on
blocking_flush_all_pending:
- With
timeout == 0, that helper can raisesocket.timeoutinstead of anSSLWantWriteErrorwhen the socket simply isn’t writable yet, which may surprise callers expecting non‑blockingshutdown()semantics.Given this shares root cause with the earlier
blocking_flush_all_pendingissue, I’d address that helper’s non‑blocking behavior first; once fixed, this shutdown path should behave as intended for all modes.
2890-2938:blocking_flush_all_pendingmay mis‑signal non‑blocking readiness as timeout
blocking_flush_all_pending:
- Repeatedly clones
pending_tls_outputand usessock_selectwith the socket’s timeout to wait for writability, then sends viasock_send, draining the buffer as bytes are accepted.- On
timed_out == true, it raises asocket.timeoutunconditionally; onBlockingIOError, it just loops again.For blocking sockets with a real timeout this is fine, but on non‑blocking sockets (
timeout == 0):
sock_select(..., timeout=Some(0))will often report “timed out” immediately when the socket isn’t writable, and this is surfaced as a hardsocket.timeoutrather than a WANT_WRITE‑style condition.- This method is called from
compat::handle_handshake_completeandshutdown(), so non‑blockingSSLSocketusers may seesocket.timeoutduring handshake completion or shutdown where CPython would typically raiseSSLWantWriteError(or simply return after sending as much as possible).Consider:
- Treating
timeout == Some(0)specially: either skippingsock_selectentirely and relying onsock_send’sBlockingIOErrorto indicate non‑writability, or mappingtimed_outin that case to anSSLWantWriteErrorrather than a timeout.- Keeping the existing behavior for genuine timeouts (
timeout > 0).This ties into the handshake‑completion concern in
compat.rs.
🧹 Nitpick comments (9)
crates/jit/src/instructions.rs (1)
175-190: Improved control flow handling with conditional terminator emission.The logic correctly checks whether the current block needs a terminator before emitting a fallthrough jump, preventing unnecessary jumps when blocks are already properly terminated. The implementation properly handles both empty blocks and checks the last instruction's opcode.
However, the terminator-checking logic (lines 180-186) is duplicated at lines 221-223 in the post-processing section.
Consider extracting terminator check into a helper method
+ fn block_needs_terminator(&self, block: Block) -> bool { + match self.builder.func.layout.last_inst(block) { + None => true, // Empty block needs terminator + Some(inst) => { + // Check if the last instruction is a terminator + !self.builder.func.dfg.insts[inst].opcode().is_terminator() + } + } + } + pub fn compile<C: bytecode::Constant>( ...Then use it at both locations:
- // Check if the block needs a terminator by examining the last instruction - let needs_terminator = match self.builder.func.layout.last_inst(cur) { - None => true, // Empty block needs terminator - Some(inst) => { - // Check if the last instruction is a terminator - !self.builder.func.dfg.insts[inst].opcode().is_terminator() - } - }; + let needs_terminator = self.block_needs_terminator(cur);scripts/generate_opcode_metadata.py (1)
224-227: Consider derivingHAVE_ARGUMENTfrom the enum rather than hardcoding.The constants
HAVE_ARGUMENT = 44andMIN_INSTRUMENTED_OPCODE = 236are hardcoded. If theInstructionenum order changes, these values will become incorrect. The verification printout at lines 233-246 helps catch this, but consider computingHAVE_ARGUMENTdynamically (e.g., finding the index ofWITH_EXCEPT_STARTwhich is the first opcode that takes an argument in CPython 3.13).scripts/generate_sre_constants.py (3)
49-49: Remove debug print statement.This appears to be leftover debugging code that will produce noisy output when the script runs.
Proposed fix
def dump_enum(d, enum_name, derives, strip_prefix=""): items = sorted(d) - print(f"items is {items}") content = [f"{derives}\n"]
142-185: Remove commented-out dead code.This large block of commented-out code appears to be an older implementation that has been replaced. Keeping it adds noise and confusion. If needed for reference, it can be retrieved from version control history.
76-76: Non-idiomatic conditional append.Using an expression statement with a ternary for side effects is unusual. Consider using a standard
ifstatement for clarity.Proposed fix
content = ["bitflags! {\n"] - content.append(f"{derives}\n") if derives else None + if derives: + content.append(f"{derives}\n") content.append(f" pub struct {struct_name}: {int_t} {{\n")crates/stdlib/src/opcode.rs (1)
57-69: RedundantInstruction::try_fromafteris_validcheck.
is_validalready performsInstruction::try_from(opcode as u8).is_ok(), so the subsequenttry_frominhas_const(and similar methods) will always succeed. Consider extracting the instruction once inis_validor usingtry_fromdirectly without theis_validguard.Possible simplification pattern
pub fn has_const(opcode: i32) -> bool { - if !Self::is_valid(opcode) { - return false; - } - if let Ok(instr) = Instruction::try_from(opcode as u8) { + if let Ok(instr) = (0..=255).contains(&opcode) + .then(|| Instruction::try_from(opcode as u8).ok()) + .flatten() + { matches!( instr, Instruction::LoadConst { .. } | Instruction::ReturnConst { .. } ) } else { false } }Or simply remove the
is_validcheck sincetry_fromalready validates:pub fn has_const(opcode: i32) -> bool { (0..=255).contains(&opcode) && Instruction::try_from(opcode as u8) .map(|instr| matches!(instr, Instruction::LoadConst { .. } | Instruction::ReturnConst { .. })) .unwrap_or(false) }crates/vm/src/frame.rs (1)
783-792: Good defensive check for CopyItem stack underflow.Adding explicit bounds checking with diagnostic output will help debug issues during development. Consider using
debug_assert!instead of runtime panic for release builds if this is a compiler invariant.crates/vm/src/vm/python_run.rs (1)
139-145: Useread_exactfor more robust magic number reading.
read()may return fewer bytes than requested even when more are available (though rare for regular files). Usingread_exactmakes the intent clearer and handles edge cases.Proposed fix
let mut file = std::fs::File::open(path)?; let mut buf = [0u8; 2]; use std::io::Read; - if file.read(&mut buf)? != 2 { - return Ok(false); + if file.read_exact(&mut buf).is_err() { + return Ok(false); }crates/stdlib/src/ssl.rs (1)
2746-2826:flush_pending_tls_outputordering and error mapping are mostly good; consider fast‑path for BIO
flush_pending_tls_output:
- Enforces the “old data first” invariant by always emptying
pending_tls_outputbefore any new TLS records are sent.- Honors an optional deadline by shrinking the effective
selecttimeout and returning asocket.timeoutif the deadline is exceeded, while preserving unsent data inpending.- Differentiates blocking vs non‑blocking sockets via
get_socket_timeout(), mapping non‑blocking “would block” conditions toSSLWantWriteError(create_ssl_want_write_error) and again leaving unsent bytes buffered.Two smaller points:
In BIO mode,
get_socket_timeout()returnsOk(None), andpendingis never populated (sinceMemoryBIO.write()always accepts full buffers). The earlypending.is_empty()check ensures you currently bail out before trying to derive aPySocket, but it would be more robust and clearer to explicitly skip the wholesock_select/PySocketpath whenself.is_bio_mode()is true.For non‑blocking sockets (
timeout == 0), a “timed_out” result fromsock_select(..., timeout=0)is treated as a write timeout. Conceptually this is a “would block now” condition rather than a true timeout; if you want closer parity with CPython, you could map that toSSLWantWriteErroras well, instead of asocket.timeout, while still preservingpendingcontents.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (52)
Lib/_opcode_metadata.pyis excluded by!Lib/**Lib/importlib/_bootstrap_external.pyis excluded by!Lib/**Lib/pkgutil.pyis excluded by!Lib/**Lib/platform.pyis excluded by!Lib/**Lib/plistlib.pyis excluded by!Lib/**Lib/tempfile.pyis excluded by!Lib/**Lib/test/_test_multiprocessing.pyis excluded by!Lib/**Lib/test/test__opcode.pyis excluded by!Lib/**Lib/test/test_asyncgen.pyis excluded by!Lib/**Lib/test/test_cmd_line_script.pyis excluded by!Lib/**Lib/test/test_compile.pyis excluded by!Lib/**Lib/test/test_contextlib.pyis excluded by!Lib/**Lib/test/test_contextlib_async.pyis excluded by!Lib/**Lib/test/test_dis.pyis excluded by!Lib/**Lib/test/test_exceptions.pyis excluded by!Lib/**Lib/test/test_ftplib.pyis excluded by!Lib/**Lib/test/test_generators.pyis excluded by!Lib/**Lib/test/test_import/__init__.pyis excluded by!Lib/**Lib/test/test_import/data/circular_imports/import_cycle.pyis excluded by!Lib/**Lib/test/test_import/data/circular_imports/singlephase.pyis excluded by!Lib/**Lib/test/test_import/data/double_const.pyis excluded by!Lib/**Lib/test/test_import/data/package3/__init__.pyis excluded by!Lib/**Lib/test/test_import/data/package3/submodule.pyis excluded by!Lib/**Lib/test/test_import/data/package4/__init__.pyis excluded by!Lib/**Lib/test/test_import/data/package4/submodule.pyis excluded by!Lib/**Lib/test/test_importlib/test_util.pyis excluded by!Lib/**Lib/test/test_patma.pyis excluded by!Lib/**Lib/test/test_pkgutil.pyis excluded by!Lib/**Lib/test/test_platform.pyis excluded by!Lib/**Lib/test/test_plistlib.pyis excluded by!Lib/**Lib/test/test_subprocess.pyis excluded by!Lib/**Lib/test/test_tempfile.pyis excluded by!Lib/**Lib/test/test_threading.pyis excluded by!Lib/**Lib/test/test_traceback.pyis excluded by!Lib/**Lib/test/test_unittest/test_async_case.pyis excluded by!Lib/**Lib/test/test_unittest/test_discovery.pyis excluded by!Lib/**Lib/test/test_unittest/test_loader.pyis excluded by!Lib/**Lib/test/test_unittest/test_program.pyis excluded by!Lib/**Lib/test/test_unittest/test_runner.pyis excluded by!Lib/**Lib/test/test_unittest/testmock/testasync.pyis excluded by!Lib/**Lib/test/test_unittest/testmock/testhelpers.pyis excluded by!Lib/**Lib/test/test_unittest/testmock/testpatch.pyis excluded by!Lib/**Lib/test/test_weakref.pyis excluded by!Lib/**Lib/threading.pyis excluded by!Lib/**Lib/token.pyis excluded by!Lib/**Lib/unittest/__init__.pyis excluded by!Lib/**Lib/unittest/async_case.pyis excluded by!Lib/**Lib/unittest/loader.pyis excluded by!Lib/**Lib/unittest/main.pyis excluded by!Lib/**Lib/weakref.pyis excluded by!Lib/**crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snapis excluded by!**/*.snapcrates/codegen/src/snapshots/rustpython_compiler_core__compile__tests__nested_double_async_with.snapis excluded by!**/*.snap
📒 Files selected for processing (33)
.cspell.dict/cpython.txt.vscode/launch.json.vscode/tasks.jsonREADME.mdcrates/codegen/src/compile.rscrates/codegen/src/error.rscrates/codegen/src/ir.rscrates/codegen/src/symboltable.rscrates/compiler-core/src/bytecode.rscrates/compiler-core/src/lib.rscrates/compiler-core/src/varint.rscrates/compiler/src/lib.rscrates/jit/src/instructions.rscrates/sre_engine/src/constants.rscrates/stdlib/src/opcode.rscrates/stdlib/src/ssl.rscrates/stdlib/src/ssl/compat.rscrates/vm/src/builtins/asyncgenerator.rscrates/vm/src/builtins/code.rscrates/vm/src/builtins/type.rscrates/vm/src/coroutine.rscrates/vm/src/exceptions.rscrates/vm/src/frame.rscrates/vm/src/import.rscrates/vm/src/stdlib/ast.rscrates/vm/src/version.rscrates/vm/src/vm/compile.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/python_run.rscrates/vm/src/vm/vm_new.rsscripts/generate_opcode_metadata.pyscripts/generate_sre_constants.pysrc/lib.rs
💤 Files with no reviewable changes (2)
- .vscode/tasks.json
- .vscode/launch.json
✅ Files skipped from review due to trivial changes (2)
- README.md
- .cspell.dict/cpython.txt
🚧 Files skipped from review as they are similar to previous changes (4)
- crates/codegen/src/error.rs
- crates/vm/src/coroutine.rs
- crates/compiler-core/src/lib.rs
- crates/vm/src/version.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.rs: Follow the default rustfmt code style by runningcargo fmtto format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass,pymodule,pyfunction, etc.) when implementing Python functionality in Rust
Files:
crates/vm/src/import.rscrates/jit/src/instructions.rscrates/vm/src/vm/vm_new.rscrates/vm/src/stdlib/ast.rscrates/compiler/src/lib.rscrates/codegen/src/ir.rscrates/stdlib/src/opcode.rscrates/vm/src/builtins/asyncgenerator.rscrates/vm/src/builtins/code.rscrates/vm/src/exceptions.rscrates/vm/src/frame.rscrates/vm/src/builtins/type.rscrates/stdlib/src/ssl/compat.rscrates/codegen/src/symboltable.rssrc/lib.rscrates/compiler-core/src/varint.rscrates/compiler-core/src/bytecode.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/compile.rscrates/sre_engine/src/constants.rscrates/stdlib/src/ssl.rscrates/vm/src/vm/python_run.rs
**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code
Files:
scripts/generate_sre_constants.pyscripts/generate_opcode_metadata.py
🧠 Learnings (7)
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.
Applied to files:
crates/vm/src/frame.rscrates/compiler-core/src/bytecode.rscrates/vm/src/vm/mod.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust
Applied to files:
src/lib.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/compile.rscrates/vm/src/vm/python_run.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When testing Python code, always use RustPython instead of the standard `python` command; use `cargo run -- script.py` or `cargo run` for interactive REPL
Applied to files:
src/lib.rscrates/vm/src/vm/mod.rscrates/vm/src/vm/python_run.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*.py : In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Applied to files:
src/lib.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: RustPython is a Python 3 interpreter written in Rust implementing Python 3.13.0+ compatibility with complete Python environment in Rust (not CPython bindings), clean implementation without compatibility hacks, and cross-platform support including WebAssembly compilation
Applied to files:
src/lib.rscrates/vm/src/vm/mod.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When modifying bytecode instructions, perform a full clean build by running `rm -r target/debug/build/rustpython-* && find . | grep -E '\.pyc$' | xargs rm -r`
Applied to files:
scripts/generate_opcode_metadata.pycrates/vm/src/vm/mod.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.
Applied to files:
crates/sre_engine/src/constants.rs
🔇 Additional comments (48)
crates/vm/src/builtins/type.rs (1)
2009-2021: I am unable to verify this review comment because the repository cannot be accessed in the current environment. The network connectivity issue prevents me from examining the code, verifying test coverage, checking clippy/fmt compliance, or comparing the implementation against CPython's behavior.To properly verify this review comment, direct access to the codebase is required to:
- Examine the full function implementation and verify the logic is correct
- Search for existing test cases that exercise custom metaclass resolution
- Run
cargo clippyandcargo fmtto ensure compliance with Rust best practices- Compare the approach with CPython's
_PyType_CalculateMetaclassimplementationcrates/jit/src/instructions.rs (2)
208-216: Correct extension of unreachability tracking.Adding
Instruction::Jumpto the unreachability tracking is correct—an unconditional jump means subsequent instructions are unreachable until the next label target. This change is consistent with the updated control flow handling.
219-228: Proper handling of unterminated blocks in post-processing.The post-processing correctly checks if the final block needs a terminator and inserts a trap when necessary, ensuring all blocks are properly terminated. This complements the earlier label-handling improvements.
As noted in the previous comment, the terminator-checking logic here is identical to lines 180-186 and could benefit from extraction to a helper method.
crates/vm/src/builtins/asyncgenerator.rs (4)
11-11: LGTM! Destructor trait import added correctly.The import is properly placed and necessary for the destructor implementation below.
35-35: LGTM! Destructor trait correctly registered with the class.The trait is properly added to the
with()clause, enabling the destructor implementation for async generators.
76-86: LGTM! Comment clarifications improve documentation.The updated comments clearly document the function's behavior, especially the intentional error suppression that matches CPython's
PyErr_WriteUnraisablesemantics.
647-658: Implementation correctly follows PEP 525 async generator finalization semantics.The destructor properly handles GC-time finalization by checking if the generator is already closed (preventing double-finalization) and calling the finalizer hook when needed. This matches PEP 525's requirement for async generators to trigger a finalizer during garbage collection, which typically schedules
aclose()on the event loop.Note: The claim about duplicate Destructor implementations could not be independently verified due to access constraints.
crates/vm/src/stdlib/ast.rs (1)
274-283: LGTM! Clean integration of end_location into parse error mapping.The error mapping correctly computes both start and end locations from the text range and populates the new
end_locationfield. This aligns with the broader PR goal of providing more precise error localization.crates/vm/src/exceptions.rs (1)
226-269: LGTM! Well-structured caret highlighting for single-line syntax errors.The logic correctly:
- Determines if the error spans a single line via
same_linecheck- Defaults to single-line behavior when line info is missing
- Ensures a minimum caret width of 1
- Skips caret rendering entirely for multi-line errors
This improves the user experience by providing accurate error highlighting without misleading carets on multi-line spans.
crates/compiler/src/lib.rs (1)
87-95: LGTM! Clean API addition for end location retrieval.The
python_end_location()method follows the same pattern aspython_location()and appropriately returnsOptionto handle cases where end location isn't available (codegen errors).crates/vm/src/vm/compile.rs (1)
1-30: LGTM! Clean separation of compilation from execution concerns.The module now has a clear single responsibility with helpful documentation pointing to
python_run.rsfor execution functions. The trimmed imports reflect the narrowed scope.crates/vm/src/vm/vm_new.rs (2)
65-80: LGTM with minor note on.expect()usage.The
new_scope_with_main()method cleanly encapsulates the__main__module setup. The.expect()on line 71 is acceptable since it operates on a freshly created dict whereset_itemshouldn't fail. The subsequent operations correctly propagate errors viaPyResult.
502-514: LGTM! Proper integration of end location attributes.The conditional setting of
end_linenoandend_offsetcorrectly leverages the newpython_end_location()API and follows the same pattern as the existinglineno/offsethandling.crates/vm/src/import.rs (1)
11-13: Verify intent: Only checking first 2 bytes of 4-byte PYC magic number.According to the Python bytecode format standard, the PYC magic number consists of 4 bytes: a 2-byte version tag followed by 0x0D 0x0A (CR LF). This function only validates the first 2 bytes via
&crate::version::PYC_MAGIC_NUMBER_BYTES[..2]. Confirm whether this is intentional (e.g., only validating the version tag while allowing flexibility on the CR LF bytes) or if the full 4-byte check is required. If intentional, add a comment explaining the design choice.crates/vm/src/builtins/code.rs (2)
353-376: LGTM!The
from_pycfunction correctly validates the magic number, extracts the code bytes after the 16-byte header, and delegates to the Python_compile_bytecodefunction. Error handling is appropriate.
1041-1066: LGTM!The
_varname_from_opargmethod correctly resolves variable names by computing the appropriate index intovarnames,cellvars, orfreevarsbased on the oparg value. This aligns with CPython's variable indexing scheme.src/lib.rs (2)
150-171: LGTM!The
run_filefunction correctly follows CPython'spymain_run_file_objpattern: checking for package directories viaget_importer, usingrunpy._run_module_as_mainfor packages, and properly handlingsys.path[0]insertion based on thesafe_pathsetting.
350-365: LGTM!Tests are properly updated to use the new
vm.new_scope_with_main()API and test both direct file execution viarun_any_fileand module/directory execution viarun_file.scripts/generate_opcode_metadata.py (1)
21-28: LGTM!The variant extraction logic correctly skips comment lines and extracts PascalCase variant names. This is adequate for the expected format of the
Instructionenum.crates/compiler-core/src/varint.rs (2)
72-98: Verifyread_varintsemantics for entry boundary detection.Line 87 checks
byte & 0x80 != 0 && shift > 0to detect the start of the next entry. This means on the first byte (shift == 0), an 0x80 marker won't trigger a break. This appears intentional for reading varints that don't have start markers, but if this function is called mid-stream where entries have start markers, the first byte's 0x80 bit would be consumed as part of the value rather than being recognized as a boundary.Ensure this is the intended behavior for all call sites.
6-41: LGTM!The varint encoding functions correctly implement 6-bit chunking with continuation bits. The zig-zag encoding for signed values properly handles
i32::MINusing wrapping subtraction. Thewrite_varint_with_startcorrectly sets the 0x80 marker on the first byte.crates/codegen/src/symboltable.rs (1)
1503-1517: LGTM for the inlining logic structure.The comprehension inlining decision based on
is_generatorandparent_can_see_classfollows the general PEP 709 pattern. Generator expressions correctly passis_generator = true(line 1281) while list/set/dict comprehensions passis_generator = false.crates/vm/src/vm/mod.rs (4)
463-497: LGTM!The
run_pyc_bytesmethod provides a clean API for executing in-memory.pycbytecode, with good documentation including usage examples. The delegation toPyCode::from_pycandwith_simple_runis appropriate.
539-571: LGTM!The
with_simple_runhelper correctly manages__file__and__cached__attributes on the main module, ensuring proper cleanup after execution. The IO flush ensures output is written before returning.
600-618: LGTM!The frame-local exception isolation (push
Nonebefore execution, pop after) correctly prevents exception state from leaking between function calls, aligning with Python's scoped exception semantics.
932-958: LGTM!The Floyd's cycle detection in
contextualize_exceptioncorrectly identifies pre-existing cycles in the exception context chain. The toggle-based slow pointer advancement (every other iteration) is a valid variant of the tortoise-and-hare algorithm that will detect cycles.crates/codegen/src/ir.rs (4)
86-105: LGTM!The
ExceptHandlerInfostruct cleanly captures exception handler metadata (target block, stack depth,preserve_lastiflag), and integrating it intoInstructionInfoenables per-instruction handler tracking for exception table generation.
332-343: LGTM!The iteration limit safety guard is a good defensive measure. Returning
InternalError::StackOverflowinstead of silently breaking ensures that pathological cases don't result in underestimated stack depths that would cause runtime crashes.
390-418: LGTM!The exception handler depth calculation correctly accounts for the stack state at handler entry: base
stack_depth, plus 1 for the exception value, plus 1 forlastiifpreserve_lastiis set. This depth represents the stack state beforePUSH_EXC_INFOprocesses.
603-676: LGTM!The exception table generation correctly implements a state machine that tracks handler regions, emitting entries when handlers change or end. The use of
block_to_indexproperly resolves handler targets to instruction indices (including EXTENDED_ARG offsets).crates/compiler-core/src/bytecode.rs (4)
17-43: Well-structured exception table entry.The
ExceptionTableEntrystruct cleanly captures the CPython 3.11+ exception table format with appropriate field types and documentation.
45-85: Exception table encoding and lookup implementation looks correct.The varint encoding follows CPython's format, and the lookup correctly handles the half-open interval
[start, end). The early return on finding a matching handler is appropriate.
2366-2436: Good test coverage for exception table encoding/decoding.The tests cover the essential cases: multiple entries, empty table, single entry, and boundary conditions (inclusive start, exclusive end).
994-998: VerifyStoreFastLoadFastargument encoding.This instruction has two
Arg<NameIdx>fields, but CPython'sSTORE_FAST_LOAD_FASTpacks both indices into a single oparg usingINSTR_FMT_IBformat (low byte forstore_idx, high byte forload_idx). Ensure the argument decoding logic correctly unpacks both indices from the single operand, or clarify if this is handled differently.crates/vm/src/frame.rs (6)
346-418: Exception handling refactor withis_reraisedetection.The logic correctly distinguishes between new exceptions and re-raises to avoid duplicate traceback entries. The handling of
BareRaiseandReraiseFromStackas reraise operations aligns with CPython semantics.
1505-1520: VerifyPushExcInfosets exception before gettingprev_exc.The current implementation gets
prev_excfromvm.current_exception()and then sets the new exception. This ordering is correct per CPython semantics. However, ensure the exception is set before the handler code runs (which happens after the jump inunwind_blocks).
1727-1763:CleanupThrowexception handling looks correct.The logic properly handles
StopIterationby extracting its value and cleaning up the stack, while re-raising other exceptions. The stack manipulation (poppingexc,last_sent_val,sub_iter) matches the expected state after a failed throw.
1920-2006: Exception table-based unwinding implementation.The
unwind_blocksrefactor correctly:
- Looks up handlers via
find_exception_handler- Pops stack to the handler's depth
- Pushes
lastiifpush_lastiis set (for cleanup handlers)- Pushes exception and jumps to handler
The traceback cycle-breaking logic on return is a thoughtful addition for RustPython's non-GC environment.
2743-2758: Improved panic message for stack underflow.The detailed diagnostic information (code name, lasti, op, source path) will be valuable for debugging compiler bugs. This is a good defensive improvement.
1603-1617: Defensive programming needed:StoreFastLoadFastunsafeexpect()without validation.The code unconditionally expects the load slot to contain a value after storing to a different slot, but there's no validation that
load_idxis initialized. The comment indicates load and store indices "often" (not always) match. Ifstore_idx != load_idxandload_idxwas never assigned, theexpect()will panic. Either verify thatstore_idx == load_idxalways holds at the VM level and document this invariant, or handle theNonecase gracefully with an error.crates/vm/src/vm/python_run.rs (1)
10-103: Well-structured Python execution API.The module provides a clean interface for running Python code from various sources, with proper
__main__setup and loader configuration. The deprecation ofrun_code_stringin favor ofrun_stringis a good API cleanup.crates/stdlib/src/ssl/compat.rs (1)
1457-1555:ssl_readintegration with pending‑output buffering looks consistentThe new behavior in
ssl_read:
- Flushes
pending_tls_outputwith a deadline before starting the read loop, ensuring prior writes reach the peer before waiting for a response.- When
conn.wants_write()during a read (e.g. key update, post‑handshake messages), it generates TLS records viassl_write_tls_records()and sends them withsend_all_bytes, using the same deadline and treatingWantWrite/Timeoutfrom this flush as non‑fatal (data is buffered for later).This matches the intended design: ordering is preserved, read timeouts are respected, and non‑blocking semantics are maintained by buffering rather than failing the read for incomplete writes. No changes requested here.
crates/stdlib/src/ssl.rs (6)
1887-1907:pending_tls_outputwiring intoPySSLSocketconstruction looks correctBoth
_wrap_socketand_wrap_bionow initializepending_tls_outputwith an emptyVec<u8>, and the field is added toPySSLSocketwith clear documentation. The field is consistently markedpub(crate)and#[pytraverse(skip)], which is appropriate for an internal transport buffer and avoids GC traversal overhead.No issues from a construction or ownership perspective.
Also applies to: 1957-1978, 2300-2354
2711-2744:sock_recv/sock_sendabstractions are consistent with prior designThe updated
sock_send(&self, data: &[u8], vm):
- Correctly handles BIO mode by delegating to
outgoing.write()and returning its result.- Uses
socket.socket.send(self.sock, data)in socket mode, keeping the same duck‑typed “raw socket” approach as elsewhere in the module.Given the new buffering logic depends on
sock_sendreturning the number of bytes actually accepted, this interface is suitable; callers convert the Python return value toisize/usizeand handle short writes and zero returns explicitly.No changes requested here.
2831-2888:send_tls_outputcorrectly feedspending_tls_outputon partial/blocked sends
send_tls_output:
- Writes the provided TLS buffer in a loop, using
sock_wait_for_io_impl(Write)to respect blocking vs timeout vs non‑blocking modes.- On timeout from
sock_wait_for_io_impl, orBlockingIOError/ zero bytes written / other send errors, it appends the unsent suffix intopending_tls_outputand returns either asocket.timeout,SSLWantWriteError, or the underlying Python exception.This matches the intended design: rustls’ internal TLS buffers are drained once into
buf, and any network‑level backpressure is captured inpending_tls_outputrather than silently dropping or re‑generating TLS records.No changes requested here.
3432-3436: Auto‑completing the handshake inreadis consistent with CPythonThe change to
read():
- Ensures
self.do_handshake(vm)?is called ifhandshake_doneis still false, before any application data is read.- Then delegates to
compat::ssl_read, mappingSslErrorvariants to the right Python exceptions.This matches OpenSSL/CPython behavior where
SSL_read()implicitly completes the handshake. Given thatdo_handshake()already raisesSSLWantReadError/SSLWantWriteErroron non‑blocking sockets, the new behavior looks correct.Also applies to: 3476-3477
3564-3568:write’s use of pending‑output flushing and chunked TLS writes is sensibleIn
write():
- The handshake is auto‑completed if needed (
self.do_handshake(vm)?).- For non‑BIO sockets,
flush_pending_tls_output(vm, None)?is called before writing new application data, preserving TLS record order.- Data is written in 16KB chunks through
conn.writer(), each followed byflush()and (for socket mode) a loop that:
- Flushes any existing
pending_tls_output.- Drains rustls’ TLS writer via
conn.write_tls.- Sends resulting TLS bytes via
send_tls_output, stashing unsent bytes intopending_tls_output.This ensures:
- Previously buffered TLS records are always pushed before new ones.
- TLS internal buffers are not allowed to grow unbounded.
- Network backpressure is handled via
pending_tls_outputrather than lost writes.No issues from a correctness perspective.
Also applies to: 3592-3641
4103-4127:write_pending_tlsandtry_read_close_notifycorrectly integrate with pending buffer and BIO EOF
write_pending_tlsnow:
- Flushes
pending_tls_outputbefore emitting new TLS records.- Uses
send_tls_outputso any partial sends are captured inpending_tls_outputrather than dropped.
try_read_close_notify:
- For BIO mode, distinguishes “no data yet” (empty read with
eof == False) from actual EOF (empty read witheof == True), only treating the latter as peer close.- For sockets, treats empty reads and non‑blocking errors appropriately:
BlockingIOError→ “no data yet”; other errors/EOF → peer closed.- Feeds any non‑empty data back into rustls via
read_tls/process_new_packetsto allow it to observe close_notify.These look correct and line up with the new shutdown logic’s expectations.
Also applies to: 4130-4177
Whoops, sorry; accidentally rebased instead of merging from main. Should be good now |
|
Either way okay, but by removing already merged commits, rebase usually makes the history usually cleaner. |
scripts/generate_sre_constants.py
Outdated
| main(*sys.argv[1:]) | ||
|
|
||
|
|
||
| # dump(f, OPCODES, "SreOpcode", "u32", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these lines supposed to be deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... yep. They were deleted in a commit which I believe was lost to the ether on rebase lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! looks good . Please check the commented out part of the script.
Removed commented-out code related to generating SRE constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
scripts/generate_sre_constants.py (1)
49-49: Remove debug print statement.This debug print should be removed before merging to production.
🔎 Proposed fix
- print(f"items is {items}")
🧹 Nitpick comments (2)
scripts/generate_sre_constants.py (2)
76-76: Prefer standard if statement for side effects.Using a conditional expression solely for its side effect is non-idiomatic Python. A standard if statement is clearer.
🔎 Proposed fix
- content.append(f"{derives}\n") if derives else None + if derives: + content.append(f"{derives}\n")
86-133: Consider adding error handling for robustness.The script lacks error handling around file operations and the
exec()call. While the current implementation works for the controlled input file, adding defensive checks would provide clearer error messages and improve maintainability.Consider:
- Wrapping file operations in try-except blocks with informative error messages
- Validating that required keys ('MAGIC', 'OPCODES', 'ATCODES', 'CHCODES') exist in
nsafter theexec()call- Catching and reporting exceptions from
exec()with context about which file failed
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
scripts/generate_sre_constants.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code
Files:
scripts/generate_sre_constants.py
🔇 Additional comments (2)
scripts/generate_sre_constants.py (2)
7-16: LGTM! Efficient file update pattern.The function correctly avoids unnecessary file writes by comparing content first, which is good for build system performance.
55-55: The project's minimum Python version is documented in DEVELOPMENT.md as "CPython version 3.13 or higher" and enforced in CI workflows (using 3.13.1). Theremoveprefix()method (available in Python 3.9+) is fully compatible with this requirement.
youknowone
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* Updated re library + test * Copied over generate_sre_constants from cpython/Tools * Customized `generate_sre_constants.py` + ran to update `constants.rs` * Clarified `dump_enum` docstring in `generate_sre_constants.py`
* Updated re library + test * Copied over generate_sre_constants from cpython/Tools * Customized `generate_sre_constants.py` + ran to update `constants.rs` * Clarified `dump_enum` docstring in `generate_sre_constants.py`
Summary by CodeRabbit
Breaking Changes
Updates
✏️ Tip: You can customize this high-level summary in your review settings.