Skip to content

Conversation

@terryluan12
Copy link
Contributor

@terryluan12 terryluan12 commented Jan 5, 2026

Summary by CodeRabbit

  • Breaking Changes

    • Removed TEMPLATE flag from regular expression patterns. Code relying on this flag will require updates.
  • Updates

    • Updated regular expression engine internal version identifier.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

The PR updates SRE constants by removing the TEMPLATE flag definition and updating the SRE_MAGIC version number. It introduces a new Python script to auto-generate Rust constants from Python source files, replacing manual generation and standardizing tooling.

Changes

Cohort / File(s) Summary
Constants Update
crates/sre_engine/src/constants.rs
Updated SRE_MAGIC constant from 20221023 to 20230612; removed TEMPLATE bitflag variant from SreFlag; updated generator header comments
Flag Representation
crates/vm/src/stdlib/sre.rs
Removed TEMPLATE flag entry from Pattern::repr_str flag_names array
Code Generation Script
scripts/generate_sre_constants.py
New Python script to auto-generate Rust constants; exports update_file(), dump_enum(), dump_bitflags(), and main() functions to build constants from Lib/re/_constants.py

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A rabbit hops through constants clear,
Old TEMPLATE flags disappear,
New scripts now generate with care,
SRE_MAGIC updated fair,
Automation hops everywhere!

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Updated the re library + test' is vague and generic, using non-specific language that doesn't convey meaningful information about the specific changes made. Provide a more specific title that describes the primary change, such as 'Update SRE constants generation and remove TEMPLATE flag' to clearly indicate the main modifications.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 5, 2026

Code has been automatically formatted

The code in this PR has been formatted using cargo fmt --all.
Please pull the latest changes before pushing again:

git pull origin update_re

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
crates/jit/src/instructions.rs (1)

219-228: Consider extracting the terminator check into a helper method.

The terminator check logic at lines 221-224 duplicates the pattern at lines 180-186. Extracting this into a small helper would improve maintainability.

🔎 Proposed refactor
+    fn block_needs_terminator(&self, block: Block) -> bool {
+        match self.builder.func.layout.last_inst(block) {
+            None => true,
+            Some(inst) => !self.builder.func.dfg.insts[inst].opcode().is_terminator(),
+        }
+    }

Then use self.block_needs_terminator(cur) in both places.

scripts/generate_sre_constants.py (2)

90-93: Consider adding error handling for file operations.

If infile doesn't exist or is unreadable, the script will fail with a raw exception. Adding explicit error handling would provide clearer feedback when running the generator.

🔎 Proposed improvement
 def main(
     infile="Lib/re/_constants.py",
     outfile_constants="crates/sre_engine/src/constants.rs",
 ):
     ns = {}
-    with open(infile) as fp:
-        code = fp.read()
+    try:
+        with open(infile) as fp:
+            code = fp.read()
+    except FileNotFoundError:
+        print(f"Error: Input file '{infile}' not found.", file=sys.stderr)
+        sys.exit(1)
     exec(code, ns)

136-139: Consider using argparse for better CLI ergonomics.

The current approach using *sys.argv[1:] works but doesn't provide usage help or argument validation. For a script that may be run manually, argparse would be more user-friendly.

🔎 Proposed improvement
 if __name__ == "__main__":
-    import sys
-
-    main(*sys.argv[1:])
+    import argparse
+    parser = argparse.ArgumentParser(description="Generate SRE constants for RustPython")
+    parser.add_argument("infile", nargs="?", default="Lib/re/_constants.py",
+                        help="Input Python constants file")
+    parser.add_argument("outfile_constants", nargs="?", 
+                        default="crates/sre_engine/src/constants.rs",
+                        help="Output Rust constants file")
+    args = parser.parse_args()
+    main(args.infile, args.outfile_constants)
crates/compiler-core/src/varint.rs (1)

104-114: Consider adding round-trip verification to tests.

The current test verifies expected byte lengths but doesn't read back values to confirm encoding correctness. Adding read verification would strengthen confidence in the implementation.

🔎 Suggested test enhancement
 #[test]
 fn test_write_read_varint() {
     let mut buf = Vec::new();
     write_varint(&mut buf, 0);
     write_varint(&mut buf, 63);
     write_varint(&mut buf, 64);
     write_varint(&mut buf, 4095);

     // Values: 0, 63, 64, 4095
     assert_eq!(buf.len(), 1 + 1 + 2 + 2);
+
+    // Verify round-trip for single values
+    let mut buf = Vec::new();
+    for val in [0u32, 1, 63, 64, 127, 128, 4095, u32::MAX] {
+        buf.clear();
+        write_varint(&mut buf, val);
+        let mut pos = 0;
+        assert_eq!(read_varint(&buf, &mut pos), Some(val), "round-trip failed for {val}");
+    }
 }
scripts/generate_opcode_metadata.py (1)

14-27: Make enum parsing and CPython-compat checks more robust

The generator looks good overall, but two robustness gaps stand out:

  1. re.search(r"pub enum Instruction \{(.+?)\n\}", ...) is quite format-sensitive. Any change such as attributes on the enum, different spacing around {, or an indented closing } will cause the match to fail or, worse, capture the wrong body. Consider anchoring on ^/$ with re.MULTILINE and allowing flexible whitespace, or (better) scanning braces to find the enum body rather than relying on a single regex.

  2. The “expected” opcode indices at the bottom are only printed; a drift between Instruction ordering and the CPython 3.13 expectations won’t fail the generation step. It would be safer to turn those into hard checks (raising if any key opcode index doesn’t match), so _opcode_metadata.py is never silently generated with inconsistent metadata.

Example of the second point:

assert find_opcode("CACHE") == 0
assert find_opcode("WITH_EXCEPT_START") == 44
# ...etc for a small set of golden opcodes...

so a mismatch fails fast rather than being buried in logs.

Also, remember that whenever Instruction’s order changes, you’ll want a full clean rebuild (including removing rustpython-* build artifacts and stray .pyc files) so the bytecode/opcode metadata stays consistent across the toolchain. Based on learnings, ...

Also applies to: 188-227, 234-246

crates/compiler-core/src/bytecode.rs (1)

17-43: Tighten invariants around exception table entry encoding

The new exception table machinery looks good and the tests cover the happy paths, but there are two spots where adding explicit invariants would make failures easier to diagnose:

  1. Depth truncation to u16

ExceptionTableEntry.depth is u16, while stack depths elsewhere are tracked as u32. In both ExceptionTableEntry::new and the callers (e.g. codegen), depth is passed as u32 and then cast to u16:

pub struct ExceptionTableEntry {
    pub depth: u16,
    // ...
}

pub fn new(start: u32, end: u32, target: u32, depth: u16, push_lasti: bool) -> Self { ... }

Given max_stackdepth is u32, it would be safer to assert that depth <= u16::MAX at the point you construct entries (or in codegen before calling new), instead of silently truncating on overflow. Even a debug_assert! would help catch impossible cases in development.

  1. Ensuring end >= start

encode_exception_table computes size via end.saturating_sub(start). If a bogus entry ever slips through with end < start, this will quietly encode a zero-length range and make the handler effectively unreachable:

let size = entry.end.saturating_sub(entry.start);

Consider either asserting end >= start when constructing ExceptionTableEntry, or replacing saturating_sub with a checked subtraction that panics or returns an error on invalid input. This makes malformed tables fail fast rather than degrading into “no handler found” behavior later.

[ suggest_recommended_refactor ]

Given how tightly this format is coupled to the Instruction layout and the generated _opcode_metadata.py, it’s also worth doing a full clean build (removing stale rustpython-* build artifacts and .pyc files) whenever you touch the opcode/exception-table definitions, to avoid mismatches between compiled bytecode and the metadata. Based on learnings, ...

Also applies to: 45-59, 61-85, 2367-2435

crates/vm/src/frame.rs (1)

464-532: Confirm delegated throw() path produces expected tracebacks

In the delegated throw() path (gen_throw when yield_from_target() finds a receiver), errors from the inner coroutine/iterator are handled via:

  • forwarding the throw, and
  • on error, pushing None and calling unwind_blocks directly with UnwindReason::Raising { exception: err }, then either resuming run or propagating err.

Unlike the non-delegated throw_here path, this branch doesn’t go through handle_exception, so it doesn’t explicitly add a traceback frame or call vm.contextualize_exception for the outer generator.

This might be exactly what you want (reusing the inner traceback without re-chaining), but it’s subtle. It would be good to double-check—ideally via tests comparing against CPython—that:

  • g.throw(...) into a yield from chain still yields the expected traceback frames (both inner and outer frames where appropriate), and
  • you don’t accidentally double-chain __context__ compared to the non-delegated raise path.
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bdf3b36 and 956cbfd.

⛔ Files ignored due to path filters (52)
  • Lib/_opcode_metadata.py is excluded by !Lib/**
  • Lib/importlib/_bootstrap_external.py is excluded by !Lib/**
  • Lib/pkgutil.py is excluded by !Lib/**
  • Lib/platform.py is excluded by !Lib/**
  • Lib/plistlib.py is excluded by !Lib/**
  • Lib/re/__init__.py is excluded by !Lib/**
  • Lib/re/_casefix.py is excluded by !Lib/**
  • Lib/re/_compiler.py is excluded by !Lib/**
  • Lib/re/_constants.py is excluded by !Lib/**
  • Lib/re/_parser.py is excluded by !Lib/**
  • Lib/sre_constants.py is excluded by !Lib/**
  • Lib/test/_test_multiprocessing.py is excluded by !Lib/**
  • Lib/test/test__opcode.py is excluded by !Lib/**
  • Lib/test/test_asyncgen.py is excluded by !Lib/**
  • Lib/test/test_compile.py is excluded by !Lib/**
  • Lib/test/test_contextlib.py is excluded by !Lib/**
  • Lib/test/test_contextlib_async.py is excluded by !Lib/**
  • Lib/test/test_dis.py is excluded by !Lib/**
  • Lib/test/test_exceptions.py is excluded by !Lib/**
  • Lib/test/test_generators.py is excluded by !Lib/**
  • Lib/test/test_import/__init__.py is excluded by !Lib/**
  • Lib/test/test_import/data/circular_imports/import_cycle.py is excluded by !Lib/**
  • Lib/test/test_import/data/circular_imports/singlephase.py is excluded by !Lib/**
  • Lib/test/test_import/data/double_const.py is excluded by !Lib/**
  • Lib/test/test_import/data/package3/__init__.py is excluded by !Lib/**
  • Lib/test/test_import/data/package3/submodule.py is excluded by !Lib/**
  • Lib/test/test_import/data/package4/__init__.py is excluded by !Lib/**
  • Lib/test/test_import/data/package4/submodule.py is excluded by !Lib/**
  • Lib/test/test_importlib/test_util.py is excluded by !Lib/**
  • Lib/test/test_patma.py is excluded by !Lib/**
  • Lib/test/test_pkgutil.py is excluded by !Lib/**
  • Lib/test/test_platform.py is excluded by !Lib/**
  • Lib/test/test_plistlib.py is excluded by !Lib/**
  • Lib/test/test_re.py is excluded by !Lib/**
  • Lib/test/test_traceback.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_async_case.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_discovery.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_loader.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_program.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_runner.py is excluded by !Lib/**
  • Lib/test/test_unittest/testmock/testasync.py is excluded by !Lib/**
  • Lib/test/test_unittest/testmock/testhelpers.py is excluded by !Lib/**
  • Lib/test/test_unittest/testmock/testpatch.py is excluded by !Lib/**
  • Lib/test/test_weakref.py is excluded by !Lib/**
  • Lib/token.py is excluded by !Lib/**
  • Lib/unittest/__init__.py is excluded by !Lib/**
  • Lib/unittest/async_case.py is excluded by !Lib/**
  • Lib/unittest/loader.py is excluded by !Lib/**
  • Lib/unittest/main.py is excluded by !Lib/**
  • Lib/weakref.py is excluded by !Lib/**
  • crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snap is excluded by !**/*.snap
  • crates/codegen/src/snapshots/rustpython_compiler_core__compile__tests__nested_double_async_with.snap is excluded by !**/*.snap
📒 Files selected for processing (20)
  • .cspell.dict/cpython.txt
  • crates/codegen/src/compile.rs
  • crates/codegen/src/error.rs
  • crates/codegen/src/ir.rs
  • crates/codegen/src/symboltable.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/lib.rs
  • crates/compiler-core/src/varint.rs
  • crates/jit/src/instructions.rs
  • crates/sre_engine/src/constants.rs
  • crates/stdlib/src/opcode.rs
  • crates/vm/src/builtins/asyncgenerator.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/coroutine.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/stdlib/sre.rs
  • crates/vm/src/version.rs
  • crates/vm/src/vm/mod.rs
  • scripts/generate_opcode_metadata.py
  • scripts/generate_sre_constants.py
💤 Files with no reviewable changes (1)
  • crates/vm/src/stdlib/sre.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/vm/mod.rs
  • crates/vm/src/coroutine.rs
  • crates/jit/src/instructions.rs
  • crates/compiler-core/src/lib.rs
  • crates/vm/src/builtins/code.rs
  • crates/codegen/src/error.rs
  • crates/codegen/src/ir.rs
  • crates/compiler-core/src/varint.rs
  • crates/vm/src/version.rs
  • crates/sre_engine/src/constants.rs
  • crates/stdlib/src/opcode.rs
  • crates/codegen/src/symboltable.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/builtins/asyncgenerator.rs
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/generate_sre_constants.py
  • scripts/generate_opcode_metadata.py
🧠 Learnings (2)
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.

Applied to files:

  • crates/vm/src/vm/mod.rs
  • crates/codegen/src/ir.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/frame.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When modifying bytecode instructions, perform a full clean build by running `rm -r target/debug/build/rustpython-* && find . | grep -E '\.pyc$' | xargs rm -r`

Applied to files:

  • scripts/generate_opcode_metadata.py
🧬 Code graph analysis (5)
crates/codegen/src/ir.rs (2)
crates/compiler-core/src/bytecode.rs (4)
  • encode_exception_table (47-59)
  • new (34-42)
  • new (540-542)
  • new (1150-1152)
crates/compiler-core/src/varint.rs (2)
  • write_signed_varint (22-30)
  • write_varint (9-17)
crates/compiler-core/src/varint.rs (1)
crates/vm/src/builtins/code.rs (1)
  • read_varint (1078-1096)
crates/stdlib/src/opcode.rs (2)
crates/compiler-core/src/bytecode.rs (3)
  • try_from (1131-1137)
  • try_from (1158-1163)
  • try_from (1172-1178)
crates/stdlib/src/array.rs (2)
  • try_from (604-607)
  • try_from (1468-1486)
crates/compiler-core/src/bytecode.rs (1)
crates/compiler-core/src/varint.rs (4)
  • read_varint (74-98)
  • read_varint_with_start (45-70)
  • write_varint (9-17)
  • write_varint_with_start (34-41)
scripts/generate_opcode_metadata.py (1)
crates/compiler-core/src/bytecode.rs (3)
  • split (424-433)
  • get (450-456)
  • get (553-555)
🔇 Additional comments (26)
.cspell.dict/cpython.txt (1)

17-17: LGTM!

The CLASSDEREF entry is correctly positioned in alphabetical order and is a valid CPython term that should be present in the spell-check dictionary. This supports the PR's enhanced bytecode and variable handling changes.

crates/jit/src/instructions.rs (2)

175-190: LGTM!

The terminator check before emitting fallthrough jumps is correct. This prevents emitting invalid IR by checking if the block already has a terminator instruction before adding an unconditional jump.


208-216: LGTM!

Including Jump in the list of instructions that mark subsequent code as unreachable is correct. Unconditional jumps transfer control elsewhere, making any following instructions dead code that should be skipped.

crates/vm/src/builtins/asyncgenerator.rs (4)

11-11: LGTM!

The Destructor import is correctly added to support the new destructor implementation for PyAsyncGen.


35-35: LGTM!

Adding Destructor to the with(...) list correctly enables the destructor path for PyAsyncGen, aligning with Python's async generator finalization semantics.


76-86: LGTM!

Comment updates improve clarity. The error-ignoring behavior correctly matches CPython's PyErr_WriteUnraisable semantics for finalizer exceptions.


647-658: Implementation looks correct.

The destructor properly checks if the generator is already closed before calling the finalizer, avoiding unnecessary work during GC.

Note: The AI summary mentions a "duplicate Destructor impl block at the bottom of the file," but only this single implementation exists. This appears to be an inconsistency in the summary.

crates/sre_engine/src/constants.rs (1)

1-128: LGTM - Generated file looks correct.

The auto-generated constants align with CPython's Lib/re/_constants.py. The updated SRE_MAGIC value (20230612), new enums (SreAtCode, SreCatCode), and removal of the TEMPLATE flag from SreFlag reflect the upstream CPython 3.11+ changes.

scripts/generate_sre_constants.py (3)

7-16: Clean update-file pattern.

The update_file function correctly avoids unnecessary writes by comparing content first, which helps avoid triggering unnecessary rebuilds in build systems.


60-83: Bitflags generation correctly preserves original values.

Unlike the enum generation, the bitflags function correctly uses the actual values from the dictionary (value from the tuple), ensuring bit positions are preserved accurately.


36-57: Docstring is accurate; enum value assignment is working correctly.

The parameter d is correctly documented as (list) in the docstring. The sequential enum value assignment (0, 1, 2...) based on sorted order is intentional and works correctly because OPCODES, ATCODES, and CHCODES are already defined in alphabetical order. The generated Rust enums have correct sequential values.

crates/compiler-core/src/lib.rs (1)

10-10: LGTM!

The new varint module is appropriately added to the public API, enabling exception table and line number table encoding utilities to be consumed by dependent crates like codegen.

crates/vm/src/builtins/code.rs (1)

1003-1028: LGTM!

The three-way lookup correctly implements CPython's variable name resolution semantics, properly indexing into varnames, cellvars, and freevars with correctly adjusted offsets. This aligns with how bytecode instructions like LOAD_FAST and LOAD_DEREF reference variables by combined index.

crates/vm/src/coroutine.rs (2)

84-106: LGTM!

The exception state swap mechanism correctly isolates generator exception context. The pattern of capturing the generator's saved exception before frame entry, replacing the None pushed by with_frame, and saving the state back before exit properly maintains exception chain semantics across yields.


174-178: LGTM!

The guard for unstarted generators (lasti() == 0) correctly avoids unnecessary GeneratorExit throw handling, matching CPython's behavior where closing a never-started generator is effectively a no-op.

crates/codegen/src/symboltable.rs (3)

58-61: LGTM!

The comp_inlined field correctly tracks PEP 709 comprehension inlining state, enabling the symbol table to mark non-generator comprehensions for potential inlining optimizations.


352-381: LGTM!

The symbol merging logic correctly implements PEP 709 semantics:

  • BOUND_FLAGS properly identifies symbols that are bound within the comprehension
  • Skipping .0 avoids exposing the internal iteration parameter
  • Only bound symbols (not Free references to outer scope) are merged
  • Merged symbols are correctly marked as Local in the parent scope

1503-1516: LGTM!

The inlining decision correctly follows CPython's PEP 709 logic: comprehensions are inlined when they are not generators and the parent scope cannot see a class scope. The saturating_sub(2) safely handles edge cases where the tables stack is shallow.

crates/compiler-core/src/varint.rs (2)

9-17: LGTM!

The 6-bit chunk encoding with 0x40 continuation flag is correctly implemented and matches the format used in LineTableReader::read_varint in code.rs.


22-30: LGTM!

The zig-zag encoding correctly handles signed integers, including i32::MIN via wrapping_sub which avoids overflow on negation.

crates/codegen/src/error.rs (2)

79-81: LGTM!

The new async error variants appropriately extend error handling for async constructs that appear outside async functions.


119-126: LGTM!

The error messages are consistent with CPython's error reporting for these invalid async construct scenarios.

crates/vm/src/vm/mod.rs (2)

523-529: LGTM!

The per-frame exception context isolation correctly prevents exception state leakage between function calls. This is essential for proper exception chaining semantics where each frame maintains its own exception context.


852-871: LGTM!

The Floyd's cycle detection algorithm correctly guards against infinite loops when traversing exception context chains. The slow pointer advances every other iteration via the toggle, which properly implements the tortoise-and-hare pattern for detecting pre-existing cycles.

crates/codegen/src/ir.rs (1)

86-105: Exception handler metadata and table generation look internally consistent

The way ExceptHandlerInfo is threaded through InstructionInfo, consumed in max_stackdepth, and then encoded via generate_exception_table lines up well:

  • handler_depth in max_stackdepth matches the documented stack layout around PUSH_EXC_INFO (base depth + optional lasti + exc, then PUSH_EXC_INFO adds one more).
  • instr.arg.instr_size() is used both for stack-depth accounting and for computing instr_index in generate_exception_table, so the table’s (start, end, target) indices are in the same CodeUnit units that lasti and frame.rs use.
  • The iteration guard in max_stackdepth gives a clear failure mode (InternalError::StackOverflow) instead of silently underestimating depth.

I don’t see correctness issues here; this integrates cleanly with the new exception-table-driven flow.

Also applies to: 325-343, 375-418, 603-676

crates/vm/src/frame.rs (1)

427-504: Yield‑from / async send and cleanup flow matches documented stack layouts

The new control-flow pieces around delegation look internally consistent:

  • yield_from_target’s detection (SEND at lasti, or RESUME preceded by YIELD_VALUE with arg >= 1 and a non-empty stack) matches the comment and the new meaning of the YIELD_VALUE oparg.
  • Send/EndSend implement the documented stack transforms:
    • Send: (receiver, value) -> (receiver, retval) or (receiver, stop_value) plus jump, with net stack effect 0.
    • EndSend: (receiver, value) -> (value), popping the receiver and leaving the final value.
  • CleanupThrow’s behavior for StopIteration vs other exceptions ((sub_iter, last_sent_val, exc) to (None, value) or re-raise) aligns with how the stack depth is modeled in Instruction::stack_effect and with the comments about handler depth in codegen.

Given those cross-checks, the stack invariants for the new delegation opcodes look solid.

Also applies to: 1008-1036, 1683-1764

@youknowone
Copy link
Member

@terryluan12 This PR is including lots of unrelated previously merged commits. please use git rebase to clean them up and cut the own changes belong to this PR.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/codegen/src/ir.rs (2)

483-676: Add explicit stack depth bounds check to prevent silent truncation

The cast curr_handler.stack_depth as u16 will silently truncate if max_stackdepth() ever exceeds u16::MAX (65535), leading to incorrect exception handler unwinding at runtime. Since ExceptionTableEntry::new() takes a u16 for the depth parameter, add an explicit check either in max_stackdepth() or here in generate_exception_table() to validate the computed depth doesn't exceed the bounds, returning an InternalError if it does.

Regarding block_to_index: the current implementation correctly maintains coupling by recomputing it on each loop iteration within the EXTENDED_ARG fixup loop and using it only after the loop terminates, ensuring it reflects final instruction sizes. However, document this tight coupling more explicitly if the instruction array layout changes in the future.


86-106: Address silent truncation when casting stack_depth to u16 in exception table entries

The handler_depth formula and stack-depth semantics are correct. The code properly computes handler entry depth as handler.stack_depth + 1 + preserve_lasti, matching CPython 3.11's exception unwinding semantics: the exception table stores the minimal stack depth to restore before pushing the exception and optional lasti.

However, three places (lines 639, 652, 670) silently cast curr_handler.stack_depth from u32 to u16 without overflow checking. If stack depth exceeds u16::MAX (65535), the value truncates, corrupting the exception table. While rare, deeply nested code could trigger this. Add a bounds check:

if curr_handler.stack_depth > u16::MAX as u32 {
    return Err(InternalError::StackOverflow);
}

The iteration guard of blocks.len() * 100 is appropriately conservative; the algorithm is monotonic and converges quickly for normal code.

♻️ Duplicate comments (2)
crates/vm/src/version.rs (1)

132-132: PYC_MAGIC_NUMBER is incorrect for Python 3.13—must be 3571, not 2997.

The Python 3.13 bytecode magic number is 3571 per CPython's _bootstrap_external.py. Using 2997 breaks .pyc compatibility and validation against Python 3.13 runtime.

🔎 Fix to use correct Python 3.13 magic number
-pub const PYC_MAGIC_NUMBER: u16 = 2997;
+pub const PYC_MAGIC_NUMBER: u16 = 3571;
crates/stdlib/src/opcode.rs (1)

49-53: Fix the off-by-one error in has_arg (duplicate issue).

This is the same issue previously flagged: line 52 uses opcode > Self::HAVE_ARGUMENT, which incorrectly treats opcode 44 (WITH_EXCEPT_START) as not having an argument. CPython's HAVE_ARGUMENT = 44 means opcodes >= 44 carry arguments.

Change the comparison to >=:

 pub fn has_arg(opcode: i32) -> bool {
-    Self::is_valid(opcode) && opcode > Self::HAVE_ARGUMENT
+    Self::is_valid(opcode) && opcode >= Self::HAVE_ARGUMENT
 }

Based on past review comments, this off-by-one was previously identified but not yet fixed.

🧹 Nitpick comments (18)
crates/jit/src/instructions.rs (1)

219-228: Consider extracting duplicate terminator-check logic into a helper.

The needs_terminator computation at lines 221-223 is identical to lines 180-186. Extracting this into a small helper method would improve maintainability.

🔎 Proposed helper extraction
+    fn block_needs_terminator(&self, block: Block) -> bool {
+        match self.builder.func.layout.last_inst(block) {
+            None => true,
+            Some(inst) => !self.builder.func.dfg.insts[inst].opcode().is_terminator(),
+        }
+    }

Then replace both occurrences:

-                    let needs_terminator = match self.builder.func.layout.last_inst(cur) {
-                        None => true, // Empty block needs terminator
-                        Some(inst) => {
-                            // Check if the last instruction is a terminator
-                            !self.builder.func.dfg.insts[inst].opcode().is_terminator()
-                        }
-                    };
-                    if needs_terminator {
+                    if self.block_needs_terminator(cur) {
                         self.builder.ins().jump(target_block, &[]);
                     }
-            let needs_terminator = match self.builder.func.layout.last_inst(cur) {
-                None => true,
-                Some(inst) => !self.builder.func.dfg.insts[inst].opcode().is_terminator(),
-            };
-            if needs_terminator {
+            if self.block_needs_terminator(cur) {
                 self.builder.ins().trap(TrapCode::user(0).unwrap());
             }
crates/sre_engine/src/constants.rs (2)

108-120: Inconsistent indentation inside bitflags! macro.

Line 109's #[derive(...)] attribute has extra leading spaces compared to the macro body. Since this file is auto-generated, this should be fixed in the generator script.

🔎 Proposed fix
 bitflags! {
-#[derive(Debug, PartialEq, Eq, Clone, Copy)]
+    #[derive(Debug, PartialEq, Eq, Clone, Copy)]
     pub struct SreFlag: u16 {

122-128: SreInfo missing derive attributes for consistency.

SreInfo lacks #[derive(Debug, PartialEq, Eq, Clone, Copy)] that SreFlag has. This could cause issues if these traits are needed elsewhere (e.g., for debugging or comparisons). Since this is auto-generated, consider updating the generator script.

🔎 Proposed fix
 bitflags! {
+    #[derive(Debug, PartialEq, Eq, Clone, Copy)]
     pub struct SreInfo: u32 {
         const PREFIX = 1;
         const LITERAL = 2;
         const CHARSET = 4;
     }
 }
README.md (1)

69-87: Shell snippet style triggers markdownlint MD014 (optional).

The new venv block uses $ prompts in a sh-tagged fence without showing command output, which markdownlint flags (MD014). If you care about a clean lint run, consider either dropping the $ prefixes here or switching this block to match the existing bash style used earlier in the README.

crates/vm/src/frame.rs (3)

347-417: Exception/traceback handling refactor looks sound; double‑check interaction with VM exception stack.

The new is_reraise detection, conditional traceback augmentation, and handle_exception + unwind_blocks flow are coherent:

  • is_reraise correctly classifies RaiseKind::BareRaise, RaiseKind::ReraiseFromStack, and Instruction::Reraise to avoid adding duplicate traceback entries and re‑chaining on re‑raise.
  • For non‑reraise, you add a single traceback entry using code.locations[idx] and store lasti in byte units (idx * 2), which matches the later restoration logic (tb.lasti / 2).
  • On failure inside unwind_blocks, restoring lasti from the traceback for this frame before returning the error helps keep frame.f_lineno consistent with the final traceback.

The only subtlety that’s not obvious from this file is how vm.contextualize_exception and vm.current_exception interact with PushExcInfo (which expects to see the previous active exception in current_exception). It would be good to confirm that contextualize_exception does not overwrite the “previous” exception in a way that breaks prev_exc chaining for nested handlers.


1008-1036: GetYieldFromIter behavior matches CPython’s GET_YIELD_FROM_ITER design.

The coroutine/generator/iterator split:

  • Requiring IS_COROUTINE for yielding from a coroutine_type and erroring otherwise.
  • Allowing direct delegation to generator_type.
  • Falling back to iter() for other objects.

is the expected behavior for GET_YIELD_FROM_ITER. Just make sure CodeFlags::IS_COROUTINE really covers all coroutine/async‑generator cases you intend to gate here; if async generators use a different flag, they could be erroneously rejected.


782-792: New stack underflow checks are useful but could be routed through fatal() for consistency.

Both CopyItem and pop_multiple now:

  • Detect attempts to read/pull more stack entries than available.
  • Emit detailed diagnostics (stack length, opcode, lasti, code object name, source path) and panic.

That’s a big improvement in debuggability for internal compiler/VM bugs. If you want to keep failure handling uniform, consider redirecting to fatal() instead of naked panic! so all fatal interpreter errors have a single code path, but this is purely stylistic.

Also applies to: 2742-2757

scripts/generate_opcode_metadata.py (2)

13-16: Regex pattern may be fragile for complex enum bodies.

The pattern r"pub enum Instruction \{(.+?)\n\}" uses a non-greedy match that stops at the first \n}. If the Instruction enum contains nested items with closing braces or if the formatting changes, this could capture incomplete content.

🔎 Suggested improvement using brace counting
# Alternative: use brace counting for more robust parsing
def extract_enum_body(content, enum_name):
    pattern = rf"pub enum {enum_name}\s*\{{"
    match = re.search(pattern, content)
    if not match:
        return None
    start = match.end()
    depth = 1
    pos = start
    while pos < len(content) and depth > 0:
        if content[pos] == '{':
            depth += 1
        elif content[pos] == '}':
            depth -= 1
        pos += 1
    return content[start:pos-1]

1-5: Consider adding a shebang and encoding declaration.

The script lacks a shebang line for direct execution.

🔎 Proposed addition
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
 """Generate _opcode_metadata.py for RustPython bytecode.

 This file generates opcode metadata that is compatible with CPython 3.13.
 RustPython's Instruction enum is now ordered to match CPython opcode numbers exactly.
 """
crates/compiler-core/src/varint.rs (1)

100-137: Consider adding roundtrip tests for read functions.

The tests verify write operations and one roundtrip for varint_with_start, but don't verify that read_varint correctly decodes values written by write_varint.

🔎 Suggested additional test
#[test]
fn test_roundtrip_varint() {
    let test_values = [0u32, 1, 63, 64, 127, 128, 4095, 4096, u32::MAX];
    for &val in &test_values {
        let mut buf = Vec::new();
        write_varint(&mut buf, val);
        let mut pos = 0;
        let read_val = read_varint(&buf, &mut pos);
        assert_eq!(read_val, Some(val), "roundtrip failed for {}", val);
        assert_eq!(pos, buf.len(), "position mismatch for {}", val);
    }
}
crates/vm/src/vm/mod.rs (1)

463-497: Consider allowing caller‑supplied path for run_pyc_bytes/file

run_pyc_bytes always uses "<pyc_bytes>" for the code object filename and "<source>" for __file__ via with_simple_run, regardless of where the .pyc originally came from. For real files this makes tracebacks and __file__/__cached__ look synthetic.

If you expect this to be used with actual compiled files (not only opaque buffers), consider:

  • Adding an optional path: &str parameter to run_pyc_bytes, threading it into both PyCode::from_pyc and with_simple_run, or
  • Deriving a best‑effort path from the .pyc header metadata when available.

It’s not correctness‑critical, but would improve debuggability and parity with the CLI path‑based execution.

Also applies to: 539-583

crates/codegen/src/ir.rs (1)

483-601: Linetable encoding relies on CPython 3.11 invariants (line/column ranges)

The reworked generate_linetable matches the Python 3.11+ scheme (Short, OneLine, Long, NoColumns) and uses the shared write_signed_varint/write_varint helpers, which is good.

A couple of invariants seem assumed but not enforced here:

  • end_line_delta is cast to u32 for write_varint; this is fine as long as end_loc.line >= loc.line always holds. If any AST producer ever hands you a location where end_line < line (e.g., from malformed ranges), the cast will wrap and produce a very large unsigned delta.

  • Similarly, the Short and OneLine encodings assume end_col >= col and columns < 128; other cases fall back to Long, but the assumptions are implicit.

If those properties are already guaranteed by your SourceLocation/parser pipeline, this is fine. Otherwise, it might be worth adding debug asserts (only in debug builds) to make violations obvious during development rather than silently emitting nonsensical linetables.

crates/compiler-core/src/bytecode.rs (1)

1115-1115: Verify LAST_INSTRUCTION remains in sync with enum.

The constant is now Instruction::Resume { arg: Arg::marker() }, which appears to be the last variant at opcode 149. However, this requires manual maintenance whenever the enum changes.

Consider adding a compile-time assertion or documentation to help maintain this invariant, especially since the comment states this is critical for marshaling.

crates/stdlib/src/opcode.rs (1)

57-163: Eliminate double conversion in opcode metadata methods.

Methods has_const, has_name, has_jump, has_free, and has_local all follow this pattern:

pub fn has_const(opcode: i32) -> bool {
    if !Self::is_valid(opcode) {  // <- first try_from
        return false;
    }
    if let Ok(instr) = Instruction::try_from(opcode as u8) {  // <- second try_from
        matches!(instr, ...)
    } else {
        false
    }
}

Since is_valid already calls Instruction::try_from, this converts the opcode twice. Refactor to convert once:

🔎 Proposed refactor
 pub fn has_const(opcode: i32) -> bool {
-    if !Self::is_valid(opcode) {
-        return false;
-    }
-    if let Ok(instr) = Instruction::try_from(opcode as u8) {
+    if !(0..=255).contains(&opcode) {
+        return false;
+    }
+    if let Ok(instr) = Instruction::try_from(opcode as u8) {
         matches!(
             instr,
             Instruction::LoadConst { .. } | Instruction::ReturnConst { .. }
         )
     } else {
         false
     }
 }

Apply the same pattern to has_name, has_jump, has_free, and has_local.

crates/vm/src/vm/python_run.rs (2)

33-67: Consider enriching file read error context.

The implementation correctly handles pyc vs source files with appropriate loaders. However, line 62 converts I/O errors to OS errors without including the filename:

Err(err) => {
    return Err(self.new_os_error(err.to_string()));
}

Consider including the filename for better debugging:

🔎 Suggested improvement
 Err(err) => {
-    return Err(self.new_os_error(err.to_string()));
+    return Err(self.new_os_error(format!("{}: {}", path, err)));
 }

105-119: Error handling for loader setup is adequate, but error messages could be enhanced.

The set_main_loader function correctly propagates errors from all operations (import, attribute access, instantiation, and dict assignment) using the ? operator. However, consider wrapping these operations to provide loader-specific context in error messages (e.g., "Failed to load X loader for main"), similar to how other parts of the codebase use new_import_error() and new_attribute_error() with descriptive context.

crates/stdlib/src/ssl.rs (2)

2778-2784: Consider hoisting socket acquisition outside the loop

The py_socket and socket are acquired inside the while loop on every iteration. This is inefficient and could theoretically fail mid-flush, leaving the buffer in a partial state. Consider acquiring the socket once before the loop.

🔎 Proposed optimization
     pub(crate) fn flush_pending_tls_output(
         &self,
         vm: &VirtualMachine,
         deadline: Option<std::time::Instant>,
     ) -> PyResult<()> {
         let mut pending = self.pending_tls_output.lock();
         if pending.is_empty() {
             return Ok(());
         }

         let socket_timeout = self.get_socket_timeout(vm)?;
         let is_non_blocking = socket_timeout.map(|t| t.is_zero()).unwrap_or(false);

+        // Acquire socket once before the loop
+        let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?;
+        let socket = py_socket
+            .sock()
+            .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?;
+
         let mut sent_total = 0;
         while sent_total < pending.len() {
             // Calculate timeout: use deadline if provided, otherwise use socket timeout
             let timeout_to_use = if let Some(dl) = deadline {
                 // ... timeout calculation ...
             };

-            // Use sock_select directly with calculated timeout
-            let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?;
-            let socket = py_socket
-                .sock()
-                .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?;
             let timed_out = sock_select(&socket, SelectKind::Write, timeout_to_use)
                 .map_err(|e| vm.new_os_error(format!("select failed: {e}")))?;

2920-2937: Consider adding a safety limit to prevent potential infinite loops

If sock_send repeatedly returns sent == 0 without raising BlockingIOError (unusual but possible edge case), the loop could spin indefinitely. While the comment acknowledges this, consider adding a safety counter.

🔎 Suggested defensive improvement
     pub(crate) fn blocking_flush_all_pending(&self, vm: &VirtualMachine) -> PyResult<()> {
         let timeout = self.get_socket_timeout(vm)?;
+        let mut zero_send_count = 0;
+        const MAX_ZERO_SENDS: usize = 100;

         loop {
             // ... existing code ...

             match self.sock_send(&pending_data, vm) {
                 Ok(result) => {
                     let sent: usize = result.try_to_value::<isize>(vm)?.try_into().unwrap_or(0);
                     if sent > 0 {
+                        zero_send_count = 0;
                         let mut pending = self.pending_tls_output.lock();
                         pending.drain(..sent);
-                    }
-                    // If sent == 0, socket wasn't ready despite select() saying so
-                    // Continue loop to retry - this avoids infinite loops
+                    } else {
+                        zero_send_count += 1;
+                        if zero_send_count >= MAX_ZERO_SENDS {
+                            return Err(vm.new_os_error(
+                                "Socket repeatedly returned 0 bytes sent".to_string()
+                            ));
+                        }
+                    }
                 }
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 956cbfd and b0859ae.

⛔ Files ignored due to path filters (52)
  • Lib/_opcode_metadata.py is excluded by !Lib/**
  • Lib/importlib/_bootstrap_external.py is excluded by !Lib/**
  • Lib/pkgutil.py is excluded by !Lib/**
  • Lib/platform.py is excluded by !Lib/**
  • Lib/plistlib.py is excluded by !Lib/**
  • Lib/tempfile.py is excluded by !Lib/**
  • Lib/test/_test_multiprocessing.py is excluded by !Lib/**
  • Lib/test/test__opcode.py is excluded by !Lib/**
  • Lib/test/test_asyncgen.py is excluded by !Lib/**
  • Lib/test/test_cmd_line_script.py is excluded by !Lib/**
  • Lib/test/test_compile.py is excluded by !Lib/**
  • Lib/test/test_contextlib.py is excluded by !Lib/**
  • Lib/test/test_contextlib_async.py is excluded by !Lib/**
  • Lib/test/test_dis.py is excluded by !Lib/**
  • Lib/test/test_exceptions.py is excluded by !Lib/**
  • Lib/test/test_ftplib.py is excluded by !Lib/**
  • Lib/test/test_generators.py is excluded by !Lib/**
  • Lib/test/test_import/__init__.py is excluded by !Lib/**
  • Lib/test/test_import/data/circular_imports/import_cycle.py is excluded by !Lib/**
  • Lib/test/test_import/data/circular_imports/singlephase.py is excluded by !Lib/**
  • Lib/test/test_import/data/double_const.py is excluded by !Lib/**
  • Lib/test/test_import/data/package3/__init__.py is excluded by !Lib/**
  • Lib/test/test_import/data/package3/submodule.py is excluded by !Lib/**
  • Lib/test/test_import/data/package4/__init__.py is excluded by !Lib/**
  • Lib/test/test_import/data/package4/submodule.py is excluded by !Lib/**
  • Lib/test/test_importlib/test_util.py is excluded by !Lib/**
  • Lib/test/test_patma.py is excluded by !Lib/**
  • Lib/test/test_pkgutil.py is excluded by !Lib/**
  • Lib/test/test_platform.py is excluded by !Lib/**
  • Lib/test/test_plistlib.py is excluded by !Lib/**
  • Lib/test/test_subprocess.py is excluded by !Lib/**
  • Lib/test/test_tempfile.py is excluded by !Lib/**
  • Lib/test/test_threading.py is excluded by !Lib/**
  • Lib/test/test_traceback.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_async_case.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_discovery.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_loader.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_program.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_runner.py is excluded by !Lib/**
  • Lib/test/test_unittest/testmock/testasync.py is excluded by !Lib/**
  • Lib/test/test_unittest/testmock/testhelpers.py is excluded by !Lib/**
  • Lib/test/test_unittest/testmock/testpatch.py is excluded by !Lib/**
  • Lib/test/test_weakref.py is excluded by !Lib/**
  • Lib/threading.py is excluded by !Lib/**
  • Lib/token.py is excluded by !Lib/**
  • Lib/unittest/__init__.py is excluded by !Lib/**
  • Lib/unittest/async_case.py is excluded by !Lib/**
  • Lib/unittest/loader.py is excluded by !Lib/**
  • Lib/unittest/main.py is excluded by !Lib/**
  • Lib/weakref.py is excluded by !Lib/**
  • crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snap is excluded by !**/*.snap
  • crates/codegen/src/snapshots/rustpython_compiler_core__compile__tests__nested_double_async_with.snap is excluded by !**/*.snap
📒 Files selected for processing (33)
  • .cspell.dict/cpython.txt
  • .vscode/launch.json
  • .vscode/tasks.json
  • README.md
  • crates/codegen/src/compile.rs
  • crates/codegen/src/error.rs
  • crates/codegen/src/ir.rs
  • crates/codegen/src/symboltable.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/lib.rs
  • crates/compiler-core/src/varint.rs
  • crates/compiler/src/lib.rs
  • crates/jit/src/instructions.rs
  • crates/sre_engine/src/constants.rs
  • crates/stdlib/src/opcode.rs
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
  • crates/vm/src/builtins/asyncgenerator.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/coroutine.rs
  • crates/vm/src/exceptions.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/import.rs
  • crates/vm/src/stdlib/ast.rs
  • crates/vm/src/version.rs
  • crates/vm/src/vm/compile.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/python_run.rs
  • crates/vm/src/vm/vm_new.rs
  • scripts/generate_opcode_metadata.py
  • scripts/generate_sre_constants.py
  • src/lib.rs
💤 Files with no reviewable changes (2)
  • .vscode/launch.json
  • .vscode/tasks.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/generate_sre_constants.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/builtins/type.rs
  • crates/vm/src/coroutine.rs
  • crates/codegen/src/error.rs
  • crates/vm/src/import.rs
  • crates/compiler/src/lib.rs
  • crates/jit/src/instructions.rs
  • crates/compiler-core/src/lib.rs
  • crates/vm/src/exceptions.rs
  • crates/vm/src/builtins/code.rs
  • crates/codegen/src/ir.rs
  • crates/vm/src/builtins/asyncgenerator.rs
  • crates/stdlib/src/opcode.rs
  • crates/vm/src/vm/vm_new.rs
  • crates/stdlib/src/ssl/compat.rs
  • crates/vm/src/frame.rs
  • crates/codegen/src/symboltable.rs
  • crates/vm/src/version.rs
  • crates/compiler-core/src/varint.rs
  • crates/vm/src/stdlib/ast.rs
  • crates/vm/src/vm/compile.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/vm/mod.rs
  • crates/stdlib/src/ssl.rs
  • crates/vm/src/vm/python_run.rs
  • src/lib.rs
  • crates/sre_engine/src/constants.rs
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/generate_opcode_metadata.py
🧠 Learnings (5)
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.

Applied to files:

  • crates/vm/src/frame.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/vm/mod.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When testing Python code, always use RustPython instead of the standard `python` command; use `cargo run -- script.py` or `cargo run` for interactive REPL

Applied to files:

  • README.md
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/python_run.rs
  • src/lib.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: RustPython is a Python 3 interpreter written in Rust implementing Python 3.13.0+ compatibility with complete Python environment in Rust (not CPython bindings), clean implementation without compatibility hacks, and cross-platform support including WebAssembly compilation

Applied to files:

  • README.md
  • crates/vm/src/vm/mod.rs
  • src/lib.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When modifying bytecode instructions, perform a full clean build by running `rm -r target/debug/build/rustpython-* && find . | grep -E '\.pyc$' | xargs rm -r`

Applied to files:

  • crates/vm/src/vm/compile.rs
  • crates/vm/src/vm/mod.rs
  • scripts/generate_opcode_metadata.py
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/vm/src/vm/compile.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/python_run.rs
  • src/lib.rs
🧬 Code graph analysis (8)
crates/vm/src/builtins/code.rs (1)
crates/vm/src/import.rs (1)
  • check_pyc_magic_number_bytes (11-13)
crates/codegen/src/ir.rs (2)
crates/compiler-core/src/bytecode.rs (4)
  • encode_exception_table (47-59)
  • new (34-42)
  • new (540-542)
  • new (1150-1152)
crates/compiler-core/src/varint.rs (2)
  • write_signed_varint (22-30)
  • write_varint (9-17)
crates/vm/src/builtins/asyncgenerator.rs (1)
crates/vm/src/types/slot.rs (1)
  • del (1579-1579)
crates/stdlib/src/opcode.rs (1)
crates/compiler-core/src/bytecode.rs (3)
  • try_from (1131-1137)
  • try_from (1158-1163)
  • try_from (1172-1178)
crates/compiler-core/src/varint.rs (1)
crates/vm/src/builtins/code.rs (1)
  • read_varint (1116-1134)
crates/compiler-core/src/bytecode.rs (1)
crates/compiler-core/src/varint.rs (4)
  • read_varint (74-98)
  • read_varint_with_start (45-70)
  • write_varint (9-17)
  • write_varint_with_start (34-41)
crates/stdlib/src/ssl.rs (2)
crates/stdlib/src/socket.rs (13)
  • std (1252-1252)
  • std (1315-1315)
  • std (1415-1415)
  • std (1837-1837)
  • std (1843-1843)
  • std (1877-1877)
  • std (1883-1883)
  • std (2354-2354)
  • std (2397-2397)
  • std (2434-2434)
  • timeout_error_msg (3296-3298)
  • sock_select (2720-2776)
  • timeout (835-837)
crates/stdlib/src/ssl/error.rs (1)
  • create_ssl_want_write_error (112-118)
crates/vm/src/vm/python_run.rs (1)
crates/vm/src/import.rs (1)
  • check_pyc_magic_number_bytes (11-13)
🪛 markdownlint-cli2 (0.18.1)
README.md

74-74: Dollar signs used before commands without showing output

(MD014, commands-show-output)


75-75: Dollar signs used before commands without showing output

(MD014, commands-show-output)


76-76: Dollar signs used before commands without showing output

(MD014, commands-show-output)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check Rust code with clippy
🔇 Additional comments (58)
.cspell.dict/cpython.txt (1)

17-17: LGTM!

The addition of CLASSDEREF is correctly placed in alphabetical order within the spell-check dictionary. This entry aligns with the broader PR changes that expand the bytecode instruction set.

crates/jit/src/instructions.rs (2)

175-189: LGTM! Proper terminator checking before fallthrough jump.

The logic correctly checks whether the current block needs a terminator before inserting a fallthrough jump, preventing duplicate terminators that would cause Cranelift compilation errors.


208-216: LGTM! Correct unreachable code tracking for unconditional jumps.

Adding Jump to the unreachable code detection is semantically correct—after an unconditional jump, subsequent instructions are unreachable until a branch target label re-establishes reachability.

crates/vm/src/builtins/type.rs (1)

2001-2021: LGTM! Well-implemented fallback for non-PyType metaclasses.

The enhancement correctly handles metaclass resolution when fast_issubclass is inconclusive by falling back to the general is_subclass method. The logic properly:

  • Preserves the fast path optimization for common PyType metaclasses
  • Handles both directions of subclass checking
  • Propagates errors appropriately with the ? operator
  • Maintains the same control flow as the fast path

The comment at line 2010 clearly explains the purpose of the fallback.

Note: The PR title mentions "Updated the re library + test", but this file contains metaclass resolution changes for type.rs. This may indicate a title/description mismatch.

crates/vm/src/import.rs (1)

11-13: LGTM!

The implementation correctly validates the pyc magic number prefix. Using starts_with with the first 2 bytes is appropriate since PYC_MAGIC_NUMBER is a u16.

crates/vm/src/builtins/code.rs (1)

1045-1064: LGTM: Variable name resolution correctly extended.

The refactored logic now properly resolves indices across varnames, cellvars, and freevars with correct boundary checks and index adjustments.

crates/sre_engine/src/constants.rs (2)

17-64: SreOpcode enum looks complete and well-structured.

The enum includes all expected SRE opcodes including newer additions like ATOMIC_GROUP, POSSESSIVE_REPEAT, POSSESSIVE_REPEAT_ONE, and the various _UNI_IGNORE variants that correspond to Python 3.11+ regex features.


66-106: SreAtCode and SreCatCode enums are correctly defined.

Both enums cover the full set of anchor codes and category codes expected by the SRE engine. The sequential numbering and naming conventions match CPython's _constants.py.

crates/vm/src/frame.rs (9)

427-462: yield_from_target heuristic matches generator state layout and looks correct.

The logic to detect a suspended yield from / await:

  • Guards on non‑empty stack.
  • Uses lasti as the index of the next instruction (RESUME or Send) and checks the previous instruction for YieldValue with arg >= 1.
  • Returns the stack top as delegate only in that configuration.

That matches how this interpreter increments lasti in run and how the new YIELD_VALUE oparg is used; it should safely return None for unstarted frames or direct yields.


464-532: gen_throw integration with CLEANUP_THROW and exception tables is well‑structured but stack layout is brittle.

The new gen_throw behavior for delegated throw:

  • Correctly prefers Coro::throw for built‑in coroutines/generators and falls back to a throw attribute.
  • On failure, pushing None (so [receiver] -> [receiver, None]) and then calling unwind_blocks with Raising { exception } lines up with the comments and the CLEANUP_THROW handler’s expectation of [sub_iter, last_sent_val, exc] once the exception is pushed.
  • The direct “no delegate” path builds and contextualizes an exception, adds a traceback frame at the yield site, pushes None to establish [receiver, None], and then reuses unwind_blocks as for normal raises.

Given how tightly this relies on the compiler’s chosen handler depths and on the exception table’s push_lasti settings, I’d strongly suggest making sure there are tests that cover:

  • throw into plain generators, async generators, and yield from chains.
  • Delegates that implement and don’t implement .throw.
  • Interactions with surrounding try/finally cleanups.

If those pass, this path looks solid.


876-893: EndAsyncFor semantics align with CPython’s StopAsyncIteration handling.

Popping (exc, awaitable) and:

  • Treating StopAsyncIteration (by fast_isinstance) as the normal termination of an async for loop and clearing the VM’s current exception.
  • Re‑raising any other exception.

matches CPython’s behavior. The explicit downcast to PyBaseException will panic only if the compiler violates the expected stack layout, which is acceptable here.


969-999: GetAwaitable’s “already awaited” guard is a good safety check.

The new branch:

  • Detects PyCoroutine values and calls frame().yield_from_target() to approximate _PyGen_yf, raising RuntimeError("coroutine is being awaited already") if it’s non‑None.
  • Otherwise uses __await__ and validates that it returns an iterator.

This should prevent double‑awaiting the same coroutine while remaining compatible with custom awaitables. The only thing to watch is that yield_from_target correctly reflects all states in which a coroutine is already being awaited (i.e., that it’s never None when the coroutine is suspended in an await chain). If you have edge‑case tests around double‑await of coroutines, it’d be worth confirming they now pass and behave like CPython.


1164-1173: Comprehension opcodes (LoadFastAndClear / StoreFastLoadFast) look correct; rely on compiler invariants.

  • LoadFastAndClear does a take() on the fastlocal slot and pushes None when it’s empty, which matches inlined comprehension semantics where the temp may or may not have been initialized.
  • StoreFastLoadFast stores to one index and then immediately loads another (often the same) under a single lock, assuming the load slot is definitely populated afterward (expect).

Both rely on the compiler never emitting these opcodes in contexts where the slots are uninitialized in ways Python would treat as an error. As long as that invariant holds in the codegen, this is fine; otherwise, expect will panic. No changes needed, but it’s worth keeping an eye on codegen evolution if new patterns start using these instructions.

Also applies to: 1603-1617


1429-1521: New exception opcodes (PopBlock, PopException, PushExcInfo, CheckExcMatch, Reraise, WithExceptStart) match CPython’s 3.11+ model.

Highlights:

  • Treating PopBlock as a pseudo‑instruction is consistent with zero‑cost exception tables: the real cleanup is driven by the exception table, not the block stack.
  • PushExcInfo and PopException manage (prev_exc, exc) on the value stack while updating VM exception state, which is in line with CPython’s PUSH_EXC_INFO / POP_EXCEPT behavior.
  • CheckExcMatch simply returns a bool on the stack; type validation is done earlier in opcodes like JumpIfNotExcMatch, which is fine.
  • Reraise { depth } pops the given depth plus the exception and then re‑errors with either the exception or the current active exception, which is what you want for structured re‑raise sites.
  • WithExceptStart correctly locates __exit__ at stack_len - 4 given the documented stack layout and passes (type, value, tb) from vm.current_exception() into it.

Overall this cluster looks consistent with CPython’s new exception stack protocol. The main risk is divergence between assumed and actual stack layouts if future codegen tweaks these opcodes; adding targeted tests around nested with + except and except* would help lock this down.

Also applies to: 1532-1558, 1663-1682


1683-1764: YieldValue, Send, EndSend, and CleanupThrow implement PEP 380/PEP 525 semantics correctly.

  • YieldValue uses its oparg to distinguish plain yields (arg 0, wrapping in PyAsyncGenWrappedValue for coroutine code) from yield from/await (arg ≥ 1, no wrapping), which is what CPython does.
  • Send keeps (receiver, retval) on the stack after a normal send and, on StopIteration, replaces the top value with the stop value and jumps to the target label—matching the control‑flow pattern around YIELD_FROM.
  • EndSend removes the receiver and leaves only the value, as expected at the end of a send chain.
  • CleanupThrow’s treatment of StopIteration vs other exceptions (extracting the value and producing (None, value) vs re‑raising) matches CPython’s CLEANUP_THROW behavior, given your exception table comments.

These opcodes are subtle, but the stack comments match the code and the surrounding gen_throw logic. As long as the compiler’s exception table depths align with the documented stacks, they should behave correctly.


2808-2811: Frame’s Debug output is clearer now.

Including both Stack and Locals in the formatted string is useful when diagnosing fatal stack issues (e.g., underflows/overflows). No issues here.


1921-2005: Break/continue semantics with finally/with blocks are correctly implemented at compile time.

For Break/Continue opcodes inside try/finally or with blocks, the compiler (compile_break_continue) emits all required cleanup code before the Break/Continue instruction itself:

  • Finally bodies are compiled inline before the jump
  • With-block exit handlers are emitted before the jump
  • For-loop iterators are popped before the jump (for break only)

This compile-time approach ensures all cleanup executes before control transfers, correctly matching CPython's semantics where a finally block that executes break/continue will suppress any pending exception. The unwind_blocks function correctly handles only Break/Continue at runtime with a direct jump (no exception table lookup needed), since no exception can be raised by the jump itself—all exception handling paths are covered by exception table entries pointing to handler blocks, not by the Break/Continue instructions.

crates/vm/src/coroutine.rs (2)

71-107: Generator exception‑state swapping in run_with_context is a solid improvement; verify with_frame’s behavior matches assumptions.

The new logic:

  • Captures the saved generator exception (gen_exc) from self.exception.
  • Inside with_frame, pops the temporary None that with_frame pushed, then pushes gen_exc, runs the frame, and stores vm.current_exception() back into self.exception before with_frame pops.
  • Restores running afterwards as before.

This ensures each generator/coroutine resumes with its own exception context and that the caller’s exception chain is restored on exit, which should fix subtle traceback/context issues around send/throw. Just make sure with_frame really does push_exception(None) before invoking the closure and pop_exception() after, as assumed in the comments; otherwise the manual pop/push here could unbalance the exception stack.


170-178: close early‑exit for unstarted generators matches CPython semantics.

Short‑circuiting close() when frame.lasti() == 0 (never started):

  • Avoids injecting a GeneratorExit into a frame that has never run.
  • Still marks the coroutine/generator as closed.

That’s in line with CPython’s behavior and avoids unnecessary work or surprising side effects.

crates/compiler-core/src/lib.rs (1)

6-11: Publicly exposing varint is appropriate.

Adding pub mod varint; cleanly surfaces the new varint utilities to downstream crates (e.g., bytecode/exception‑table encoders) without affecting existing exports.

crates/vm/src/stdlib/ast.rs (1)

272-283: Parse error location expansion to include end_location looks correct.

Transforming parse_error.location: TextRange into a PySourceRange via text_range_to_source_range, then setting:

  • location from range.start.to_source_location()
  • end_location from range.end.to_source_location()

gives the compiler/VM access to both start and end positions, consistent with how other layers now report syntax locations. Using "<unknown>" for source_path when invoked through this AST helper also seems fine, since callers typically treat this as a generic parse API rather than a file‑bound compiler entrypoint.

crates/vm/src/vm/vm_new.rs (2)

65-80: new_scope_with_main correctly wires __main__ into globals and sys.modules.

  • Builds a scope with builtins.
  • Creates a __main__ module sharing that scope’s globals.
  • Initializes __main__.__annotations__ to an empty dict.
  • Registers the module in sys.modules["__main__"].

That’s a clean, self‑contained replacement for ad‑hoc main‑module setup. The dict().set_item(...).expect(...) usage is fine here given the dict is a fresh built‑in mapping.


332-525: SyntaxError now includes end position metadata, improving diagnostics.

The addition:

  • Calls error.python_end_location() and, when present, sets end_lineno and end_offset on the SyntaxError instance alongside the existing lineno/offset and text/filename.

This aligns with the extended location tracking you wired through the parser/compiler and matches how CPython exposes end_lineno/end_offset. No issues.

crates/vm/src/exceptions.rs (1)

220-270: SyntaxError caret rendering now correctly respects multi‑line ranges.

The updated logic:

  • Reads lineno, end_lineno, offset, and end_offset.
  • Only draws the caret line when end_lineno is equal to lineno (or when end_lineno is missing), avoiding misleading carets for multi‑line errors.
  • Derives a default end_offset from offset when it’s absent/0/−1 and ensures at least one caret is shown.
  • Computes caret width as end_colno - colno over the stripped leading whitespace, preserving the previous visual alignment.

This is a nice improvement for range‑aware SyntaxErrors and remains backward‑compatible for the older single‑point cases.

crates/codegen/src/error.rs (1)

79-81: LGTM!

The new InvalidAsyncFor, InvalidAsyncWith, and InvalidAsyncComprehension error variants follow the existing patterns and provide appropriate error messages consistent with CPython behavior.

Also applies to: 119-126

crates/compiler/src/lib.rs (1)

87-95: LGTM!

The python_end_location accessor correctly returns the end location for parse errors and None for codegen errors, matching the pattern of the existing python_location method.

src/lib.rs (2)

150-171: LGTM - Good implementation of run_file.

The function correctly mirrors CPython's pymain_run_file_obj behavior:

  • Checks for package directories via get_importer
  • Uses runpy._run_module_as_main for package execution
  • Properly handles sys.path[0] insertion based on safe_path setting

354-361: LGTM - Test updated to use new APIs.

The test correctly uses vm.new_scope_with_main() and exercises both direct file execution and directory package execution via run_file.

crates/vm/src/vm/compile.rs (1)

1-30: LGTM - Clean separation of concerns.

Good refactoring to separate compilation from execution. The module now has a single, clear responsibility, and the documentation properly directs users to python_run.rs for execution functions.

crates/vm/src/builtins/asyncgenerator.rs (2)

35-35: LGTM - Correct Destructor implementation for async generators.

The Destructor implementation properly follows CPython's _PyGen_Finalize pattern:

  • Skips finalization if the generator is already closed
  • Calls the finalizer hook through call_finalizer which ignores errors (matching PyErr_WriteUnraisable behavior)

Also applies to: 647-658


76-86: LGTM - Finalizer error handling is correct.

The call_finalizer method correctly ignores errors from the finalizer callback, matching CPython's behavior of using PyErr_WriteUnraisable to silently discard exceptions during finalization.

crates/compiler-core/src/varint.rs (3)

9-17: LGTM - Correct varint encoding implementation.

The 6-bit chunk encoding with 0x40 continuation bit follows the standard pattern used for Python bytecode exception tables.


22-30: LGTM - Correct zigzag encoding for signed integers.

The encoding correctly handles i32::MIN using wrapping_sub to avoid undefined behavior, and the zigzag transformation ((0 - val) << 1) | 1 for negatives / (val << 1) for positives is standard.


74-98: The 0x80 check difference is intentional by design. The read_varint in compiler-core/src/varint.rs is used for exception tables where entries are delimited by 0x80 markers (set by write_varint_with_start), while the read_varint in code.rs is for line tables which use linear byte streams without entry delimiters. The 0x80 check in varint.rs at line 87 is a defensive mechanism that breaks before consuming a byte with 0x80 set (when shift > 0), allowing the next read_varint_with_start call to correctly identify and read the next entry. These implementations serve different formats and are not interchangeable, but this is expected and safe by design. Adding documentation to clarify the purpose of each read_varint variant would reduce confusion.

scripts/generate_opcode_metadata.py (1)

225-226: No action required—HAVE_ARGUMENT = 44 is correctly positioned.

The constant correctly identifies the boundary between no-argument instructions (opcodes 0–43) and with-argument instructions (44+) in RustPython's Instruction enum. Position 44 corresponds to WithExceptStart, which matches the enum structure and is documented consistently across the bytecode definition, stdlib opcode module, and the generation script's own verification checks.

crates/vm/src/vm/mod.rs (1)

607-618: Verify new per‑frame exception stack isolation doesn’t regress sys.exc_info/context behavior

with_frame now unconditionally does:

  • push_exception(None) before running the frame, and
  • pop_exception() afterwards, discarding whatever set_exception stored in the top ExceptionStack entry during execution.

Because the actual error is propagated via the PyResult return, this is fine as long as:

  • No code relies on current_exception() (or topmost_exception()) still containing the just‑raised exception after run_frame/with_frame returns, and
  • All uses of push_exception/pop_exception around try/except blocks remain balanced with this new implicit per‑frame layer.

Given contextualize_exception and topmost_exception now walk the entire ExceptionStack and include cycle detection, the internal behavior looks sound, but I recommend double‑checking any callers that previously used take_exception/current_exception for sys.exc_info() or logging to ensure they still see the right value at the right time.

Also applies to: 908-920, 932-955

crates/compiler-core/src/bytecode.rs (5)

17-85: LGTM! Exception table implementation is solid.

The encoding and decoding logic correctly implements the 6-bit varint format with start markers. The use of saturating_sub for size calculation prevents underflow, and the handler lookup correctly uses half-open interval semantics [start, end).


614-629: Excellent documentation on RaiseKind variants.

The detailed comments explaining the mapping between each variant and Python syntax/CPython opcodes are very helpful. This makes the distinction between BareRaise (getting exception from VM state) and ReraiseFromStack (getting from stack) clear.


1922-2090: Stack effect implementation looks correct.

The updated stack_effect method properly handles:

  • Different RaiseKind variants with appropriate stack effects
  • Generator/async operations (Send, EndSend, CleanupThrow)
  • Defensive handling in BuildSlice using try_get with a sensible fallback

The comprehensive coverage and inline comments make the logic easy to follow.


2367-2436: Good test coverage for exception table functionality.

The test suite covers key scenarios:

  • Multiple entries with different configurations
  • Boundary conditions (inclusive start, exclusive end)
  • Edge cases (empty table, single entry)
  • Multiple lookups at various offsets

This provides solid confidence in the encoding/decoding implementation.


678-1111: The runtime guards and bytecode validation are actually in place. The VM explicitly calls unreachable!() in frame.rs for all placeholder instructions, and bytecode deserialization validates opcodes against LAST_INSTRUCTION. Additionally, codegen never emits these instructions.

However, there is an accuracy issue: GetYieldFromIter is marked as "not implemented, placeholder" in the bytecode enum comment, but the implementation at frame.rs:1008 shows it's fully implemented with logic to handle coroutines, generators, and other iterables. This documentation/label is misleading.

Likely an incorrect or invalid review comment.

crates/stdlib/src/opcode.rs (1)

165-171: Correct alignment with exception table design.

The updated has_exc now returns false for all opcodes with a clear explanation that exception handling is managed via the exception table introduced in bytecode.rs. This aligns with the Python 3.11+ zero-cost exception handling model.

crates/vm/src/vm/python_run.rs (3)

18-31: LGTM! Clean delegation pattern.

The run_any_filerun_simple_filewith_simple_run flow is clear and handles the edge case of empty paths appropriately.


72-102: Well-designed public API with proper error context.

The execution methods provide clear entry points for different use cases. Good practices observed:

  • Syntax errors include source context for better error messages
  • Deprecated run_code_string provides migration guidance
  • run_block_expr uses specialized Mode::BlockExpr for embedded expressions

126-151: Solid pyc detection with proper fallback logic.

The two-tier detection strategy is well-designed:

  1. Fast path: check .pyc extension
  2. Fallback: read magic number (2 bytes only, per CPython compatibility)

The comment about text mode (lines 147-149) is a helpful CPython compatibility note. The file is properly closed via RAII when the file variable goes out of scope.

crates/stdlib/src/ssl.rs (13)

2342-2347: LGTM - Well-documented buffering field

The pending_tls_output field with PyMutex<Vec<u8>> is appropriate for thread-safe buffering of unsent TLS bytes. The documentation clearly explains its purpose for preventing data loss in non-blocking scenarios.


2831-2888: LGTM - Correct handling of partial sends

The send_tls_output method correctly handles partial sends by saving unsent bytes to the pending_tls_output buffer. The error handling preserves data integrity in all failure modes (timeout, blocking IO error, other errors).


2727-2743: LGTM - Cleaner API with byte slice parameter

The signature change from PyObjectRef to &[u8] simplifies the internal API. Callers now pass raw bytes directly, and the method handles conversion to Python objects internally. This is more idiomatic for Rust code.


3593-3640: LGTM - Critical TLS 1.3 handshake fix

The flush operations before writing new data and after write_all are critical for correct TLS 1.3 behavior. This ensures the Finished message reaches the server before application data, preventing protocol violations. The integration with send_tls_output correctly handles partial sends.


3962-3986: LGTM - Critical data ordering fix in shutdown

The flush sequence before send_close_notify() is essential for TLS correctness:

  1. First drains rustls internal buffer to TLS records
  2. Then flushes pending_tls_output to socket
  3. Only then sends close_notify

This prevents data loss where the peer receives close_notify before all application data. The BIO vs socket mode distinction is correctly handled.


4040-4078: LGTM - Correct CPython/OpenSSL compatibility

The timeout handling correctly mirrors CPython/OpenSSL behavior by returning after sending close_notify. The tiered flush strategy (best-effort for non-blocking, timeout-respecting for timeout mode, blocking for blocking mode) is appropriate. The comment explaining the asyncore deadlock issue is valuable for future maintainers.


4103-4127: LGTM - Correct TLS record ordering

The flush_pending_tls_output call before draining new TLS data ensures proper ordering of TLS records. This prevents scenarios where new data could bypass previously buffered data.


4130-4178: LGTM - Improved EOF detection logic

The enhanced EOF detection distinguishes between "no data available yet" (BIO mode without EOF flag) and "peer closed connection" (socket EOF or BIO with EOF flag). The error handling that treats connection reset as peer closed is correct for shutdown scenarios.


3432-3436: LGTM - OpenSSL-compatible auto-handshake

Auto-completing the handshake on first read() matches OpenSSL's SSL_read() behavior. This is the expected behavior for applications that don't explicitly call do_handshake().


3564-3568: LGTM - Consistent with read method

The auto-handshake on write() is consistent with the read() implementation and matches OpenSSL's SSL_write() behavior.


1905-1905: LGTM - Correct field initialization

The pending_tls_output is correctly initialized with an empty Vec wrapped in PyMutex.


1976-1976: LGTM - Consistent with _wrap_socket

The field initialization is consistent with the _wrap_socket constructor.


2711-2724: LGTM - Comment clarifications

The comment updates accurately reflect that flags are not supported in BIO mode.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (10)
crates/compiler/src/lib.rs (1)

55-61: Use saturating subtraction for defensive safety.

As noted in a previous review, the unchecked subtraction at line 57 relies on the implicit assumption that error.location.end() is non-zero when this branch executes. While the condition logic suggests this is safe, using saturating_sub makes the intent explicit and guards against edge cases.

🔎 Suggested fix
-            let prev_line_end = error.location.end() - ruff_text_size::TextSize::from(1);
+            let prev_line_end = error.location.end().saturating_sub(ruff_text_size::TextSize::from(1));
crates/sre_engine/src/constants.rs (1)

108-120: Fix indentation in bitflags! macro.

As noted in a previous review, line 109's #[derive(...)] attribute lacks proper indentation. Since this file is auto-generated, the fix should be applied to scripts/generate_sre_constants.py in the dump_bitflags function to emit consistent indentation matching rustfmt style.

crates/vm/src/builtins/code.rs (1)

339-352: Fix compilation error: OsStr doesn't have a display() method.

Line 341 calls .display() on OsStr (from file_stem()), but this method doesn't exist on OsStr. This will cause a compilation error.

🔎 Proposed fix
     pub fn from_pyc_path(path: &std::path::Path, vm: &VirtualMachine) -> PyResult<PyRef<Self>> {
         let name = match path.file_stem() {
-            Some(stem) => stem.display().to_string(),
+            Some(stem) => stem.to_string_lossy().to_string(),
             None => "".to_owned(),
         };
         let content = std::fs::read(path).map_err(|e| e.to_pyexception(vm))?;
src/lib.rs (1)

173-202: Potential issue: cached None importers not handled correctly.

In CPython's path_importer_cache, entries can be None to indicate that no importer was found for a path. The current implementation at lines 179-181 returns Some(importer) for any cached value, including None. This could cause run_file to incorrectly treat a path as having an importer when it doesn't.

🔎 Proposed fix
     if let Some(importer) = path_importer_cache.get_item_opt(path, vm)? {
+        // CPython caches None for paths with no importer
+        if vm.is_none(&importer) {
+            return Ok(None);
+        }
         return Ok(Some(importer));
     }
crates/codegen/src/symboltable.rs (1)

352-382: PEP 709 merge timing: symbols merged after nested scope analysis.

The merge of inlined comprehension symbols into the parent scope happens at lines 352-382, which is after analyze_symbol_table is called on all sub_tables (lines 344-346). This means nested functions/scopes are analyzed before comprehension bindings are visible in the parent, causing incorrect scope resolution for walrus operator targets referenced in nested functions.

crates/stdlib/src/opcode.rs (1)

51-52: Off-by-one in has_arg already flagged.

This issue was identified in a previous review. The condition should be opcode >= Self::HAVE_ARGUMENT to match CPython semantics where opcodes 44+ have arguments.

crates/stdlib/src/ssl/compat.rs (2)

1197-1263: Handshake completion flush can misbehave on non‑blocking sockets

handle_handshake_complete in socket mode:

  • Sends post‑handshake TLS data via send_all_bytes, breaking on SslError::WantWrite but still proceeding.
  • Then unconditionally calls socket.blocking_flush_all_pending(vm) for all non‑BIO sockets, mapping any Python exception (including timeouts) to SslError::Py.

On non‑blocking sockets (timeout == 0), this means:

  • blocking_flush_all_pending() uses sock_select(..., timeout=0) and treats “not immediately writable” as a timeout, which is then surfaced as a socket.timeout instead of a WANT_WRITE‑style condition.
  • WantWrite coming from the earlier send_all_bytes loop is not propagated; the handshake may be treated as fully complete or fail with timeout, rather than signaling “try again when writable”.

This diverges from OpenSSL/CPython’s non‑blocking handshake behavior where completion is gated on I/O readiness and signaled back via SSLWantReadError/SSLWantWriteError.

I’d suggest:

  • Teaching blocking_flush_all_pending to treat timeout == 0 as non‑blocking and to yield a WANT_WRITE‑style error instead of a timeout when the socket isn’t immediately writable; and/or
  • Special‑casing non‑blocking sockets in handle_handshake_complete to avoid a fully blocking (or pseudo‑blocking) flush, propagating a WANT_WRITE condition when pending TLS data cannot be flushed.

1006-1078: send_all_bytes buffering semantics look sound but sent == 0 is ambiguous

The helper correctly:

  • Flushes pending_tls_output first to preserve TLS record ordering.
  • Stashes any unsent suffix into pending_tls_output on BlockingIOError, timeouts, or other send errors, avoiding data loss.

However, treating sent == 0 from sock_send() as SslError::WantWrite conflates “would block” with a potential connection-closed condition. For Python’s socket.send(), a return of 0 on a non-empty buffer more typically indicates the peer has closed the connection rather than EAGAIN/EWOULDBLOCK, which should surface as an error.

Consider distinguishing a genuine “zero bytes written on non-empty buffer” as a hard EOF (mapped to an SslError::Eof/Syscall variant) instead of WantWrite, or at least tightening this path based on platform semantics or additional checks.

crates/stdlib/src/ssl.rs (2)

3960-4077: New shutdown flow improves ordering but still depends on blocking_flush_all_pending semantics

The updated shutdown():

  • Before sending close_notify, flushes all existing TLS records:
    • Drains rustls’ internal write buffer via conn.write_tls and send_tls_output.
    • Then flushes pending_tls_output:
      • Socket mode: blocking_flush_all_pending(vm)? (guarantees all application data is sent before close_notify).
      • BIO mode: non‑blocking flush_pending_tls_output.
  • After close_notify is queued, uses write_pending_tls() and, for real sockets, returns immediately once our close_notify + pending bytes are flushed according to the socket’s timeout mode, aligning more closely with OpenSSL’s “one‑sided” shutdown semantics.

Ordering and data‑loss prevention look good, but the behavior on non‑blocking sockets again hinges on blocking_flush_all_pending:

  • With timeout == 0, that helper can raise socket.timeout instead of an SSLWantWriteError when the socket simply isn’t writable yet, which may surprise callers expecting non‑blocking shutdown() semantics.

Given this shares root cause with the earlier blocking_flush_all_pending issue, I’d address that helper’s non‑blocking behavior first; once fixed, this shutdown path should behave as intended for all modes.


2890-2938: blocking_flush_all_pending may mis‑signal non‑blocking readiness as timeout

blocking_flush_all_pending:

  • Repeatedly clones pending_tls_output and uses sock_select with the socket’s timeout to wait for writability, then sends via sock_send, draining the buffer as bytes are accepted.
  • On timed_out == true, it raises a socket.timeout unconditionally; on BlockingIOError, it just loops again.

For blocking sockets with a real timeout this is fine, but on non‑blocking sockets (timeout == 0):

  • sock_select(..., timeout=Some(0)) will often report “timed out” immediately when the socket isn’t writable, and this is surfaced as a hard socket.timeout rather than a WANT_WRITE‑style condition.
  • This method is called from compat::handle_handshake_complete and shutdown(), so non‑blocking SSLSocket users may see socket.timeout during handshake completion or shutdown where CPython would typically raise SSLWantWriteError (or simply return after sending as much as possible).

Consider:

  • Treating timeout == Some(0) specially: either skipping sock_select entirely and relying on sock_send’s BlockingIOError to indicate non‑writability, or mapping timed_out in that case to an SSLWantWriteError rather than a timeout.
  • Keeping the existing behavior for genuine timeouts (timeout > 0).

This ties into the handshake‑completion concern in compat.rs.

🧹 Nitpick comments (9)
crates/jit/src/instructions.rs (1)

175-190: Improved control flow handling with conditional terminator emission.

The logic correctly checks whether the current block needs a terminator before emitting a fallthrough jump, preventing unnecessary jumps when blocks are already properly terminated. The implementation properly handles both empty blocks and checks the last instruction's opcode.

However, the terminator-checking logic (lines 180-186) is duplicated at lines 221-223 in the post-processing section.

Consider extracting terminator check into a helper method
+    fn block_needs_terminator(&self, block: Block) -> bool {
+        match self.builder.func.layout.last_inst(block) {
+            None => true, // Empty block needs terminator
+            Some(inst) => {
+                // Check if the last instruction is a terminator
+                !self.builder.func.dfg.insts[inst].opcode().is_terminator()
+            }
+        }
+    }
+
     pub fn compile<C: bytecode::Constant>(
         ...

Then use it at both locations:

-                    // Check if the block needs a terminator by examining the last instruction
-                    let needs_terminator = match self.builder.func.layout.last_inst(cur) {
-                        None => true, // Empty block needs terminator
-                        Some(inst) => {
-                            // Check if the last instruction is a terminator
-                            !self.builder.func.dfg.insts[inst].opcode().is_terminator()
-                        }
-                    };
+                    let needs_terminator = self.block_needs_terminator(cur);
scripts/generate_opcode_metadata.py (1)

224-227: Consider deriving HAVE_ARGUMENT from the enum rather than hardcoding.

The constants HAVE_ARGUMENT = 44 and MIN_INSTRUMENTED_OPCODE = 236 are hardcoded. If the Instruction enum order changes, these values will become incorrect. The verification printout at lines 233-246 helps catch this, but consider computing HAVE_ARGUMENT dynamically (e.g., finding the index of WITH_EXCEPT_START which is the first opcode that takes an argument in CPython 3.13).

scripts/generate_sre_constants.py (3)

49-49: Remove debug print statement.

This appears to be leftover debugging code that will produce noisy output when the script runs.

Proposed fix
 def dump_enum(d, enum_name, derives, strip_prefix=""):
     items = sorted(d)
-    print(f"items is {items}")
     content = [f"{derives}\n"]

142-185: Remove commented-out dead code.

This large block of commented-out code appears to be an older implementation that has been replaced. Keeping it adds noise and confusion. If needed for reference, it can be retrieved from version control history.


76-76: Non-idiomatic conditional append.

Using an expression statement with a ternary for side effects is unusual. Consider using a standard if statement for clarity.

Proposed fix
     content = ["bitflags! {\n"]
-    content.append(f"{derives}\n") if derives else None
+    if derives:
+        content.append(f"{derives}\n")
     content.append(f"    pub struct {struct_name}: {int_t} {{\n")
crates/stdlib/src/opcode.rs (1)

57-69: Redundant Instruction::try_from after is_valid check.

is_valid already performs Instruction::try_from(opcode as u8).is_ok(), so the subsequent try_from in has_const (and similar methods) will always succeed. Consider extracting the instruction once in is_valid or using try_from directly without the is_valid guard.

Possible simplification pattern
         pub fn has_const(opcode: i32) -> bool {
-            if !Self::is_valid(opcode) {
-                return false;
-            }
-            if let Ok(instr) = Instruction::try_from(opcode as u8) {
+            if let Ok(instr) = (0..=255).contains(&opcode)
+                .then(|| Instruction::try_from(opcode as u8).ok())
+                .flatten()
+            {
                 matches!(
                     instr,
                     Instruction::LoadConst { .. } | Instruction::ReturnConst { .. }
                 )
             } else {
                 false
             }
         }

Or simply remove the is_valid check since try_from already validates:

pub fn has_const(opcode: i32) -> bool {
    (0..=255).contains(&opcode)
        && Instruction::try_from(opcode as u8)
            .map(|instr| matches!(instr, Instruction::LoadConst { .. } | Instruction::ReturnConst { .. }))
            .unwrap_or(false)
}
crates/vm/src/frame.rs (1)

783-792: Good defensive check for CopyItem stack underflow.

Adding explicit bounds checking with diagnostic output will help debug issues during development. Consider using debug_assert! instead of runtime panic for release builds if this is a compiler invariant.

crates/vm/src/vm/python_run.rs (1)

139-145: Use read_exact for more robust magic number reading.

read() may return fewer bytes than requested even when more are available (though rare for regular files). Using read_exact makes the intent clearer and handles edge cases.

Proposed fix
     let mut file = std::fs::File::open(path)?;
     let mut buf = [0u8; 2];

     use std::io::Read;
-    if file.read(&mut buf)? != 2 {
-        return Ok(false);
+    if file.read_exact(&mut buf).is_err() {
+        return Ok(false);
     }
crates/stdlib/src/ssl.rs (1)

2746-2826: flush_pending_tls_output ordering and error mapping are mostly good; consider fast‑path for BIO

flush_pending_tls_output:

  • Enforces the “old data first” invariant by always emptying pending_tls_output before any new TLS records are sent.
  • Honors an optional deadline by shrinking the effective select timeout and returning a socket.timeout if the deadline is exceeded, while preserving unsent data in pending.
  • Differentiates blocking vs non‑blocking sockets via get_socket_timeout(), mapping non‑blocking “would block” conditions to SSLWantWriteError (create_ssl_want_write_error) and again leaving unsent bytes buffered.

Two smaller points:

  1. In BIO mode, get_socket_timeout() returns Ok(None), and pending is never populated (since MemoryBIO.write() always accepts full buffers). The early pending.is_empty() check ensures you currently bail out before trying to derive a PySocket, but it would be more robust and clearer to explicitly skip the whole sock_select/PySocket path when self.is_bio_mode() is true.

  2. For non‑blocking sockets (timeout == 0), a “timed_out” result from sock_select(..., timeout=0) is treated as a write timeout. Conceptually this is a “would block now” condition rather than a true timeout; if you want closer parity with CPython, you could map that to SSLWantWriteError as well, instead of a socket.timeout, while still preserving pending contents.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 956cbfd and b0859ae.

⛔ Files ignored due to path filters (52)
  • Lib/_opcode_metadata.py is excluded by !Lib/**
  • Lib/importlib/_bootstrap_external.py is excluded by !Lib/**
  • Lib/pkgutil.py is excluded by !Lib/**
  • Lib/platform.py is excluded by !Lib/**
  • Lib/plistlib.py is excluded by !Lib/**
  • Lib/tempfile.py is excluded by !Lib/**
  • Lib/test/_test_multiprocessing.py is excluded by !Lib/**
  • Lib/test/test__opcode.py is excluded by !Lib/**
  • Lib/test/test_asyncgen.py is excluded by !Lib/**
  • Lib/test/test_cmd_line_script.py is excluded by !Lib/**
  • Lib/test/test_compile.py is excluded by !Lib/**
  • Lib/test/test_contextlib.py is excluded by !Lib/**
  • Lib/test/test_contextlib_async.py is excluded by !Lib/**
  • Lib/test/test_dis.py is excluded by !Lib/**
  • Lib/test/test_exceptions.py is excluded by !Lib/**
  • Lib/test/test_ftplib.py is excluded by !Lib/**
  • Lib/test/test_generators.py is excluded by !Lib/**
  • Lib/test/test_import/__init__.py is excluded by !Lib/**
  • Lib/test/test_import/data/circular_imports/import_cycle.py is excluded by !Lib/**
  • Lib/test/test_import/data/circular_imports/singlephase.py is excluded by !Lib/**
  • Lib/test/test_import/data/double_const.py is excluded by !Lib/**
  • Lib/test/test_import/data/package3/__init__.py is excluded by !Lib/**
  • Lib/test/test_import/data/package3/submodule.py is excluded by !Lib/**
  • Lib/test/test_import/data/package4/__init__.py is excluded by !Lib/**
  • Lib/test/test_import/data/package4/submodule.py is excluded by !Lib/**
  • Lib/test/test_importlib/test_util.py is excluded by !Lib/**
  • Lib/test/test_patma.py is excluded by !Lib/**
  • Lib/test/test_pkgutil.py is excluded by !Lib/**
  • Lib/test/test_platform.py is excluded by !Lib/**
  • Lib/test/test_plistlib.py is excluded by !Lib/**
  • Lib/test/test_subprocess.py is excluded by !Lib/**
  • Lib/test/test_tempfile.py is excluded by !Lib/**
  • Lib/test/test_threading.py is excluded by !Lib/**
  • Lib/test/test_traceback.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_async_case.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_discovery.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_loader.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_program.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_runner.py is excluded by !Lib/**
  • Lib/test/test_unittest/testmock/testasync.py is excluded by !Lib/**
  • Lib/test/test_unittest/testmock/testhelpers.py is excluded by !Lib/**
  • Lib/test/test_unittest/testmock/testpatch.py is excluded by !Lib/**
  • Lib/test/test_weakref.py is excluded by !Lib/**
  • Lib/threading.py is excluded by !Lib/**
  • Lib/token.py is excluded by !Lib/**
  • Lib/unittest/__init__.py is excluded by !Lib/**
  • Lib/unittest/async_case.py is excluded by !Lib/**
  • Lib/unittest/loader.py is excluded by !Lib/**
  • Lib/unittest/main.py is excluded by !Lib/**
  • Lib/weakref.py is excluded by !Lib/**
  • crates/codegen/src/snapshots/rustpython_codegen__compile__tests__nested_double_async_with.snap is excluded by !**/*.snap
  • crates/codegen/src/snapshots/rustpython_compiler_core__compile__tests__nested_double_async_with.snap is excluded by !**/*.snap
📒 Files selected for processing (33)
  • .cspell.dict/cpython.txt
  • .vscode/launch.json
  • .vscode/tasks.json
  • README.md
  • crates/codegen/src/compile.rs
  • crates/codegen/src/error.rs
  • crates/codegen/src/ir.rs
  • crates/codegen/src/symboltable.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/compiler-core/src/lib.rs
  • crates/compiler-core/src/varint.rs
  • crates/compiler/src/lib.rs
  • crates/jit/src/instructions.rs
  • crates/sre_engine/src/constants.rs
  • crates/stdlib/src/opcode.rs
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
  • crates/vm/src/builtins/asyncgenerator.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/builtins/type.rs
  • crates/vm/src/coroutine.rs
  • crates/vm/src/exceptions.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/import.rs
  • crates/vm/src/stdlib/ast.rs
  • crates/vm/src/version.rs
  • crates/vm/src/vm/compile.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/python_run.rs
  • crates/vm/src/vm/vm_new.rs
  • scripts/generate_opcode_metadata.py
  • scripts/generate_sre_constants.py
  • src/lib.rs
💤 Files with no reviewable changes (2)
  • .vscode/tasks.json
  • .vscode/launch.json
✅ Files skipped from review due to trivial changes (2)
  • README.md
  • .cspell.dict/cpython.txt
🚧 Files skipped from review as they are similar to previous changes (4)
  • crates/codegen/src/error.rs
  • crates/vm/src/coroutine.rs
  • crates/compiler-core/src/lib.rs
  • crates/vm/src/version.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/import.rs
  • crates/jit/src/instructions.rs
  • crates/vm/src/vm/vm_new.rs
  • crates/vm/src/stdlib/ast.rs
  • crates/compiler/src/lib.rs
  • crates/codegen/src/ir.rs
  • crates/stdlib/src/opcode.rs
  • crates/vm/src/builtins/asyncgenerator.rs
  • crates/vm/src/builtins/code.rs
  • crates/vm/src/exceptions.rs
  • crates/vm/src/frame.rs
  • crates/vm/src/builtins/type.rs
  • crates/stdlib/src/ssl/compat.rs
  • crates/codegen/src/symboltable.rs
  • src/lib.rs
  • crates/compiler-core/src/varint.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/compile.rs
  • crates/sre_engine/src/constants.rs
  • crates/stdlib/src/ssl.rs
  • crates/vm/src/vm/python_run.rs
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/generate_sre_constants.py
  • scripts/generate_opcode_metadata.py
🧠 Learnings (7)
📚 Learning: 2025-12-09T08:46:58.660Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6358
File: crates/vm/src/exception_group.rs:173-185
Timestamp: 2025-12-09T08:46:58.660Z
Learning: In crates/vm/src/exception_group.rs, the derive() method intentionally always creates a BaseExceptionGroup instance rather than preserving the original exception class type. This is a deliberate design decision that differs from CPython's behavior.

Applied to files:

  • crates/vm/src/frame.rs
  • crates/compiler-core/src/bytecode.rs
  • crates/vm/src/vm/mod.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • src/lib.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/compile.rs
  • crates/vm/src/vm/python_run.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When testing Python code, always use RustPython instead of the standard `python` command; use `cargo run -- script.py` or `cargo run` for interactive REPL

Applied to files:

  • src/lib.rs
  • crates/vm/src/vm/mod.rs
  • crates/vm/src/vm/python_run.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*.py : In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only

Applied to files:

  • src/lib.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: RustPython is a Python 3 interpreter written in Rust implementing Python 3.13.0+ compatibility with complete Python environment in Rust (not CPython bindings), clean implementation without compatibility hacks, and cross-platform support including WebAssembly compilation

Applied to files:

  • src/lib.rs
  • crates/vm/src/vm/mod.rs
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: When modifying bytecode instructions, perform a full clean build by running `rm -r target/debug/build/rustpython-* && find . | grep -E '\.pyc$' | xargs rm -r`

Applied to files:

  • scripts/generate_opcode_metadata.py
  • crates/vm/src/vm/mod.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/sre_engine/src/constants.rs
🔇 Additional comments (48)
crates/vm/src/builtins/type.rs (1)

2009-2021: I am unable to verify this review comment because the repository cannot be accessed in the current environment. The network connectivity issue prevents me from examining the code, verifying test coverage, checking clippy/fmt compliance, or comparing the implementation against CPython's behavior.

To properly verify this review comment, direct access to the codebase is required to:

  • Examine the full function implementation and verify the logic is correct
  • Search for existing test cases that exercise custom metaclass resolution
  • Run cargo clippy and cargo fmt to ensure compliance with Rust best practices
  • Compare the approach with CPython's _PyType_CalculateMetaclass implementation
crates/jit/src/instructions.rs (2)

208-216: Correct extension of unreachability tracking.

Adding Instruction::Jump to the unreachability tracking is correct—an unconditional jump means subsequent instructions are unreachable until the next label target. This change is consistent with the updated control flow handling.


219-228: Proper handling of unterminated blocks in post-processing.

The post-processing correctly checks if the final block needs a terminator and inserts a trap when necessary, ensuring all blocks are properly terminated. This complements the earlier label-handling improvements.

As noted in the previous comment, the terminator-checking logic here is identical to lines 180-186 and could benefit from extraction to a helper method.

crates/vm/src/builtins/asyncgenerator.rs (4)

11-11: LGTM! Destructor trait import added correctly.

The import is properly placed and necessary for the destructor implementation below.


35-35: LGTM! Destructor trait correctly registered with the class.

The trait is properly added to the with() clause, enabling the destructor implementation for async generators.


76-86: LGTM! Comment clarifications improve documentation.

The updated comments clearly document the function's behavior, especially the intentional error suppression that matches CPython's PyErr_WriteUnraisable semantics.


647-658: Implementation correctly follows PEP 525 async generator finalization semantics.

The destructor properly handles GC-time finalization by checking if the generator is already closed (preventing double-finalization) and calling the finalizer hook when needed. This matches PEP 525's requirement for async generators to trigger a finalizer during garbage collection, which typically schedules aclose() on the event loop.

Note: The claim about duplicate Destructor implementations could not be independently verified due to access constraints.

crates/vm/src/stdlib/ast.rs (1)

274-283: LGTM! Clean integration of end_location into parse error mapping.

The error mapping correctly computes both start and end locations from the text range and populates the new end_location field. This aligns with the broader PR goal of providing more precise error localization.

crates/vm/src/exceptions.rs (1)

226-269: LGTM! Well-structured caret highlighting for single-line syntax errors.

The logic correctly:

  1. Determines if the error spans a single line via same_line check
  2. Defaults to single-line behavior when line info is missing
  3. Ensures a minimum caret width of 1
  4. Skips caret rendering entirely for multi-line errors

This improves the user experience by providing accurate error highlighting without misleading carets on multi-line spans.

crates/compiler/src/lib.rs (1)

87-95: LGTM! Clean API addition for end location retrieval.

The python_end_location() method follows the same pattern as python_location() and appropriately returns Option to handle cases where end location isn't available (codegen errors).

crates/vm/src/vm/compile.rs (1)

1-30: LGTM! Clean separation of compilation from execution concerns.

The module now has a clear single responsibility with helpful documentation pointing to python_run.rs for execution functions. The trimmed imports reflect the narrowed scope.

crates/vm/src/vm/vm_new.rs (2)

65-80: LGTM with minor note on .expect() usage.

The new_scope_with_main() method cleanly encapsulates the __main__ module setup. The .expect() on line 71 is acceptable since it operates on a freshly created dict where set_item shouldn't fail. The subsequent operations correctly propagate errors via PyResult.


502-514: LGTM! Proper integration of end location attributes.

The conditional setting of end_lineno and end_offset correctly leverages the new python_end_location() API and follows the same pattern as the existing lineno/offset handling.

crates/vm/src/import.rs (1)

11-13: Verify intent: Only checking first 2 bytes of 4-byte PYC magic number.

According to the Python bytecode format standard, the PYC magic number consists of 4 bytes: a 2-byte version tag followed by 0x0D 0x0A (CR LF). This function only validates the first 2 bytes via &crate::version::PYC_MAGIC_NUMBER_BYTES[..2]. Confirm whether this is intentional (e.g., only validating the version tag while allowing flexibility on the CR LF bytes) or if the full 4-byte check is required. If intentional, add a comment explaining the design choice.

crates/vm/src/builtins/code.rs (2)

353-376: LGTM!

The from_pyc function correctly validates the magic number, extracts the code bytes after the 16-byte header, and delegates to the Python _compile_bytecode function. Error handling is appropriate.


1041-1066: LGTM!

The _varname_from_oparg method correctly resolves variable names by computing the appropriate index into varnames, cellvars, or freevars based on the oparg value. This aligns with CPython's variable indexing scheme.

src/lib.rs (2)

150-171: LGTM!

The run_file function correctly follows CPython's pymain_run_file_obj pattern: checking for package directories via get_importer, using runpy._run_module_as_main for packages, and properly handling sys.path[0] insertion based on the safe_path setting.


350-365: LGTM!

Tests are properly updated to use the new vm.new_scope_with_main() API and test both direct file execution via run_any_file and module/directory execution via run_file.

scripts/generate_opcode_metadata.py (1)

21-28: LGTM!

The variant extraction logic correctly skips comment lines and extracts PascalCase variant names. This is adequate for the expected format of the Instruction enum.

crates/compiler-core/src/varint.rs (2)

72-98: Verify read_varint semantics for entry boundary detection.

Line 87 checks byte & 0x80 != 0 && shift > 0 to detect the start of the next entry. This means on the first byte (shift == 0), an 0x80 marker won't trigger a break. This appears intentional for reading varints that don't have start markers, but if this function is called mid-stream where entries have start markers, the first byte's 0x80 bit would be consumed as part of the value rather than being recognized as a boundary.

Ensure this is the intended behavior for all call sites.


6-41: LGTM!

The varint encoding functions correctly implement 6-bit chunking with continuation bits. The zig-zag encoding for signed values properly handles i32::MIN using wrapping subtraction. The write_varint_with_start correctly sets the 0x80 marker on the first byte.

crates/codegen/src/symboltable.rs (1)

1503-1517: LGTM for the inlining logic structure.

The comprehension inlining decision based on is_generator and parent_can_see_class follows the general PEP 709 pattern. Generator expressions correctly pass is_generator = true (line 1281) while list/set/dict comprehensions pass is_generator = false.

crates/vm/src/vm/mod.rs (4)

463-497: LGTM!

The run_pyc_bytes method provides a clean API for executing in-memory .pyc bytecode, with good documentation including usage examples. The delegation to PyCode::from_pyc and with_simple_run is appropriate.


539-571: LGTM!

The with_simple_run helper correctly manages __file__ and __cached__ attributes on the main module, ensuring proper cleanup after execution. The IO flush ensures output is written before returning.


600-618: LGTM!

The frame-local exception isolation (push None before execution, pop after) correctly prevents exception state from leaking between function calls, aligning with Python's scoped exception semantics.


932-958: LGTM!

The Floyd's cycle detection in contextualize_exception correctly identifies pre-existing cycles in the exception context chain. The toggle-based slow pointer advancement (every other iteration) is a valid variant of the tortoise-and-hare algorithm that will detect cycles.

crates/codegen/src/ir.rs (4)

86-105: LGTM!

The ExceptHandlerInfo struct cleanly captures exception handler metadata (target block, stack depth, preserve_lasti flag), and integrating it into InstructionInfo enables per-instruction handler tracking for exception table generation.


332-343: LGTM!

The iteration limit safety guard is a good defensive measure. Returning InternalError::StackOverflow instead of silently breaking ensures that pathological cases don't result in underestimated stack depths that would cause runtime crashes.


390-418: LGTM!

The exception handler depth calculation correctly accounts for the stack state at handler entry: base stack_depth, plus 1 for the exception value, plus 1 for lasti if preserve_lasti is set. This depth represents the stack state before PUSH_EXC_INFO processes.


603-676: LGTM!

The exception table generation correctly implements a state machine that tracks handler regions, emitting entries when handlers change or end. The use of block_to_index properly resolves handler targets to instruction indices (including EXTENDED_ARG offsets).

crates/compiler-core/src/bytecode.rs (4)

17-43: Well-structured exception table entry.

The ExceptionTableEntry struct cleanly captures the CPython 3.11+ exception table format with appropriate field types and documentation.


45-85: Exception table encoding and lookup implementation looks correct.

The varint encoding follows CPython's format, and the lookup correctly handles the half-open interval [start, end). The early return on finding a matching handler is appropriate.


2366-2436: Good test coverage for exception table encoding/decoding.

The tests cover the essential cases: multiple entries, empty table, single entry, and boundary conditions (inclusive start, exclusive end).


994-998: Verify StoreFastLoadFast argument encoding.

This instruction has two Arg<NameIdx> fields, but CPython's STORE_FAST_LOAD_FAST packs both indices into a single oparg using INSTR_FMT_IB format (low byte for store_idx, high byte for load_idx). Ensure the argument decoding logic correctly unpacks both indices from the single operand, or clarify if this is handled differently.

crates/vm/src/frame.rs (6)

346-418: Exception handling refactor with is_reraise detection.

The logic correctly distinguishes between new exceptions and re-raises to avoid duplicate traceback entries. The handling of BareRaise and ReraiseFromStack as reraise operations aligns with CPython semantics.


1505-1520: Verify PushExcInfo sets exception before getting prev_exc.

The current implementation gets prev_exc from vm.current_exception() and then sets the new exception. This ordering is correct per CPython semantics. However, ensure the exception is set before the handler code runs (which happens after the jump in unwind_blocks).


1727-1763: CleanupThrow exception handling looks correct.

The logic properly handles StopIteration by extracting its value and cleaning up the stack, while re-raising other exceptions. The stack manipulation (popping exc, last_sent_val, sub_iter) matches the expected state after a failed throw.


1920-2006: Exception table-based unwinding implementation.

The unwind_blocks refactor correctly:

  1. Looks up handlers via find_exception_handler
  2. Pops stack to the handler's depth
  3. Pushes lasti if push_lasti is set (for cleanup handlers)
  4. Pushes exception and jumps to handler

The traceback cycle-breaking logic on return is a thoughtful addition for RustPython's non-GC environment.


2743-2758: Improved panic message for stack underflow.

The detailed diagnostic information (code name, lasti, op, source path) will be valuable for debugging compiler bugs. This is a good defensive improvement.


1603-1617: Defensive programming needed: StoreFastLoadFast unsafe expect() without validation.

The code unconditionally expects the load slot to contain a value after storing to a different slot, but there's no validation that load_idx is initialized. The comment indicates load and store indices "often" (not always) match. If store_idx != load_idx and load_idx was never assigned, the expect() will panic. Either verify that store_idx == load_idx always holds at the VM level and document this invariant, or handle the None case gracefully with an error.

crates/vm/src/vm/python_run.rs (1)

10-103: Well-structured Python execution API.

The module provides a clean interface for running Python code from various sources, with proper __main__ setup and loader configuration. The deprecation of run_code_string in favor of run_string is a good API cleanup.

crates/stdlib/src/ssl/compat.rs (1)

1457-1555: ssl_read integration with pending‑output buffering looks consistent

The new behavior in ssl_read:

  • Flushes pending_tls_output with a deadline before starting the read loop, ensuring prior writes reach the peer before waiting for a response.
  • When conn.wants_write() during a read (e.g. key update, post‑handshake messages), it generates TLS records via ssl_write_tls_records() and sends them with send_all_bytes, using the same deadline and treating WantWrite/Timeout from this flush as non‑fatal (data is buffered for later).

This matches the intended design: ordering is preserved, read timeouts are respected, and non‑blocking semantics are maintained by buffering rather than failing the read for incomplete writes. No changes requested here.

crates/stdlib/src/ssl.rs (6)

1887-1907: pending_tls_output wiring into PySSLSocket construction looks correct

Both _wrap_socket and _wrap_bio now initialize pending_tls_output with an empty Vec<u8>, and the field is added to PySSLSocket with clear documentation. The field is consistently marked pub(crate) and #[pytraverse(skip)], which is appropriate for an internal transport buffer and avoids GC traversal overhead.

No issues from a construction or ownership perspective.

Also applies to: 1957-1978, 2300-2354


2711-2744: sock_recv/sock_send abstractions are consistent with prior design

The updated sock_send(&self, data: &[u8], vm):

  • Correctly handles BIO mode by delegating to outgoing.write() and returning its result.
  • Uses socket.socket.send(self.sock, data) in socket mode, keeping the same duck‑typed “raw socket” approach as elsewhere in the module.

Given the new buffering logic depends on sock_send returning the number of bytes actually accepted, this interface is suitable; callers convert the Python return value to isize/usize and handle short writes and zero returns explicitly.

No changes requested here.


2831-2888: send_tls_output correctly feeds pending_tls_output on partial/blocked sends

send_tls_output:

  • Writes the provided TLS buffer in a loop, using sock_wait_for_io_impl(Write) to respect blocking vs timeout vs non‑blocking modes.
  • On timeout from sock_wait_for_io_impl, or BlockingIOError / zero bytes written / other send errors, it appends the unsent suffix into pending_tls_output and returns either a socket.timeout, SSLWantWriteError, or the underlying Python exception.

This matches the intended design: rustls’ internal TLS buffers are drained once into buf, and any network‑level backpressure is captured in pending_tls_output rather than silently dropping or re‑generating TLS records.

No changes requested here.


3432-3436: Auto‑completing the handshake in read is consistent with CPython

The change to read():

  • Ensures self.do_handshake(vm)? is called if handshake_done is still false, before any application data is read.
  • Then delegates to compat::ssl_read, mapping SslError variants to the right Python exceptions.

This matches OpenSSL/CPython behavior where SSL_read() implicitly completes the handshake. Given that do_handshake() already raises SSLWantReadError/SSLWantWriteError on non‑blocking sockets, the new behavior looks correct.

Also applies to: 3476-3477


3564-3568: write’s use of pending‑output flushing and chunked TLS writes is sensible

In write():

  • The handshake is auto‑completed if needed (self.do_handshake(vm)?).
  • For non‑BIO sockets, flush_pending_tls_output(vm, None)? is called before writing new application data, preserving TLS record order.
  • Data is written in 16KB chunks through conn.writer(), each followed by flush() and (for socket mode) a loop that:
    • Flushes any existing pending_tls_output.
    • Drains rustls’ TLS writer via conn.write_tls.
    • Sends resulting TLS bytes via send_tls_output, stashing unsent bytes into pending_tls_output.

This ensures:

  • Previously buffered TLS records are always pushed before new ones.
  • TLS internal buffers are not allowed to grow unbounded.
  • Network backpressure is handled via pending_tls_output rather than lost writes.

No issues from a correctness perspective.

Also applies to: 3592-3641


4103-4127: write_pending_tls and try_read_close_notify correctly integrate with pending buffer and BIO EOF

  • write_pending_tls now:

    • Flushes pending_tls_output before emitting new TLS records.
    • Uses send_tls_output so any partial sends are captured in pending_tls_output rather than dropped.
  • try_read_close_notify:

    • For BIO mode, distinguishes “no data yet” (empty read with eof == False) from actual EOF (empty read with eof == True), only treating the latter as peer close.
    • For sockets, treats empty reads and non‑blocking errors appropriately: BlockingIOError → “no data yet”; other errors/EOF → peer closed.
    • Feeds any non‑empty data back into rustls via read_tls/process_new_packets to allow it to observe close_notify.

These look correct and line up with the new shutdown logic’s expectations.

Also applies to: 4130-4177

@terryluan12
Copy link
Contributor Author

@terryluan12 This PR is including lots of unrelated previously merged commits. please use git rebase to clean them up and cut the own changes belong to this PR.

Whoops, sorry; accidentally rebased instead of merging from main. Should be good now

@youknowone
Copy link
Member

Either way okay, but by removing already merged commits, rebase usually makes the history usually cleaner.

main(*sys.argv[1:])


# dump(f, OPCODES, "SreOpcode", "u32", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these lines supposed to be deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... yep. They were deleted in a commit which I believe was lost to the ether on rebase lol

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! looks good . Please check the commented out part of the script.

Removed commented-out code related to generating SRE constants.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
scripts/generate_sre_constants.py (1)

49-49: Remove debug print statement.

This debug print should be removed before merging to production.

🔎 Proposed fix
-    print(f"items is {items}")
🧹 Nitpick comments (2)
scripts/generate_sre_constants.py (2)

76-76: Prefer standard if statement for side effects.

Using a conditional expression solely for its side effect is non-idiomatic Python. A standard if statement is clearer.

🔎 Proposed fix
-    content.append(f"{derives}\n") if derives else None
+    if derives:
+        content.append(f"{derives}\n")

86-133: Consider adding error handling for robustness.

The script lacks error handling around file operations and the exec() call. While the current implementation works for the controlled input file, adding defensive checks would provide clearer error messages and improve maintainability.

Consider:

  • Wrapping file operations in try-except blocks with informative error messages
  • Validating that required keys ('MAGIC', 'OPCODES', 'ATCODES', 'CHCODES') exist in ns after the exec() call
  • Catching and reporting exceptions from exec() with context about which file failed
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0859ae and 583e58b.

📒 Files selected for processing (1)
  • scripts/generate_sre_constants.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/generate_sre_constants.py
🔇 Additional comments (2)
scripts/generate_sre_constants.py (2)

7-16: LGTM! Efficient file update pattern.

The function correctly avoids unnecessary file writes by comparing content first, which is good for build system performance.


55-55: The project's minimum Python version is documented in DEVELOPMENT.md as "CPython version 3.13 or higher" and enforced in CI workflows (using 3.13.1). The removeprefix() method (available in Python 3.9+) is fully compatible with this requirement.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@youknowone youknowone merged commit 8977c39 into RustPython:main Jan 5, 2026
13 checks passed
terryluan12 added a commit to terryluan12/RustPython that referenced this pull request Jan 5, 2026
* Updated re library + test

* Copied over generate_sre_constants from cpython/Tools

* Customized `generate_sre_constants.py` + ran to update `constants.rs`

* Clarified `dump_enum` docstring in `generate_sre_constants.py`
@terryluan12 terryluan12 deleted the update_re branch January 5, 2026 19:06
terryluan12 added a commit to terryluan12/RustPython that referenced this pull request Jan 15, 2026
* Updated re library + test

* Copied over generate_sre_constants from cpython/Tools

* Customized `generate_sre_constants.py` + ran to update `constants.rs`

* Clarified `dump_enum` docstring in `generate_sre_constants.py`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants