Skip to content

Conversation

@terryluan12
Copy link
Contributor

@terryluan12 terryluan12 commented Jan 4, 2026

Also includes updates to sre_compile.py, sre_constants.py, sre_parse.py

Summary by CodeRabbit

  • Chores

    • Updated internal engine version identifier.
    • Removed a deprecated flag option from the engine's public flags.
    • Added an automated script to regenerate engine constants from source definitions.
  • User-visible

    • Pattern/regex formatting no longer displays the removed flag when showing compiled patterns.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

Important

Review skipped

Review was skipped due to path filters

⛔ Files ignored due to path filters (8)
  • Lib/test/test_zipfile/_path/test_path.py is excluded by !Lib/**
  • Lib/test/test_zipfile/test_core.py is excluded by !Lib/**
  • Lib/test/test_zipimport.py is excluded by !Lib/**
  • Lib/test/zipimport_data/sparse-zip64-c0-0x000000000.part is excluded by !Lib/**
  • Lib/test/zipimport_data/sparse-zip64-c0-0x100000000.part is excluded by !Lib/**
  • Lib/test/zipimport_data/sparse-zip64-c0-0x200000000.part is excluded by !Lib/**
  • Lib/zipfile/__init__.py is excluded by !Lib/**
  • Lib/zipimport.py is excluded by !Lib/**

CodeRabbit blocks several paths by default. You can override this behavior by explicitly including those paths in the path filters. For example, including **/dist/** will override the default block on the dist directory, by removing the pattern from both the lists.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Updates SRE constants and tooling: SRE_MAGIC value changed, the SreFlag TEMPLATE bit removed, VM repr mapping no longer lists TEMPLATE, and a new Python script generates crates/sre_engine/src/constants.rs from Lib/re/_constants.py.

Changes

Cohort / File(s) Summary
Constants file
crates/sre_engine/src/constants.rs
pub const SRE_MAGIC changed from 20221023 to 20230612; removed TEMPLATE = 1 from SreFlag bitflags.
VM stdlib repr
crates/vm/src/stdlib/sre.rs
Removed re.TEMPLATE from the Pattern flag-name mapping used by repr_str (affects textual flag listing only).
Generator script
scripts/generate_sre_constants.py, Lib/re/_constants.py
Added scripts/generate_sre_constants.py which reads Lib/re/_constants.py and emits crates/sre_engine/src/constants.rs; includes update_file, dump_enum, dump_bitflags, and main for idempotent generation of enums/bitflags and SRE_MAGIC.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I nibbled constants late at night,
A number hopped to a newer light,
One tiny flag hopped out of view,
A Python pen sketched Rust anew,
Soft bytes and carrots shining bright.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ⚠️ Warning The PR title mentions only zipfile/zipimport libraries, but the actual changeset includes SRE engine constant updates, a new constant generation script, and flag removals—core changes not referenced in the title. Update the title to reflect the primary changes: 'Update SRE constants generation and remove TEMPLATE flag' or similar, or clarify why SRE changes are secondary to zipfile updates.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 4, 2026

Code has been automatically formatted

The code in this PR has been formatted using cargo fmt --all.
Please pull the latest changes before pushing again:

git pull origin update_zipfile_zipimport

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
scripts/generate_sre_constants.py (2)

7-16: Consider clarifying ValueError handling.

The function correctly avoids unnecessary writes when content hasn't changed. However, catching ValueError in addition to OSError is unusual for file operations. If there's a specific reason for catching ValueError, consider adding a comment. Otherwise, OSError alone should suffice.

🔎 Simplified exception handling
 def update_file(file, content):
     try:
         with open(file, "r") as fobj:
             if fobj.read() == content:
                 return False
-    except (OSError, ValueError):
+    except OSError:
         pass
     with open(file, "w") as fobj:
         fobj.write(content)
     return True

60-82: Minor style improvement for conditional append.

The function correctly generates bitflags definitions. Line 75 uses a conditional expression with append() which always returns None. While functional, this is slightly unconventional.

🔎 Cleaner conditional logic
     content = ["bitflags! {\n"]
-    content.append(f"{derives}\n") if derives else None
+    if derives:
+        content.append(f"{derives}\n")
     content.append(f"    pub struct {struct_name}: {int_t} {{\n")
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eee3608 and b95199b.

⛔ Files ignored due to path filters (18)
  • Lib/re/__init__.py is excluded by !Lib/**
  • Lib/re/_casefix.py is excluded by !Lib/**
  • Lib/re/_compiler.py is excluded by !Lib/**
  • Lib/re/_constants.py is excluded by !Lib/**
  • Lib/re/_parser.py is excluded by !Lib/**
  • Lib/sre_constants.py is excluded by !Lib/**
  • Lib/test/test_re.py is excluded by !Lib/**
  • Lib/test/test_runpy.py is excluded by !Lib/**
  • Lib/test/test_socket.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_async_case.py is excluded by !Lib/**
  • Lib/test/test_zipfile/_path/test_path.py is excluded by !Lib/**
  • Lib/test/test_zipfile/test_core.py is excluded by !Lib/**
  • Lib/test/test_zipimport.py is excluded by !Lib/**
  • Lib/test/zipimport_data/sparse-zip64-c0-0x000000000.part is excluded by !Lib/**
  • Lib/test/zipimport_data/sparse-zip64-c0-0x100000000.part is excluded by !Lib/**
  • Lib/test/zipimport_data/sparse-zip64-c0-0x200000000.part is excluded by !Lib/**
  • Lib/zipfile/__init__.py is excluded by !Lib/**
  • Lib/zipimport.py is excluded by !Lib/**
📒 Files selected for processing (3)
  • crates/sre_engine/src/constants.rs
  • crates/vm/src/stdlib/sre.rs
  • scripts/generate_sre_constants.py
💤 Files with no reviewable changes (1)
  • crates/vm/src/stdlib/sre.rs
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/sre_engine/src/constants.rs
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/generate_sre_constants.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (7)
crates/sre_engine/src/constants.rs (4)

6-7: LGTM! Clear documentation of auto-generation.

The header correctly indicates this file is auto-generated from Python source, which helps prevent inadvertent manual edits and improves maintainability.


65-121: LGTM! Improved code organization.

The additional blank lines between enum and bitflags definitions improve readability and align with Rust formatting conventions.


108-120: TEMPLATE flag removal is complete and has no breaking references.

The TEMPLATE flag has been successfully removed from SreFlag, and verification confirms no remaining references to SRE_FLAG_TEMPLATE or re.TEMPLATE exist in the codebase. This aligns with Python's deprecation of the flag in 3.6+.


16-16: Verify SRE_MAGIC aligns with the target CPython version being synchronized.

The SRE_MAGIC constant is a version identifier for the regex engine and is properly exported to the Python layer as MAGIC in the sre module. Changing this value could affect regex pattern compilation compatibility.

Confirm that the value 20230612 matches the SRE_MAGIC in the specific CPython version (3.12 or 3.13) being synchronized. Check the target CPython source or run python -c "import _sre; print(_sre.MAGIC)" on the target version to verify alignment.

Note: The .pyc file references in the codebase are for Python bytecode, not regex pattern caches, so the primary concern is ensuring the engine version is compatible with the source being synchronized.

scripts/generate_sre_constants.py (3)

1-5: LGTM! Clear script identification.

The shebang and documentation clearly identify the script's purpose and usage.


85-132: Build script with trusted input - exec() usage is acceptable.

The exec() call on line 92 executes code from Lib/re/_constants.py. While exec() generally poses security risks, this is acceptable here because:

  1. The input file is part of the trusted codebase
  2. This is a build-time script run by developers, not production code
  3. The script's purpose is to read Python constants and generate Rust code

The generated constants correctly match the structure in constants.rs.


141-184: Remove commented-out code.

This large block of commented-out code should be removed. Version control preserves history if needed, and dead code reduces maintainability. Additionally, the hardcoded list at lines 158-168 includes the now-deprecated TEMPLATE flag, making it outdated.

🔎 Remove dead code
-
-#         dump(f, OPCODES, "SreOpcode", "u32", "")
-#         dump(f, ATCODES, "SreAtCode", "u32", "AT_")
-#         dump(f, CHCODES, "SreCatCode", "u32", "CATEGORY_")
-#
-#         def bitflags(typ, int_t, prefix, flags):
-#             f.write(f"""\
-# bitflags! {{
-#     pub struct {typ}: {int_t} {{
-# """)
-#             for name in flags:
-#                 val = globals()[prefix + name]
-#                 f.write(f"        const {name} = {val};\n")
-#             f.write("""\
-#     }
-# }
-# """)
-#
-#         bitflags("SreFlag", "u16", "SRE_FLAG_", [
-#             "TEMPLATE",
-#             "IGNORECASE",
-#             "LOCALE",
-#             "MULTILINE",
-#             "DOTALL",
-#             "UNICODE",
-#             "VERBOSE",
-#             "DEBUG",
-#             "ASCII",
-#         ])
-#
-#         bitflags("SreInfo", "u32", "SRE_INFO_", [
-#             "PREFIX", "LITERAL", "CHARSET",
-#         ])
-#
-#     print("done")
-
-
-# if __name__ == "__main__":
-#     import sys
-#     if len(sys.argv) > 1:
-#         constants_file = sys.argv[1]
-#     else:
-#         import os
-#         constants_file = os.path.join(os.path.dirname(__file__), "../../sre-engine/src/constants.rs")
-#     with open(constants_file, "w") as f:

Likely an incorrect or invalid review comment.

@terryluan12 terryluan12 changed the title Update the zipfile, zipimport and re libraries + tests Update the zipfile, zipimport and re libraries + tests v3.13.11 Jan 4, 2026
@terryluan12 terryluan12 changed the title Update the zipfile, zipimport and re libraries + tests v3.13.11 Update the zipfile, zipimport and re libraries + tests - v3.13.11 Jan 4, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
scripts/generate_sre_constants.py (1)

36-58: Address the past review comment regarding enum value handling.

A previous review flagged an inconsistency in how dump_enum() handles values compared to dump_bitflags(). The past review comment suggests either:

  1. Updating dump_enum() to preserve values when a dict is passed (like dump_bitflags() does), or
  2. Clarifying in the docstring that the function expects a list and will always assign sequential values starting from 0.

Since the current usage passes lists (OPCODES, ATCODES, CHCODES), option 2 (clarifying the docstring) may be sufficient.

🧹 Nitpick comments (2)
scripts/generate_sre_constants.py (2)

76-76: Use proper if statement for conditional side effects.

Using a ternary expression for side effects is not idiomatic Python.

🔎 Proposed fix
     content = ["bitflags! {\n"]
-    content.append(f"{derives}\n") if derives else None
+    if derives:
+        content.append(f"{derives}\n")
     content.append(f"    pub struct {struct_name}: {int_t} {{\n")

86-133: Verify error handling for missing keys and file operations.

The script uses exec() to read constants from a trusted source file, which is acceptable for a build-time code generator. However, there's no validation that required keys (MAGIC, OPCODES, ATCODES, CHCODES, SRE_FLAG_*, SRE_INFO_*) exist in the namespace after execution. If the structure of Lib/re/_constants.py changes, the script will fail with cryptic KeyError exceptions.

Consider adding validation or wrapping the key accesses in try-except blocks to provide clearer error messages.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b95199b and 1c44c43.

📒 Files selected for processing (1)
  • scripts/generate_sre_constants.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/generate_sre_constants.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
scripts/generate_sre_constants.py (2)

86-134: Consider documenting security assumptions for exec() usage.

The use of exec() on line 93 executes arbitrary Python code from the input file. While this is acceptable for a build script reading from a trusted source file in the same repository, consider adding a comment to document this security assumption for future maintainers.

Additionally, the function lacks explicit error handling for file operations. While build scripts commonly fail fast, adding a try-except block with descriptive error messages would improve the developer experience when issues occur.

🔎 Optional improvements
 def main(
     infile="Lib/re/_constants.py",
     outfile_constants="crates/sre_engine/src/constants.rs",
 ):
+    # Execute trusted source file to load constants
+    # WARNING: This executes arbitrary Python code from infile
+    # Only use with trusted source files from this repository
     ns = {}
-    with open(infile) as fp:
-        code = fp.read()
-    exec(code, ns)
+    try:
+        with open(infile) as fp:
+            code = fp.read()
+    except FileNotFoundError:
+        raise FileNotFoundError(f"Input file not found: {infile}")
+    except Exception as e:
+        raise Exception(f"Failed to read {infile}: {e}")
+    
+    exec(code, ns)

136-139: LGTM!

Standard entry point pattern with support for command-line arguments.

As per coding guidelines, consider running ruff to ensure PEP 8 compliance and catch any potential linting issues.

#!/bin/bash
# Description: Run ruff linter on the new Python script to ensure PEP 8 compliance

# Check if ruff is available and run it on the script
if command -v ruff &> /dev/null; then
    ruff check scripts/generate_sre_constants.py
else
    echo "ruff is not installed. Install it with: pip install ruff"
fi
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c44c43 and deb1063.

📒 Files selected for processing (1)
  • scripts/generate_sre_constants.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.py: In most cases, Python code should not be edited; bug fixes should be made through Rust code modifications only
Follow PEP 8 style for custom Python code
Use ruff for linting Python code

Files:

  • scripts/generate_sre_constants.py
🧠 Learnings (1)
📚 Learning: 2025-12-27T14:03:49.034Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.034Z
Learning: Applies to **/*test*.py : NEVER comment out or delete any test code lines except for removing `unittest.expectedFailure` decorators and upper TODO comments

Applied to files:

  • scripts/generate_sre_constants.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (4)
scripts/generate_sre_constants.py (4)

1-5: LGTM!

Clear script header with appropriate shebang and documentation.


7-16: LGTM!

The idempotent file update logic is well-implemented, only writing when content actually changes.


19-33: LGTM!

Appropriate copyright header for generated Rust code.


60-83: LGTM!

The bitflags generation logic correctly preserves original values from the dictionary.

@ShaharNaveh
Copy link
Collaborator

@terryluan12 can you please split this PR into multiple PRs? Unless there's a connection between all the 20 files. I'd rather review multiple smaller PRs rather than one big PR, this will help with merging what can already be merged and not blocked by other unrelated changes.

And tysm for your contributions<3

@youknowone
Copy link
Member

probably zipfile and zipimport is related. If splitting re is possible, that will be the best.

@terryluan12 terryluan12 force-pushed the update_zipfile_zipimport branch from deb1063 to 83ec1c3 Compare January 5, 2026 00:26
@terryluan12
Copy link
Contributor Author

probably zipfile and zipimport is related. If splitting re is possible, that will be the best.

Sounds good! Just done so on #6648

@terryluan12 terryluan12 changed the title Update the zipfile, zipimport and re libraries + tests - v3.13.11 Update the zipfile + zipimport libraries + tests - v3.13.11 Jan 5, 2026
@terryluan12
Copy link
Contributor Author

(Btw, double checking if skip is needed)

@terryluan12 terryluan12 force-pushed the update_zipfile_zipimport branch 2 times, most recently from 503a471 to 5c07d25 Compare January 5, 2026 17:59
@terryluan12 terryluan12 force-pushed the update_zipfile_zipimport branch from 5c07d25 to f94cb62 Compare January 5, 2026 18:03
with zipfile.ZipFile(TESTFN, "r") as zipfp:
self._test_read(zipfp, expected_names, self.file_content)

@unittest.skip('TODO: RUSTPYTHON')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These skips too. Actually, I'd like to entirely remove skip in this patch.
Let's see what's happening without skip.

Usually we avoid skip because it hides bugs and problems. If it still fails, I will append the reason what justify skip

@terryluan12 terryluan12 requested a review from youknowone January 6, 2026 05:02
@youknowone youknowone force-pushed the update_zipfile_zipimport branch from 55d87a0 to b68b623 Compare January 6, 2026 07:27


# TODO: RUSTPYTHON
@unittest.skip("TODO: RUSTPYTHON shift_jis encoding unsupported")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@terryluan12 Because this is a setUp issue, skipping the entire tests as previous version is better than marking every functions as skip. They are caused by the single reason

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much!

@youknowone youknowone merged commit cd613ed into RustPython:main Jan 6, 2026
23 of 24 checks passed
terryluan12 added a commit to terryluan12/RustPython that referenced this pull request Jan 15, 2026
…on#6639)

* Updated zipimport library + test

* Updated zipfile library + test

* Annotated failing/erroring tests in test_zipfile and test_zipimport

* Changed all skips in `test_core.py` to expectedFailures

* skip EncodedMetadataTests

---------

Co-authored-by: Jeong YunWon <jeong@youknowone.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants