Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 26, 2025

Summary by CodeRabbit

  • Bug Fixes
    • Enforced UTF‑8 validation for type names and documentation to prevent invalid/surrogate text during type creation and renaming.
    • Strengthened slots validation so slot items must be strings and must be valid identifiers; improved handling of "dict" in slots.
    • Made string identifier validation accessible for use in identifier checks.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 26, 2025

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Exposes PyStr::ensure_valid_utf8 at crate scope and adds UTF‑8 and type validations during type creation and name assignment: validates type names, __doc__, and __slots__ items and rejects bytes in __slots__.

Changes

Cohort / File(s) Summary
String validation utility
crates/vm/src/builtins/str.rs
PyStr::ensure_valid_utf8 visibility changed from private to pub(crate) to allow reuse within the crate.
Type construction & attribute validation
crates/vm/src/builtins/type.rs
Calls to ensure_valid_utf8 added for type names, __doc__ strings, and in PyType.__set_name__/__name__ setter; __slots__ handling extended to reject bytes items and ensure slot identifiers; preserves existing control flow aside from added validations and error messages.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through bytes and names so sly,
I checked each slot and gave a cry,
"No invalid utf, no sneaky byte!"
Now types stand tidy, neat, and right,
A rabbit's nod — safe strings in sight.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'check surrogates' is vague and does not clearly convey the main changes in the pull request, which involve enforcing UTF-8 validity and validating type names and slots. Consider using a more descriptive title such as 'Enforce UTF-8 validity for type names and validate slots' to better reflect the primary changes being made.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4457769 and 4184c27.

📒 Files selected for processing (2)
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/builtins/type.rs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review December 27, 2025 02:06
@youknowone youknowone marked this pull request as draft December 27, 2025 02:06
@youknowone youknowone marked this pull request as ready for review December 27, 2025 12:37
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/vm/src/builtins/type.rs (1)

1359-1379: Validate __slots__ string items for UTF-8 surrogates to match validation of type name and __doc__.

The PR adds UTF-8 surrogate validation for type names (line 1166) and __doc__ (line 1315), but __slots__ string items lack this validation. Currently, slot names containing surrogates (e.g., '\ud800') would pass the bytes check and try_into_typed without validation, then propagate through mangle_name and intern_str. Calling ensure_valid_utf8 on each slot string after conversion would maintain consistency with the validation applied elsewhere.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27ab62d and 4457769.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_builtin.py is excluded by !Lib/**
📒 Files selected for processing (2)
  • crates/vm/src/builtins/str.rs
  • crates/vm/src/builtins/type.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/builtins/str.rs
  • crates/vm/src/builtins/type.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
🔇 Additional comments (6)
crates/vm/src/builtins/str.rs (1)

444-461: LGTM - Visibility change enables proper UTF-8 validation across the crate.

The pub(crate) visibility is the appropriate scope for this helper, allowing type construction paths to validate strings without exposing it publicly.

crates/vm/src/builtins/type.rs (5)

1154-1177: LGTM - Proper UTF-8 validation for type name setter.

The validation is correctly placed after the null character check, ensuring type names don't contain surrogates. This aligns with CPython's behavior.


1252-1259: LGTM - UTF-8 validation during type construction.

Correctly validates the type name immediately after the null character check, preventing type creation with surrogate-containing names.


1311-1316: LGTM - Validates __doc__ strings for surrogates during type creation.

The check correctly handles the optional nature of __doc__ - only validating when it exists and is a string.


1351-1358: LGTM - Rejects bytes as __slots__ value.

This correctly rejects __slots__ = b"foo" with an appropriate error message, matching CPython's behavior.


1366-1373: LGTM - Rejects bytes items within __slots__ iterable.

Correctly handles cases like __slots__ = ['a', b'b'] by validating each item during iteration.

@github-actions
Copy link
Contributor

Code has been automatically formatted

The code in this PR has been formatted using cargo fmt --all.
Please pull the latest changes before pushing again:

git pull origin surrogate-check

@youknowone youknowone marked this pull request as draft December 27, 2025 14:15
@youknowone youknowone marked this pull request as ready for review December 27, 2025 15:06
@youknowone youknowone merged commit 9b2ad34 into RustPython:main Dec 27, 2025
11 of 13 checks passed
@youknowone youknowone deleted the surrogate-check branch December 27, 2025 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant