Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Jan 16, 2026

Summary by CodeRabbit

  • New Features

    • I/O wrapper objects (BytesIO, TextIOWrapper, StringIO, etc.) now support pickling and unpickling of their internal state.
    • Improved pickling support for objects using new-argument and getnewargs_ex/getnewargs patterns for protocol ≥2.
  • Bug Fixes

    • Clearer, more specific error messages and stricter validation when pickling/unpickling incompatible or malformed object state.

✏️ Tip: You can customize this high-level summary in your review settings.

@youknowone youknowone marked this pull request as ready for review January 16, 2026 15:04
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 16, 2026

📝 Walkthrough

Walkthrough

Refactors object pickling/reduction to add a new reduce_newobj path and helpers for extracting getnewargs/getstate/slot and item data, adds getstate/setstate to IO wrapper classes for pickling, interns getnewargs_ex in VM context, and removes a legacy Python reducelib module.

Changes

Cohort / File(s) Summary
Object pickling & reduction
crates/vm/src/builtins/object.rs
Adds get_new_arguments, is_getstate_overridden, object_getstate, get_items_iter, reduce_newobj; refactors protocol>=2 reduction to use reduce_newobj; tighter itemsize and basicsize validation; clearer pickle error messages.
IO wrapper pickling
crates/vm/src/stdlib/io.rs
Adds __getstate__ / __setstate__ implementations to IO wrapper classes (BytesIO, TextIOWrapper, StringIO and other wrappers), exports PyDict, and enforces state-shape validation and closed-state checks.
VM context constants
crates/vm/src/vm/context.rs
Adds __getnewargs_ex__ field to ConstName for interned magic method name.
Removed legacy module
crates/vm/Lib/python_builtins/__reducelib.py
Deletes the Python-side reducelib module that previously provided reduce_2/slotnames/_abstract_method_error helpers.
Thread cleanup minor refactor
crates/vm/src/stdlib/thread.rs
Replaced map_or(false, ...) with is_some_and(...) in a retain filter; no semantic change.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • fanninpm

Poem

🐰
I nibbled bytes and stored a state,
Gathered args to reconstruct fate,
IO trunks now pack and send,
Basicsize checked from end to end,
Hopping home with pickles great! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: adding support for getnewargs_ex to improve pickle functionality, which aligns with the primary modifications across object.rs, io.rs, and context.rs.
Docstring Coverage ✅ Passed Docstring coverage is 81.25% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8c5d1dd and a418c68.

⛔ Files ignored due to path filters (11)
  • Lib/test/test_copy.py is excluded by !Lib/**
  • Lib/test/test_csv.py is excluded by !Lib/**
  • Lib/test/test_descr.py is excluded by !Lib/**
  • Lib/test/test_enum.py is excluded by !Lib/**
  • Lib/test/test_lzma.py is excluded by !Lib/**
  • Lib/test/test_memoryio.py is excluded by !Lib/**
  • Lib/test/test_pickle.py is excluded by !Lib/**
  • Lib/test/test_pickletools.py is excluded by !Lib/**
  • Lib/test/test_typing.py is excluded by !Lib/**
  • Lib/test/test_unittest/test_runner.py is excluded by !Lib/**
  • Lib/test/test_zlib.py is excluded by !Lib/**
📒 Files selected for processing (5)
  • crates/vm/Lib/python_builtins/__reducelib.py
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/stdlib/io.rs
  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/vm/context.rs
💤 Files with no reviewable changes (1)
  • crates/vm/Lib/python_builtins/__reducelib.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/vm/src/vm/context.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style using cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/stdlib/thread.rs
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/stdlib/io.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-2025)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
🔇 Additional comments (12)
crates/vm/src/stdlib/io.rs (5)

161-162: LGTM!

The PyDict import is correctly added to support the __setstate__ implementations that need to restore the __dict__ attribute.


4081-4103: LGTM!

The __getstate__ implementation correctly captures the buffer content, position, and optional __dict__. The TODO comment appropriately documents that the actual newline setting will be stored once implemented.


4105-4144: LGTM! Previous feedback addressed.

The __setstate__ implementation now includes the closed-file guard at line 4108, which was the concern raised in the previous review. The implementation correctly validates the tuple length, restores buffer content and position, and handles the __dict__ restoration properly.


4294-4313: LGTM!

The __getstate__ implementation correctly captures bytes content, position, and optional __dict__. Using a 3-tuple (without newline) is appropriate for binary I/O.


4315-4351: LGTM!

The __setstate__ implementation correctly:

  • Checks closed state before modification
  • Uses try_resizable to ensure no active buffer exports exist (line 4332)
  • Restores content and position
  • Handles __dict__ restoration properly
crates/vm/src/stdlib/thread.rs (1)

518-523: LGTM! Cleaner idiomatic Rust.

Using is_some_and instead of map_or(false, ...) is the preferred pattern for this check. Functionally equivalent with improved readability.

crates/vm/src/builtins/object.rs (6)

564-624: LGTM - Proper implementation of new arguments extraction.

The function correctly handles both __getnewargs_ex__ and __getnewargs__ with appropriate type validation and error messages matching CPython's behavior.


626-644: LGTM!

The identity comparison correctly detects whether __getstate__ is overridden by comparing method objects from the MRO.


646-656: LGTM!

Correct delegation: uses the default implementation with the required flag when not overridden, otherwise invokes the user-defined __getstate__.


658-674: LGTM!

Correctly provides item iterators for list/dict subclasses as required by the pickle protocol for proper content restoration.


736-743: LGTM!

Clean routing: protocol >= 2 uses the new reduce_newobj path while older protocols fall back to the copyreg helper.


187-239: LGTM - pickle validation logic is correct.

The itemsize check (line 190) and basicsize calculation correctly implement pickle validation. The logic properly accounts for dict size (if HAS_DICT flag), weakref support detection (with separate handling for heap vs built-in types), and slot sizes before comparing against the actual class basicsize.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin pickle

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/vm/src/stdlib/io.rs`:
- Around line 4105-4140: The StringIO __setstate__ method is missing the
closed-file guard present in BytesIO.__setstate__; add a check at the top of fn
__setstate__(zelf: PyRef<Self>, ...) that tests zelf.closed.load() and returns
the same error used by BytesIO.__setstate__ when the object is closed (use the
same vm error construction so behavior and message match). Ensure this check
runs before mutating self.buffer or seeking so __setstate__ fails fast for
closed StringIO instances.
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e9a57d1 and 8c5d1dd.

⛔ Files ignored due to path filters (7)
  • Lib/test/test_copy.py is excluded by !Lib/**
  • Lib/test/test_csv.py is excluded by !Lib/**
  • Lib/test/test_descr.py is excluded by !Lib/**
  • Lib/test/test_enum.py is excluded by !Lib/**
  • Lib/test/test_lzma.py is excluded by !Lib/**
  • Lib/test/test_memoryio.py is excluded by !Lib/**
  • Lib/test/test_pickle.py is excluded by !Lib/**
📒 Files selected for processing (3)
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/stdlib/io.rs
  • crates/vm/src/vm/context.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style using cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/vm/src/vm/context.rs
  • crates/vm/src/builtins/object.rs
  • crates/vm/src/stdlib/io.rs
🧬 Code graph analysis (1)
crates/vm/src/builtins/object.rs (2)
crates/vm/src/builtins/tuple.rs (3)
  • class (41-43)
  • class (527-529)
  • __getnewargs__ (352-362)
crates/vm/src/builtins/int.rs (3)
  • __getnewargs__ (560-562)
  • a (623-623)
  • a (659-659)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (windows-2025)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Run rust tests (ubuntu-latest)
🔇 Additional comments (12)
crates/vm/src/vm/context.rs (1)

138-138: LGTM!

The __getnewargs_ex__ constant is correctly added in alphabetical order after __getnewargs__, following the existing pattern for magic method names.

crates/vm/src/stdlib/io.rs (4)

161-162: LGTM!

The added imports for PyDict and PyTupleRef are necessary for the new __getstate__ and __setstate__ methods.


4081-4103: LGTM!

The __getstate__ implementation correctly captures StringIO state (content, newline, position, dict). The TODO at line 4096 acknowledges that actual newline handling needs future implementation.


4290-4309: LGTM!

The BytesIO.__getstate__ correctly returns a 3-tuple (content, position, dict) which matches CPython's expected state format for BytesIO.


4311-4347: LGTM!

The BytesIO.__setstate__ implementation properly validates the closed state, checks for exports via try_resizable, and correctly restores content, position, and dict.

crates/vm/src/builtins/object.rs (7)

626-644: LGTM!

The is_getstate_overridden function correctly detects if a type has overridden __getstate__ by comparing the method reference against object.__getstate__.


646-656: LGTM!

The object_getstate function correctly delegates to either the overridden __getstate__ method or the default implementation based on whether the method is overridden.


658-674: LGTM!

The get_items_iter function correctly returns iterators for list/dict items when applicable, using fast_isinstance for efficient type checks.


676-734: LGTM!

The reduce_newobj function correctly implements protocol >= 2 reduction:

  • Properly checks for tp_new availability
  • Correctly chooses between __newobj__ and __newobj_ex__ based on kwargs presence
  • Computes the required flag correctly for state retrieval
  • Returns the expected 5-tuple format

736-743: LGTM!

The common_reduce function cleanly delegates to reduce_newobj for protocol >= 2, maintaining the fallback to copyreg._reduce_ex for older protocols.


564-624: Verify identifier! macro resolution for __getnewargs_ex__.

The function uses identifier!(vm, __getnewargs_ex__) at line 571. Ensure this macro properly resolves to the newly added constant in context.rs. Unable to verify in current sandbox environment—manual verification or compilation check via cargo clippy required.


187-239: No changes needed. The weakref detection logic in object_getstate_default is correct and consistent with the codebase patterns.

Both detection mechanisms are intentional and accurate:

  1. Heap types without explicit __slots__ automatically support weakref—this matches CPython behavior and is confirmed by comments in type.rs (lines 1294-1295).
  2. For builtin types, checking attributes.contains_key() for __weakref__ is the correct method in this codebase to detect weakref support, following the same pattern used for __dict__ descriptor detection (type.rs, lines 1441-1446).

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@youknowone
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 16, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@youknowone youknowone merged commit 02cc257 into RustPython:main Jan 17, 2026
23 of 24 checks passed
@youknowone youknowone deleted the pickle branch January 17, 2026 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant