fix pyexpat #6582

youknowone · 2025-12-29T14:16:35Z

Summary by CodeRabbit

New Features
- Parser now accepts both text and binary XML input.
- Optional namespace-separator support for namespaced element/attribute names.
- Interning behavior now respects a provided intern mapping for names.
Bug Fixes / Behavior Changes
- Empty input is treated as a successful parse.
- Parse and file-parse operations return consistent integer status.
- Namespace-separator input is validated to a single character; explicit encoding argument is ignored.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-29T14:16:44Z

Warning

Rate limit exceeded

@youknowone has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6d84c60 and 5e7f7ef.

⛔ Files ignored due to path filters (4)

Lib/test/test_pyexpat.py is excluded by !Lib/**
Lib/test/test_xml_dom_xmlbuilder.py is excluded by !Lib/**
Lib/test/test_xmlrpc.py is excluded by !Lib/**
Lib/xmlrpc/client.py is excluded by !Lib/**

📒 Files selected for processing (1)

crates/stdlib/src/pyexpat.rs

📝 Walkthrough

Walkthrough

Parse now accepts either string or bytes and converts to bytes before parsing. A new optional namespace_separator is stored and used by a make_name helper to produce names; intern is initialized from the provided argument. do_parse now returns Results, Parse/ParseFile surface integer status, and ParserCreate validates namespace_separator and forwards intern/namespace_separator to the parser constructor.

Changes

Cohort / File(s)	Summary
pyexpat core changes `crates/stdlib/src/pyexpat.rs`	Large functional update: `parse` accepts `Either<PyStrRef, PyBytesRef>` and returns `PyResult<i32>`; `do_parse` now returns `Result<_, xml::reader::Error>`; `ParseFile` returns integer status and guards empty input.
Parser configuration & constructors `crates/stdlib/src/pyexpat.rs`	`ParserCreateArgs` fields changed from `OptionalArg` to `Option`/`Option<PyObjectRef>`; `PyExpatLikeXmlParser::new` signature updated to accept `namespace_separator: Option<String>` and `intern: Option<PyObjectRef>`; `parser_create` validates `namespace_separator` length.
Name handling & intern behavior `crates/stdlib/src/pyexpat.rs`	Added `namespace_separator: Option<String>` field, `make_name(&self, name: &xml::name::OwnedName) -> String`, and logic to use namespace+separator for element/attribute names; `intern` is initialized from the provided argument and used for name internment.
Imports & API surface `crates/stdlib/src/pyexpat.rs`	Added imports: `Either` and `PyBytesRef`; adjusted public signatures and exported types to match new behavior.
Error handling / compatibility `crates/stdlib/src/pyexpat.rs`	Parsing flow changed to propagate xml reader errors from `do_parse`; `Parse`/`ParseFile` now map certain errors to the legacy integer success value (1) for compatibility; encoding parameter is ignored.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Add missing xmlparser attributes: namespace_prefixes, ordered_attributes, specified_attributes, intern #6494 — Adds an intern attribute and related plumbing to PyExpatLikeXmlParser; strongly related to the intern/name changes in this PR.

Poem

🐰
I hopped through names, a tiny code sprite,
Joined bytes and strings in soft moonlight.
A separator stitched namespaces tight,
Interns snug in a dictionary bed,
Parsers purr as XML dreams are read.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Title check	❓ Inconclusive	The title 'fix pyexpat' is vague and generic, using non-descriptive language that doesn't convey what specific issue or functionality is being addressed in this substantial refactoring.	Replace with a more specific title that describes the main change, such as 'Add namespace support and bytes handling to pyexpat parser' or 'Extend pyexpat parser with namespace separator and bytes data support'.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

crates/stdlib/src/pyexpat.rs (1)

362-368: ParserCreate arguments are currently unused.

The encoding, namespace_separator, and intern parameters defined in ParserCreateArgs are not being used in the parser_create function. This means XML encoding declarations and namespace handling are not currently supported. While this appears to be existing technical debt (indicated by the _args prefix), users may expect these parameters to have an effect based on CPython's pyexpat API.

Would you like me to open an issue to track implementing support for these ParserCreate parameters?

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8059960 and b7b81a4.

⛔ Files ignored due to path filters (1)

Lib/xmlrpc/client.py is excluded by !Lib/**

📒 Files selected for processing (1)

crates/stdlib/src/pyexpat.rs

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

crates/stdlib/src/pyexpat.rs

🧠 Learnings (1)

📚 Learning: 2025-12-27T14:03:49.011Z

Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.011Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

crates/stdlib/src/pyexpat.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Run rust tests (windows-latest)
GitHub Check: Run snippets and cpython tests (windows-latest)
GitHub Check: Run snippets and cpython tests (macos-latest)

🔇 Additional comments (2)

crates/stdlib/src/pyexpat.rs (2)

52-52: LGTM: Import additions support the new functionality.

The addition of PyBytesRef and Either imports correctly enables the Parse method to accept both string and bytes inputs.

Also applies to: 54-54

354-360: Option with #[pyarg(any, optional)] is a valid pattern in RustPython.

The codebase uses both OptionalArg<T> and Option<T> with #[pyarg(any, optional)]. While OptionalArg is more commonly used, Option is also a supported and legitimate pattern, as evidenced by its use in multiple stdlib modules (codecs.rs, sys.rs, lzma.rs, openssl.rs) and elsewhere. The change is consistent with established patterns in the codebase.

coderabbitai · 2025-12-29T15:00:55Z

crates/stdlib/src/pyexpat.rs

+        fn parse(
+            &self,
+            data: Either<PyStrRef, PyBytesRef>,
+            _isfinal: OptionalArg<bool>,
+            vm: &VirtualMachine,
+        ) {
+            let bytes = match data {
+                Either::A(s) => s.as_bytes().to_vec(),
+                Either::B(b) => b.as_bytes().to_vec(),
+            };
+            let reader = Cursor::<Vec<u8>>::new(bytes);
            let parser = self.create_config().create_reader(reader);
            self.do_parse(vm, parser);
        }


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # First, let's examine the actual file and the Parse method cd /tmp && find . -name "pyexpat.rs" -path "*/stdlib/src/*" 2>/dev/null | head -1

Repository: RustPython/RustPython

Length of output: 47

🏁 Script executed:

#!/bin/bash # List files in the stdlib crates directory find . -path "*/stdlib/src*" -name "*.rs" | head -20

Repository: RustPython/RustPython

Length of output: 665

🏁 Script executed:

#!/bin/bash # Examine the pyexpat.rs file around the specified lines cat -n crates/stdlib/src/pyexpat.rs | sed -n '310,350p'

Repository: RustPython/RustPython

Length of output: 1799

🏁 Script executed:

#!/bin/bash # Search for Parse method definition and all its usages rg -n "fn parse|\.parse\(" crates/stdlib/src/pyexpat.rs -A 2

Repository: RustPython/RustPython

Length of output: 468

🏁 Script executed:

#!/bin/bash # Check if Parse is a public method and look for any breaking change documentation rg -n "pub.*parse|parse.*Either|PyBytesRef" crates/stdlib/src/pyexpat.rs -B 2 -A 5

Repository: RustPython/RustPython

Length of output: 719

🏁 Script executed:

#!/bin/bash # Check the PyBytesRef definition to verify as_bytes() method exists rg -n "impl.*PyBytesRef|fn as_bytes" --type=rust -B 2 -A 5 | head -50

Repository: RustPython/RustPython

Length of output: 2637

🏁 Script executed:

#!/bin/bash # Look for the definition of PyBytesRef and its methods fd -e rs -x rg "pub.*as_bytes|PyBytesRef" {} + | grep -i "as_bytes" | head -20

Repository: RustPython/RustPython

Length of output: 15647

🏁 Script executed:

#!/bin/bash # Check builtins module where PyBytesRef is likely defined find . -path "*/builtins*" -name "*.rs" | xargs rg -l "PyBytesRef" | head -5

Repository: RustPython/RustPython

Length of output: 230

🏁 Script executed:

#!/bin/bash # Look for PyBytesRef and its as_bytes method rg "impl.*PyBytes|fn as_bytes" -A 3 | head -40

Repository: RustPython/RustPython

Length of output: 1980

🏁 Script executed:

#!/bin/bash # Check if there are tests for the parse method rg -n "test.*parse|Parse" crates/stdlib/src/pyexpat.rs

Repository: RustPython/RustPython

Length of output: 988

🏁 Script executed:

#!/bin/bash # Check for any Python side behavior of Parse in tests fd -e py -path "*/test*" -exec rg -l "Parse\|pyexpat" {} \; | head -5

Repository: RustPython/RustPython

Length of output: 233

🏁 Script executed:

#!/bin/bash # Verify PyBytesRef specifically has as_bytes() method rg -n "impl.*PyBytes|pub struct PyBytes" crates/vm/src/builtins/bytes.rs -A 10 | head -40

Repository: RustPython/RustPython

Length of output: 921

🏁 Script executed:

#!/bin/bash # Check the PyBytesRef type definition to ensure it wraps PyBytesInner rg -n "type PyBytesRef|pub.*PyBytesRef" crates/vm/src/builtins/bytes.rs -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 154

🏁 Script executed:

#!/bin/bash # Check PyBytes as_bytes() method implementation rg -n "impl.*PyBytes|fn as_bytes" crates/vm/src/builtins/bytes.rs -A 3 | head -50

Repository: RustPython/RustPython

Length of output: 1518

🏁 Script executed:

#!/bin/bash # Find the actual as_bytes() method definition in PyBytes rg -n "pub.*fn as_bytes|fn as_bytes" crates/vm/src/builtins/bytes.rs -B 2 -A 3

Repository: RustPython/RustPython

Length of output: 178

Implementation is correct, but the signature change is not actually breaking.

Both PyStrRef and PyBytesRef have as_bytes() methods, and the conversion to Vec<u8> properly handles both string (UTF-8) and bytes inputs. However, the signature change from accepting PyStrRef to Either<PyStrRef, PyBytesRef> is backward compatible—existing code passing strings continues to work. This is a feature expansion, not a breaking change.

Note: The method has no error handling; do_parse() is called without checking its result.

🤖 Prompt for AI Agents

In crates/stdlib/src/pyexpat.rs around lines 321 to 334, the parse method currently calls self.do_parse(vm, parser) and ignores its return value or any errors; update the method to capture the result of do_parse, handle possible errors (map them to appropriate Python exceptions via the VirtualMachine or return a Result), and propagate or log failures instead of dropping them; ensure the function signature and callers are adjusted as needed so parsing failures produce a clear error (e.g., convert Err into a VM exception or return a PyResult) and preserve existing behavior for successful parses.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/stdlib/src/pyexpat.rs (1)

114-152: Validate the intern parameter type.

The intern parameter is accepted as PyObjectRef without type validation. If a non-dict object is passed, it could cause runtime errors when dict operations are attempted later. Consider validating that intern is a dict type when Some is provided.

🔎 Proposed validation

 fn new(
     namespace_separator: Option<String>,
     intern: Option<PyObjectRef>,
     vm: &VirtualMachine,
 ) -> PyResult<PyExpatLikeXmlParserRef> {
-    let intern_dict = intern.unwrap_or_else(|| vm.ctx.new_dict().into());
+    let intern_dict = match intern {
+        Some(obj) => {
+            // Verify it's a dict
+            if !obj.is(&vm.ctx.types.dict_type) {
+                return Err(vm.new_type_error("intern must be a dictionary".to_owned()));
+            }
+            obj
+        }
+        None => vm.ctx.new_dict().into(),
+    };
     Ok(Self {

♻️ Duplicate comments (2)

crates/stdlib/src/pyexpat.rs (2)

342-362: Silently ignoring all parsing errors may hide legitimate issues.

While the comment indicates this is for compatibility with existing Python code, silently ignoring all errors from do_parse (line 360) means legitimate parsing errors are never surfaced to the caller. Consider at least logging errors or handling specific error types that should be propagated.

This concern was previously raised in past review comments.

365-377: Same error handling concern as in Parse.

Like the parse method, this silently ignores all parsing errors (line 375). The same concerns about hiding legitimate issues apply here.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7b81a4 and 6d84c60.

⛔ Files ignored due to path filters (4)

Lib/test/test_pyexpat.py is excluded by !Lib/**
Lib/test/test_xml_dom_xmlbuilder.py is excluded by !Lib/**
Lib/test/test_xmlrpc.py is excluded by !Lib/**
Lib/xmlrpc/client.py is excluded by !Lib/**

📒 Files selected for processing (1)

crates/stdlib/src/pyexpat.rs

🧰 Additional context used

📓 Path-based instructions (1)

**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

crates/stdlib/src/pyexpat.rs

🧠 Learnings (4)

📚 Learning: 2025-12-24T23:47:38.444Z

Learnt from: youknowone
Repo: RustPython/RustPython PR: 0
File: :0-0
Timestamp: 2025-12-24T23:47:38.444Z
Learning: CPython's xml.parsers.expat xmlparser exposes a namespace_prefixes boolean attribute (implemented in the pyexpat C module) that controls namespace prefix reporting via XML_SetReturnNSTriplet. This attribute is used internally by xml.dom.expatbuilder and xml.etree.ElementTree.

Applied to files:

crates/stdlib/src/pyexpat.rs

📚 Learning: 2025-11-10T06:27:41.954Z

Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

crates/stdlib/src/pyexpat.rs

📚 Learning: 2025-12-27T14:03:49.011Z

Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.011Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

crates/stdlib/src/pyexpat.rs

📚 Learning: 2025-12-24T23:47:38.444Z

Learnt from: youknowone
Repo: RustPython/RustPython PR: 0
File: :0-0
Timestamp: 2025-12-24T23:47:38.444Z
Learning: CPython's xml.parsers.expat xmlparser does not expose an intern attribute. String interning for XML parsing in CPython is handled through SAX's expatreader feature_string_interning mechanism, not as a direct xmlparser.intern attribute.

Applied to files:

crates/stdlib/src/pyexpat.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)

GitHub Check: Run snippets and cpython tests (windows-latest)
GitHub Check: Run rust tests (windows-latest)
GitHub Check: Ensure compilation on various targets
GitHub Check: Check the WASM package and demo
GitHub Check: Run snippets and cpython tests (ubuntu-latest)
GitHub Check: Run rust tests (ubuntu-latest)
GitHub Check: Run snippets and cpython tests (macos-latest)
GitHub Check: Run snippets and cpython tests on wasm-wasi
GitHub Check: Run tests under miri
GitHub Check: Run rust tests (macos-latest)
GitHub Check: Check Rust code with clippy

🔇 Additional comments (5)

crates/stdlib/src/pyexpat.rs (5)

52-54: LGTM!

The new imports are necessary for accepting both string and bytes input in the parse method.

69-70: LGTM!

The namespace_separator field is properly added with the correct #[pytraverse(skip)] annotation for non-Python object data.

291-297: LGTM!

The make_name helper correctly constructs element names with namespace prefixes when both separator and namespace are present.

299-339: LGTM!

The error handling is now properly structured with Result return type, and make_name is consistently used for namespace-aware naming.

381-388: LGTM!

The change from OptionalArg<T> to Option<T> is appropriate here since these are parsed arguments with #[pyarg(any, optional)]. The intern type change to PyObjectRef is also correct since it accepts dict objects.

coderabbitai · 2025-12-29T15:27:08Z

crates/stdlib/src/pyexpat.rs

    fn parser_create(
-        _args: ParserCreateArgs,
+        args: ParserCreateArgs,
        vm: &VirtualMachine,
    ) -> PyResult<PyExpatLikeXmlParserRef> {
-        PyExpatLikeXmlParser::new(vm)
+        // Validate namespace_separator: must be at most one character
+        let ns_sep = match args.namespace_separator {
+            Some(ref s) => {
+                let chars: Vec<char> = s.as_str().chars().collect();
+                if chars.len() > 1 {
+                    return Err(vm.new_value_error(
+                        "namespace_separator must be at most one character, omitted, or None"
+                            .to_owned(),
+                    ));
+                }
+                Some(s.as_str().to_owned())
+            }
+            None => None,
+        };
+
+        // encoding parameter is currently not used (xml-rs handles encoding from XML declaration)
+        let _ = args.encoding;
+
+        PyExpatLikeXmlParser::new(ns_sep, args.intern, vm)
    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for encoding parameter usage in tests and documentation rg -n "encoding" crates/stdlib/src/pyexpat.rs -B2 -A2 # Search for ParserCreate tests rg -n "ParserCreate|parser_create" crates/stdlib/src/pyexpat.rs -B1 -A1 # Check if there are any test files related to pyexpat fd -e py -e rs "test" | grep -i "expat\|parser"

Repository: RustPython/RustPython

Length of output: 736

🏁 Script executed:

# Find the PyExpatLikeXmlParser implementation rg -n "impl.*PyExpatLikeXmlParser|fn new" crates/stdlib/src/pyexpat.rs -B2 -A5 # Search for test files related to expat or xml parsing find . -type f -name "*.py" | xargs rg -l "pyexpat|xml.parsers.expat" 2>/dev/null | head -10 # Check the struct definition and new method rg -n "struct PyExpatLikeXmlParser|pub fn new" crates/stdlib/src/pyexpat.rs -A10

Repository: RustPython/RustPython

Length of output: 878

🏁 Script executed:

# Search for pyexpat test files find . -type f -name "*.py" -path "*/test*" | xargs rg -l "expat|ParserCreate" 2>/dev/null | head -5 # Also search for any examples or tests that mention encoding with parsers rg -n "ParserCreate.*encoding|encoding.*parser" --type=py 2>/dev/null # Look in the test_pyexpat or similar if it exists find . -type f -name "*expat*" | grep -E "\.(py|rs)$"

Repository: RustPython/RustPython

Length of output: 95

🏁 Script executed:

# Search for tests within pyexpat.rs rg -n "#\[test\]|#\[cfg.*test" crates/stdlib/src/pyexpat.rs -A10 # Search for any inline examples or documentation rg -n "///" crates/stdlib/src/pyexpat.rs -B1 -A1 # Check the full file length to understand the scope wc -l crates/stdlib/src/pyexpat.rs # Search for any XML declaration or encoding-related processing rg -n "xml-rs|encoding|XML|declaration" crates/stdlib/src/pyexpat.rs -C2

Repository: RustPython/RustPython

Length of output: 1422

🏁 Script executed:

# Search for cargo dependencies to understand xml-rs version rg -n "xml-rs|xml" Cargo.toml 2>/dev/null | head -20 # Look for any Cargo.toml in stdlib crate cat crates/stdlib/Cargo.toml | grep -A5 -B5 "xml" # Search for any comments about Python API compatibility rg -n "python|API|compat" crates/stdlib/src/pyexpat.rs -i | head -15

Repository: RustPython/RustPython

Length of output: 151

🏁 Script executed:

# Look at the stdlib Cargo.toml directly head -100 crates/stdlib/Cargo.toml | grep -E "xml|dependencies" -A10 # Also check if there's information about xml-rs online or in comments rg -n "xml-rs" crates/stdlib/src/pyexpat.rs

Repository: RustPython/RustPython

Length of output: 1174

🌐 Web query:

Python xml.parsers.expat ParserCreate encoding parameter behavior documentation

💡 Result:

Short answer

xml.parsers.expat.ParserCreate(encoding=None, namespace_separator=None) accepts an optional encoding string that, if given, overrides any encoding declared or detected in the XML document. [1]

The encoding string must name an encoding supported by Expat; Expat only has a small built‑in set (US‑ASCII, UTF‑8, UTF‑16, ISO‑8859‑1). Any other name will cause Expat to invoke its UnknownEncodingHandler. [1][2]

Sources

Python docs: xml.parsers.expat.ParserCreate description. [1]

Expat reference: XML_ParserCreate behavior and built‑in encodings. [2]

🏁 Script executed:

# Search for TODO, FIXME, or other markers about encoding limitations rg -n "TODO|FIXME|HACK|NOTE|BUG|encoding" crates/stdlib/src/pyexpat.rs | head -20 # Check if there are any comments about limitations rg -n "limitation|not supported|unsupported|not implemented|doesn't support" crates/stdlib/src/pyexpat.rs -i

Repository: RustPython/RustPython

Length of output: 242

Ignoring the encoding parameter breaks Python API compatibility.

The encoding parameter in ParserCreate is part of Python's documented xml.parsers.expat API and allows overriding the encoding detected in the XML document. RustPython silently ignores this parameter because xml-rs only auto-detects from the XML declaration. This breaks code that explicitly specifies encoding before parsing. The parameter should either be supported or raise an error if provided with a non-default value, rather than silently discarding it.

🤖 Prompt for AI Agents

In crates/stdlib/src/pyexpat.rs around lines 391 to 414, the function currently ignores the ParserCreateArgs.encoding field which breaks Python API compatibility; change the function to check args.encoding and if Some(value) return an appropriate Python exception (e.g., vm.new_value_error) explaining that specifying encoding is not supported by this backend instead of silently discarding it, or alternatively implement support by passing the provided encoding into the underlying parser initialization; update the code path so that when encoding is present you either raise the error with a clear message or wire the encoding into PyExpatLikeXmlParser::new (and adjust that constructor to accept and honor the encoding).

youknowone marked this pull request as ready for review December 29, 2025 14:56

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

youknowone force-pushed the pyexpat branch from b7b81a4 to 6d84c60 Compare December 29, 2025 15:20

coderabbitai bot reviewed Dec 29, 2025

View reviewed changes

Fix pyexpat parse

5e7f7ef

youknowone force-pushed the pyexpat branch from 6d84c60 to 5e7f7ef Compare December 29, 2025 15:36

youknowone merged commit f91ffe3 into RustPython:main Dec 29, 2025
1 of 2 checks passed

youknowone deleted the pyexpat branch December 29, 2025 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix pyexpat #6582

fix pyexpat #6582

Uh oh!

youknowone commented Dec 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 29, 2025 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 29, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix pyexpat #6582

fix pyexpat #6582

Uh oh!

Conversation

youknowone commented Dec 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

youknowone commented Dec 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 29, 2025 •

edited

Loading