Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 29, 2025

Summary by CodeRabbit

  • New Features

    • Parser now accepts both text and binary XML input.
    • Optional namespace-separator support for namespaced element/attribute names.
    • Interning behavior now respects a provided intern mapping for names.
  • Bug Fixes / Behavior Changes

    • Empty input is treated as a successful parse.
    • Parse and file-parse operations return consistent integer status.
    • Namespace-separator input is validated to a single character; explicit encoding argument is ignored.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 29, 2025

Warning

Rate limit exceeded

@youknowone has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 5 minutes and 48 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 6d84c60 and 5e7f7ef.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_pyexpat.py is excluded by !Lib/**
  • Lib/test/test_xml_dom_xmlbuilder.py is excluded by !Lib/**
  • Lib/test/test_xmlrpc.py is excluded by !Lib/**
  • Lib/xmlrpc/client.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/stdlib/src/pyexpat.rs
📝 Walkthrough

Walkthrough

Parse now accepts either string or bytes and converts to bytes before parsing. A new optional namespace_separator is stored and used by a make_name helper to produce names; intern is initialized from the provided argument. do_parse now returns Results, Parse/ParseFile surface integer status, and ParserCreate validates namespace_separator and forwards intern/namespace_separator to the parser constructor.

Changes

Cohort / File(s) Summary
pyexpat core changes
crates/stdlib/src/pyexpat.rs
Large functional update: parse accepts Either<PyStrRef, PyBytesRef> and returns PyResult<i32>; do_parse now returns Result<_, xml::reader::Error>; ParseFile returns integer status and guards empty input.
Parser configuration & constructors
crates/stdlib/src/pyexpat.rs
ParserCreateArgs fields changed from OptionalArg to Option/Option<PyObjectRef>; PyExpatLikeXmlParser::new signature updated to accept namespace_separator: Option<String> and intern: Option<PyObjectRef>; parser_create validates namespace_separator length.
Name handling & intern behavior
crates/stdlib/src/pyexpat.rs
Added namespace_separator: Option<String> field, make_name(&self, name: &xml::name::OwnedName) -> String, and logic to use namespace+separator for element/attribute names; intern is initialized from the provided argument and used for name internment.
Imports & API surface
crates/stdlib/src/pyexpat.rs
Added imports: Either and PyBytesRef; adjusted public signatures and exported types to match new behavior.
Error handling / compatibility
crates/stdlib/src/pyexpat.rs
Parsing flow changed to propagate xml reader errors from do_parse; Parse/ParseFile now map certain errors to the legacy integer success value (1) for compatibility; encoding parameter is ignored.

Sequence Diagram(s)

(omitted)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰
I hopped through names, a tiny code sprite,
Joined bytes and strings in soft moonlight.
A separator stitched namespaces tight,
Interns snug in a dictionary bed,
Parsers purr as XML dreams are read.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'fix pyexpat' is vague and generic, using non-descriptive language that doesn't convey what specific issue or functionality is being addressed in this substantial refactoring. Replace with a more specific title that describes the main change, such as 'Add namespace support and bytes handling to pyexpat parser' or 'Extend pyexpat parser with namespace separator and bytes data support'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review December 29, 2025 14:56
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/stdlib/src/pyexpat.rs (1)

362-368: ParserCreate arguments are currently unused.

The encoding, namespace_separator, and intern parameters defined in ParserCreateArgs are not being used in the parser_create function. This means XML encoding declarations and namespace handling are not currently supported. While this appears to be existing technical debt (indicated by the _args prefix), users may expect these parameters to have an effect based on CPython's pyexpat API.

Would you like me to open an issue to track implementing support for these ParserCreate parameters?

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8059960 and b7b81a4.

⛔ Files ignored due to path filters (1)
  • Lib/xmlrpc/client.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/stdlib/src/pyexpat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/pyexpat.rs
🧠 Learnings (1)
📚 Learning: 2025-12-27T14:03:49.011Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.011Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/stdlib/src/pyexpat.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
🔇 Additional comments (2)
crates/stdlib/src/pyexpat.rs (2)

52-52: LGTM: Import additions support the new functionality.

The addition of PyBytesRef and Either imports correctly enables the Parse method to accept both string and bytes inputs.

Also applies to: 54-54


354-360: Option with #[pyarg(any, optional)] is a valid pattern in RustPython.

The codebase uses both OptionalArg<T> and Option<T> with #[pyarg(any, optional)]. While OptionalArg is more commonly used, Option is also a supported and legitimate pattern, as evidenced by its use in multiple stdlib modules (codecs.rs, sys.rs, lzma.rs, openssl.rs) and elsewhere. The change is consistent with established patterns in the codebase.

Comment on lines 321 to 365
fn parse(
&self,
data: Either<PyStrRef, PyBytesRef>,
_isfinal: OptionalArg<bool>,
vm: &VirtualMachine,
) {
let bytes = match data {
Either::A(s) => s.as_bytes().to_vec(),
Either::B(b) => b.as_bytes().to_vec(),
};
let reader = Cursor::<Vec<u8>>::new(bytes);
let parser = self.create_config().create_reader(reader);
self.do_parse(vm, parser);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's examine the actual file and the Parse method
cd /tmp && find . -name "pyexpat.rs" -path "*/stdlib/src/*" 2>/dev/null | head -1

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

#!/bin/bash
# List files in the stdlib crates directory
find . -path "*/stdlib/src*" -name "*.rs" | head -20

Repository: RustPython/RustPython

Length of output: 665


🏁 Script executed:

#!/bin/bash
# Examine the pyexpat.rs file around the specified lines
cat -n crates/stdlib/src/pyexpat.rs | sed -n '310,350p'

Repository: RustPython/RustPython

Length of output: 1799


🏁 Script executed:

#!/bin/bash
# Search for Parse method definition and all its usages
rg -n "fn parse|\.parse\(" crates/stdlib/src/pyexpat.rs -A 2

Repository: RustPython/RustPython

Length of output: 468


🏁 Script executed:

#!/bin/bash
# Check if Parse is a public method and look for any breaking change documentation
rg -n "pub.*parse|parse.*Either|PyBytesRef" crates/stdlib/src/pyexpat.rs -B 2 -A 5

Repository: RustPython/RustPython

Length of output: 719


🏁 Script executed:

#!/bin/bash
# Check the PyBytesRef definition to verify as_bytes() method exists
rg -n "impl.*PyBytesRef|fn as_bytes" --type=rust -B 2 -A 5 | head -50

Repository: RustPython/RustPython

Length of output: 2637


🏁 Script executed:

#!/bin/bash
# Look for the definition of PyBytesRef and its methods
fd -e rs -x rg "pub.*as_bytes|PyBytesRef" {} + | grep -i "as_bytes" | head -20

Repository: RustPython/RustPython

Length of output: 15647


🏁 Script executed:

#!/bin/bash
# Check builtins module where PyBytesRef is likely defined
find . -path "*/builtins*" -name "*.rs" | xargs rg -l "PyBytesRef" | head -5

Repository: RustPython/RustPython

Length of output: 230


🏁 Script executed:

#!/bin/bash
# Look for PyBytesRef and its as_bytes method
rg "impl.*PyBytes|fn as_bytes" -A 3 | head -40

Repository: RustPython/RustPython

Length of output: 1980


🏁 Script executed:

#!/bin/bash
# Check if there are tests for the parse method
rg -n "test.*parse|Parse" crates/stdlib/src/pyexpat.rs

Repository: RustPython/RustPython

Length of output: 988


🏁 Script executed:

#!/bin/bash
# Check for any Python side behavior of Parse in tests
fd -e py -path "*/test*" -exec rg -l "Parse\|pyexpat" {} \; | head -5

Repository: RustPython/RustPython

Length of output: 233


🏁 Script executed:

#!/bin/bash
# Verify PyBytesRef specifically has as_bytes() method
rg -n "impl.*PyBytes|pub struct PyBytes" crates/vm/src/builtins/bytes.rs -A 10 | head -40

Repository: RustPython/RustPython

Length of output: 921


🏁 Script executed:

#!/bin/bash
# Check the PyBytesRef type definition to ensure it wraps PyBytesInner
rg -n "type PyBytesRef|pub.*PyBytesRef" crates/vm/src/builtins/bytes.rs -B 2 -A 2

Repository: RustPython/RustPython

Length of output: 154


🏁 Script executed:

#!/bin/bash
# Check PyBytes as_bytes() method implementation
rg -n "impl.*PyBytes|fn as_bytes" crates/vm/src/builtins/bytes.rs -A 3 | head -50

Repository: RustPython/RustPython

Length of output: 1518


🏁 Script executed:

#!/bin/bash
# Find the actual as_bytes() method definition in PyBytes
rg -n "pub.*fn as_bytes|fn as_bytes" crates/vm/src/builtins/bytes.rs -B 2 -A 3

Repository: RustPython/RustPython

Length of output: 178


Implementation is correct, but the signature change is not actually breaking.

Both PyStrRef and PyBytesRef have as_bytes() methods, and the conversion to Vec<u8> properly handles both string (UTF-8) and bytes inputs. However, the signature change from accepting PyStrRef to Either<PyStrRef, PyBytesRef> is backward compatible—existing code passing strings continues to work. This is a feature expansion, not a breaking change.

Note: The method has no error handling; do_parse() is called without checking its result.

🤖 Prompt for AI Agents
In crates/stdlib/src/pyexpat.rs around lines 321 to 334, the parse method
currently calls self.do_parse(vm, parser) and ignores its return value or any
errors; update the method to capture the result of do_parse, handle possible
errors (map them to appropriate Python exceptions via the VirtualMachine or
return a Result), and propagate or log failures instead of dropping them; ensure
the function signature and callers are adjusted as needed so parsing failures
produce a clear error (e.g., convert Err into a VM exception or return a
PyResult) and preserve existing behavior for successful parses.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/stdlib/src/pyexpat.rs (1)

114-152: Validate the intern parameter type.

The intern parameter is accepted as PyObjectRef without type validation. If a non-dict object is passed, it could cause runtime errors when dict operations are attempted later. Consider validating that intern is a dict type when Some is provided.

🔎 Proposed validation
 fn new(
     namespace_separator: Option<String>,
     intern: Option<PyObjectRef>,
     vm: &VirtualMachine,
 ) -> PyResult<PyExpatLikeXmlParserRef> {
-    let intern_dict = intern.unwrap_or_else(|| vm.ctx.new_dict().into());
+    let intern_dict = match intern {
+        Some(obj) => {
+            // Verify it's a dict
+            if !obj.is(&vm.ctx.types.dict_type) {
+                return Err(vm.new_type_error("intern must be a dictionary".to_owned()));
+            }
+            obj
+        }
+        None => vm.ctx.new_dict().into(),
+    };
     Ok(Self {
♻️ Duplicate comments (2)
crates/stdlib/src/pyexpat.rs (2)

342-362: Silently ignoring all parsing errors may hide legitimate issues.

While the comment indicates this is for compatibility with existing Python code, silently ignoring all errors from do_parse (line 360) means legitimate parsing errors are never surfaced to the caller. Consider at least logging errors or handling specific error types that should be propagated.

This concern was previously raised in past review comments.


365-377: Same error handling concern as in Parse.

Like the parse method, this silently ignores all parsing errors (line 375). The same concerns about hiding legitimate issues apply here.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b7b81a4 and 6d84c60.

⛔ Files ignored due to path filters (4)
  • Lib/test/test_pyexpat.py is excluded by !Lib/**
  • Lib/test/test_xml_dom_xmlbuilder.py is excluded by !Lib/**
  • Lib/test/test_xmlrpc.py is excluded by !Lib/**
  • Lib/xmlrpc/client.py is excluded by !Lib/**
📒 Files selected for processing (1)
  • crates/stdlib/src/pyexpat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/pyexpat.rs
🧠 Learnings (4)
📚 Learning: 2025-12-24T23:47:38.444Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 0
File: :0-0
Timestamp: 2025-12-24T23:47:38.444Z
Learning: CPython's xml.parsers.expat xmlparser exposes a namespace_prefixes boolean attribute (implemented in the pyexpat C module) that controls namespace prefix reporting via XML_SetReturnNSTriplet. This attribute is used internally by xml.dom.expatbuilder and xml.etree.ElementTree.

Applied to files:

  • crates/stdlib/src/pyexpat.rs
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/stdlib/src/pyexpat.rs
📚 Learning: 2025-12-27T14:03:49.011Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-12-27T14:03:49.011Z
Learning: Applies to **/*.rs : Use the macro system (`pyclass`, `pymodule`, `pyfunction`, etc.) when implementing Python functionality in Rust

Applied to files:

  • crates/stdlib/src/pyexpat.rs
📚 Learning: 2025-12-24T23:47:38.444Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 0
File: :0-0
Timestamp: 2025-12-24T23:47:38.444Z
Learning: CPython's xml.parsers.expat xmlparser does not expose an intern attribute. String interning for XML parsing in CPython is handled through SAX's expatreader feature_string_interning mechanism, not as a direct xmlparser.intern attribute.

Applied to files:

  • crates/stdlib/src/pyexpat.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run tests under miri
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check Rust code with clippy
🔇 Additional comments (5)
crates/stdlib/src/pyexpat.rs (5)

52-54: LGTM!

The new imports are necessary for accepting both string and bytes input in the parse method.


69-70: LGTM!

The namespace_separator field is properly added with the correct #[pytraverse(skip)] annotation for non-Python object data.


291-297: LGTM!

The make_name helper correctly constructs element names with namespace prefixes when both separator and namespace are present.


299-339: LGTM!

The error handling is now properly structured with Result return type, and make_name is consistently used for namespace-aware naming.


381-388: LGTM!

The change from OptionalArg<T> to Option<T> is appropriate here since these are parsed arguments with #[pyarg(any, optional)]. The intern type change to PyObjectRef is also correct since it accepts dict objects.

Comment on lines 391 to 417
fn parser_create(
_args: ParserCreateArgs,
args: ParserCreateArgs,
vm: &VirtualMachine,
) -> PyResult<PyExpatLikeXmlParserRef> {
PyExpatLikeXmlParser::new(vm)
// Validate namespace_separator: must be at most one character
let ns_sep = match args.namespace_separator {
Some(ref s) => {
let chars: Vec<char> = s.as_str().chars().collect();
if chars.len() > 1 {
return Err(vm.new_value_error(
"namespace_separator must be at most one character, omitted, or None"
.to_owned(),
));
}
Some(s.as_str().to_owned())
}
None => None,
};

// encoding parameter is currently not used (xml-rs handles encoding from XML declaration)
let _ = args.encoding;

PyExpatLikeXmlParser::new(ns_sep, args.intern, vm)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for encoding parameter usage in tests and documentation
rg -n "encoding" crates/stdlib/src/pyexpat.rs -B2 -A2

# Search for ParserCreate tests
rg -n "ParserCreate|parser_create" crates/stdlib/src/pyexpat.rs -B1 -A1

# Check if there are any test files related to pyexpat
fd -e py -e rs "test" | grep -i "expat\|parser"

Repository: RustPython/RustPython

Length of output: 736


🏁 Script executed:

# Find the PyExpatLikeXmlParser implementation
rg -n "impl.*PyExpatLikeXmlParser|fn new" crates/stdlib/src/pyexpat.rs -B2 -A5

# Search for test files related to expat or xml parsing
find . -type f -name "*.py" | xargs rg -l "pyexpat|xml.parsers.expat" 2>/dev/null | head -10

# Check the struct definition and new method
rg -n "struct PyExpatLikeXmlParser|pub fn new" crates/stdlib/src/pyexpat.rs -A10

Repository: RustPython/RustPython

Length of output: 878


🏁 Script executed:

# Search for pyexpat test files
find . -type f -name "*.py" -path "*/test*" | xargs rg -l "expat|ParserCreate" 2>/dev/null | head -5

# Also search for any examples or tests that mention encoding with parsers
rg -n "ParserCreate.*encoding|encoding.*parser" --type=py 2>/dev/null

# Look in the test_pyexpat or similar if it exists
find . -type f -name "*expat*" | grep -E "\.(py|rs)$"

Repository: RustPython/RustPython

Length of output: 95


🏁 Script executed:

# Search for tests within pyexpat.rs
rg -n "#\[test\]|#\[cfg.*test" crates/stdlib/src/pyexpat.rs -A10

# Search for any inline examples or documentation
rg -n "///" crates/stdlib/src/pyexpat.rs -B1 -A1

# Check the full file length to understand the scope
wc -l crates/stdlib/src/pyexpat.rs

# Search for any XML declaration or encoding-related processing
rg -n "xml-rs|encoding|XML|declaration" crates/stdlib/src/pyexpat.rs -C2

Repository: RustPython/RustPython

Length of output: 1422


🏁 Script executed:

# Search for cargo dependencies to understand xml-rs version
rg -n "xml-rs|xml" Cargo.toml 2>/dev/null | head -20

# Look for any Cargo.toml in stdlib crate
cat crates/stdlib/Cargo.toml | grep -A5 -B5 "xml"

# Search for any comments about Python API compatibility
rg -n "python|API|compat" crates/stdlib/src/pyexpat.rs -i | head -15

Repository: RustPython/RustPython

Length of output: 151


🏁 Script executed:

# Look at the stdlib Cargo.toml directly
head -100 crates/stdlib/Cargo.toml | grep -E "xml|dependencies" -A10

# Also check if there's information about xml-rs online or in comments
rg -n "xml-rs" crates/stdlib/src/pyexpat.rs

Repository: RustPython/RustPython

Length of output: 1174


🌐 Web query:

Python xml.parsers.expat ParserCreate encoding parameter behavior documentation

💡 Result:

Short answer

  • xml.parsers.expat.ParserCreate(encoding=None, namespace_separator=None) accepts an optional encoding string that, if given, overrides any encoding declared or detected in the XML document. [1]
  • The encoding string must name an encoding supported by Expat; Expat only has a small built‑in set (US‑ASCII, UTF‑8, UTF‑16, ISO‑8859‑1). Any other name will cause Expat to invoke its UnknownEncodingHandler. [1][2]

Sources

  • Python docs: xml.parsers.expat.ParserCreate description. [1]
  • Expat reference: XML_ParserCreate behavior and built‑in encodings. [2]

🏁 Script executed:

# Search for TODO, FIXME, or other markers about encoding limitations
rg -n "TODO|FIXME|HACK|NOTE|BUG|encoding" crates/stdlib/src/pyexpat.rs | head -20

# Check if there are any comments about limitations
rg -n "limitation|not supported|unsupported|not implemented|doesn't support" crates/stdlib/src/pyexpat.rs -i

Repository: RustPython/RustPython

Length of output: 242


Ignoring the encoding parameter breaks Python API compatibility.

The encoding parameter in ParserCreate is part of Python's documented xml.parsers.expat API and allows overriding the encoding detected in the XML document. RustPython silently ignores this parameter because xml-rs only auto-detects from the XML declaration. This breaks code that explicitly specifies encoding before parsing. The parameter should either be supported or raise an error if provided with a non-default value, rather than silently discarding it.

🤖 Prompt for AI Agents
In crates/stdlib/src/pyexpat.rs around lines 391 to 414, the function currently
ignores the ParserCreateArgs.encoding field which breaks Python API
compatibility; change the function to check args.encoding and if Some(value)
return an appropriate Python exception (e.g., vm.new_value_error) explaining
that specifying encoding is not supported by this backend instead of silently
discarding it, or alternatively implement support by passing the provided
encoding into the underlying parser initialization; update the code path so that
when encoding is present you either raise the error with a clear message or wire
the encoding into PyExpatLikeXmlParser::new (and adjust that constructor to
accept and honor the encoding).

@youknowone youknowone merged commit f91ffe3 into RustPython:main Dec 29, 2025
1 of 2 checks passed
@youknowone youknowone deleted the pyexpat branch December 29, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant