Skip to content

Conversation

@moreal
Copy link
Contributor

@moreal moreal commented Jan 11, 2026

First, to minimize movement between Python and Rust, I made it parse with Rust whenever possible. This improved performance from 8.77s on the main branch (by my local benchmark) to 7.37s. Although still slower compared to CPython's 0.2s, it's about a 18% improvement. The execution method was as follows, and I ran it about 3 times, calculating based on the highest value.

cargo build --release
time ./target/release/rustpython json_loads.py

Additionally, since JSON often has duplicate keys, I also implemented CPython's method of reusing them. However, when measured separately, this didn't show any execution time benefits—it's only expected to help with memory efficiency.


While profiling by taking multiple flamegraph spans, I discovered that scanstring was taking the longest time. The reason was that operations like chars().nth() are O(N) and severely slow. I avoided this performance bottleneck by also operating with byte indices alongside character indices.

Compared to CPython taking 0.2 seconds, it now takes around 0.4 to 0.5 seconds—a huge improvement! This is over 16 times faster!

For the original motivation, pip install, what used to take 5 seconds for the black package now takes 1.61 seconds. Even when built for debug targets, what used to take 171 seconds now finishes in under 20 seconds.

Summary by CodeRabbit

  • Bug Fixes

    • Stronger detection and clearer errors for malformed JSON, invalid escapes, out-of-range positions, and trailing-comma cases.
  • Improvements

    • More accurate, character-aware error messages for strings and Unicode escapes (including surrogate handling).
    • Reduced allocations for object keys and overall parsing robustness and performance.
  • New Features

    • Accepts an escaped "/" in strings and improves fallback parsing behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 11, 2026

📝 Walkthrough

Walkthrough

Refactored JSON parsing to use both Python character indices and Rust byte indices, moved string/object/array parsing into Rust, added byte-aware utilities and memoization, and exposed public scanstring and DecodeError::new with enhanced error positions.

Changes

Cohort / File(s) Summary
Core JSON parsing restructure
crates/stdlib/src/json.rs
Changed JsonScanner::parse and JsonScanner::call signatures to accept PyStrRef/char_idx/byte_idx. Added byte-aware whitespace utilities, dual-index dispatch, Rust implementations for parse_object, parse_array, call_scan_once, finalize_object, and make_decode_error. Added object key memoization and updated error/reporting to char/byte positions.
String scanning and decode errors
crates/stdlib/src/json/machinery.rs
Added public scanstring(s: &Wtf8, char_offset: usize, strict: bool) -> Result<(Wtf8Buf, usize, usize), DecodeError> returning (decoded, chars_consumed, bytes_consumed). Introduced memchr2 fast-path, revised escape/surrogate handling and slow-path assembly, improved positional error contexts, and made DecodeError::new public.
Imports & small utilities
crates/stdlib/src/json/machinery.rs (imports)
Added memchr2 import and small signature/visibility adjustments to support new fast-path scanning and public APIs.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant JsonScanner
    participant Parser
    participant ScanOnce
    participant ScanString
    participant ObjParser
    participant ArrParser
    participant Finalizer

    Caller->>JsonScanner: call(pystr, char_idx)
    JsonScanner->>Parser: parse(pystr, char_idx, byte_idx, scan_once)
    Parser->>ScanOnce: call_scan_once(byte_idx, char_idx)
    alt string
        ScanOnce->>ScanString: scanstring(s_bytes, char_offset, strict)
        ScanString-->>ScanOnce: (decoded, chars, bytes)
    else object
        ScanOnce->>ObjParser: parse_object(s_bytes, byte_idx)
        ObjParser->>ScanString: scanstring(key_bytes, char_offset, strict)
        ScanString-->>ObjParser: (key, chars, bytes)
        ObjParser->>ScanOnce: parse value (recursive)
        ObjParser->>Finalizer: finalize_object(pairs, hooks)
        Finalizer-->>ObjParser: Python object
    else array
        ScanOnce->>ArrParser: parse_array(s_bytes, byte_idx)
        ArrParser->>ScanOnce: parse element (recursive)
    else literal/number
        ScanOnce-->>Parser: parsed literal/number
    end
    ScanOnce-->>Parser: (value, end_positions)
    Parser-->>JsonScanner: PyIterReturn
    JsonScanner-->>Caller: result
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐇 I hop on bytes, I count each char,

I chase escaped slashes near and far,
I stitch surrogates, memo keys tight,
I parse through day and into night,
JSON nibbled neat — what a delight 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main objective: improving json.loads performance. The PR demonstrates significant optimization through Rust-based refactoring and achieves measurable performance gains.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

🧹 Recent nitpick comments
crates/stdlib/src/json.rs (1)

36-40: Consider using the standard library method.

starts_with_bytes reimplements slice::starts_with. The standard library method is equivalent and may be clearer:

bytes[byte_idx..].starts_with(b"null")

This is a minor suggestion; the current implementation is correct.


📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 073a24a and 088fd83.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_json/test_decode.py is excluded by !Lib/**
📒 Files selected for processing (2)
  • crates/stdlib/src/json.rs
  • crates/stdlib/src/json/machinery.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style using cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/json/machinery.rs
  • crates/stdlib/src/json.rs
🧬 Code graph analysis (2)
crates/stdlib/src/json/machinery.rs (1)
crates/stdlib/src/json.rs (1)
  • scanstring (702-732)
crates/stdlib/src/json.rs (1)
crates/stdlib/src/json/machinery.rs (3)
  • scanstring (140-252)
  • new (112-115)
  • len (123-128)
🔇 Additional comments (10)
crates/stdlib/src/json/machinery.rs (3)

150-168: Fast path condition may miss edge cases with backslash.

The fast path activates when memchr2 finds " before any \. However, when bytes[pos] == b'"', it's possible the string contains no backslashes at all OR the quote was found first. The condition is correct, but note that a string like "hello\nworld" will correctly fall through to the slow path since \ would be found at a lower index than the closing ".

The control character check content_bytes.iter().any(|&b| b < 0x20) correctly validates strict mode. However, there's a subtle consideration: this check examines raw bytes, which is correct since control characters 0x00-0x1F are single-byte and won't appear as continuation bytes in valid UTF-8/WTF-8.


254-267: LGTM!

The decode_unicode function correctly decodes 4 hex digits into a CodePoint. The bit-shifting logic is correct, and error handling properly uses the provided position.


130-144: Good documentation of the function contract.

The docstring clearly explains the parameters, return values, and the semantics of chars_consumed and bytes_consumed (both include the closing quote). This is important for callers to correctly compute end positions.

crates/stdlib/src/json.rs (7)

283-292: Key memoization approach is correct.

The memoization converts Wtf8Buf to String for HashMap keys. Since JSON requires keys to be valid UTF-8 strings (JSON is a subset of Unicode), this conversion is safe and won't lose data. The memoization effectively reduces allocations for duplicate keys, which is particularly beneficial for large JSON arrays of objects with repeated key names.


388-469: LGTM!

The parse_array implementation correctly handles empty arrays, trailing comma detection, and maintains consistent char/byte index tracking. The structure mirrors parse_object well.


600-608: Fallback byte index calculation is O(n) but acceptable.

The comment at line 601-602 correctly notes this is expensive but should be rare. The fallback only triggers for unrecognized input patterns that aren't handled by the Rust fast paths.

One edge case: if end_char_idx returned by the Python scan_once is larger than the string length, char_indices().nth() returns None, and unwrap_or(s.len()) is used. This seems intentional and safe.


641-653: Good design: O(n) conversion done once at entry.

The character-to-byte index conversion using char_indices().nth() is O(char_idx), but this is done once at the entry point. All subsequent operations use byte indices, avoiding the repeated O(n) lookups that the PR aimed to eliminate. This is the correct architectural approach.


701-732: LGTM!

The scanstring wrapper correctly bridges the Python character-index API with the Rust byte-index implementation. The conversion logic is consistent with WTF-8 semantics, and error handling for out-of-bounds indices is appropriate.


471-503: Hook handling follows CPython semantics.

The finalize_object method correctly implements the hook priority: object_pairs_hook takes precedence over object_hook when both are present. The pairs are correctly converted to a list of tuples for object_pairs_hook, and object_hook receives the constructed dict.


114-123: String parsing position calculation is correct.

The calculation end_char_idx = char_idx + 1 + chars_consumed correctly accounts for:

  • char_idx: position of opening "
  • +1: skip opening quote
  • chars_consumed: content characters plus closing quote (included by scanstring)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 11, 2026

Code has been automatically formatted

The code in this PR has been formatted using:

  • cargo fmt --all
    Please pull the latest changes before pushing again:
git pull origin improve-json-perf

@moreal moreal force-pushed the improve-json-perf branch 3 times, most recently from e7bfda5 to 4772f32 Compare January 14, 2026 16:29
@moreal moreal changed the title WIP: Improve json.loads Improve json.loads Jan 14, 2026
@moreal moreal marked this pull request as ready for review January 14, 2026 17:55
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@crates/stdlib/src/json/machinery.rs`:
- Around line 212-227: The error position passed to decode_unicode for the
second surrogate is wrong; adjust the computed position so it points at the
start of the second "\uXXXX" escape (not inside the first). In the
surrogate-pair branch around uni.to_lead_surrogate(), change the position
argument passed to decode_unicode from char_offset + next_char_i + 1 to a value
that accounts for the first "\uXXXX" (e.g. char_offset + next_char_i + 7) or,
better, use the character index/byte_pos extracted from the tuple destructuring
(the byte_pos2/tuple element) to compute the exact offset and pass that into
decode_unicode so error locations refer to the second escape. Ensure chunk_start
and chars updates remain consistent with the new position calculation.
🧹 Nitpick comments (1)
crates/stdlib/src/json.rs (1)

587-617: Consider validating non-negative return index in fallback path.

The fallback to Python scan_once correctly handles the interface, but if scan_once returns a negative index (line 600), the as usize cast at line 605 would produce a very large value, causing .nth() to return None and defaulting to s.len().

While negative indices from scan_once would indicate a bug in the Python code, an explicit check could provide a clearer error:

Optional defensive check
                             let end_char_idx: isize = tuple.as_slice()[1].try_to_value(vm)?;
+                            if end_char_idx < 0 {
+                                return Err(vm.new_value_error("scan_once returned negative index"));
+                            }
                             // For fallback, we need to calculate byte_idx from char_idx
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 13a875f and 7e51d21.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_json/test_decode.py is excluded by !Lib/**
📒 Files selected for processing (2)
  • crates/stdlib/src/json.rs
  • crates/stdlib/src/json/machinery.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style using cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/json/machinery.rs
  • crates/stdlib/src/json.rs
🧬 Code graph analysis (2)
crates/stdlib/src/json/machinery.rs (1)
crates/stdlib/src/json.rs (1)
  • scanstring (702-732)
crates/stdlib/src/json.rs (1)
crates/stdlib/src/json/machinery.rs (3)
  • scanstring (140-252)
  • new (112-115)
  • len (123-128)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (windows-2025)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-2025)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (macos-latest)
🔇 Additional comments (13)
crates/stdlib/src/json/machinery.rs (5)

33-33: LGTM on imports and public visibility change.

The memchr2 import enables efficient byte-scanning for the fast path, and making DecodeError::new public is necessary for error construction in the calling code.

Also applies to: 112-115


130-144: LGTM on function signature and documentation.

The documentation clearly explains the contract - input slice starts after the opening quote, and char_offset enables accurate error position reporting. The triple return value (result, chars_consumed, bytes_consumed) elegantly supports the dual-index tracking pattern used throughout the PR.


145-168: LGTM on fast path implementation.

The optimization strategy is sound: use memchr2 to quickly locate the first " or \, and if the first match is the closing quote, the entire string content can be processed without escape handling. The control character check correctly scans for bytes < 0x20, and the character count properly uses code_points().count() to handle multi-byte sequences.


197-241: LGTM on escape sequence handling.

All standard JSON escape sequences are correctly handled, including the optional / escape. The error reporting correctly uses the character-based position (char_offset + char_i) for accurate Python-level error messages.


254-267: LGTM on decode_unicode implementation.

The function correctly parses 4 hex digits using bit shifting and properly handles errors when characters are missing or non-hexadecimal. The conversion from u16 to CodePoint correctly handles both BMP characters and surrogates.

crates/stdlib/src/json.rs (8)

20-40: LGTM on helper functions.

The skip_whitespace function correctly handles all JSON whitespace characters, and the comment appropriately notes that ASCII whitespace characters are single-byte, making the byte count equal to the character count. The starts_with_bytes function is a clean utility for pattern matching.


92-188: LGTM on updated parse method.

The method correctly maintains dual byte/char indices throughout. String parsing delegates to machinery::scanstring with proper slice offset, and the nested object/array parsing is cleanly delegated to dedicated methods. The constant handling correctly relies on ASCII lengths equaling character counts.


226-386: LGTM on parse_object implementation.

The method correctly handles all object parsing edge cases including empty objects, missing delimiters, and trailing commas. The key memoization via HashMap<String, PyStrRef> is an effective optimization for JSON with repeated keys. Error messages are clear and position-accurate.


388-469: LGTM on parse_array implementation.

The array parsing correctly mirrors the object parsing structure with proper handling for empty arrays, element delimiters, and trailing comma detection. The recursive value parsing via call_scan_once maintains consistent index tracking.


471-503: LGTM on finalize_object implementation.

The method correctly implements the CPython JSON module's hook priority: object_pairs_hook takes precedence when present, otherwise a dict is built and optionally passed to object_hook. The behavior for duplicate keys (last value wins) matches JSON specification.


619-629: LGTM on make_decode_error helper.

This provides a clean abstraction for creating decode errors within the JsonScanner methods, maintaining consistency with the existing py_decode_error function.


632-664: LGTM on Callable implementation.

The entry point correctly validates the index, computes the byte index from the character index (accepting the O(n) cost at the entry point), and properly handles the beyond-string-length case by returning StopIteration. This establishes the dual-index tracking that propagates through all nested parsing.


701-732: LGTM on scanstring function.

The function correctly bridges the Python API (character indices) with the Rust implementation (byte indices). The index conversion using code_point_indices().nth() handles multi-byte characters correctly, and the error handling for out-of-bounds indices is appropriate.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

@moreal moreal changed the title Improve json.loads Improve json.loads performance Jan 14, 2026
moreal and others added 9 commits January 15, 2026 14:30
Parse JSON constants (null, true, false) directly in Rust within
call_scan_once() instead of falling back to Python scan_once.
This reduces Python-Rust boundary crossings for array/object values.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Parse JSON numbers starting with digits (0-9) directly in Rust within
call_scan_once() by reusing the existing parse_number() method.
This reduces Python-Rust boundary crossings for array/object values.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Parse special JSON constants (NaN, Infinity, -Infinity) and negative
numbers directly in Rust within call_scan_once(). This handles:
- 'N' -> NaN via parse_constant callback
- 'I' -> Infinity via parse_constant callback
- '-' -> -Infinity or negative numbers via parse_constant/parse_number

This reduces Python-Rust boundary crossings for array/object values.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@moreal
Copy link
Contributor Author

moreal commented Jan 15, 2026

Thank you for resolving the CI issue!

)
.and_then(|x| x.to_pyresult(vm))
}
match s.char_indices().nth(char_idx) {
Copy link
Member

@youknowone youknowone Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is looking not very performance friendly.
Unlike CPython, our string is not array of chars.
Can these functions written with only byte_idx instead of char_idx?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Please let me know if I misunderstood)

As with tracking byte_idx in this PR, going through the Chars iterator has a very negative impact on performance. However, in the Callable implementation you commented on, char_idx has to be used because it's called using char_idx from the Python side.

So this is the part where we convert char_idx to byte_idx before starting parsing. Is there a better way to convert to byte_idx than using char_indices().nth()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thank you for explanation! I was misunderstanding.

Copy link
Member

@youknowone youknowone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@youknowone youknowone merged commit 3a702ac into RustPython:main Jan 15, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants