Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Jan 3, 2026

Summary by CodeRabbit

  • Bug Fixes

    • TLS sockets now buffer and reliably flush pending TLS output, handling partial writes, non‑blocking I/O, and timeouts to avoid stalled sends and unnecessary would‑block errors.
    • Shutdown/close‑notify now ensure buffered TLS data is transmitted before closing; want‑read/want‑write semantics preserved.
    • Read and handshake flows handle mid‑operation write requirements and enforce deadlines to reduce spurious failures.
  • Refactor

    • TLS I/O and flushing behavior centralized into shared helpers used across handshake, read, write, and shutdown for consistent, robust behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 3, 2026

Caution

Review failed

The pull request is closed.

📝 Walkthrough

Walkthrough

Adds non-blocking TLS output buffering to SSLSocket/PySSLSocket with new pending buffer and helpers; integrates buffered send/flush into handshake, read, write, and shutdown flows to handle partial writes and preserve SSL_want_read/SSL_want_write semantics.

Changes

Cohort / File(s) Summary
Core SSL types & I/O
crates/stdlib/src/ssl.rs
Adds pending_tls_output: PyMutex<Vec<u8>> to PySSLSocket and _SSLSocket; implements flush_pending_tls_output, send_tls_output, blocking_flush_all_pending; integrates buffering into read/write/handshake/shutdown paths.
Compatibility helpers & send logic
crates/stdlib/src/ssl/compat.rs
Adds send_all_bytes(socket, buf, vm, deadline) to pre-flush pending output, handle partial writes, WantWrite, and deadline checks; replaces direct socket writes in handshake/read/shutdown flows.
Handshake / Read / Write flows
crates/stdlib/src/ssl.rs, crates/stdlib/src/ssl/compat.rs
Pre-flushes pending TLS output before producing/sending records; uses buffered-send helpers for TLS records and alerts; performs best-effort sends during reads when a write is required; ensures close_notify follows flushed output.
Initialization & constructors
crates/stdlib/src/ssl.rs
Initializes pending_tls_output in _wrap_socket and _wrap_bio constructors.
Socket helper signature change & call sites
crates/stdlib/src/ssl.rs
Changes sock_send signature from data: Vec<u8> to data: &[u8]; updates call sites to work with non-blocking buffering and partial writes.
Manifest
Cargo.toml
(Unchanged content reference) Manifest present for library version lookup.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App as Application
  participant SSL as SSL layer (PySSLSocket)
  participant Sock as TCP Socket
  participant VM as VirtualMachine

  rect rgba(135,206,235,0.10)
  Note over App,SSL: TLS write with pending-buffer handling
  App->>SSL: write(data)
  SSL->>SSL: produce TLS record(s)
  SSL->>SSL: flush_pending_tls_output(vm, deadline?)
  end

  alt pending buffer non-empty
    SSL->>Sock: send_all_bytes(pending, deadline)
    Sock-->>SSL: sent / partial / WouldBlock
    SSL->>SSL: buffer leftover (if any)
  end

  SSL->>Sock: send_all_bytes(new_record, deadline)
  alt socket sends all bytes
    Sock-->>SSL: sent complete
    SSL-->>App: return bytes_written
  else partial write or WouldBlock
    Sock-->>SSL: sent partial / WouldBlock
    SSL->>SSL: append remaining to pending_tls_output
    SSL-->>App: raise SSL_want_write
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • coolreader18
  • ShaharNaveh

Poem

🐇
I tuck the bytes in cozy rows,
I wait until the wire glows,
When lanes unfreeze I hop and send,
No TLS crumb is left to mend,
Hopped, flushed, and on its toes. 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Fix SSL handshake partial send' directly summarizes the main change: addressing an issue where TLS handshake-related bytes were not reliably sent during non-blocking operations, causing timeouts in downstream read operations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 114fc35 and 23436d4.

📒 Files selected for processing (2)
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/stdlib/src/ssl/compat.rs (1)

1031-1061: Handshake partial-send loop fixes main issue; edge case still drops unsent bytes and logic is duplicated

This loop correctly handles the common case where sock_send returns a short count without error, ensuring the full TLS buffer is pushed out before proceeding, and mapping non-blocking write conditions to SslError::WantWrite, which is a big improvement over the previous single-shot send.

Two points worth considering:

  1. Partial write + BlockingIOError edge case
    If socket.sock_send ever writes some bytes and then raises BlockingIOError, the function returns SslError::WantWrite but the remaining unsent part of buf is dropped when handshake_write_loop unwinds. Because write_tls has already drained rustls’ outbound queue into this Vec<u8>, those bytes can’t be regenerated, so the peer will see a truncated TLS record and the handshake will eventually fail. This was effectively broken before as well, but if you want fully robust behaviour you’d need a persistent “pending TLS ciphertext” buffer on PySSLSocket that’s flushed before calling write_tls again, rather than relying solely on a local Vec.

  2. Keep send semantics in one place
    The partial-send logic here is nearly identical in structure to the new loop in PySSLSocket::write in ssl.rs (short writes, BlockingIOError → WANT_WRITE, 0-byte writes treated as WANT_WRITE). It would be easier to keep semantics aligned over time if this were factored into a small helper that operates on a &[u8] and a sock_send callback, and reused from both handshake write paths and the data write path.

Given how rare the partial+exception case is in practice and that this change already fixes the main short-send bug, these are follow‑up improvements rather than blockers.

crates/stdlib/src/ssl.rs (1)

3416-3454: Non-blocking write loop correctly handles partial sends; consider consolidating logic and tightening error handling

This new loop in write() does the right thing for the main problem the PR is targeting:

  • It keeps calling sock_send until all of buf is written, so short send() results no longer silently truncate TLS ciphertext.
  • It respects socket timeouts via sock_wait_for_io_impl, and maps non-blocking conditions to SSLWantWriteError, which matches Python/OpenSSL expectations.

A couple of follow‑ups you might want to consider:

  1. Share implementation with the handshake write path
    The structure here (send until sent_total == buf.len(), BlockingIOError → WANT_WRITE, 0-byte writes treated as WANT_WRITE) is almost identical to handshake_write_loop in compat.rs. Extracting a small helper that takes a buffer and a sock_send closure would reduce duplication and keep behaviour in sync for handshake and application data writes.

  2. Avoid silently assuming a full send on conversion failure
    result.try_into_value(vm).unwrap_or(buf.len() - sent_total) is very defensive, but if send() ever returned a non-int, it would mask the error by pretending the rest of the buffer was sent. Surfacing a TypeError (or turning that into an SSL syscall error) would be safer than assuming success in such an unexpected situation.

Functionally, though, this change is a clear improvement over the previous single-shot send.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cff37af and e4f26de.

📒 Files selected for processing (2)
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl/compat.rs
  • crates/stdlib/src/ssl.rs
🧬 Code graph analysis (1)
crates/stdlib/src/ssl.rs (2)
crates/stdlib/src/ssl/error.rs (1)
  • create_ssl_want_write_error (112-118)
crates/stdlib/src/ssl/compat.rs (1)
  • is_blocking_io_error (1002-1004)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e4f26de and aaa6fd6.

⛔ Files ignored due to path filters (1)
  • Lib/test/test_ftplib.py is excluded by !Lib/**
📒 Files selected for processing (2)
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl/compat.rs
  • crates/stdlib/src/ssl.rs
🧬 Code graph analysis (1)
crates/stdlib/src/ssl.rs (2)
crates/stdlib/src/ssl/compat.rs (2)
  • is_blocking_io_error (1002-1004)
  • new (1794-1796)
crates/stdlib/src/ssl/error.rs (1)
  • create_ssl_want_write_error (112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run tests under miri
🔇 Additional comments (2)
crates/stdlib/src/ssl/compat.rs (1)

1432-1475: ssl_read loop and timeout/deadline handling look consistent with SSL_read semantics.

The read loop’s deadline calculation, WANT_READ/WANT_WRITE mapping, and the pre‑read flush of pending TLS data via wants_write() integrate cleanly with the compat error model and CPython‑style behavior. Aside from the send‑buffer concern noted for send_all_bytes, the control flow here looks sound.

crates/stdlib/src/ssl.rs (1)

3232-3236: Auto‑completing the handshake on first read/write aligns with OpenSSL/CPython behavior.

The added do_handshake() calls in read() and write() when handshake_done is false make the I/O methods behave like SSL_read/SSL_write, which implicitly drive the handshake. Combined with the compat‑layer error mapping, this is a good behavioral match and keeps callers from needing to special‑case do_handshake() in most flows.

Also applies to: 3364-3368

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
crates/stdlib/src/ssl.rs (1)

3992-4039: BIO shutdown now correctly distinguishes “no data yet” from EOF.

The updated try_read_close_notify:

  • Treats an empty MemoryBIO.read() result as EOF only if incoming_bio.eof is true.
  • Otherwise, in BIO mode, an empty read with eof == False is interpreted as “no data yet” and returns Ok(false) instead of “peer closed”.

This matches MemoryBIO.eof semantics and avoids prematurely treating a quiet BIO as peer closure during shutdown.

In CPython’s `ssl.MemoryBIO`, can `read()` return an empty `bytes` object while `eof` is still `False` (i.e., buffer empty but no EOF signaled via `write_eof()`), and is `eof` the correct indicator of actual stream EOF?
crates/stdlib/src/ssl/compat.rs (1)

1006-1043: Non-blocking partial writes in send_all_bytes still drop TLS bytes produced by write_tls.

send_all_bytes stages rustls output into a Vec<u8> and assumes it either fully drains or returns SslError::WantWrite/error. On non-blocking sockets or full kernel send buffers, the flow is:

  • conn.write_tls(&mut buf) drains rustls’ internal output queue into buf.
  • send_all_bytes loops calling sock_send(buf[sent_total..]).
  • If the kernel send buffer fills, sock_send raises BlockingIOError or returns 0 with sent_total < buf.len().
  • send_all_bytes returns SslError::WantWrite (or error), and buf is dropped.
  • rustls has already drained its internal buffer; the unsent tail cannot be re‑drained on the next write_tls call.

This affects every call site that uses write_tls + send_all_bytes:

  • handshake_write_loop (lines 1051–1074): handshake bytes can be truncated on non-blocking sockets, breaking the handshake.
  • handle_handshake_complete socket path (lines 1183–1212): post-handshake NewSessionTicket and similar messages can be lost.
  • ssl_do_handshake BIO flush (lines 1331–1342): mostly safe in BIO mode today because sock_send writes to MemoryBIO without partials, but the helper’s semantics are still lossy for non-BIO.
  • ssl_read’s wants_write flush (lines 1457–1460): queued TLS records required for progress can be partially dropped.

Given that PySSLSocket now has a per-socket pending_tls_output buffer in ssl.rs, the handshake/compat paths should be updated to avoid staging into a throwaway Vec:

  • Either:
    • Route these handshake writes through the same flush_pending_tls_output / send_tls_output helpers used by write() and write_pending_tls(), so unsent TLS bytes are preserved and retried; or
    • Introduce a compat-layer equivalent (e.g., a per-socket pending-TLS buffer exposed via methods on PySSLSocket) and have send_all_bytes operate on that instead of a transient Vec.

Until that’s done, non-blocking handshakes and any path that relies on send_all_bytes can silently truncate TLS data.

For rustls 0.23.x, does `Connection::write_tls(&mut Write)` drain the internal outbound TLS buffer into the provided writer (i.e., after a successful call, rustls no longer retains a copy of those bytes), so that if the caller fails to actually send all of them over the network, the unsent tail cannot be re-drained on a later `write_tls` call?

Also applies to: 1051-1074, 1183-1212, 1331-1342, 1457-1460

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between caf807f and df1d22c.

⛔ Files ignored due to path filters (2)
  • Lib/test/_test_multiprocessing.py is excluded by !Lib/**
  • Lib/test/test_ftplib.py is excluded by !Lib/**
📒 Files selected for processing (2)
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl/compat.rs
  • crates/stdlib/src/ssl.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (2)
crates/stdlib/src/ssl.rs (2)

1887-1907: pending_tls_output field and initialisation look consistent.

The new pending_tls_output: PyMutex<Vec<u8>> is added to PySSLSocket and correctly initialised in both _wrap_socket and _wrap_bio, so all instances have a valid buffer regardless of mode. No issues here.

Also applies to: 1957-1978, 2342-2348


3356-3360: Auto-handshake in read()/write() diverges from CPython's ssl module semantics.

While OpenSSL's SSL_read()/SSL_write() can initiate the handshake if needed, CPython's ssl.SSLSocket.read() and write() do not automatically complete a pending handshake. CPython expects the handshake to be completed before I/O (either via do_handshake_on_connect=True during connect(), or explicit do_handshake() calls). The documented behavior is that read/write operations expect an already-active SSL connection.

The current implementation, while functionally safe due to the handshake_done flag and idempotency, introduces implicit behavior that differs from CPython's explicit handshake requirement and may surprise users familiar with the standard library.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
crates/stdlib/src/ssl.rs (1)

1905-1907: Pending TLS output plumbing looks solid overall; just watch shutdown blocking semantics.

The new pending_tls_output field plus flush_pending_tls_output, send_tls_output, and blocking_flush_all_pending give you a coherent story for preserving and later flushing unsent TLS bytes:

  • Writes now:
    • Ensure handshake is done before sending (write).
    • Flush any previously queued TLS bytes via flush_pending_tls_output before generating new records.
    • Use send_tls_output to send freshly produced TLS records, appending unsent tails to pending_tls_output and mapping write timeouts to socket.timeout via timeout_error_msg, and non-blocking EAGAIN to SSLWantWriteError.
  • Shutdown/write-pending:
    • shutdown() flushes pending_tls_output before sending close_notify, and write_pending_tls() now participates in the same pending-buffer pipeline by calling flush_pending_tls_output first and send_tls_output per TLS chunk.

These changes address the earlier “drop on partial send” problem for application writes and close-notify, and keep ordering correct by always draining pending_tls_output before pushing new TLS bytes.

One behavior to double-check is blocking_flush_all_pending in the shutdown path: even for sockets that were configured as non-blocking or with a finite timeout, once you enter the timeout_mode branch in shutdown() you call blocking_flush_all_pending, which uses sock_select(..., None) and will block until all pending bytes are sent or a hard error occurs. That may be acceptable (shutdown becomes fully blocking regardless of previous mode), but it’s a semantic change vs simply propagating SSLWantWriteError/socket.timeout from the non-blocking helpers. It may be worth confirming this matches CPython’s SSLSocket.shutdown() behavior for non-blocking and timeout sockets, or gating the fully-blocking flush to “truly blocking” sockets only.

Also applies to: 1976-1978, 2342-2348, 2748-2904, 3530-3534, 3575-3577, 3588-3600, 4023-4046

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df1d22c and ef4bc84.

📒 Files selected for processing (2)
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl/compat.rs
  • crates/stdlib/src/ssl.rs
🧠 Learnings (1)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/stdlib/src/ssl.rs
🧬 Code graph analysis (1)
crates/stdlib/src/ssl.rs (2)
crates/stdlib/src/socket.rs (13)
  • timeout (835-837)
  • timeout_error_msg (3296-3298)
  • sock_select (2720-2776)
  • std (1252-1252)
  • std (1315-1315)
  • std (1415-1415)
  • std (1837-1837)
  • std (1843-1843)
  • std (1877-1877)
  • std (1883-1883)
  • std (2354-2354)
  • std (2397-2397)
  • std (2434-2434)
crates/stdlib/src/ssl/error.rs (1)
  • create_ssl_want_write_error (112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run tests under miri
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (1)
crates/stdlib/src/ssl.rs (1)

3398-3402: Auto-completing the handshake in read() matches OpenSSL/CPython expectations.

Having read() call do_handshake() when handshake_done is false (before delegating to compat::ssl_read) aligns with OpenSSL/CPython behavior where SSL_read() implicitly drives the handshake, and the subsequent error mapping (WantRead, WantWrite, Timeout, EOF/ZeroReturn) is consistent with the compat layer. No issues spotted here.

Also applies to: 3442-3443

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/stdlib/src/ssl/compat.rs (1)

1022-1059: Consider optimizing Vec allocation in send loop.

Line 1024 creates a new Vec<u8> on each iteration with buf[sent_total..].to_vec(). For large buffers or connections with many partial sends, this could result in multiple allocations.

If socket.sock_send() can accept a &[u8] or if there's an alternative approach, consider refactoring to reduce allocations. This would improve performance for high-throughput scenarios or slow networks where partial sends are more common.

#!/bin/bash
# Check sock_send signature to determine if it can accept &[u8]
rg -n "fn sock_send|pub fn sock_send" crates/stdlib/src/ssl.rs -A 10
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef4bc84 and 5f403fd.

📒 Files selected for processing (2)
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl/compat.rs
  • crates/stdlib/src/ssl.rs
🧠 Learnings (1)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/stdlib/src/ssl.rs
🧬 Code graph analysis (2)
crates/stdlib/src/ssl/compat.rs (1)
crates/stdlib/src/ssl.rs (5)
  • result (2660-2660)
  • result (2773-2773)
  • result (2832-2832)
  • result (2890-2890)
  • e (1113-1113)
crates/stdlib/src/ssl.rs (2)
crates/stdlib/src/socket.rs (12)
  • timeout (835-837)
  • timeout_error_msg (3296-3298)
  • std (1252-1252)
  • std (1315-1315)
  • std (1415-1415)
  • std (1837-1837)
  • std (1843-1843)
  • std (1877-1877)
  • std (1883-1883)
  • std (2354-2354)
  • std (2397-2397)
  • std (2434-2434)
crates/stdlib/src/ssl/error.rs (1)
  • create_ssl_want_write_error (112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run tests under miri
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (12)
crates/stdlib/src/ssl/compat.rs (1)

1008-1061: The error handling is intentional and correct. The flush_pending_tls_output method preserves all unsent bytes in the pending buffer regardless of error type, so no data is lost when errors are ignored. This is a consistent, documented pattern throughout the SSL module for non-blocking operations (e.g., ssl.rs:3999 explicitly notes "best-effort flush, ignore errors to avoid deadlock"), while timeout and blocking modes propagate errors (ssl.rs:4004). No changes needed.

crates/stdlib/src/ssl.rs (11)

1905-1905: LGTM: Pending TLS output buffer initialization.

The pending_tls_output field is correctly initialized to an empty vector in both _wrap_socket and _wrap_bio code paths, ensuring the buffer is ready to store unsent TLS data when partial writes occur.

Also applies to: 1976-1976


2748-2803: LGTM: Robust pending TLS output flushing.

The flush_pending_tls_output method correctly:

  • Flushes buffered TLS data before generating new output
  • Preserves unsent bytes in the pending buffer on partial writes or errors
  • Raises SSLWantWriteError for non-blocking sockets
  • Uses timeout_error_msg for write timeouts (addressing previous review feedback)

2805-2866: LGTM: TLS output sending with buffering.

The send_tls_output method properly:

  • Sends TLS data generated by rustls' write_tls()
  • Preserves unsent bytes by extending the pending buffer on partial writes or errors
  • Handles non-blocking mode and timeouts correctly
  • Prevents data loss when the socket cannot immediately accept all data

2868-2904: Note: Blocking flush behavior during shutdown.

The blocking_flush_all_pending method uses sock_select with None timeout, making it wait indefinitely until all pending TLS data is sent. This behavior is consistent with ensuring data integrity during shutdown but ignores the socket's configured timeout setting. As noted in previous review feedback, this is a known trade-off for ensuring all data is transmitted before closing the connection.


2711-2725: LGTM: Socket receive implementation.

The sock_recv method correctly handles both BIO and socket modes, reading from the incoming BIO or socket respectively. The comment clarifies that flags are not supported in BIO mode, which is consistent with the implementation.


3398-3402: LGTM: Auto-handshake in read and write operations.

Both read() and write() now automatically complete the handshake if not already done, matching OpenSSL's behavior where SSL_read() and SSL_write() auto-complete the handshake. This ensures the TLS connection is established before I/O operations proceed.

Also applies to: 3530-3534


3574-3600: LGTM: Write method properly integrates TLS output buffering.

The write() method correctly:

  • Flushes pending TLS data before generating new output (line 3589)
  • Uses send_tls_output to handle partial writes (line 3599)
  • Ensures data integrity by preserving unsent bytes in the pending buffer
  • Follows the documented pattern: flush old data → generate new data → send with buffering

3920-4016: LGTM: Shutdown properly flushes pending data and respects timeout modes.

The shutdown() method now:

  • Flushes pending TLS output before sending close_notify (line 3923), ensuring application data is transmitted before the shutdown alert
  • Respects socket timeout settings when flushing remaining data (lines 3996-4010):
    • Non-blocking mode (timeout=0): best-effort flush, ignores errors to avoid deadlock
    • Timeout mode: uses flush with the configured timeout
    • Blocking mode (timeout=None): waits until all data is sent via blocking_flush_all_pending
  • Returns immediately after sending close_notify in socket mode, matching CPython/OpenSSL behavior and avoiding potential deadlocks

This addresses previous review feedback about handling timeout settings appropriately.


4042-4065: LGTM: write_pending_tls integrates buffering correctly.

The write_pending_tls method properly:

  • Flushes any previously pending TLS output first (line 4044)
  • Uses send_tls_output to handle partial writes (line 4061)
  • Ensures unsent TLS data is preserved in the pending buffer

This maintains consistency with the overall buffering strategy.


4067-4116: LGTM: Improved BIO EOF detection in try_read_close_notify.

The try_read_close_notify method now correctly handles BIO mode by:

  • Checking the BIO's eof attribute when an empty read occurs (lines 4082-4091)
  • Distinguishing between "no data yet" (returns false) and "EOF signaled" (returns true)
  • Treating BlockingIOError as "no data yet" rather than peer closure (line 4108)

This fixes the issue where an empty BIO buffer without EOF would be misinterpreted as peer closure, addressing previous review feedback.


2342-2347: LGTM: Well-documented pending TLS output buffer design.

The comments clearly explain the purpose and design of the pending_tls_output buffer:

  • Stores unsent TLS bytes when the socket cannot accept data immediately
  • Prevents data loss when write_tls() drains rustls' internal buffer
  • Essential for non-blocking socket operation

This addresses the core issue identified in previous reviews where TLS bytes were lost on partial writes.

@youknowone
Copy link
Member Author

@coderabbitai Still I am experiencing test error. Please investigate the reason

ERROR: test_pwd (test.test_ftplib.TestTLS_FTPClassMixin.test_pwd)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 921, in setUp
    self.client.prot_p()
    ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 757, in prot_p
    self.voidcmd('PBSZ 0')
                 ^^^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 286, in voidcmd
    return self.voidresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 259, in voidresp
    resp = self.getresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
                                             ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
                                ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
                             ^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
                                  ^^^^^^
TimeoutError: The read operation timed out

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Let's examine the ssl.rs file to see the new buffering logic
cat crates/stdlib/src/ssl.rs

Length of output: 197088


🏁 Script executed:

#!/bin/bash
# Let's examine the compat.rs file to see the send_all_bytes implementation
cat crates/stdlib/src/ssl/compat.rs

Length of output: 78650


🏁 Script executed:

#!/bin/bash
# Let's also look at the test file to understand what it's doing
cat crates/pylib/Lib/test/test_ftplib.py | grep -A 30 "class TestTLS_FTPClassMixin"

Length of output: 68


@youknowone I've analyzed the code and error, and I believe I've found the issue causing the timeout.

Problem Analysis

The test is timing out when trying to read the FTP server's response after sending the PBSZ 0 command. The issue is in the send_all_bytes function in compat.rs:

fn send_all_bytes(socket: &PySSLSocket, buf: Vec<u8>, vm: &VirtualMachine) -> SslResult<()> {
    // First, flush any previously pending TLS data
    // Ignore errors - they will be saved in pending buffer and retried
    let _ = socket.flush_pending_tls_output(vm);  // ⚠️ ERROR IGNORED
    
    if buf.is_empty() {
        return Ok(());
    }
    
    let mut sent_total = 0;
    while sent_total < buf.len() {
        // ... send logic ...
    }
}

The bug: The flush operation error is being ignored with let _ =. This means:

  1. During handshake completion, some TLS data (like TLS 1.3 NewSessionTicket) gets buffered in pending_tls_output
  2. When sending the FTP command, send_all_bytes tries to flush but ignores any errors
  3. If the flush fails partially or has issues, old pending data might remain
  4. The server never receives all the handshake data, so it doesn't send a proper response
  5. Client times out waiting for the response

Recommended Fix

The send_all_bytes function should not ignore flush errors. Here's the fix:

fn send_all_bytes(socket: &PySSLSocket, buf: Vec<u8>, vm: &VirtualMachine) -> SslResult<()> {
    // First, flush any previously pending TLS data
    socket.flush_pending_tls_output(vm).map_err(SslError::Py)?;  // Don't ignore errors
    
    if buf.is_empty() {
        return Ok(());
    }
    
    let mut sent_total = 0;
    while sent_total < buf.len() {
        let to_send = buf[sent_total..].to_vec();
        match socket.sock_send(to_send, vm) {
            Ok(result) => {
                let sent: usize = result
                    .try_to_value::<isize>(vm)
                    .map_err(SslError::Py)?
                    .try_into()
                    .map_err(|_| SslError::Syscall("Invalid send return value".to_string()))?;
                if sent == 0 {
                    socket.pending_tls_output.lock().extend_from_slice(&buf[sent_total..]);
                    return Err(SslError::WantWrite);
                }
                sent_total += sent;
            }
            Err(e) => {
                if is_blocking_io_error(&e, vm) {
                    socket.pending_tls_output.lock().extend_from_slice(&buf[sent_total..]);
                    return Err(SslError::WantWrite);
                }
                socket.pending_tls_output.lock().extend_from_slice(&buf[sent_total..]);
                return Err(SslError::Py(e));
            }
        }
    }
    Ok(())
}

Additionally, check handshake_write_loop which has a similar issue:

// Change this:
let _ = socket.flush_pending_tls_output(vm);

// To this:
socket.flush_pending_tls_output(vm).map_err(SslError::Py)?;

This ensures that any pending TLS output from previous operations is fully sent before generating new TLS data, preventing the deadlock/timeout scenario.

@youknowone
Copy link
Member Author

@coderabbitai fixed but still error

ERROR: test_quit (test.test_ftplib.TestTLS_FTPClassMixin.test_quit)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 921, in setUp
    self.client.prot_p()
    ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 757, in prot_p
    self.voidcmd('PBSZ 0')
                 ^^^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 286, in voidcmd
    return self.voidresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 259, in voidresp
    resp = self.getresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
                                             ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
                                ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
                             ^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
                                  ^^^^^^
TimeoutError: The read operation timed out

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Examine the current send_all_bytes implementation
rg -n -A 30 "fn send_all_bytes" crates/stdlib/src/ssl/compat.rs

Length of output: 1393


🏁 Script executed:

#!/bin/bash
# Look at the flush_pending_tls_output implementation
rg -n -A 25 "fn flush_pending_tls_output" crates/stdlib/src/ssl.rs

Length of output: 1337


🏁 Script executed:

#!/bin/bash
# Check the handshake_write_loop to see how it's using flush
rg -n -B 5 -A 20 "fn handshake_write_loop" crates/stdlib/src/ssl/compat.rs

Length of output: 1030


🏁 Script executed:

#!/bin/bash
# Look for all places where pending_tls_output is being locked/modified
rg -n "pending_tls_output\.lock\(\)" crates/stdlib/src/ssl/compat.rs | head -30

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check the do_handshake implementation
rg -n -B 5 -A 40 "pub fn do_handshake\(" crates/stdlib/src/ssl/compat.rs

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Get the complete flush_pending_tls_output implementation
rg -n -A 45 "fn flush_pending_tls_output" crates/stdlib/src/ssl.rs

Length of output: 2481


🏁 Script executed:

#!/bin/bash
# Look at how handshake is completed - search for where do_handshake is called
rg -n -B 3 -A 15 "do_handshake" crates/stdlib/src/ssl/compat.rs | head -80

Length of output: 3421


🏁 Script executed:

#!/bin/bash
# Check if there's a handshake completion or finish function
rg -n -B 5 -A 30 "fn.*handshake.*complete" crates/stdlib/src/ssl/compat.rs

Length of output: 1556


🏁 Script executed:

#!/bin/bash
# Look for where TLS connection is being read during handshake
rg -n -B 5 -A 20 "handshake_read_loop\|read_tls" crates/stdlib/src/ssl/compat.rs | head -100

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check the ssl_read implementation to see if it flushes pending before reading
rg -n -B 5 -A 30 "pub fn ssl_read" crates/stdlib/src/ssl/compat.rs

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look for ssl_read function with different patterns
rg -n -B 5 -A 30 "fn ssl_read" crates/stdlib/src/ssl/compat.rs

Length of output: 3286


🏁 Script executed:

#!/bin/bash
# Check the complete handshake loop implementation
rg -n -A 80 "pub.super. fn ssl_do_handshake" crates/stdlib/src/ssl/compat.rs

Length of output: 3839


🏁 Script executed:

#!/bin/bash
# Look at handshake_read_loop without the pipe
rg -n -B 3 -A 25 "fn handshake_read_loop" crates/stdlib/src/ssl/compat.rs

Length of output: 0


🏁 Script executed:

#!/bin/bash
# See if there's read logic that needs to flush pending writes
rg -n -B 3 -A 20 "fn.*read.*loop" crates/stdlib/src/ssl/compat.rs

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Check if ssl_read exists with pub(super)
rg -n "pub\(super\) fn.*read" crates/stdlib/src/ssl/compat.rs

Length of output: 28


🏁 Script executed:

#!/bin/bash
# Check the full ssl_read implementation - it's probably longer
rg -n -A 100 "pub\(super\) fn ssl_read" crates/stdlib/src/ssl/compat.rs | head -150

Length of output: 4913


🏁 Script executed:

#!/bin/bash
# Search for any flush calls in ssl_read
rg -n "flush_pending" crates/stdlib/src/ssl/compat.rs

Length of output: 203


🏁 Script executed:

#!/bin/bash
# Look at what handshake_read_data does
rg -n -B 3 -A 30 "fn handshake_read_data" crates/stdlib/src/ssl/compat.rs

Length of output: 1303


🏁 Script executed:

#!/bin/bash
# Check if there's a sock_wait_for_io_impl that blocks on write during read
rg -n -B 3 -A 3 "sock_wait_for_io_impl.*Write" crates/stdlib/src/ssl.rs

Length of output: 689


@youknowone I found the root cause! The issue is a deadline management problem in the read path.

Root Cause

Looking at the code flow:

  1. ssl_read line 1456-1465: Has a deadline check at the top of the loop only
  2. ssl_read line 1479-1486: When conn.wants_write() (pending handshake data needs to be sent), it calls send_all_bytes
  3. send_all_bytes line 1016: Calls flush_pending_tls_output
  4. flush_pending_tls_output line 2761: Calls sock_wait_for_io_impl(SelectKind::Write, vm) which blocks

The problem: When flush_pending_tls_output blocks waiting for socket writability, it uses the socket's full timeout period. But we're in the middle of a READ operation that has its own deadline! The deadline check at the top of the ssl_read loop isn't executed while we're blocked inside flush_pending_tls_output.

Timeline of the bug:

  1. Handshake completes, TLS 1.3 NewSessionTicket gets buffered in pending_tls_output
  2. Client calls read() to get FTP response with a 30-second timeout
  3. ssl_read tries to flush pending write data first
  4. flush_pending_tls_output blocks for up to 30 seconds waiting for socket to be writable
  5. While blocked in WRITE, the READ deadline passes
  6. Eventually returns with "The read operation timed out"

Solution

The fix is to make ssl_read check the deadline before and after calling send_all_bytes. Here's the fix:

// In ssl_read, around line 1479-1486:
if conn.wants_write() && !is_bio {
    // Check deadline BEFORE attempting flush
    if let Some(deadline) = deadline
        && std::time::Instant::now() >= deadline
    {
        return Err(SslError::Timeout(
            "The read operation timed out".to_string(),
        ));
    }
    
    // Flush pending TLS data before continuing
    let tls_data = ssl_write_tls_records(conn)?;
    if !tls_data.is_empty() {
        // Use non-blocking send to avoid blocking too long during read
        match send_all_bytes(socket, tls_data, vm) {
            Ok(()) => {}
            Err(SslError::WantWrite) => {
                // Socket buffer full - for READ operation, this is acceptable
                // The pending data will be sent on next write/read call
                // Don't fail the READ just because we couldn't flush writes
            }
            Err(e) => return Err(e),
        }
    }
    
    // Check deadline AFTER flush attempt
    if let Some(deadline) = deadline
        && std::time::Instant::now() >= deadline
    {
        return Err(SslError::Timeout(
            "The read operation timed out".to_string(),
        ));
    }
    
    // After flushing, rustls may want to read again - continue loop
    continue;
}

Additional consideration: You might also want to make flush_pending_tls_output accept an optional deadline parameter so it can bail out early if called during a read operation with limited time remaining. But the above fix should resolve the immediate timeout issue.

The key insight: Don't fail a READ operation just because pending WRITE data couldn't be flushed immediately. The WantWrite error should be caught and ignored in the read path, with the pending data staying buffered for the next write opportunity.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/stdlib/src/ssl/compat.rs (1)

1308-1316: Use send_all_bytes for close_notify alert to ensure proper error handling and retry logic.

Lines 1308-1310 correctly flush pending TLS data before sending close_notify, and line 1365 properly uses send_all_bytes for BIO mode writes. However, line 1314 uses direct socket.sock_send(alert_data, vm) without the send_all_bytes wrapper, which lacks retry logic for partial sends and discards the error result. This violates Rust error handling best practices and creates inconsistency.

The send_all_bytes function handles partial sends with a retry loop, saves unsent bytes to pending buffer, and properly propagates errors. Replace the direct sock_send call at line 1314 with send_all_bytes to ensure the close_notify alert reaches the peer even if the initial send is partial, matching the pattern used throughout the rest of the file (lines 1094, 1213, 1228, 1365, 1483).

🧹 Nitpick comments (1)
crates/stdlib/src/ssl/compat.rs (1)

1006-1061: LGTM with optional performance consideration.

The send_all_bytes function correctly addresses the partial-send issues flagged in past reviews:

  • Flushes pending TLS data before sending new bytes (line 1016)
  • Buffers unsent data on partial sends or blocking errors (lines 1034-1038, 1045-1049, 1052-1056)
  • Returns appropriate SslError variants for retry/propagation

One optional optimization: line 1024 allocates a new Vec on each iteration (buf[sent_total..].to_vec()). For large buffers with many partial sends, this could be inefficient. However, this is required because sock_send takes ownership of the Vec. If sock_send were refactored to accept a slice, this allocation could be avoided.

Based on past review comments, this implementation resolves the previously identified issue where partial writes dropped TLS bytes.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5f403fd and 8f84261.

📒 Files selected for processing (2)
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl/compat.rs
  • crates/stdlib/src/ssl.rs
🧠 Learnings (1)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/stdlib/src/ssl.rs
🧬 Code graph analysis (1)
crates/stdlib/src/ssl.rs (4)
crates/stdlib/src/ssl/compat.rs (4)
  • new (1818-1820)
  • is_blocking_io_error (1002-1004)
  • data (1524-1526)
  • data (1762-1764)
crates/stdlib/src/ssl/cert.rs (5)
  • new (628-636)
  • new (1034-1042)
  • new (1299-1304)
  • new (1456-1466)
  • new (1539-1549)
crates/stdlib/src/socket.rs (12)
  • timeout (835-837)
  • sock_select (2720-2776)
  • std (1252-1252)
  • std (1315-1315)
  • std (1415-1415)
  • std (1837-1837)
  • std (1843-1843)
  • std (1877-1877)
  • std (1883-1883)
  • std (2354-2354)
  • std (2397-2397)
  • std (2434-2434)
crates/stdlib/src/ssl/error.rs (1)
  • create_ssl_want_write_error (112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run tests under miri
🔇 Additional comments (9)
crates/stdlib/src/ssl/compat.rs (3)

1077-1095: LGTM! Handshake write logic correctly handles pending data.

The changes ensure proper ordering and reliable transmission:

  • Lines 1077-1079: Flush pending TLS data before generating new output maintains correct record ordering.
  • Line 1094: send_all_bytes replaces direct socket send, ensuring partial writes are buffered correctly.

Based on past review comments, this addresses the issue where pending TLS bytes in handshake flows lacked a flush mechanism.


1213-1232: LGTM! Post-handshake data handling is correct.

The changes properly handle post-handshake TLS data (e.g., TLS 1.3 NewSessionTicket):

  • Line 1213: BIO mode uses send_all_bytes to buffer partial writes.
  • Lines 1228-1232: Socket mode loop gracefully handles WantWrite by breaking and relying on buffered pending data for subsequent operations.

The updated comment (lines 1221-1222) accurately describes the best-effort approach, where buffered pending data will be flushed in subsequent read/write calls. This addresses the past review concern that the previous comment was misleading about data being "resent."


1479-1486: LGTM! Read-side write handling is correct.

Line 1483 properly uses send_all_bytes when ssl_read needs to write TLS data (e.g., for TLS key updates or renegotiation). This ensures partial writes during read operations are buffered correctly, and the subsequent continue (line 1486) allows rustls to retry reading after the flush.

crates/stdlib/src/ssl.rs (6)

1905-1905: LGTM: Pending TLS output buffer initialization.

The pending_tls_output field is properly initialized to an empty Vec in both socket and BIO wrapper paths, with clear documentation explaining its purpose in preventing data loss during non-blocking writes.

Also applies to: 1976-1976, 2342-2347


2748-2803: LGTM: flush_pending_tls_output correctly preserves unsent data.

The method properly handles all error cases:

  • Timeouts preserve unsent data and raise socket.timeout (matching CPython)
  • Non-blocking mode returns SSLWantWriteError with data queued for retry
  • Other errors also preserve pending data to prevent loss

The implementation ensures data integrity across partial writes.


2805-2866: LGTM: send_tls_output complements pending data flush.

This method correctly handles new TLS output by attempting to send immediately and buffering any unsent tail when the socket would block or times out. Together with flush_pending_tls_output, these methods ensure rustls' internal buffer is never drained without accounting for what actually reached the wire.


3574-3578: LGTM: Write path correctly maintains TLS data order.

The changes ensure:

  1. Application data is flushed to TLS records (line 3575-3577)
  2. Previously pending TLS data is sent before generating new TLS output (line 3589) — critical for maintaining record order
  3. New TLS data is sent with buffering for partial writes (line 3599)

This ordering prevents TLS record corruption and data loss.

Also applies to: 3588-3600


3918-3951: LGTM: Shutdown correctly flushes pending data before close_notify.

The critical pre-close_notify flush sequence (lines 3922-3945) is essential:

  1. Drain rustls' internal TLS buffer (lines 3926-3935)
  2. Flush pending_tls_output to socket/BIO (lines 3937-3945)
  3. Only then send close_notify (line 3947)

This ordering prevents data loss when the peer receives close_notify first and closes the connection.

The return-immediately behavior for socket mode (lines 3998-4036) correctly matches CPython/OpenSSL's SSL_shutdown(), avoiding deadlock in asyncore scenarios where both sides wait for each other's close_notify. The timeout-mode handling (lines 4015-4031) appropriately respects the socket's blocking/timeout/non-blocking configuration.

Also applies to: 3998-4036


4088-4137: LGTM: BIO EOF detection correctly distinguishes "no data yet" from peer close.

Lines 4101-4116 now check the eof attribute to avoid treating an empty BIO buffer as peer closure. This fixes the earlier issue where shutdown() in BIO mode would incorrectly classify a quiet connection as ragged EOF. The logic correctly returns Ok(false) when eof=false (no data available yet) and Ok(true) only when the BIO signals true EOF or a socket read returns empty.

@youknowone
Copy link
Member Author

@CodeRabbit fixed them, but still errors

ERROR: test_getwelcome (test.test_ftplib.TestTLS_FTPClassMixin.test_getwelcome)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 921, in setUp
    self.client.prot_p()
    ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 757, in prot_p
    self.voidcmd('PBSZ 0')
                 ^^^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 286, in voidcmd
    return self.voidresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 259, in voidresp
    resp = self.getresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
                                             ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
                                ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
                             ^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
                                  ^^^^^^
TimeoutError: The read operation timed out

======================================================================
ERROR: test_storlines (test.test_ftplib.TestTLS_FTPClassMixin.test_storlines)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 642, in test_storlines
    self.client.storlines('stor foo', f, callback=lambda x: flag.append(None))
                                                                        ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 534, in storlines
    return self.voidresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 259, in voidresp
    resp = self.getresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
                                             ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
                                ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
                             ^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
                                  ^^^^^^
TimeoutError: The read operation timed out

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/stdlib/src/ssl/compat.rs (1)

1008-1061: LGTM with optional performance consideration.

The send_all_bytes function correctly handles partial sends by buffering unsent data in pending_tls_output and returning WantWrite when progress stalls. The logic preserves data across all error paths (zero bytes sent, blocking IO, and other errors), which addresses the critical issue from past reviews where TLS bytes could be dropped.

Optional performance note: Line 1024 allocates a new Vec on each loop iteration with buf[sent_total..].to_vec(). For large buffers with small partial sends, this creates multiple allocations. If sock_send could accept a slice reference instead of taking ownership, this could be avoided. However, this optimization may require API changes and is not critical for correctness.

crates/stdlib/src/ssl.rs (1)

2868-2904: Verify blocking flush behavior aligns with shutdown timeout expectations.

The blocking_flush_all_pending method uses sock_select(..., None) (line 2885), which blocks indefinitely waiting for writability. However, in the shutdown() method (lines 4015-4031), the code attempts to respect different socket timeout modes (non-blocking, timeout, blocking).

Specifically:

  • Lines 4017-4021: Non-blocking mode should do best-effort flush
  • Lines 4022-4026: Timeout mode should use socket's timeout
  • Lines 4027-4030: Blocking mode should wait indefinitely

But blocking_flush_all_pending always blocks indefinitely. Consider:

  • Renaming to clarify it always blocks regardless of socket mode, or
  • Accepting an optional timeout parameter to respect the socket's timeout setting
🤖 Suggested refinement
// Option 1: Add timeout parameter
pub(crate) fn blocking_flush_all_pending(&self, timeout: Option<Duration>, vm: &VirtualMachine) -> PyResult<()> {
    loop {
        let pending_data = {
            let pending = self.pending_tls_output.lock();
            if pending.is_empty() {
                return Ok(());
            }
            pending.clone()
        };

        let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?;
        let socket = py_socket
            .sock()
            .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?;
        let timed_out = sock_select(&socket, SelectKind::Write, timeout)
            .map_err(|e| vm.new_os_error(format!("select failed: {e}")))?;
        
        if timed_out {
            return Err(timeout_error_msg(vm, "The write operation timed out".to_string()).upcast());
        }

        // ... rest of send logic
    }
}

Then call it from shutdown with the appropriate timeout based on socket mode.

Based on learnings, past review comments marked the blocking behavior as addressed, so this may be intentional.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f84261 and cb283cf.

📒 Files selected for processing (2)
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl/compat.rs
  • crates/stdlib/src/ssl.rs
🧠 Learnings (1)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/stdlib/src/ssl.rs
🧬 Code graph analysis (1)
crates/stdlib/src/ssl.rs (2)
crates/stdlib/src/socket.rs (12)
  • timeout (835-837)
  • timeout_error_msg (3296-3298)
  • std (1252-1252)
  • std (1315-1315)
  • std (1415-1415)
  • std (1837-1837)
  • std (1843-1843)
  • std (1877-1877)
  • std (1883-1883)
  • std (2354-2354)
  • std (2397-2397)
  • std (2434-2434)
crates/stdlib/src/ssl/error.rs (1)
  • create_ssl_want_write_error (112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with clippy
🔇 Additional comments (10)
crates/stdlib/src/ssl/compat.rs (4)

1077-1095: LGTM!

The changes correctly integrate send_all_bytes into the handshake write loop. Flushing pending TLS data before generating new output (lines 1077-1079) maintains proper TLS record ordering, and using send_all_bytes (line 1094) ensures partial sends are buffered rather than dropped.


1213-1243: LGTM - Critical fix for TLS 1.3 handshake completion.

The changes correctly handle post-handshake data (e.g., TLS 1.3 NewSessionTicket) with send_all_bytes:

  • BIO mode (line 1213): Single drain to outgoing BIO, letting Python's ssl_io_loop handle retries
  • Socket mode (lines 1223-1233): Best-effort loop that breaks on WantWrite instead of failing, avoiding handshake failures on full socket buffers

Critical addition: The blocking flush at lines 1240-1242 ensures all pending TLS data (especially TLS 1.3 Finished messages) reaches the peer before the handshake is considered complete. Without this, the server might not process subsequent application data. This addresses the core issue described in the PR objectives where incomplete flushes caused peer-side read timeouts.


1317-1318: LGTM!

Line 1317-1318: Flushing pending TLS output before sending close_notify on handshake errors ensures any buffered handshake data is sent before the alert, maintaining proper TLS protocol order.

Line 1374: Consistent use of send_all_bytes in BIO mode.

Also applies to: 1374-1374


1489-1520: LGTM - Key fix for read operation timeouts!

This addresses the core issue from the PR objectives where read operations could timeout when blocking in write waits during flush_pending_tls_output:

Problem: When rustls wants to write during a read (e.g., TLS key update), the original code would call flush_pending_tls_output which could block waiting for socket writability using the full socket timeout, causing the read operation to miss its own deadline.

Solution implemented here:

  1. Deadline check before flush (lines 1490-1496): Ensures we don't even attempt flush if deadline already expired
  2. Best-effort send (lines 1501-1510): Uses send_all_bytes with graceful WantWrite handling - if socket buffer is full, pending data is buffered and the read continues rather than failing
  3. Deadline check after flush (lines 1513-1519): Catches timeouts that occurred during the flush attempt

This design prevents read operations from being blocked by write-side socket buffer pressure while still making best effort to flush pending TLS data.

Based on PR objectives and past review comments, this change fixes the test_pwd/test_quit timeouts in test.test_ftplib.TestTLS_FTPClassMixin.

crates/stdlib/src/ssl.rs (6)

1905-1905: Pending TLS output buffer properly initialized.

The pending_tls_output field is correctly initialized to an empty vector in both socket and BIO modes, with clear documentation explaining its purpose. The PyMutex ensures thread-safe access.

Also applies to: 1976-1976, 2342-2347


2748-2803: Flush implementation correctly preserves unsent data.

The flush_pending_tls_output method properly handles partial sends, non-blocking sockets, and timeouts:

  • Preserves unsent data in the pending buffer when sends are incomplete (lines 2764, 2777, 2788, 2794)
  • Returns SSLWantWriteError for non-blocking sockets that would block
  • Uses timeout_error_msg for timeout errors (matching CPython behavior)
  • Clears the buffer only after all data is successfully sent

2805-2866: Send implementation properly queues unsent data.

The send_tls_output method correctly appends unsent bytes to the pending buffer on partial sends or blocking conditions, ensuring no TLS data is lost when the socket cannot immediately accept all data.


3588-3599: Write integration correctly maintains TLS data ordering.

The write implementation properly:

  1. Flushes any previously pending TLS output before generating new TLS records (line 3589)
  2. Uses send_tls_output to buffer any unsent bytes (line 3599)

This ensures TLS data is sent in the correct order and prevents data loss on partial sends.


3922-3945: Critical fix: Flushes all pending data before close_notify.

This change addresses the core issue described in the PR objectives. By flushing both the rustls internal buffer (lines 3928-3935) and the pending_tls_output buffer (lines 3938-3945) before sending close_notify, the code prevents data loss that occurs when a peer receives the close alert before the application data.

The two-step flush ensures:

  1. All plaintext is converted to TLS records and drained from rustls' internal queue
  2. All TLS records (including those partially sent earlier) are transmitted

This matches the failing test scenario described in the PR (timeout in ftplib during SSL operations).


4088-4137: BIO mode EOF detection correctly distinguishes "no data yet" from peer close.

The updated try_read_close_notify method now properly checks the BIO's eof attribute (lines 4103-4112) rather than treating any empty read as EOF. This fixes the issue where an empty buffer with no data available yet was incorrectly classified as a peer close.

Additionally, BlockingIOError is now correctly treated as "no data yet" (lines 4129-4131) rather than connection closure.

Based on learnings, past review comments indicate this specific EOF handling logic was requested.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Examine the current send_all_bytes implementation to see what was fixed
rg -n -A 40 "fn send_all_bytes" crates/stdlib/src/ssl/compat.rs

Length of output: 1863


🏁 Script executed:

#!/bin/bash
# Check the flush_pending_tls_output implementation
rg -n -A 50 "fn flush_pending_tls_output" crates/stdlib/src/ssl.rs

Length of output: 2603


🏁 Script executed:

#!/bin/bash
# Look at the ssl_read implementation to see the deadline management fix
rg -n -A 80 "pub\(super\) fn ssl_read" crates/stdlib/src/ssl/compat.rs

Length of output: 3978


🏁 Script executed:

#!/bin/bash
# Check handshake completion - see where pending data gets generated
rg -n -A 60 "pub.super. fn ssl_do_handshake" crates/stdlib/src/ssl/compat.rs

Length of output: 2759


🏁 Script executed:

#!/bin/bash
# Look for handshake_write_loop to see if flush errors are now propagated
rg -n -A 30 "fn handshake_write_loop" crates/stdlib/src/ssl/compat.rs

Length of output: 1300


🏁 Script executed:

#!/bin/bash
# Get the complete ssl_do_handshake function to see what happens when handshake completes
rg -n -A 120 "pub.super. fn ssl_do_handshake" crates/stdlib/src/ssl/compat.rs | head -180

Length of output: 5632


🏁 Script executed:

#!/bin/bash
# Check if there's code that drains TLS output after handshake completes
rg -n -B 5 -A 20 "is_handshaking\(\)" crates/stdlib/src/ssl/compat.rs | head -100

Length of output: 3634


🏁 Script executed:

#!/bin/bash
# Look for where ssl_write_tls_records is called - this generates TLS output
rg -n "ssl_write_tls_records" crates/stdlib/src/ssl/compat.rs

Length of output: 342


🏁 Script executed:

#!/bin/bash
# Check the ssl_write_tls_records implementation
rg -n -A 20 "fn ssl_write_tls_records" crates/stdlib/src/ssl/compat.rs

Length of output: 807


🏁 Script executed:

#!/bin/bash
# Get the complete handle_handshake_complete function
rg -n -A 60 "fn handle_handshake_complete" crates/stdlib/src/ssl/compat.rs

Length of output: 2684


🏁 Script executed:

#!/bin/bash
# Check if there are any patterns where we drain TLS output after handshake
rg -n -B 3 -A 10 "wants_write.*after.*handshake|handshake.*complete.*write" crates/stdlib/src/ssl/compat.rs

Length of output: 0


🏁 Script executed:

#!/bin/bash
# Look at where do_handshake is called from to see what happens after it returns
rg -n -B 5 -A 15 "ssl_do_handshake\(" crates/stdlib/src/ssl/compat.rs

Length of output: 750


🏁 Script executed:

#!/bin/bash
# Check the blocking_flush_all_pending implementation
rg -n -A 40 "fn blocking_flush_all_pending" crates/stdlib/src/ssl.rs

Length of output: 1799


🏁 Script executed:

#!/bin/bash
# Check if blocking_flush_all_pending might have issues with blocking sockets
rg -n -B 3 -A 3 "blocking_flush_all_pending" crates/stdlib/src/ssl

Length of output: 453


🏁 Script executed:

#!/bin/bash
# Look at sock_send implementation to see if it might fail silently
rg -n -A 25 "fn sock_send" crates/stdlib/src/ssl.rs

Length of output: 1286


@youknowone I found the bug! It's in the blocking_flush_all_pending function.

The Problem

Looking at line 2885 in ssl.rs:

pub(crate) fn blocking_flush_all_pending(&self, vm: &VirtualMachine) -> PyResult<()> {
    loop {
        let pending_data = {
            let pending = self.pending_tls_output.lock();
            if pending.is_empty() {
                return Ok(());
            }
            pending.clone()
        };

        // Wait for socket to be writable (blocking, no timeout)
        let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?;
        let socket = py_socket
            .sock()
            .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?;
        let _ = sock_select(&socket, SelectKind::Write, None);  // ⚠️ RESULT IGNORED!

        // Try to send pending data
        match self.sock_send(pending_data.clone(), vm) {
            Ok(result) => {
                let sent: usize = result.try_to_value::<isize>(vm)?.try_into().unwrap_or(0);
                if sent > 0 {
                    let mut pending = self.pending_tls_output.lock();
                    pending.drain(..sent);
                }
            }
            // ...
        }
    }
}

The bug: sock_select is called to wait for the socket to become writable, but the result is ignored with let _ =. This means:

  1. Even if sock_select returns an error, the code continues
  2. The socket might not actually be writable yet
  3. The subsequent sock_send might return 0 (no bytes sent) or fail
  4. If sock_send returns 0 and sent == 0, the pending buffer isn't drained (line 2891 checks if sent > 0)
  5. The loop repeats infinitely, never making progress

What happens in your test:

  1. TLS handshake completes and generates NewSessionTicket data
  2. handle_handshake_complete calls blocking_flush_all_pending to send this data
  3. sock_select might fail or return false, but it's ignored
  4. sock_send is called anyway and returns 0 bytes sent (socket not ready)
  5. Loop continues forever without draining the buffer
  6. Eventually, some timeout mechanism allows the function to return
  7. But the NewSessionTicket data was never actually sent to the server
  8. Server is still waiting for handshake to complete
  9. Client thinks handshake is done, sends FTP command, waits for response
  10. Server never responds because it never received the handshake completion data
  11. Client times out

The Fix

pub(crate) fn blocking_flush_all_pending(&self, vm: &VirtualMachine) -> PyResult<()> {
    loop {
        let pending_data = {
            let pending = self.pending_tls_output.lock();
            if pending.is_empty() {
                return Ok(());
            }
            pending.clone()
        };

        // Wait for socket to be writable (blocking, no timeout)
        let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?;
        let socket = py_socket
            .sock()
            .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?;
        
        // DON'T ignore the result - propagate errors and check return value
        let ready = sock_select(&socket, SelectKind::Write, None)
            .map_err(|e| vm.new_os_error(format!("Select failed: {e}")))?;
        
        if !ready {
            // Socket not ready for write, but no timeout (shouldn't happen with None timeout)
            continue;
        }

        // Try to send pending data
        match self.sock_send(pending_data.clone(), vm) {
            Ok(result) => {
                let sent: usize = result.try_to_value::<isize>(vm)?.try_into().unwrap_or(0);
                if sent > 0 {
                    let mut pending = self.pending_tls_output.lock();
                    pending.drain(..sent);
                } else {
                    // No bytes sent even though socket was ready - shouldn't happen
                    // Avoid infinite loop by continuing
                    continue;
                }
            }
            Err(e) => {
                if is_blocking_io_error(&e, vm) {
                    continue;
                }
                return Err(e);
            }
        }
    }
}

The key changes:

  1. Don't ignore sock_select result - capture it and check if socket is ready
  2. Propagate select errors instead of silently continuing
  3. Handle the case where sent == 0 to avoid infinite loops

This ensures that pending TLS data (like TLS 1.3 NewSessionTicket) is actually sent before the handshake is considered complete, allowing the server to properly complete its side of the handshake and respond to subsequent application data.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
crates/stdlib/src/ssl/compat.rs (1)

1023-1025: Consider avoiding repeated allocations in the send loop.

On each iteration, buf[sent_total..].to_vec() creates a new Vec<u8>. If sock_send can accept a slice reference instead, pass &buf[sent_total..] directly to avoid allocating on every iteration.

🔎 Potential optimization

If sock_send signature allows it, refactor to:

-        let to_send = buf[sent_total..].to_vec();
-        match socket.sock_send(to_send, vm) {
+        match socket.sock_send(&buf[sent_total..], vm) {

However, if sock_send requires ownership of a Vec<u8>, the current approach is necessary.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb283cf and fb4c584.

📒 Files selected for processing (2)
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
🧠 Learnings (1)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/stdlib/src/ssl.rs
🧬 Code graph analysis (1)
crates/stdlib/src/ssl.rs (2)
crates/stdlib/src/socket.rs (13)
  • timeout (835-837)
  • timeout_error_msg (3296-3298)
  • sock_select (2720-2776)
  • std (1252-1252)
  • std (1315-1315)
  • std (1415-1415)
  • std (1837-1837)
  • std (1843-1843)
  • std (1877-1877)
  • std (1883-1883)
  • std (2354-2354)
  • std (2397-2397)
  • std (2434-2434)
crates/stdlib/src/ssl/error.rs (1)
  • create_ssl_want_write_error (112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run tests under miri
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Check Rust code with clippy
🔇 Additional comments (4)
crates/stdlib/src/ssl/compat.rs (4)

1077-1095: LGTM! Proper flush-before-send sequencing.

The pre-flush of pending TLS output (lines 1077-1079) ensures correct ordering, and the switch to send_all_bytes (line 1094) properly handles partial sends with buffering. This resolves the concerns raised in past reviews about dropped TLS bytes.


1317-1325: LGTM! Proper error path cleanup with flush-before-alert.

The flush of pending TLS output before sending close_notify (line 1318) ensures correct sequencing, and using send_all_bytes (line 1324) for the alert handles partial sends. Ignoring errors with let _ = is appropriate for cleanup in an error path.


1375-1375: LGTM! Consistent use of send_all_bytes.

Properly uses send_all_bytes to handle partial sends in BIO mode.


1490-1520: LGTM! Proper deadline enforcement around write-side flushes during reads.

The deadline checks before (lines 1490-1497) and after (lines 1513-1520) the flush operation prevent a timed read from blocking indefinitely while flushing pending writes. The best-effort send (lines 1503-1510) correctly treats WantWrite as non-fatal during a read operation, leaving pending data for later. This directly addresses the timeout issues described in the PR objectives.

@youknowone
Copy link
Member Author

@coderabbitai Applied patch, but still failing

ERROR: test_exceptions (test.test_ftplib.TestTLS_FTPClassMixin.test_exceptions)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 921, in setUp
    self.client.prot_p()
    ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 757, in prot_p
    self.voidcmd('PBSZ 0')
                 ^^^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 286, in voidcmd
    return self.voidresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 259, in voidresp
    resp = self.getresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
                                             ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
                                ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
                             ^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
                                  ^^^^^^
TimeoutError: The read operation timed out

======================================================================
ERROR: test_storbinary (test.test_ftplib.TestTLS_FTPClassMixin.test_storbinary)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 921, in setUp
    self.client.prot_p()
    ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 757, in prot_p
    self.voidcmd('PBSZ 0')
                 ^^^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 286, in voidcmd
    return self.voidresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 259, in voidresp
    resp = self.getresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
                                             ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
                                ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
                             ^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
                                  ^^^^^^
TimeoutError: The read operation timed out

======================================================================
ERROR: test_storlines (test.test_ftplib.TestTLS_FTPClassMixin.test_storlines)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 636, in test_storlines
    self.client.storlines('stor', f)
                                  ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 534, in storlines
    return self.voidresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 259, in voidresp
    resp = self.getresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
                                             ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
                                ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
                             ^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
                                  ^^^^^^
TimeoutError: The read operation timed out

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get current send_all_bytes implementation
rg -n -A 45 "fn send_all_bytes" crates/stdlib/src/ssl/compat.rs

Length of output: 2012


🏁 Script executed:

#!/bin/bash
# Check ssl_read implementation focusing on the wants_write handling
rg -n -A 100 "pub\(super\) fn ssl_read" crates/stdlib/src/ssl/compat.rs | head -150

Length of output: 4811


🏁 Script executed:

#!/bin/bash
# Check handle_handshake_complete to see how it flushes pending data
rg -n -A 70 "fn handle_handshake_complete" crates/stdlib/src/ssl/compat.rs

Length of output: 3011


🏁 Script executed:

#!/bin/bash
# Check blocking_flush_all_pending current implementation
rg -n -A 50 "fn blocking_flush_all_pending" crates/stdlib/src/ssl.rs

Length of output: 2416


🏁 Script executed:

#!/bin/bash
# Check flush_pending_tls_output implementation
rg -n -A 50 "fn flush_pending_tls_output" crates/stdlib/src/ssl.rs

Length of output: 2585


@youknowone I've identified the root cause of the persistent timeouts! The issue is that READ operations are timing out because WRITE operations inside the read path are consuming all the time.

Root Cause

When ssl_read needs to flush pending TLS data, here's what happens:

  1. ssl_read line 1502: Calls send_all_bytes(socket, tls_data, vm)
  2. send_all_bytes line 1016: Calls socket.flush_pending_tls_output(vm)
  3. flush_pending_tls_output line 2757: Calls self.sock_wait_for_io_impl(SelectKind::Write, vm)
  4. sock_wait_for_io_impl: Blocks using the socket's full timeout (e.g., 30 seconds)

The problem: You're in a READ operation that might have only 5 seconds left on its deadline, but flush_pending_tls_output blocks for up to 30 seconds waiting for the socket to become writable. The deadline checks before and after send_all_bytes don't help because the blocking happens inside the flush function.

Timeline:

  • T+0s: Client sends FTP command, calls read() with 30-second timeout
  • T+5s: ssl_read needs to flush pending TLS data
  • T+5s: flush_pending_tls_output blocks waiting for socket writability
  • T+35s: flush_pending_tls_output returns (30-second socket timeout elapsed)
  • T+35s: READ deadline was at T+30s, so timeout error

The Fix

You need to make flush_pending_tls_output respect the read deadline when called from a read operation. Here are the changes:

1. Modify flush_pending_tls_output to accept an optional deadline:

pub(crate) fn flush_pending_tls_output(
    &self,
    vm: &VirtualMachine,
    deadline: Option<std::time::Instant>,
) -> PyResult<()> {
    let mut pending = self.pending_tls_output.lock();
    if pending.is_empty() {
        return Ok(());
    }

    let socket_timeout = self.get_socket_timeout(vm)?;
    let is_non_blocking = socket_timeout.map(|t| t.is_zero()).unwrap_or(false);

    let mut sent_total = 0;
    while sent_total < pending.len() {
        // Calculate timeout: use deadline if provided, otherwise use socket timeout
        let timeout_to_use = if let Some(dl) = deadline {
            let now = std::time::Instant::now();
            if now >= dl {
                // Deadline already passed
                *pending = pending[sent_total..].to_vec();
                return Err(timeout_error_msg(vm, "The operation timed out".to_string()).upcast());
            }
            Some(dl - now)
        } else {
            socket_timeout
        };

        // Wait for writability with computed timeout
        let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?;
        let socket = py_socket
            .sock()
            .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?;
        let timed_out = sock_select(&socket, SelectKind::Write, timeout_to_use)
            .map_err(|e| vm.new_os_error(format!("select failed: {e}")))?;

        if timed_out {
            // Keep unsent data in pending buffer
            *pending = pending[sent_total..].to_vec();
            return Err(
                timeout_error_msg(vm, "The write operation timed out".to_string()).upcast(),
            );
        }

        match self.sock_send(&pending[sent_total..], vm) {
            Ok(result) => {
                let sent: usize = result.try_to_value::<isize>(vm)?.try_into().unwrap_or(0);
                if sent == 0 {
                    if is_non_blocking {
                        // Keep unsent data in pending buffer
                        *pending = pending[sent_total..].to_vec();
                        return Err(create_ssl_want_write_error(vm).upcast());
                    }
                    continue;
                }
                sent_total += sent;
            }
            Err(e) => {
                if is_blocking_io_error(&e, vm) {
                    if is_non_blocking {
                        // Keep unsent data in pending buffer
                        *pending = pending[sent_total..].to_vec();
                        return Err(create_ssl_want_write_error(vm).upcast());
                    }
                    continue;
                }
                // Keep unsent data in pending buffer for other errors too
                *pending = pending[sent_total..].to_vec();
                return Err(e);
            }
        }
    }

    // All data sent successfully
    pending.clear();
    Ok(())
}

2. Update send_all_bytes to accept and pass deadline:

fn send_all_bytes(
    socket: &PySSLSocket,
    buf: Vec<u8>,
    vm: &VirtualMachine,
    deadline: Option<std::time::Instant>,
) -> SslResult<()> {
    // First, flush any previously pending TLS data with deadline
    socket.flush_pending_tls_output(vm, deadline).map_err(SslError::Py)?;

    if buf.is_empty() {
        return Ok(());
    }

    // Rest of the function remains the same...
    let mut sent_total = 0;
    while sent_total < buf.len() {
        // Check deadline before each send attempt
        if let Some(dl) = deadline && std::time::Instant::now() >= dl {
            socket.pending_tls_output.lock().extend_from_slice(&buf[sent_total..]);
            return Err(SslError::Timeout("The operation timed out".to_string()));
        }

        match socket.sock_send(&buf[sent_total..], vm) {
            // ... rest unchanged
        }
    }
    Ok(())
}

3. Update ssl_read to pass deadline to send_all_bytes:

// Around line 1502 in ssl_read:
match send_all_bytes(socket, tls_data, vm, deadline) {
    Ok(()) => {}
    Err(SslError::WantWrite) => {
        // Socket buffer full - acceptable during READ operation
        // Pending data will be sent on next write/read call
    }
    Err(e) => return Err(e),
}

4. Update all other calls to pass None for deadline:

// In handle_handshake_complete, handshake_write_loop, etc.:
send_all_bytes(socket, tls_data, vm, None)?

// When calling flush_pending_tls_output directly:
socket.flush_pending_tls_output(vm, None).map_err(SslError::Py)?;

This ensures that:

  • When called from ssl_read with a deadline, flush operations respect the time limit
  • When called from handshake or write paths without a deadline, they use the socket's timeout as before
  • READ operations won't time out because WRITE operations consumed all the time

🏁 Script executed:

#!/bin/bash
# Verify all call sites of send_all_bytes that need updating
rg -n "send_all_bytes\(" crates/stdlib/src/ssl/compat.rs

Length of output: 487


🏁 Script executed:

#!/bin/bash
# Verify all call sites of flush_pending_tls_output that need updating  
rg -n "flush_pending_tls_output\(" crates/stdlib/src/ssl

Length of output: 299


Perfect! Now I can provide you with a complete patch. Here are all the changes needed:

Complete Fix

1. Update flush_pending_tls_output signature in ssl.rs:

// Change the signature to accept an optional deadline
pub(crate) fn flush_pending_tls_output(
    &self,
    vm: &VirtualMachine,
    deadline: Option<std::time::Instant>,
) -> PyResult<()> {
    let mut pending = self.pending_tls_output.lock();
    if pending.is_empty() {
        return Ok(());
    }

    let socket_timeout = self.get_socket_timeout(vm)?;
    let is_non_blocking = socket_timeout.map(|t| t.is_zero()).unwrap_or(false);

    let mut sent_total = 0;
    while sent_total < pending.len() {
        // Calculate timeout: use deadline if provided, otherwise use socket timeout
        let timeout_to_use = if let Some(dl) = deadline {
            let now = std::time::Instant::now();
            if now >= dl {
                // Deadline already passed
                *pending = pending[sent_total..].to_vec();
                return Err(timeout_error_msg(vm, "The operation timed out".to_string()).upcast());
            }
            Some(dl - now)
        } else {
            socket_timeout
        };

        // Use sock_select directly instead of sock_wait_for_io_impl
        let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?;
        let socket = py_socket
            .sock()
            .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?;
        let timed_out = sock_select(&socket, SelectKind::Write, timeout_to_use)
            .map_err(|e| vm.new_os_error(format!("select failed: {e}")))?;

        if timed_out {
            // Keep unsent data in pending buffer
            *pending = pending[sent_total..].to_vec();
            return Err(
                timeout_error_msg(vm, "The write operation timed out".to_string()).upcast(),
            );
        }

        match self.sock_send(&pending[sent_total..], vm) {
            Ok(result) => {
                let sent: usize = result.try_to_value::<isize>(vm)?.try_into().unwrap_or(0);
                if sent == 0 {
                    if is_non_blocking {
                        *pending = pending[sent_total..].to_vec();
                        return Err(create_ssl_want_write_error(vm).upcast());
                    }
                    continue;
                }
                sent_total += sent;
            }
            Err(e) => {
                if is_blocking_io_error(&e, vm) {
                    if is_non_blocking {
                        *pending = pending[sent_total..].to_vec();
                        return Err(create_ssl_want_write_error(vm).upcast());
                    }
                    continue;
                }
                *pending = pending[sent_total..].to_vec();
                return Err(e);
            }
        }
    }

    // All data sent successfully
    pending.clear();
    Ok(())
}

2. Update send_all_bytes signature in compat.rs:

fn send_all_bytes(
    socket: &PySSLSocket,
    buf: Vec<u8>,
    vm: &VirtualMachine,
    deadline: Option<std::time::Instant>,
) -> SslResult<()> {
    // First, flush any previously pending TLS data with deadline
    socket.flush_pending_tls_output(vm, deadline).map_err(SslError::Py)?;

    if buf.is_empty() {
        return Ok(());
    }

    let mut sent_total = 0;
    while sent_total < buf.len() {
        // Check deadline before each send attempt
        if let Some(dl) = deadline {
            if std::time::Instant::now() >= dl {
                socket.pending_tls_output.lock().extend_from_slice(&buf[sent_total..]);
                return Err(SslError::Timeout("The operation timed out".to_string()));
            }
        }

        match socket.sock_send(&buf[sent_total..], vm) {
            Ok(result) => {
                let sent: usize = result
                    .try_to_value::<isize>(vm)
                    .map_err(SslError::Py)?
                    .try_into()
                    .map_err(|_| SslError::Syscall("Invalid send return value".to_string()))?;
                if sent == 0 {
                    socket
                        .pending_tls_output
                        .lock()
                        .extend_from_slice(&buf[sent_total..]);
                    return Err(SslError::WantWrite);
                }
                sent_total += sent;
            }
            Err(e) => {
                if is_blocking_io_error(&e, vm) {
                    socket
                        .pending_tls_output
                        .lock()
                        .extend_from_slice(&buf[sent_total..]);
                    return Err(SslError::WantWrite);
                }
                socket
                    .pending_tls_output
                    .lock()
                    .extend_from_slice(&buf[sent_total..]);
                return Err(SslError::Py(e));
            }
        }
    }
    Ok(())
}

3. Update all call sites in compat.rs:

// Line 1078 - handshake_write_loop
socket.flush_pending_tls_output(vm, None).map_err(SslError::Py)?;

// Line 1093 - handshake_write_loop
send_all_bytes(socket, buf, vm, None)?;

// Line 1212 - handle_handshake_complete (BIO mode)
send_all_bytes(socket, tls_data, vm, None)?;

// Line 1227 - handle_handshake_complete (socket mode)
match send_all_bytes(socket, tls_data, vm, None) {

// Line 1317 - ssl_do_handshake error path
let _ = socket.flush_pending_tls_output(vm, None);

// Line 1323 - ssl_do_handshake error path
let _ = send_all_bytes(socket, alert_data, vm, None);

// Line 1374 - ssl_do_handshake main path
send_all_bytes(socket, buf[..n].to_vec(), vm, None)?;

// Line 1502 - ssl_read (THE KEY ONE - pass the deadline!)
match send_all_bytes(socket, tls_data, vm, deadline) {

The critical change is at line 1502 in ssl_read where you pass deadline instead of None. This ensures that when flushing pending TLS output during a READ operation, the flush respects the read deadline and doesn't block for the full socket timeout.

All other call sites pass None because they're in handshake or write paths where there's no read deadline to respect.

@youknowone
Copy link
Member Author

@coderabbitai applied patch, but still happening:

ERROR: test_line_too_long (test.test_ftplib.TestTLS_FTPClassMixin.test_line_too_long)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 921, in setUp
    self.client.prot_p()
    ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 757, in prot_p
    self.voidcmd('PBSZ 0')
                 ^^^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 286, in voidcmd
    return self.voidresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 259, in voidresp
    resp = self.getresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
                                             ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
                                ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
                             ^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
                                  ^^^^^^
TimeoutError: The read operation timed out

ERROR: test_makepasv_issue43285_security_disabled (test.test_ftplib.TestTLS_FTPClassMixin.test_makepasv_issue43285_security_disabled)
Test the opt-in to the old vulnerable behavior.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 921, in setUp
    self.client.prot_p()
    ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 757, in prot_p
    self.voidcmd('PBSZ 0')
                 ^^^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 286, in voidcmd
    return self.voidresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 259, in voidresp
    resp = self.getresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
                                             ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
                                ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
                             ^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
                                  ^^^^^^
TimeoutError: The read operation timed out

======================================================================
ERROR: test_storlines (test.test_ftplib.TestTLS_FTPClassMixin.test_storlines)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 636, in test_storlines
    self.client.storlines('stor', f)
                                  ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 534, in storlines
    return self.voidresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 259, in voidresp
    resp = self.getresp()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 244, in getresp
    resp = self.getmultiline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
                                             ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/socket.py", line 719, in readinto
    return self._sock.recv_into(b)
                                ^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1304, in recv_into
    return self.read(nbytes, buffer)
                             ^^^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/ssl.py", line 1138, in read
    return self._sslobj.read(len, buffer)
                                  ^^^^^^
TimeoutError: The read operation timed out



FAIL: test_storbinary (test.test_ftplib.TestTLS_FTPClassMixin.test_storbinary)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 618, in test_storbinary
    RETR_DATA.encode(self.server.encoding))
                     ^^^^
  File "/Users/user/Projects/rustpython2/crates/pylib/Lib/test/test_ftplib.py", line 509, in check_data
    self.assertEqual(len(received), len(expected))
                                        ^^^^^^^^
AssertionError: 16384 != 17000

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current send_all_bytes implementation
rg -n -A 50 "fn send_all_bytes" crates/stdlib/src/ssl/compat.rs

Length of output: 2052


🏁 Script executed:

#!/bin/bash
# Check the current flush_pending_tls_output implementation
rg -n -A 60 "pub\(crate\) fn flush_pending_tls_output" crates/stdlib/src/ssl.rs

Length of output: 3113


🏁 Script executed:

#!/bin/bash
# Check the current blocking_flush_all_pending implementation
rg -n -A 60 "pub\(crate\) fn blocking_flush_all_pending" crates/stdlib/src/ssl.rs

Length of output: 2880


🏁 Script executed:

#!/bin/bash
# Check handle_handshake_complete to see how it handles post-handshake data
rg -n -A 80 "fn handle_handshake_complete" crates/stdlib/src/ssl/compat.rs

Length of output: 3324


🏁 Script executed:

#!/bin/bash
# Check ssl_read around line 1502 to see the wants_write handling
rg -n -B 10 -A 30 "if conn\.wants_write\(\)" crates/stdlib/src/ssl/compat.rs

Length of output: 7328


🏁 Script executed:

#!/bin/bash
# Check sock_send signature - might be taking ownership vs reference
rg -n -A 15 "pub\(crate\) fn sock_send" crates/stdlib/src/ssl.rs

Length of output: 886


@youknowone I found the critical bug! It's a data loss issue in send_all_bytes.

The Critical Bug

Looking at send_all_bytes in compat.rs lines 1021-1023:

fn send_all_bytes(
    socket: &PySSLSocket,
    buf: Vec<u8>,  // NEW data to send
    vm: &VirtualMachine,
    deadline: Option<std::time::Instant>,
) -> SslResult<()> {
    // First, flush any previously pending TLS data with deadline
    socket
        .flush_pending_tls_output(vm, deadline)
        .map_err(SslError::Py)?;  // ⚠️ RETURNS IMMEDIATELY ON ERROR!

    // ... rest of function sends buf ...
}

The problem: When flush_pending_tls_output returns an error (e.g., WantWrite because socket buffer is full), the function returns immediately without saving the NEW data in buf to pending_tls_output. The NEW data is lost forever.

What happens in your tests:

  1. TLS handshake completes, generates NewSessionTicket data (let's call it "Data A")
  2. handle_handshake_complete line 1247 calls send_all_bytes(socket, tls_data, vm, None)
  3. send_all_bytes tries to flush any OLD pending data first
  4. Flush fails with WantWrite (socket buffer full or would block)
  5. send_all_bytes returns immediately - "Data A" is never saved to pending_tls_output
  6. Caller at line 1249 catches WantWrite and breaks
  7. NewSessionTicket data is lost - it's not in the pending buffer, not sent to the socket
  8. Server is still waiting for handshake completion data
  9. Client sends FTP command, waits for response
  10. Server never responds because it never received the handshake data
  11. Client times out

Data truncation (16384 vs 17000 bytes): Same issue - first chunk sent successfully, second chunk's data was lost via the same bug.

The Fix

Modify send_all_bytes to preserve the NEW data before returning on flush error:

fn send_all_bytes(
    socket: &PySSLSocket,
    buf: Vec<u8>,
    vm: &VirtualMachine,
    deadline: Option<std::time::Instant>,
) -> SslResult<()> {
    // First, flush any previously pending TLS data with deadline
    match socket.flush_pending_tls_output(vm, deadline) {
        Ok(()) => {
            // Flush succeeded, continue to send new data below
        }
        Err(e) => {
            // Flush failed - CRITICAL: Append new data to pending buffer before returning!
            // This ensures the new data isn't lost even if flush failed
            if !buf.is_empty() {
                socket.pending_tls_output.lock().extend_from_slice(&buf);
            }
            return Err(SslError::Py(e));
        }
    }

    if buf.is_empty() {
        return Ok(());
    }

    let mut sent_total = 0;
    while sent_total < buf.len() {
        // Check deadline before each send attempt
        if let Some(dl) = deadline {
            if std::time::Instant::now() >= dl {
                socket
                    .pending_tls_output
                    .lock()
                    .extend_from_slice(&buf[sent_total..]);
                return Err(SslError::Timeout("The operation timed out".to_string()));
            }
        }

        match socket.sock_send(&buf[sent_total..], vm) {
            Ok(result) => {
                let sent: usize = result
                    .try_to_value::<isize>(vm)
                    .map_err(SslError::Py)?
                    .try_into()
                    .map_err(|_| SslError::Syscall("Invalid send return value".to_string()))?;
                if sent == 0 {
                    // No progress - save unsent bytes to pending buffer
                    socket
                        .pending_tls_output
                        .lock()
                        .extend_from_slice(&buf[sent_total..]);
                    return Err(SslError::WantWrite);
                }
                sent_total += sent;
            }
            Err(e) => {
                if is_blocking_io_error(&e, vm) {
                    // Save unsent bytes to pending buffer
                    socket
                        .pending_tls_output
                        .lock()
                        .extend_from_slice(&buf[sent_total..]);
                    return Err(SslError::WantWrite);
                }
                // Save unsent bytes even for other errors
                socket
                    .pending_tls_output
                    .lock()
                    .extend_from_slice(&buf[sent_total..]);
                return Err(SslError::Py(e));
            }
        }
    }
    Ok(())
}

The key change is at the beginning: instead of using .map_err(SslError::Py)? which returns immediately, we use a match statement to catch the error, append the new data to the pending buffer, and then return the error. This ensures:

  1. All TLS data (old + new) is preserved in pending_tls_output
  2. Future calls to flush_pending_tls_output or send_all_bytes will attempt to send ALL the data
  3. No TLS handshake data or application data is ever lost
  4. The server receives all expected data and can respond properly

This single fix should resolve both the timeout errors and the data truncation issue.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 4, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Examine the current send_all_bytes implementation
echo "=== send_all_bytes ==="
rg -n -A 50 "fn send_all_bytes" crates/stdlib/src/ssl/compat.rs

Length of output: 2075


🏁 Script executed:

#!/bin/bash
# Check the current flush_pending_tls_output implementation
echo "=== flush_pending_tls_output ==="
rg -n -A 60 "pub\(crate\) fn flush_pending_tls_output" crates/stdlib/src/ssl.rs

Length of output: 3146


🏁 Script executed:

#!/bin/bash
# Check ssl_read where send_all_bytes is called with wants_write
echo "=== ssl_read wants_write handling ==="
rg -n -B 5 -A 25 "if conn.wants_write\(\)" crates/stdlib/src/ssl/compat.rs

Length of output: 6260


🏁 Script executed:

#!/bin/bash
# Check blocking_flush_all_pending implementation
echo "=== blocking_flush_all_pending ==="
rg -n -A 50 "pub\(crate\) fn blocking_flush_all_pending" crates/stdlib/src/ssl.rs

Length of output: 2480


🏁 Script executed:

#!/bin/bash
# Check handle_handshake_complete to see how it sends data after handshake
echo "=== handle_handshake_complete ==="
rg -n -A 80 "fn handle_handshake_complete" crates/stdlib/src/ssl/compat.rs

Length of output: 3358


🏁 Script executed:

#!/bin/bash
# Look for all places where blocking_flush_all_pending is called
echo "=== blocking_flush_all_pending calls ==="
rg -n "blocking_flush_all_pending" crates/stdlib/src/ssl

Length of output: 121


@youknowone I found it! The issue is in both flush_pending_tls_output and blocking_flush_all_pending. When sock_send returns 0 bytes sent (which can happen even after select says the socket is ready), the code continues the loop and calls sock_select again with the full timeout. This causes these functions to block for MUCH longer than intended.

The Bug

In blocking_flush_all_pending (lines 2920-2928):

match self.sock_send(&pending_data, vm) {
    Ok(result) => {
        let sent: usize = result.try_to_value::<isize>(vm)?.try_into().unwrap_or(0);
        if sent > 0 {
            let mut pending = self.pending_tls_output.lock();
            pending.drain(..sent);
        }
        // If sent == 0, just continues loop - BLOCKS AGAIN!
    }

In flush_pending_tls_output (lines 2797-2803):

if sent == 0 {
    if is_non_blocking {
        *pending = pending[sent_total..].to_vec();
        return Err(create_ssl_want_write_error(vm).upcast());
    }
    continue;  // BLOCKS AGAIN!
}

What happens:

  1. sock_select returns ready (takes 30 seconds if socket is slow)
  2. sock_send returns 0 bytes sent
  3. Loop continues, calls sock_select again with 30-second timeout
  4. This repeats, consuming 30s + 30s + 30s... = way more than the original read deadline!

The Fix

We need to avoid re-blocking after a 0-byte send. Here's the corrected code:

Fix blocking_flush_all_pending:

pub(crate) fn blocking_flush_all_pending(&self, vm: &VirtualMachine) -> PyResult<()> {
    let timeout = self.get_socket_timeout(vm)?;
    let mut zero_send_count = 0;
    const MAX_ZERO_SENDS: usize = 3;

    loop {
        let pending_data = {
            let pending = self.pending_tls_output.lock();
            if pending.is_empty() {
                return Ok(());
            }
            pending.clone()
        };

        let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?;
        let socket = py_socket
            .sock()
            .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?;
        let timed_out = sock_select(&socket, SelectKind::Write, timeout)
            .map_err(|e| vm.new_os_error(format!("select failed: {e}")))?;

        if timed_out {
            return Err(
                timeout_error_msg(vm, "The write operation timed out".to_string()).upcast(),
            );
        }

        match self.sock_send(&pending_data, vm) {
            Ok(result) => {
                let sent: usize = result.try_to_value::<isize>(vm)?.try_into().unwrap_or(0);
                if sent > 0 {
                    let mut pending = self.pending_tls_output.lock();
                    pending.drain(..sent);
                    zero_send_count = 0;  // Reset counter on successful send
                } else {
                    // Socket said it was ready but sent 0 bytes
                    zero_send_count += 1;
                    if zero_send_count >= MAX_ZERO_SENDS {
                        return Err(vm.new_os_error(
                            "Socket became unwritable despite select indicating ready".to_string()
                        ));
                    }
                    // Small delay before retry to avoid tight loop
                    std::thread::sleep(std::time::Duration::from_millis(1));
                }
            }
            Err(e) => {
                if is_blocking_io_error(&e, vm) {
                    continue;
                }
                return Err(e);
            }
        }
    }
}

Fix flush_pending_tls_output:

pub(crate) fn flush_pending_tls_output(
    &self,
    vm: &VirtualMachine,
    deadline: Option<std::time::Instant>,
) -> PyResult<()> {
    let mut pending = self.pending_tls_output.lock();
    if pending.is_empty() {
        return Ok(());
    }

    let socket_timeout = self.get_socket_timeout(vm)?;
    let is_non_blocking = socket_timeout.map(|t| t.is_zero()).unwrap_or(false);

    let mut sent_total = 0;
    let mut zero_send_count = 0;
    const MAX_ZERO_SENDS: usize = 3;

    while sent_total < pending.len() {
        let timeout_to_use = if let Some(dl) = deadline {
            let now = std::time::Instant::now();
            if now >= dl {
                *pending = pending[sent_total..].to_vec();
                return Err(
                    timeout_error_msg(vm, "The operation timed out".to_string()).upcast()
                );
            }
            Some(dl - now)
        } else {
            socket_timeout
        };

        let py_socket: PyRef<PySocket> = self.sock.clone().try_into_value(vm)?;
        let socket = py_socket
            .sock()
            .map_err(|e| vm.new_os_error(format!("Failed to get socket: {e}")))?;
        let timed_out = sock_select(&socket, SelectKind::Write, timeout_to_use)
            .map_err(|e| vm.new_os_error(format!("select failed: {e}")))?;

        if timed_out {
            *pending = pending[sent_total..].to_vec();
            return Err(
                timeout_error_msg(vm, "The write operation timed out".to_string()).upcast(),
            );
        }

        match self.sock_send(&pending[sent_total..], vm) {
            Ok(result) => {
                let sent: usize = result.try_to_value::<isize>(vm)?.try_into().unwrap_or(0);
                if sent == 0 {
                    if is_non_blocking {
                        *pending = pending[sent_total..].to_vec();
                        return Err(create_ssl_want_write_error(vm).upcast());
                    }
                    // Socket said it was ready but sent 0 bytes
                    zero_send_count += 1;
                    if zero_send_count >= MAX_ZERO_SENDS {
                        *pending = pending[sent_total..].to_vec();
                        return Err(vm.new_os_error(
                            "Socket became unwritable despite select indicating ready".to_string()
                        ));
                    }
                    // Small delay before retry to avoid tight loop
                    std::thread::sleep(std::time::Duration::from_millis(1));
                    continue;
                }
                sent_total += sent;
                zero_send_count = 0;  // Reset on successful send
            }
            Err(e) => {
                if is_blocking_io_error(&e, vm) {
                    if is_non_blocking {
                        *pending = pending[sent_total..].to_vec();
                        return Err(create_ssl_want_write_error(vm).upcast());
                    }
                    continue;
                }
                *pending = pending[sent_total..].to_vec();
                return Err(e);
            }
        }
    }

    pending.clear();
    Ok(())
}

The key changes:

  1. Track zero-byte send count: After 3 consecutive 0-byte sends despite select saying ready, bail out with an error
  2. Reset counter on progress: When bytes are successfully sent, reset the counter
  3. Small sleep on zero-send: Add a 1ms sleep to avoid tight spinning

This prevents the functions from blocking indefinitely when sock_send keeps returning 0, which was consuming all the read deadline time.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/stdlib/src/ssl/compat.rs (2)

1509-1545: Thorough deadline enforcement with best-effort write during read.

The implementation correctly handles the complex case where a READ operation requires writing TLS data (key updates, etc.):

  • Deadline checks before and after flush prevent timeout violations
  • WantWrite and Timeout are treated as non-fatal (lines 1525-1532) since failing the READ for a transient WRITE issue would be incorrect
  • The deadline parameter is properly threaded through send_all_bytes to flush_pending_tls_output

The multiple deadline checks are necessary to handle all edge cases, though the complexity is high.

Optional: Consider extracting deadline check to reduce duplication
fn check_read_deadline(deadline: Option<std::time::Instant>) -> SslResult<()> {
    if let Some(dl) = deadline && std::time::Instant::now() >= dl {
        return Err(SslError::Timeout("The read operation timed out".to_string()));
    }
    Ok(())
}

Then replace the three deadline checks with check_read_deadline(deadline)?;


1519-1523: Comment placement is slightly confusing.

The comment on line 1519 says "Pass deadline so flush respects read timeout", but it's positioned before ssl_write_tls_records(conn) which doesn't accept a deadline. The deadline is actually passed on line 1523 to send_all_bytes, which then passes it to flush_pending_tls_output.

Consider repositioning the comment for clarity
-                // CRITICAL: Pass deadline so flush respects read timeout
                 let tls_data = ssl_write_tls_records(conn)?;
                 if !tls_data.is_empty() {
                     // Use best-effort send - don't fail READ just because WRITE couldn't complete
+                    // CRITICAL: Pass deadline so flush respects read timeout
                     match send_all_bytes(socket, tls_data, vm, deadline) {
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92b79a1 and 114fc35.

📒 Files selected for processing (2)
  • crates/stdlib/src/ssl.rs
  • crates/stdlib/src/ssl/compat.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/ssl/compat.rs
  • crates/stdlib/src/ssl.rs
🧠 Learnings (1)
📚 Learning: 2025-11-10T06:27:41.954Z
Learnt from: youknowone
Repo: RustPython/RustPython PR: 6243
File: vm/src/function/buffer.rs:123-129
Timestamp: 2025-11-10T06:27:41.954Z
Learning: In vm/src/function/buffer.rs, line 36 in the try_rw_bytes_like method intentionally uses TypeError (not BufferError) for the "buffer is not a read-write bytes-like object" error case, even though a similar error message in ArgMemoryBuffer::try_from_borrowed_object uses BufferError. The TypeError is the intended exception type for this specific code path.

Applied to files:

  • crates/stdlib/src/ssl.rs
🧬 Code graph analysis (1)
crates/stdlib/src/ssl.rs (3)
crates/stdlib/src/ssl/compat.rs (2)
  • new (1879-1881)
  • is_blocking_io_error (1002-1004)
crates/stdlib/src/socket.rs (13)
  • std (1252-1252)
  • std (1315-1315)
  • std (1415-1415)
  • std (1837-1837)
  • std (1843-1843)
  • std (1877-1877)
  • std (1883-1883)
  • std (2354-2354)
  • std (2397-2397)
  • std (2434-2434)
  • timeout_error_msg (3296-3298)
  • sock_select (2720-2776)
  • timeout (835-837)
crates/stdlib/src/ssl/error.rs (1)
  • create_ssl_want_write_error (112-118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Ensure compilation on various targets
🔇 Additional comments (10)
crates/stdlib/src/ssl/compat.rs (4)

1006-1078: LGTM! Comprehensive partial-send handling with deadline awareness.

The function correctly:

  • Flushes pending data before sending new bytes to maintain ordering
  • Checks deadline before each send attempt to respect timeouts
  • Buffers unsent bytes on blocking/zero-send conditions to prevent data loss
  • Returns WantWrite when no progress can be made, allowing the caller to retry

1094-1114: Proper ordering and partial-send handling in handshake write path.

The pre-flush before generating new output ensures TLS data ordering is maintained, and using send_all_bytes provides consistent partial-send and buffering behavior throughout the handshake.


1240-1262: Best-effort send with critical blocking flush ensures handshake completion.

The socket mode implementation correctly handles partial sends:

  • The while loop attempts to send pending TLS data (e.g., NewSessionTicket)
  • Breaks on WantWrite rather than failing, preserving unsent data in the pending buffer
  • The blocking_flush_all_pending call (lines 1259-1261) ensures all buffered TLS data is transmitted before handshake completes

This two-phase approach prevents handshake completion with unsent data while gracefully handling socket buffer exhaustion.


1336-1344: Best-effort cleanup in error path is appropriate.

The error handling correctly attempts to flush pending data and send close_notify using send_all_bytes, with errors ignored via let _ = ... since we're already in an error state. This prevents secondary errors from masking the original handshake failure.

crates/stdlib/src/ssl.rs (6)

1905-1905: LGTM! Pending TLS output buffer addition is well-designed.

The pending_tls_output field correctly addresses the data loss issue when rustls drains its internal buffer but the socket cannot immediately accept all TLS bytes. The field is properly:

  • Wrapped in PyMutex for thread-safe access
  • Initialized to empty in both _wrap_socket and _wrap_bio
  • Documented with clear explanation of its purpose

Also applies to: 1976-1976, 2342-2347


2746-2938: LGTM! TLS output flush helpers are correctly implemented.

The three new helper methods properly address the pending data management requirements:

  1. flush_pending_tls_output (lines 2748-2826):

    • Accepts optional deadline to respect read-time limits
    • Correctly calculates timeout: uses deadline if provided, else socket timeout
    • Properly handles all socket modes (blocking, timeout, non-blocking)
    • Preserves unsent data on timeout/blocking/error conditions
  2. send_tls_output (lines 2831-2888):

    • Stages new TLS output and buffers unsent bytes
    • No deadline parameter (correctly—only used in write paths)
    • Consistent error handling with flush_pending_tls_output
  3. blocking_flush_all_pending (lines 2892-2938):

    • Respects socket timeout during flush
    • Correctly drains pending buffer incrementally
    • Handles sent == 0 to avoid infinite loops

All three methods use timeout_error_msg for timeout errors, addressing the previous review comment.


2727-2744: LGTM! Signature change improves efficiency and clarity.

Changing sock_send from data: Vec<u8> to data: &[u8] is a good improvement:

  • More efficient: avoids unnecessary allocations when passing slices
  • More idiomatic Rust: borrows instead of taking ownership
  • Updated documentation clearly states caller responsibilities for flushing pending data

3432-3436: LGTM! Read/write handshake auto-completion matches OpenSSL behavior.

The changes correctly implement SSL auto-handshake semantics:

  1. Lines 3432-3436, 3564-3568: Both read() and write() now auto-complete handshake if not done, matching OpenSSL's SSL_read()/SSL_write() behavior.

  2. Line 3596 (flush before write): CRITICAL fix—ensures TLS 1.3 Finished message and any pending post-handshake data reaches the server before sending new application data. Without this, servers may not be ready to process the data.

  3. Lines 3630, 3640: Modified write loop uses send_tls_output() to properly buffer unsent TLS bytes, preventing data loss.

All flush operations correctly pass None for deadline since write operations aren't under read-deadline constraints.

Also applies to: 3564-3568, 3592-3641


3959-4076: LGTM! Shutdown refactoring correctly prevents data loss and matches CPython behavior.

The shutdown() changes implement several critical fixes:

  1. Lines 3963-3987 (CRITICAL): Flushes ALL pending TLS data BEFORE sending close_notify. This two-step flush (rustls internal buffer → pending_tls_output buffer) ensures close_notify comes AFTER all application data, preventing data loss.

  2. Lines 4040-4050: Returns immediately after sending close_notify for sockets with usable timeout, matching CPython/OpenSSL behavior. This avoids deadlock where both sides wait for each other's close_notify.

  3. Lines 4056-4073: Respects socket timeout modes when flushing:

    • Non-blocking (line 4059-4062): Best-effort flush, ignores errors to prevent asyncore deadlock (intentional)
    • Timeout mode (line 4064-4067): Flushes with socket's timeout, propagates errors
    • Blocking mode (line 4069-4071): Waits until all data sent

The non-blocking mode's error suppression (line 4061 let _ = ...) is intentional per the comment and matches the asyncore usage pattern.


4103-4178: LGTM! BIO EOF handling fix addresses previous review concern.

The changes correctly implement the BIO mode distinction between "no data yet" and "peer closed":

  1. Lines 4104-4106: write_pending_tls now flushes any previously pending data first, ensuring correct ordering when sending new TLS data (including close_notify).

  2. Lines 4129-4178 (try_read_close_notify refactoring): Properly handles BIO EOF semantics:

    • Lines 4143-4157: In BIO mode, checks bio.eof attribute when data is empty
    • If eof=false, returns Ok(false) (no data available yet) ✓
    • If eof=true, returns Ok(true) (peer closed) ✓
    • Socket mode: Empty read indicates peer closed (ragged EOF)
    • Line 4170-4172: BlockingIOError treated as "no data yet" (not peer close)

This addresses the past review comment: "try_read_close_notify may treat 'no data yet' from BIO as peer close".

@youknowone youknowone merged commit fd27c2d into RustPython:main Jan 4, 2026
12 of 13 checks passed
@youknowone youknowone deleted the ssl branch January 4, 2026 08:54
terryluan12 pushed a commit to terryluan12/RustPython that referenced this pull request Jan 5, 2026
* simplify test_ftplib

* mark flaky test

* better ssl handshake, pending tls output

* flush_all_pending

* deadline

* flush pending
terryluan12 pushed a commit to terryluan12/RustPython that referenced this pull request Jan 5, 2026
* simplify test_ftplib

* mark flaky test

* better ssl handshake, pending tls output

* flush_all_pending

* deadline

* flush pending
@coderabbitai coderabbitai bot mentioned this pull request Jan 13, 2026
terryluan12 pushed a commit to terryluan12/RustPython that referenced this pull request Jan 15, 2026
* simplify test_ftplib

* mark flaky test

* better ssl handshake, pending tls output

* flush_all_pending

* deadline

* flush pending
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants