Skip to content

gh-142352: Fix start_tls() to transfer buffered data from StreamReader#142354

Merged
kumaraditya303 merged 5 commits intopython:mainfrom
kasimov-maxim:gh-142352-start-tls-buffered-data
Feb 28, 2026
Merged

gh-142352: Fix start_tls() to transfer buffered data from StreamReader#142354
kumaraditya303 merged 5 commits intopython:mainfrom
kasimov-maxim:gh-142352-start-tls-buffered-data

Conversation

@kasimov-maxim
Copy link
Contributor

@kasimov-maxim kasimov-maxim commented Dec 6, 2025

Fixes #142352

Summary

When using StreamWriter.start_tls() to upgrade a connection to TLS mid-stream, any data already buffered in the StreamReader was lost. This commonly occurs when implementing servers that support the PROXY protocol.

Changes

  • Added _transfer_buffered_data_to_ssl() method to BaseEventLoop that transfers buffered data from StreamReader._buffer to SSLProtocol._incoming before the TLS handshake begins
  • Added tests for the PROXY protocol + TLS scenario

Generative AI usage

Some parts of this PR were assisted by a generative AI tool. Specifically:

  • help with drafting English-language comments and documentation
  • suggestions for test refactoring
  • assistance with wording and structuring the PR description

All functional logic and final edits were reviewed, verified, and authored by
me, in accordance with the Python devguide guidelines.

Test plan

  • Added new tests in test_streams.py
  • All asyncio tests pass
  • make patchcheck passes

@python-cla-bot
Copy link

python-cla-bot bot commented Dec 6, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@kumaraditya303
Copy link
Contributor

@ambv Can you review this? You reviewed the original PR which added this API #91453

picnixz
picnixz previously requested changes Jan 6, 2026
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this PR been generated using an LLM? if so, indicate which places were so and read https://devguide.python.org/getting-started/generative-ai/.

self.assertEqual(msg2, b"hello world 2!\n")

def _run_test_start_tls_behind_proxy(self, send_combined):
"""Test start_tls() when TLS ClientHello arrives with PROXY header.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove docstrings from tests. It would be shown on failure. Use regular comments and shorten it. We don't want to restate the entire issue so simply link the issue's number as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, will update the tests accordingly.

test_message = b"hello world\n"
expected_response = reverse_message(test_message)

class TCPProxyServer:
Copy link
Member

@picnixz picnixz Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this class? don't we already have some helpers elsewhere? The test also looks overly complex. Why not just patching existing TCPServer? (if possible; I don't know if there isn't some class that could be used as is with some minimal modifications)

@bedevere-app
Copy link

bedevere-app bot commented Jan 6, 2026

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@picnixz
Copy link
Member

picnixz commented Jan 6, 2026

Also, I think reading and checking if we're compliant with https://datatracker.ietf.org/doc/html/rfc2817 should be done (even if the RFC is only proposed and not in the standard tracks) or whatever RFC should be considered (I do think it's correct to fix this but I don't know if we should only send the possible hello for tls or generally write whatever was buffered)

@kasimov-maxim
Copy link
Contributor Author

Has this PR been generated using an LLM? if so, indicate which places were so

Noted in PR.

Comment on lines 1335 to 1336
ssl_protocol._incoming.write(buffer)
buffer.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could better belong @

self._incoming = ssl.MemoryBIO()
.

Technically, there could be something like

-         self._incoming = ssl.MemoryBIO()
+         self._incoming = ssl.MemoryBIO()
+         try:
+             underlying_raw_stream_buffer = app_protocol._stream_reader._buffer
+         except AttributeError:
+             pass
+         else:
+             self._incoming.write(underlying_raw_stream_buffer)
+             underlying_raw_stream_buffer.clear()

But I don't know how other places that initialize sslproto.SSLProtocol() would be affected. So it's just an idea to check out and see if other instances should have the same buffer transfer logic (looks like it's used in two other modules).

@kasimov-maxim
Copy link
Contributor Author

kasimov-maxim commented Jan 21, 2026

@picnixz

I think reading and checking if we're compliant with https://datatracker.ietf.org/doc/html/rfc2817 should be done

RFC 2817 isn’t relevant here; it covers HTTP/1.1 Upgrade rather than asyncio.start_tls(). At this point we’ve already decided to switch to TLS, so any buffered bytes belong to the TLS layer and should be passed as‑is to the SSL BIO. If non‑TLS bytes are still buffered, that means start_tls() was called too early in the protocol, which is outside asyncio’s responsibility.

@kasimov-maxim kasimov-maxim force-pushed the gh-142352-start-tls-buffered-data branch from b2ed3c3 to 86d80bf Compare January 21, 2026 16:08
@kasimov-maxim
Copy link
Contributor Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Jan 21, 2026

Thanks for making the requested changes!

@picnixz: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from picnixz January 21, 2026 16:15
@kasimov-maxim kasimov-maxim force-pushed the gh-142352-start-tls-buffered-data branch from 86d80bf to c2cd20b Compare January 21, 2026 16:49
@kasimov-maxim
Copy link
Contributor Author

@fantix

Conclusion first: I think loop.start_tls() should take an extra parameter forward_handshake_data: bytes | None instead of actively looking into the protocol.

I agree that explicitly passing the bytes to be forwarded is the cleanest model, since only the application knows its buffer semantics. A parameter like forward_handshake_data: bytes | None (or shorter handshake_data: bytes | None) would make that explicit.
It’s also symmetric with cases where the application knows the required ordering (e.g., PROXY header first, then TLS), which the event loop cannot infer.

@kasimov-maxim
Copy link
Contributor Author

@fantix

The root trigger of this bug is, as you stated in the bug report, "the TLS ClientHello (arrived) in the same TCP segment (as some application flag did)".

The new regression test (test_start_tls_buffered_data) shows the trigger is broader than “ClientHello in the same TCP segment as some application flag.” The issue is any buffered TLS data that reaches StreamReader before start_tls() is called. That can happen even without proxy‑protocol or segment coalescing.

@kumaraditya303 kumaraditya303 added topic-asyncio needs backport to 3.13 bugs and security fixes needs backport to 3.14 bugs and security fixes type-bug An unexpected behavior, bug, or error labels Feb 8, 2026
@kumaraditya303
Copy link
Contributor

Please resolve the Github comments once they are addressed, otherwise it makes reviewing difficult.

Copy link
Contributor

@fantix fantix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@kumaraditya303 kumaraditya303 merged commit 0598f4a into python:main Feb 28, 2026
51 checks passed
@miss-islington-app
Copy link

Thanks @kasimov-maxim for the PR, and @kumaraditya303 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13, 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 28, 2026
…a from StreamReader (pythonGH-142354)

(cherry picked from commit 0598f4a)

Co-authored-by: Maksym Kasimov <39828623+kasimov-maxim@users.noreply.github.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
@miss-islington-app
Copy link

Sorry, @kasimov-maxim and @kumaraditya303, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 0598f4a8999b96409e0a2bf9c480afc76a876860 3.13

@bedevere-app
Copy link

bedevere-app bot commented Feb 28, 2026

GH-145363 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Feb 28, 2026
@bedevere-app
Copy link

bedevere-app bot commented Feb 28, 2026

GH-145364 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Feb 28, 2026
kumaraditya303 added a commit to kumaraditya303/cpython that referenced this pull request Feb 28, 2026
…red data from StreamReader (pythonGH-142354)

(cherry picked from commit 0598f4a)

Co-authored-by: Maksym Kasimov <39828623+kasimov-maxim@users.noreply.github.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
kumaraditya303 added a commit to miss-islington/cpython that referenced this pull request Feb 28, 2026
…a from StreamReader (pythonGH-142354)

(cherry picked from commit 0598f4a)

Co-authored-by: Maksym Kasimov <39828623+kasimov-maxim@users.noreply.github.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
bkap123 pushed a commit to bkap123/cpython that referenced this pull request Feb 28, 2026
…a from StreamReader (python#142354)

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
kumaraditya303 added a commit that referenced this pull request Feb 28, 2026
…ta from StreamReader (GH-142354) (#145363)

gh-142352: Fix `asyncio` `start_tls()` to transfer buffered data from StreamReader (GH-142354)
(cherry picked from commit 0598f4a)

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Co-authored-by: Maksym Kasimov <39828623+kasimov-maxim@users.noreply.github.com>
kumaraditya303 added a commit that referenced this pull request Feb 28, 2026
…rom StreamReader (GH-142354) (#145364)

[3.13] gh-142352: Fix `asyncio` `start_tls()` to transfer buffered data from StreamReader (GH-142354)
(cherry picked from commit 0598f4a)

Co-authored-by: Maksym Kasimov <39828623+kasimov-maxim@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

topic-asyncio type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

asyncio: start_tls() loses buffered data from StreamReader

5 participants