Skip to content

ROX-33569: Rotate TLS session ticket keys on TLS certificate rotation#19384

Open
kurlov wants to merge 2 commits intomasterfrom
akurlov/ROX-33569-rotate-central-tls-session-keys-along-with-certificate
Open

ROX-33569: Rotate TLS session ticket keys on TLS certificate rotation#19384
kurlov wants to merge 2 commits intomasterfrom
akurlov/ROX-33569-rotate-central-tls-session-keys-along-with-certificate

Conversation

@kurlov
Copy link
Member

@kurlov kurlov commented Mar 11, 2026

Description

When Central's default TLS certificate is rotated the certificate file watcher correctly detects the change and loads the new certificate into memory. However, a browser which typically caches TLS sessions continue to see the old certificate because TLS session ticket keys are not rotated along with the certificate.

The problem is that TLS Session ticket keys are static:

  • Go's TLS library generates session ticket keys at startup
  • When we Clone() the config, it copies these static keys
  • As long as the keys remain unchanged, clients can resume indefinitely
  • Even after certificate rotation, resumed sessions never see the new certificate

Originally I've experienced this issue when rotated certificate for the integration ACSCS. The browser still see the old certificate even after two days. Basically, I had to invalidate browser cache or close all tabs and wait for connection timeout to receive a new certificate.

Refreshing TLS Session ticket keys along with certificate rotation should prevent the above issue.

A sign of another potential issue

By default TLS Session ticket keys should be rotated every 24 hours. However since I saw the old certificate after two days it means the automatical rotation of session ticket keys does not work. If look at crypto/tls
/common.go ticketKeys
the cloned config will have static session ticket keys. Thus the keys will be valid indefinitely.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Added unit test

@rhacs-bot
Copy link
Contributor

Images are ready for the commit at 74b8748.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-298-g74b874869c.

}

// rotateSessionTicketKeys generates and sets new session ticket keys for the TLS config.
// Note: Calling SetSessionTicketKeys disables Go's automatic 24-hour session ticket key rotation.
Copy link
Member Author

@kurlov kurlov Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that automatic session ticket rotation is effectively disabled at the current implementaion. The cloned config does not automatically updates ticket keys. See PR description for more details

errNoTLSConfig = errors.New("no TLS config is available")

// sessionTicketKeyRotator is the function used to rotate session ticket keys. Used for testing.
sessionTicketKeyRotator = rotateSessionTicketKeys
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to Go's TLS API limitation. There is no easy way to test that SessionTicketKeys are updated. Hence validate that the update function is called at least

@kurlov
Copy link
Member Author

kurlov commented Mar 11, 2026

/retest

@codecov
Copy link

codecov bot commented Mar 11, 2026

Codecov Report

❌ Patch coverage is 40.00000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.63%. Comparing base (b0dfc15) to head (74b8748).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
pkg/mtls/certwatch/tls_config_holder.go 40.00% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19384      +/-   ##
==========================================
- Coverage   49.64%   49.63%   -0.02%     
==========================================
  Files        2698     2701       +3     
  Lines      203075   203233     +158     
==========================================
+ Hits       100822   100879      +57     
- Misses      94732    94827      +95     
- Partials     7521     7527       +6     
Flag Coverage Δ
go-unit-tests 49.63% <40.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@openshift-ci
Copy link

openshift-ci bot commented Mar 11, 2026

@kurlov: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ocp-4-20-qa-e2e-tests 74b8748 link false /test ocp-4-20-qa-e2e-tests
ci/prow/ocp-4-12-qa-e2e-tests 74b8748 link false /test ocp-4-12-qa-e2e-tests
ci/prow/ocp-4-21-operator-e2e-tests 74b8748 link false /test ocp-4-21-operator-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@vladbologa vladbologa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I think this PR introduces a subtle side effect that could impact long running existing connections, but I'd say it's acceptable.

Consider this scenario:

  • long running Sensor - Central connection (more than 6 months)
  • Sensor certificates expired and it didn't reload them (refresh occurs 6 months in advance, but a pod restart is required to load new certs)
  • Central updates its TLS info and refreshes its session keys -> re authentication will now fail

So a client with expired certs would be disconnected, whereas before it kept running until the connection is broken and has to be re-established. Basically it surfaces a problem that was silently hiding. I'm not sure how likely this is to happen, but I thought it's worth mentioning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants