ROX-25946, ROX-25947: New cluster registration logic#13239
ROX-25946, ROX-25947: New cluster registration logic#13239mclasmeier merged 95 commits intomasterfrom
Conversation
|
Skipping CI for Draft Pull Request. |
|
Images are ready for the commit at 690cae2. To use with deploy scripts, first |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13239 +/- ##
==========================================
+ Coverage 48.54% 48.56% +0.01%
==========================================
Files 2467 2474 +7
Lines 177811 178837 +1026
==========================================
+ Hits 86319 86849 +530
- Misses 84564 85045 +481
- Partials 6928 6943 +15
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
4d2b146 to
e744aa1
Compare
|
/retest |
porridge
left a comment
There was a problem hiding this comment.
LGTM overall, but I have some questions, some important change requests, and a whole bunch of optional nitpicks.
|
/test all |
7d1c267 to
769d838
Compare
|
FTR, the updated code, including the refactoring of the crs flow implementation, is tested successfully in a follow-up PR. |
Refactoring related to NewClient function.
|
/retest |
lvalerom
left a comment
There was a problem hiding this comment.
Finished a first pass. It looks good. I have a couple of overall questions:
- Out of curiosity, are these init containers executed every time the pod restarts or only the first time?
- Do I need to rebase any branches in order to test these changes? I'd like to play a little bit with different sensor/central versions.
The init containers are executed on every pod start. But the CRS container would terminate very quickly, because it sees that nothing is to do and the |
Yes, you can test this. Go to this branch for that. It contains the changes of this branch plus the necessary modifications to the Helm chart. |
…se time-capped context in CRS handshake
…ficateSet, adjust code and tests accordingly
|
@mclasmeier I was able to make sensor panic by:
I'm not sure if this is something that we even want to address. It looks to me like a very extreme case but it thought it was worth mentioning. |
Thank you for the testing! I think what you are outlining above seems like somewhat of an artificial breakage caused by combining code fragments from different versions that were not designed to play nice with each other. My guess is that in your scenario the following happens: ROX_DEPLOY_SENSOR_WITH_CRS=true (within the deploy scripts) causes the new Helm chart to use CRS. But the specified sensor (4.6.1) doesn't know how to do CRS, hence sensor would immediately try to start normally as a service and fails to find certificates? |
…g CA cert or missing service certificates
|
/retest |
2 similar comments
|
/retest |
|
/retest |
Agree.
Just to clarify. I didn't use the deploy scripts. I generated the helm charts with roxctl v4.7 and I had the |
Yeah sensor panics because it cannot find the certificates. There's only one way I can see we can avoid this (if we want to):
|
lvalerom
left a comment
There was a problem hiding this comment.
LGTM! Thank you for the changes!
Isn't sensor deliberately panicking in such a case? In any case, it would still require an upgrade on the user's side to benefit from whatever we change here.
Thank you, I think this one -- if feasible -- would be the way to go. But not as part of this PR. |
Hmm. |
Thank you so much! |
Description
This PR implements the CRS-based registration flow between central and sensor. The sensor-side of the CRS flow is implemented as a sub-command of the
sensorbinary for convenient testing and usage in the form of an init container in the sensor pod.Some reviewing notes:
emptyDir) so that they can be referenced as files and thus injected into the existing MTLS-based authentication flow.User-facing documentation
Testing and quality
Automated testing
How I validated my change
In order to keep PRs at a manageable size this PR only contains the changes for the flow between central and sensor. Actually leveraging this flow in our Helm charts is part of #12713. Thus, most testing has been done on the feature branch which contains the Helm chart changes, which is based on this branch.