Skip to content

ROX-8752: Keep local scanner certificates up to date#278

Closed
juanrh wants to merge 45 commits intomasterfrom
juanrh/ROX-8752
Closed

ROX-8752: Keep local scanner certificates up to date#278
juanrh wants to merge 45 commits intomasterfrom
juanrh/ROX-8752

Conversation

@juanrh
Copy link
Contributor

@juanrh juanrh commented Jan 11, 2022

Description

This is a follow up of #219, only commit from 88dd23a4a on are relevant.

In order to keep local scanner certificates up to date, this change adds a new SensorComponent to Sensor, exposed by localscanner.NewLocalScannerOperator, that periodically fetches the certificates from Central, using the bidirectional RPC Communicate.

This is a first version to discuss the general approach. In particular naming of the new SensorComponent should be revised.

Checklist

  • Investigated and inspected CI test results
  • Unit test and regression tests added
  • Evaluated and added CHANGELOG entry if required
  • Determined and documented upgrade steps

If any of these don't apply, please comment below.

Testing Performed

TODO(replace-me)

@juanrh juanrh changed the base branch from master to juanrh/ROX-8790 January 12, 2022 08:35
@ghost
Copy link

ghost commented Jan 12, 2022

Tag for build #107572 is 3.67.x-276-gabefd5cf18.

💻 For deploying this image using the dev scripts, run the following first:

export MAIN_IMAGE_TAG='3.67.x-276-gabefd5cf18'

📦 You can also generate an installation bundle with:

docker run -i --rm stackrox/main:3.67.x-276-gabefd5cf18 central generate interactive > bundle.zip

🕹️ A roxctl binary artifact can be downloaded from CircleCI.

Copy link
Contributor

@SimonBaeumer SimonBaeumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of the logic is implemented twice for both secrets. How about creating a small component which allows to register timers on a secret to manage the tls cert?

This CertManager (or however it is called) implements a Start() and Update() func to start the timer managing the resource. All the CertManager needs to know is

  • which secret to manage?
  • how to update the certs in the secret? (maybe an anonymous func solves that or simply adding the field names as input for the first implementation)
  • timer duration (may also be depended on cert content expiration)

The certmanager itself

  • sends the request to central if the timer expires
  • updates the cert if a response message was received
    • handles also errors if no response was received, so small storage to keep track of already send requests

The localscannerOperatorImpl only takes care of constructing and starting the different cert managers (this could be used later also for other components).

Comment on lines 30 to 31
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the secret is scanner-tls and scanner-db-tls. For the time being let us add these secrets like that and only persist them if the scanner-tls is
A) does not already exists
or B) has the ownership of the Sensor

Comment on lines 119 to 120
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the following behaviour from maintaining the scanner tls secrets:

  • Create secret if not found (set owner ref on object, may happen on sensor startup)
  • Update secret if found (and owner ref matches sensor deployment)
  • Don't update secret if it exists and does not exist

OwnerReferneces on objects are necessary to determine if a resource is controlled by something living in the cluster and to garbage collect if the owning object reference can't be found.
In this case, if the Sensor deployment is deleted the scanner-tls secret should be deleted too. You can find the k8s GC docs here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may missed that but how is the first certificate issued if no scanner certs existed beforehand (when a SecuredCluster is installed into a fresh cluster)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was written to renew the secrets data if the secrets exists, and do nothing if the secrets don't exist. I'll change that according to your comments above.
In any case, I dry-run tested this version by creating empty secrets with kubectl -n stackrox create secret generic scanner-local-tls. For an empty secret we have secret.Data[mtls.ServiceCertFileName] returning the zero value for []byte. The func getScannerSecretDuration passes that empty slice to helpers.ParseCertificatePEM and gets an error, which leads to getScannerSecretDuration returning 0 as the time until the next refresh. Start is calling o.scheduleLocalScannerSecretsRefresh() that calls o.doScheduleLocalScannerSecretsRefresh(getScannerSecretsDuration(localScannerCredsSecret, localScannerDBCredsSecret)) so o.doScheduleLocalScannerSecretsRefresh schedules the secret refresh immediately.
Long story short, in this version of the code the operator starts reading the secrets. If there are no certs in the secret data then it assumes the secrets are corrupt and issues new certificates; if there is some certs in the secret data then it schedules a refresh accordingly to the certificates expiration

Base automatically changed from juanrh/ROX-8790 to master January 17, 2022 13:37
@juanrh
Copy link
Contributor Author

juanrh commented Jan 18, 2022

Splitting this PR into pieces, see subtasks of https://issues.redhat.com/browse/ROX-8752.
First PR of the set replacing this is #350

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants