Skip to content

ROX-9148: Create cert refresher for business logic of local scanner certificate refresh#474

Merged
juanrh merged 30 commits intomasterfrom
juanrh/ROX-9148
Feb 17, 2022
Merged

ROX-9148: Create cert refresher for business logic of local scanner certificate refresh#474
juanrh merged 30 commits intomasterfrom
juanrh/ROX-9148

Conversation

@juanrh
Copy link
Contributor

@juanrh juanrh commented Jan 28, 2022

Description

This is a fragment of #400 that has the business logic for the client side of keeping local scanner certificates updated, that is not related to a sensor component.

Checklist

  • Investigated and inspected CI test results
  • Unit test and regression tests added
  • Evaluated and added CHANGELOG entry if required
  • Determined and documented upgrade steps

No need to add a CHANGELOG entry or update steps.

Testing Performed

New unit tests

go test -count=100 -race -v -p=1 github.com/stackrox/rox/sensor/kubernetes/localscanner -run TestCertRefresher
go test -count=100 -race -v -p=1 github.com/stackrox/rox/pkg/concurrency -run TestRetryTicker

@ghost
Copy link

ghost commented Jan 28, 2022

Tag for build #216678 is 3.68.x-235-g3f4e8b1ec4.

💻 For deploying this image using the dev scripts, run the following first:

export MAIN_IMAGE_TAG='3.68.x-235-g3f4e8b1ec4'

📦 You can also generate an installation bundle with:

docker run -i --rm stackrox/main:3.68.x-235-g3f4e8b1ec4 central generate interactive > bundle.zip

🕹️ A roxctl binary artifact can be downloaded from CircleCI.

Copy link
Contributor

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice so far, looking forward to a version rebased on all the dependent PRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is a lie in case timeToRefresh was > 0, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeToRefresh is the time for the next refresh, we just refreshed the certificates in line 85 with r.refreshCertificates(ctx). I'll rename timeToRefresh to timeToNextRefresh to make this more clear

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that if we landed here because refreshCertificates returned early, on line 108 then nothing got refreshed this time round, and this message is a lie. I think we should log this around line 123 instead, and even potentially rename refreshCertificates to maybeRefreshCertificates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right, done

@juanrh juanrh changed the base branch from juanrh/ROX-9129 to master February 7, 2022 18:51
@juanrh juanrh requested a review from porridge February 7, 2022 18:52
@juanrh juanrh marked this pull request as ready for review February 7, 2022 18:52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that if we landed here because refreshCertificates returned early, on line 108 then nothing got refreshed this time round, and this message is a lie. I think we should log this around line 123 instead, and even potentially rename refreshCertificates to maybeRefreshCertificates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an error right after deployment, when it's expected, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in #457 the repository should already populate the secret data, so this would be unexpected. Although I wonder if this could also happen even after the repository creates the secrets initialising the secret data with the certificates, due to k8s eventual consistency. If that is the case we'd have the certificates emitted twice in a row, do you think that could happen too?

For the same, I think scanner should be able to wait without crashing until the secrets are created and populated. Is that correct or am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make it such that when a secret is present, it contains expected data. In other words we should never create an empty secret.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand you mean that in that case we should not treat ErrEmptyCertificate differently to other errors. The current proposal for #457 creates secrets with data populated with the certificates. So I think we can go two paths here:

  1. Treat errors in getTimeToRefresh as unrecoverable: so getTimeToRefresh just returns 0, err for any error returned by r.getCertsRenewalTime(certificates). This implies an infinite loop of errors when the secret data is corrupted. The current proposal for ROX-9128: Create certSecretsRepo type to store certificates in k8s secrets #457 only creates a secret when it doesn't exist, so restarting Sensor wouldn't fix that
  2. Treat errors in getTimeToRefresh as recoverable, and recover by returning 0, nil for any error returned by r.getCertsRenewalTime(certificates). This can also leads to an infinite loop of requesting certificates, if the certificate renewal process always fails.

I tend to prefer option 2, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the repository should already populate the secret data, so this would be unexpected

I understand you mean that in that case we should not treat ErrEmptyCertificate differently to other errors.

Yes, if we never create an empty secret then we should not be having a special error instance for an empty secret. Just treat it as invalid cert ripe for a refresh.

Copy link
Contributor Author

@juanrh juanrh Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I first modified getTimeToRefresh to never fail and always recover using 0 as timeToNextRefresh.
But then I realised that if we get an error in the second call to r.getTimeToRefresh in the body of ensureCertificatesAreFresh, for the new certificates returned by r.requestCertificates(ctx), then we should not return 0, nil to the ticker that is calling RefreshCertificates, because that would imply an immediate retry, thus hammering central with certificate requests just because the refresher doesn't understand the certificates. So I have added a recoverFromErr parameter to getTimeToRefresh to optionally ignore errors, so in ensureCertificatesAreFresh for the the first call we ignore errors and refresh the certificate immediately if we cannot parse the certificates stored in the secrets, but for the second call where we parse the new certificates we don't recover from an error so the backoff mechanism in the ticker can work.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are no other errs in this function:

Suggested change
renewalTime, renewalTimeErr := r.getCertsRenewalTime(certificates)
renewalTime, err := r.getCertsRenewalTime(certificates)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this interface is all that useful...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would be useful for testing the sensor component that launches the refresher, as it would allow to mock the refresher easily

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but an interface should be defined in the user's package, not in the implementer's one.
https://blog.chewxy.com/2018/03/18/golang-interfaces/

Here it's the same package, but arguably the same principles should apply when we split things across files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm understanding here is that the suggestion is moving the certRefresher interface to the sensor component that uses it, e.g. to file tls_issuer.go in https://github.com/stackrox/stackrox/pull/565/files#diff-81bd3808339e34da64a1fa5e66c27f0ec9d75a8a51956b1773c4e0c8b6295c08. I'm thus removing the interface from this PR

Comment on lines 37 to 38
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ticker := concurrency.NewRetryTicker(refresher.RefreshCertificates, timeout, backoff)
return ticker
return concurrency.NewRetryTicker(refresher.RefreshCertificates, timeout, backoff)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 58 to 59
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// PutServiceCertificates TODO replace by serviceCertificatesRepo from ROX-9128
type PutServiceCertificates interface {
// certificatesRepo TODO replace by serviceCertificatesRepo from ROX-9128
type certificatesRepo interface {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@juanrh juanrh requested a review from porridge February 8, 2022 16:49
Comment on lines 22 to 23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that because the last line of parameter list is indented the same as the first line of code, I find it hard to tell them apart visually. WDYT about inserting a blank line as the first line of function body whenever there is a line break inside the parameter list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I agree that is more readable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only passed as the function to the ticker and not accessed anywhere else (apart from unit tests), right?
In that case I guess it does not need to be exported? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to lower case

Comment on lines 53 to 65
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the entire API is just creating a ticker and delegating Stop and Start to it, then perhaps the refresher should not be wrapping it in the first place? Perhaps we can simplify the whole thing by having newCertRefresher return an appropriately configured ticker instead? WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, and it simplifies the code significantly, done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the repository should already populate the secret data, so this would be unexpected

I understand you mean that in that case we should not treat ErrEmptyCertificate differently to other errors.

Yes, if we never create an empty secret then we should not be having a special error instance for an empty secret. Just treat it as invalid cert ripe for a refresh.

@juanrh juanrh requested a review from porridge February 10, 2022 15:39
Copy link
Contributor

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, apart from a few nitpicks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: corrupted might be too strong a word for the case where CA certs differ. Perhaps just inconsistent state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed this to

log.Errorf("local scanner certificates are in an inconsistent state, will refresh certificates immediately: %s", getCertsErr)

Comment on lines 26 to 25
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
timeout time.Duration, backoff wait.Backoff) concurrency.RetryTicker {
return concurrency.NewRetryTicker(func(ctx context.Context) (timeToNextTick time.Duration, err error) {
timeout time.Duration, backoff wait.Backoff) concurrency.RetryTicker {
return concurrency.NewRetryTicker(func(ctx context.Context) (timeToNextTick time.Duration, err error) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a consistent wrapping style for the same function prototypes (i.e. here vs refreshCertificates). For 4 parameters I'd prefer the style used here, but up to you as long as it's consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now using longer lines for refreshCertificates too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// recoverFromErr to so the ticker knows this is an error, and it retries with backoff.
// recoverFromEr=false so the ticker knows this is an error, and it retries with backoff.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH my preference would be to inline this function. The extra recoverFromErr parameter seems to introduce more complexity than is solved by having this standalone function. At least I had a hard time figuring out what is going on :-)

Copy link
Contributor Author

@juanrh juanrh Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I like how your comments keep making the code smaller!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ErrDifferentCAForDifferentServiceTypes TODO replace by ROX-9128
// ErrDifferentCAForDifferentServiceTypes TODO(ROX-9128): <description why>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan is not merging this even with a TODO, because that would cause a conflict with #457 because both would define the same package variable and break the build. Merging this PR is blocked by merging #457

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

references to ROX-9128 removed after rebase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ErrDifferentCAForDifferentServiceTypes = errors.New("found different CA PEM in secret Data for different service types")
ErrDifferentCAForDifferentServiceTypes = errors.New("found different CA PEM in secret Data field for different service types")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ErrMissingSecretData TODO replace by ROX-9128
// ErrMissingSecretData TODO(ROX-9128): <description>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

references to ROX-9128 removed after rebase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mock right? Can this be removed?
If you add temporary code try to mark it as TODO(do-not-merge): ... to prevent accidental merges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not a mock, per another review comment I moved the interface definition to the place where they are used, see #474 (comment) for context

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we add the interfaces where they are needed?
Tbh I've never seen this before it and seems to not make sense anymore if we have another package using the same interface as a dependency (in structs or function signatures).
cc: @porridge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. the Reader is also defined in io.go where the implementation lives

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this particular case, another option is putting all interfaces in a new file sensor/kubernetes/localscanner/types.go, what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@porridge Do you have any production examples? Preferably in a well-known project?
Also took a look at the uber style guide where it is not mentioned.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SimonBaeumer here's the most authoritative source I could find: https://github.com/golang/go/wiki/CodeReviewComments#interfaces

This is also alluded to by people like Rob Pike in places like https://groups.google.com/g/golang-nuts/c/mg-8_jMyasY/m/lo-kDuEd540J which explains the philosophical background of why this rule was created in the first place.
I'm also reasonably sure I've seen a talk by one of the Go creators that mentioned this, but I cannot find it :-(

Some more places where it's mentioned by other people:

https://blog.logrocket.com/exploring-structs-interfaces-go/#other-interface-types-considerations
https://commandercoriander.net/blog/2018/03/30/go-interfaces/

Of course there are cases where this is hard to apply, but I don't think this is one of them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the talk I had in mind, but fun to watch: https://www.youtube.com/watch?v=PAAkCSZUG1c&t=318s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting discussion. Some golang concepts like interfaces or methods are sneaky to me because they sound similar to Java or C++ concepts, but are actually quite different, so this material is really useful

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for sharing @porridge. I was not aware of this, need to dig deeper later 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// refreshCertificates determines refreshes the certificate secrets if needed, and returns the time
// refreshCertificates refreshes the certificate secrets if needed, and returns the time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 81 to 85
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be replaced with ensureSecrets after the repo secrets PR is merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@SimonBaeumer
Copy link
Contributor

I did not review tests as waiting for updating the interface for ensure secrets.

@juanrh juanrh requested a review from SimonBaeumer February 15, 2022 12:14
Copy link
Contributor

@SimonBaeumer SimonBaeumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already looks very good.
Small remarks on the interface discussion and mocking (using generated mocks instead).
Good job here!

Comment on lines +125 to +126
time.Sleep(capTime)
assert.Nil(t, ticker.getTickTimer())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have something like an assert.Until function?
Or retry this until either ticker.getTickerTimer==null or timeout exceeded?
I looks like a general pattern here. Would be great to have something like that here. (Must not be part of this PR, instead adding it as a separate ticket.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen that in testify, but we can implement something like that with PollWithTimeout

func assertTickerEventuallyStops(t *testing.T, ticker *retryTickerImpl) {
	ok := PollWithTimeout(func() bool {
		return ticker.getTickTimer() == nil
	}, 10 * time.Millisecond, capTime)
	assert.True(t, ok, "ticker should eventually stop")
}

I have checked that this catches the error for TestRetryTickerStop and TestRetryTickerStopsOnNonRecoverableErrors when changing scheduleTick to work wrong, so I'm using this instead.

Nice catch, this should make our tests slightly faster

}

func (s *certRefresherSuite) TestRefreshCertificatesRequestCertificatesFailure() {
certificates := (*storage.TypedServiceCertificateSet)(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
certificates := (*storage.TypedServiceCertificateSet)(nil)
var certificates *storage.TypedServiceCertificateSet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

func (s *certRefresherSuite) TestRefreshCertificatesRequestCertificatesResponseFailure() {
certificates := (*storage.TypedServiceCertificateSet)(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
certificates := (*storage.TypedServiceCertificateSet)(nil)
var certificates *storage.TypedServiceCertificateSet

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean we add the interfaces where they are needed?
Tbh I've never seen this before it and seems to not make sense anymore if we have another package using the same interface as a dependency (in structs or function signatures).
cc: @porridge

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I.e. the Reader is also defined in io.go where the implementation lives

return 0, nil
}
if k8sErrors.IsNotFound(getCertsErr) {
log.Warnf("local scanner certificates are missing, "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Warnf("local scanner certificates are missing, "+
log.Warnf("local scanner certificates not found, " +

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@SimonBaeumer SimonBaeumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the interface location this looks good. Imho if we don't have evidence to put in the cert_refresher let's put it back to implementation side.

@juanrh juanrh merged commit 5f295b0 into master Feb 17, 2022
@juanrh juanrh deleted the juanrh/ROX-9148 branch February 17, 2022 08:34
RTann pushed a commit that referenced this pull request Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants