ROX-8790: Create gRPC endpoint to generate Scanner certificate by juanrh · Pull Request #219 · stackrox/stackrox

juanrh · 2022-01-03T13:18:45Z

Description

Create new gRPC endpoint in Central that Sensor can use to ask for the TLS certificates for a local Scanner. See design doc for additional context.

Note: this also has includes commits from #211, which hasn't been merged. First relevant commit is 76d65ee.

Checklist

Investigated and inspected CI test results
Unit test and regression tests added
Evaluated and added CHANGELOG entry if required
Determined and documented upgrade steps

This creates a new gRPC endpoint, so it doesn't change the behaviour of the system, therefore there no need to add neither a CHANGELOG entry nor upgrade instructions.

Testing Performed

Added additional unit test.

ghost · 2022-01-03T13:35:09Z

Tag for build #107531 is 3.67.x-267-g747e03b381.

💻 For deploying this image using the dev scripts, run the following first:

export MAIN_IMAGE_TAG='3.67.x-267-g747e03b381'

📦 You can also generate an installation bundle with:

docker run -i --rm stackrox/main:3.67.x-267-g747e03b381 central generate interactive > bundle.zip

🕹️ A roxctl binary artifact can be downloaded from CircleCI.

central/localscanner/service.go

porridge · 2022-01-04T09:23:39Z

central/localscanner/service.go

Shouldn't this be inferred from the connection credentials, rather than specified explicitly by the client?
As things are now, this would allow cluster A to generate scanner certs for cluster B, which shouldn't be allowed or necessary IMHO?

Now getting the cluster id from authn.IdentityFromContextOrNil(ctx) following what is done in certdistribution service. Please let me know if that works

Here followed what it's done in serviceImpl.Communicate and serviceImpl.getClusterForConnection from central/sensor/service/service_impl.go to get the cluster id using authn.IdentityFromContext. However, after investigating how authn.IdentityFromContext works, I understand this would use as cluster id whatever was used for the Identifier field of the mtls.Subject that was used to issue the sensor certificates. Except for statically registered clusters, that should always be either centralsensor.RegisteredInitCertClusterID or centralsensor.EphemeralInitCertClusterID, not the cluster id that Central eventually assigns during the hello protocol, that is returned in the CentralHello message.

I also see that Sensor caches CentralHello.cluster_id in /var/cache/stackrox/cluster-id (with helmconfig.StoreCachedClusterID in centralCommunicationImpl.initialSync), and adds it to sensorHello.HelmManagedConfigInit.ClusterId when it is available (with helmconfig.LoadCachedClusterIDin centralCommunicationImpl.sendEvents), which I guess happens on a restart of the sensor Pod, when the sensor has already performed the hello protocol once, and already got a cluster id from central. This is an example of an existing gRPC in central that receives the cluster id twice, once in the request context as part of the TLS auth, and another in an argument of the rpc operation. This makes sense to me because, in order to be able to get the cluster id just from the auth, Sensor would need to create new TLS certificates that use the cluster id returned in CentralHello.cluster_id for the Identity field of the mtls.Subject, and then restart the connection to Central by calling client.Communicate with the new credentials, which is not currently being done.

So I was thinking on the following:

adding a parameter for the cluster id to IssueLocalScannerCertsRequest, and in central/localscanner/service.go for authorizeAndGetClusterID do something similar to what it's done in getClusterForConnection at central/sensor/service/service_impl.go that uses centralsensor.GetClusterID to check that the cluster id obtained from the auth is compatible with the one sent in the request. That would cover "would allow cluster A to generate scanner certs for cluster B", at least partially.

On a future PR, for the client side I was thinking of adding a public method to sensor/common/sensor/sensor.go to return s.centralConnection, and pass that to the operator in sensor/kubernetes/main.go. Checking how grpcUtil.LazyClientConn works, that would block any operation on sensor that uses s.centralConnection until the connection setup is completed in sensor.gRPCConnectToCentralWithRetries that calls s.centralConnection.Set(centralConnection). I could then get the cluster id from clusterid.Get(), which also blocks until centralCommunicationImpl.initialSync calls clusterid.Set(clusterID) with the cluster id returned in CentralHello. That would ensure that the operator only calls IssueLocalScannerCerts when the connection to central is ready, and using the cluster id returned by central during hello.

Do you think that could work?

I agree with the first bullet point.

As for the second bullet point, I'm not sure all this complexity is worth it currently. Perhaps for a request from a sensor that presents an init cert cluster ID we just create an init-style (wildcard ID) scanner certificates?

After all, if Mallory has an init bundle, he could impersonate any cluster created with that bundle. (Potentially any other cluster as well, I don't know whether our code checks that.)

Once we tackle https://issues.redhat.com/browse/ROX-8091 we could tighten this up.

WDYT @SimonBaeumer ?

Hm, yeah the complexity worries me too.
In the beginning I had the idea of an "embedded" operator running orthogonal to Sensor but given authn/authz complexities emerging now I would rather take a step back and ignore this for now.
Running the secret creation as part of Sensor itself and using the already established bi-directional connection to me looks like the better approach without speculative improvements.

The only downside I see is that the cert is generated only after the initial syncs.

For using the gRPC stream the message needs to be dispatched which is done in the connection_impl.go:runRecv which calls the connection_impl.go:handleMessage.

Hi Simon,

I've saved the previous prototype using the embedded operator as client in juanrh/ROX-8752-embedded-operator.
I'll now pivot to the new approach you outline. Just so we are on the same page, let me write down what I understand here:

The client will be now sensor/common/sensor/sensor.go

Instead of a new gRPC service LocalScannerService we use the existing bidirectional rcp Communicate from service SensorService, which implies adding new variants to message MsgFromSensor and message MsgToSensor, and using the methods you point out to handle the new messsages.

With initial sync I understand you mean the Hello protocol, because s.manager.HandleConnection(server.Context(), sensorHello, cluster, eventPipeline, server) is only called in serviceImpl.Communicate after the Centrall hello response is sent in server.Send(&central.MsgToSensor{Msg: &central.MsgToSensor_Hello{Hello: centralHello}})

I'll send a draft PR when I have to prototype to foster alignment.

The client will be now sensor/common/sensor/sensor.go

Yes, more specifically started from the Sensor start-up routine and not outside of it (i.e. in main.go).

Instead of a new gRPC service LocalScannerService we use the existing bidirectional rcp Communicate from service SensorService, which implies adding new variants to message MsgFromSensor and message MsgToSensor, and using the methods you point out to handle the new messsages.

Exactly

With initial sync I understand you mean the Hello protocol, because s.manager.HandleConnection(server.Context(), sensorHello, cluster, eventPipeline, server) is only called in serviceImpl.Communicate after the Centrall hello response is sent in server.Send(&central.MsgToSensor{Msg: &central.MsgToSensor_Hello{Hello: centralHello}})

Yes. To clarify I meant the time to wait until the first scanner cert can be generated in Central.

porridge

I think this makes sense. Can you please rebase on the tip of the other branch for a final pass?

central/localscanner/certificates.go

central/localscanner/service.go

Also add validations on caller identity

proto/internalapi/central/local_scanner.proto

SimonBaeumer · 2022-01-11T09:03:12Z

central/localscanner/service.go

Hm, yeah the complexity worries me too.
In the beginning I had the idea of an "embedded" operator running orthogonal to Sensor but given authn/authz complexities emerging now I would rather take a step back and ignore this for now.
Running the secret creation as part of Sensor itself and using the already established bi-directional connection to me looks like the better approach without speculative improvements.

The only downside I see is that the cert is generated only after the initial syncs.

For using the gRPC stream the message needs to be dispatched which is done in the connection_impl.go:runRecv which calls the connection_impl.go:handleMessage.

SimonBaeumer

Cool to see how you already navigate around the source!

Imho let's handle the cert gen message inside the stream instead of doing it outside to prevent duplicating logic and issues due to cluster initialization.

central/localscanner/certificates_test.go

central/localscanner/service.go

pkg/mtls/crypto.go

SimonBaeumer

This looks already very promising!
@misberner could you take a look at the review & changes as well?

central/localscanner/certificates.go

central/localscanner/certificates_test.go

central/sensor/service/connection/connection_impl.go

SimonBaeumer · 2022-01-12T15:28:35Z

central/sensor/service/connection/connection_impl.go

+	certs, err := localscanner.IssueLocalScannerCerts(namespace, c.clusterID)
+	errMsgTemplate := "Error issuing local Scanner certificates for cluster with ID %s and namespace %s"
+	if err != nil {
+		return errors.Wrapf(err, errMsgTemplate, c.clusterID, namespace)


Hm, with this approach Sensor never get notified that the certificate was not generated. Imho Sensor should expect a valid response, even if containing an error (and Sensor should imho expect a response).

That's right, I modified IssueLocalScannerCertsResponse to use a oneof that returns either the certificates or the error message

central/sensor/service/connection/connection_impl.go

misberner · 2022-01-12T22:28:06Z

This looks already very promising! @misberner could you take a look at the review & changes as well?

I'm happy to TAL, but given the size of this PR I won't be able to do so before Friday at best. Is there maybe some subset of the PR that you specifically would like me to take a look at?

juanrh · 2022-01-13T16:43:46Z

This looks already very promising! @misberner could you take a look at the review & changes as well?

I'm happy to TAL, but given the size of this PR I won't be able to do so before Friday at best. Is there maybe some subset of the PR that you specifically would like me to take a look at?

Thanks for taking a look @misberner. Here we are adding new variants to message MsgFromSensor and message MsgToSensor so Sensor can make a request to Central through the existing rpc Communicate. So I'd ask your feedback about the changes to central/sensor/service/connection/connection_impl.go and the proto files.

`envisolator` doesn't enable env vars in release builds

central/localscanner/certificates.go

central/localscanner/certificates_test.go

central/sensor/service/connection/connection_impl.go

central/sensor/service/connection/connection_test.go

pkg/mtls/crypto.go

proto/internalapi/central/local_scanner.proto

misberner · 2022-01-13T23:24:31Z

proto/internalapi/central/local_scanner.proto

+
+package central;
+
+message LocalScannerCertificates {


So, per the design doc, the reason to not request the certificates via the SensorHello protocol is that the local scanner cert's validity might be shorter than the lifetime of the sensor<>central connection. That makes sense. However, I find it unfortunate that you deviate from the format that significantly. Encoding the service name into the file name might not be the most elegant thing, but neither is sending two distinct ca certificates even if they can only ever possibly be the same.

If you want to keep it more explicitly structured, I'd still suggest to generalize this message to message ServiceCertificate (there really isn't anything scanner-specific here), and then structure request/response as follows:

message TypedServiceCertificate { storage.ServiceType service_type = 1; ServiceCertificate cert = 2; } message IssueServiceCertsRequest { repeated storage.ServiceType service_types = 1; } message IssueServiceCertsResponse { repeated TypedServiceCertificate service_certs = 1; }

this will hardly be any harder to use, but way more future-proof. However, I would maintain that the map<string, string> cert_bundle approach is perfectly fine as well.

You are right. I modified the protos to avoid sending the ca twice, and avoid local scanner specific stuff when it's not needed. However, I still think IssueLocalScannerCertsRequest make sense because local scanner certificates should have a specific expiration time of days. Let me know what you think

Makes sense about the expiration, though I'd say that could be addressed by an extra parameter. In the end, we might want to switch to more short-lived certificates for other services as well.

Also add test to check the right feature flag is being used.

Also extend test cases to cover more missing parameters combinations

central/localscanner/certificates_test.go

central/sensor/service/connection/connection_test.go

pkg/mtls/crypto.go

Co-authored-by: Malte Isberner <2822367+misberner@users.noreply.github.com>

instead of assertions with info

SimonBaeumer

Looks good to me!
Please wait for @porridge comment on the race condition in subtests before merging 👍

SimonBaeumer · 2022-01-17T10:23:06Z

central/sensor/service/connection/connection_test.go

+	for tcName, tc := range testCases {
+		s.Run(tcName, func() {


@porridge Is here a possible race? I remember you working on a problem here in tests.

There was rumour from someone (unfortunately don't remember who that was) that the suite tests are run in parallel by default, in which case this would be racy. I vaguely remember the same person later claimed they might have been wrong. So it's inconclusive.
If you'd like to be safe, just add tc := tc right before s.Run() and mention something // TODO(ROX-8730): just in case this is racy so we remember to remove it if this turns out to not be necessary.

Never heard that, but I can promise you that suite tests are not run in parallel, and in fact cannot be run in parallel

There. So it's safe.

I'll make that change anyway, as it doesn't hurt

porridge

Just a few optional nitpicks inline.
This looks great!

central/sensor/service/connection/connection_impl.go

central/sensor/service/connection/connection_test.go

central/localscanner/certificates.go

This reverts commit af418a3.

juanrh requested review from SimonBaeumer and porridge January 3, 2022 13:18

github-actions bot added the area/central label Jan 3, 2022

porridge reviewed Jan 4, 2022

View reviewed changes

juanrh marked this pull request as ready for review January 4, 2022 16:54

porridge requested changes Jan 5, 2022

View reviewed changes

central/localscanner/certificates.go Outdated Show resolved Hide resolved

central/localscanner/service.go Outdated Show resolved Hide resolved

Juan Rodriguez Hortala added 10 commits January 5, 2022 11:46

Initial code for central service to generate local scanner certificates

f9dbf7b

fix style issues

e19fc2e

Add unit test for LocalScannerService

1d5fda9

Simplify IssueLocalScannerCerts

c3b0959

remove redundant field

71d7f13

Infer cluster id from request context

2a087bd

Reorder func to have entry point on top, and aux funcs dowmn

7a35476

Memoize CAForSigning

ff0aee4

fix code style

d1b7d62

Number proto message fields starting in 1

3e23314

juanrh force-pushed the juanrh/ROX-8790 branch from cf581a1 to 3e23314 Compare January 5, 2022 13:43

juanrh requested a review from porridge January 5, 2022 13:43

porridge requested changes Jan 5, 2022

View reviewed changes

central/localscanner/service.go Outdated Show resolved Hide resolved

central/localscanner/service.go Outdated Show resolved Hide resolved

Properly infer cluster id from request context

b1d908e

Also add validations on caller identity

juanrh requested a review from porridge January 5, 2022 16:12

SimonBaeumer reviewed Jan 11, 2022

View reviewed changes

SimonBaeumer suggested changes Jan 11, 2022

View reviewed changes

central/localscanner/certificates_test.go Outdated Show resolved Hide resolved

central/localscanner/certificates_test.go Outdated Show resolved Hide resolved

central/localscanner/service.go Outdated Show resolved Hide resolved

pkg/mtls/crypto.go Show resolved Hide resolved

Replace new gRPC service with new messages in SensorService.Communicate

225f4db

juanrh mentioned this pull request Jan 11, 2022

ROX-8752: Keep local scanner certificates up to date #278

Closed

4 tasks

fix checkstyle

2057450

SimonBaeumer suggested changes Jan 12, 2022

View reviewed changes

SimonBaeumer requested a review from misberner January 12, 2022 15:36

Juan Rodriguez Hortala added 3 commits January 13, 2022 11:48

enable IssueLocalScannerCerts feature flag for tests

c3c97b3

Return a failure message on local certificate issue error

6951eb8

Add test for processIssueLocalScannerCertsRequest

b88111d

Skip tests when feature flag dependency is disabled

83e208c

`envisolator` doesn't enable env vars in release builds

misberner suggested changes Jan 13, 2022

View reviewed changes

Juan Rodriguez Hortala added 8 commits January 14, 2022 09:22

Use features.LocalImageScanning always directly

8921d85

Also add test to check the right feature flag is being used.

Quote namespace in error log

4fe13ce

Use assertion methods directly insted of through s.Assert()

1f5bd75

Make sure the result of handleMessage is always checked

1a1b1fd

Add format to proto field names for certs and keys

7bfbd6c

get namespace from sensor hello insteadof request parameter

a2b5cef

Also extend test cases to cover more missing parameters combinations

Avoid redundancies and generalize proto messages

a771af3

Add request id for pairing responses with their requests

40a6fa0

misberner approved these changes Jan 14, 2022

View reviewed changes

Juan Rodriguez Hortala and others added 5 commits January 14, 2022 14:09

Use Len assertion instead of Equal of len

b1703fe

Co-authored-by: Malte Isberner <2822367+misberner@users.noreply.github.com>

Use require to avoid panic later on in test

2e3b639

Co-authored-by: Malte Isberner <2822367+misberner@users.noreply.github.com>

use proper list comparison instead of a loop and len check

06e8eaa

use subtest for all test tables

98bb61b

instead of assertions with info

use require instead of assert to prevent potential test panic

7356d0e

SimonBaeumer approved these changes Jan 17, 2022

View reviewed changes

Make test resilient to eventual parallel test support in suites

af418a3

porridge approved these changes Jan 17, 2022

View reviewed changes

central/sensor/service/connection/connection_impl.go Outdated Show resolved Hide resolved

central/sensor/service/connection/connection_test.go Outdated Show resolved Hide resolved

central/localscanner/certificates.go Outdated Show resolved Hide resolved

Juan Rodriguez Hortala added 3 commits January 17, 2022 14:04

improve readability of certificate creation code

df63203

Revert "Make test resilient to eventual parallel test support in suites"

8139ea7

This reverts commit af418a3.

explicitly return named return values

747e03b

juanrh merged commit 2db707e into master Jan 17, 2022

juanrh deleted the juanrh/ROX-8790 branch January 17, 2022 13:37

RTann pushed a commit that referenced this pull request Apr 6, 2022

ROX-8790: Create gRPC endpoint to generate Scanner certificate (#219)

ff80e8f

vladbologa mentioned this pull request Sep 18, 2024

ROX-25949: gRPC endpoint to return Secured Cluster TLS certificates #12740

Merged

9 tasks

Conversation

juanrh commented Jan 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Testing Performed

Uh oh!

ghost commented Jan 3, 2022 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juanrh Jan 10, 2022 • edited by SimonBaeumer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SimonBaeumer Jan 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

porridge left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SimonBaeumer Jan 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SimonBaeumer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SimonBaeumer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

misberner commented Jan 12, 2022

Uh oh!

juanrh commented Jan 13, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

juanrh commented Jan 3, 2022 •

edited

Loading

ghost commented Jan 3, 2022 •

edited by ghost

Loading

juanrh Jan 10, 2022 •

edited by SimonBaeumer

Loading

SimonBaeumer Jan 11, 2022 •

edited

Loading

SimonBaeumer Jan 11, 2022 •

edited

Loading