ROX-9589 - Remove scanner slim prefixes#820
Conversation
|
Tag for build #277239 is 💻 For deploying this image using the dev scripts, run the following first: export MAIN_IMAGE_TAG='3.69.x-19-gc3f5e3a64e'📦 You can also generate an installation bundle with: docker run -i --rm stackrox/main:3.69.x-19-gc3f5e3a64e central generate interactive > bundle.zip🕹️ A |
| "wrong owner for scanner and scanner db secrets": {secretNames: []string{"scanner-slim-tls", "scanner-db-slim-tls"}}, | ||
| "wrong owner for scanner secret": {secretNames: []string{"scanner-tls"}}, | ||
| "wrong owner for scanner db secret": {secretNames: []string{"scanner-db-tls"}}, | ||
| "wrong owner for scanner and scanner db secrets": {secretNames: []string{"scanner-tls", "scanner-db-slim-tls"}}, |
There was a problem hiding this comment.
Remove scanner-db-slim-tls
There was a problem hiding this comment.
Is there another PR for this?
There was a problem hiding this comment.
No, this is a review comment of myself but did not fixed it yet.
RTann
left a comment
There was a problem hiding this comment.
LGTM, but with two questions
| [< if not .FeatureFlags.ROX_LOCAL_IMAGE_SCANNING >] | ||
| {{ include "srox.fail" "Scanner's slim mode currently not supported" }} | ||
| [< end >] | ||
| {{ $_ := set $scannerCfg "name" "scanner-slim" }} |
There was a problem hiding this comment.
Somewhat unrelated, but I see full mode mention certs. Do we not do that here because the slim scanners get their certs from Central?
| "wrong owner for scanner and scanner db secrets": {secretNames: []string{"scanner-slim-tls", "scanner-db-slim-tls"}}, | ||
| "wrong owner for scanner secret": {secretNames: []string{"scanner-tls"}}, | ||
| "wrong owner for scanner db secret": {secretNames: []string{"scanner-db-tls"}}, | ||
| "wrong owner for scanner and scanner db secrets": {secretNames: []string{"scanner-tls", "scanner-db-slim-tls"}}, |
There was a problem hiding this comment.
Is there another PR for this?
|
Manual tests results for commit 0764658 # on commit 07646583c
git checkout -b sb/rox-9589-remove-scanner-slim-prefixes --track origin/sb/rox-9589-remove-scanner-slim-prefixes
# Launch openshift-4
infractl create openshift-4 juan-test --lifespan=24h
infractl list
infractl artifacts juan-test --download-dir ~/infra/juan-test
cat ~/infra/juan-test/url
cat ~/infra/juan-test/kubeadmin-password && echo
export KUBECONFIG="$HOME/infra/juan-test/kubeconfig"
export ROX_LOCAL_IMAGE_SCANNING=true
roxctl helm output central-services --debug --remove
roxctl helm output secured-cluster-services --debug --remove
$ find stackrox-secured-cluster-services-chart/ -name _scanner_init.tpl
stackrox-secured-cluster-services-chart//templates/_scanner_init.tpl
CLUSTER_NAME='os4'
DOCKER_PASS=..
helm -n stackrox install stackrox-central-services \
./stackrox-central-services-chart \
--set clusterName=${CLUSTER_NAME} \
--set imagePullSecrets.username=juanrhrox \
--set imagePullSecrets.password=${DOCKER_PASS}
ADMIN_PWD=...
kubectl -n stackrox port-forward svc/central 8443:443 &
rm cluster-init-bundle.yaml
./bin/darwin/roxctl central init-bundles generate init_bundle --output cluster-init-bundle.yaml --insecure-skip-tls-verify -p ${ADMIN_PWD}
INIT_BUNDLE_PATH=cluster-init-bundle.yaml
#########
## Test: SecuredCluster & Central deployed in the namespace
#########
### PASS Test case 1: With scanner.disable=false Helm install fails because Central exists with a "not supported" message
helm -n stackrox install stackrox-secured-cluster-services ./stackrox-secured-cluster-services-chart \
--set clusterName=${CLUSTER_NAME} \
-f ${INIT_BUNDLE_PATH} \
--set scanner.disable=false \
--set imagePullSecrets.username=juanrhrox \
--set imagePullSecrets.password=${DOCKER_PASS}
# Fails as expected in image/templates/helm/stackrox-secured-cluster/templates/_init.tpl.htpl line 285
# FATAL ERROR:
# Local scanner is not supported to be deployed within a namespace running Central,
# Scanner must be deployed with the Central chart. To fix this error set scanner.disable=true and
# re-deploy.
#########
### PASS Test case 2: With scanner.disable=true Helm install succeeds and tls issuer does not touch scanner-tls or scanner-db-tls secrets nor fails execution
function rox_get_pod_name_by_component {
component="${1}"
POD_NAME=$(kubectl -n stackrox get pods -l app.kubernetes.io/component=${component} -o jsonpath='{.items[0].metadata.name}')
echo ${POD_NAME}
}
function rox_sensor_logs {
kubectl logs -n stackrox $(rox_get_pod_name_by_component sensor) -f
}
helm -n stackrox install stackrox-secured-cluster-services ./stackrox-secured-cluster-services-chart \
--set clusterName=${CLUSTER_NAME} \
-f ${INIT_BUNDLE_PATH} \
--set scanner.disable=true \
--set imagePullSecrets.username=juanrhrox \
--set imagePullSecrets.password=${DOCKER_PASS}
$ helm -n stackrox ls | grep secured
...
stackrox-secured-cluster-services stackrox 1 2022-03-04 16:56:23.522308 +0100 CET deployed stackrox-secured-cluster-services-69.0.18-g07646583c4 3.69.x-18-g07646583c4
# Enable feature flag in sensor and check it doesn't touch the secrets
kubectl -n stackrox set env deployment sensor ROX_LOCAL_IMAGE_SCANNING=true
# PASS: cert refresher stops itself because it doesn't own the local scanner secrets
# kubernetes/localscanner: 2022/03/04 17:16:12.453143 cert_refresher.go:50: Error: non-recoverable error refreshing local scanner credentials, automatic refresh will be stopped: 2 errors occurred:
# * unexpected owner for certificate secrets
# * unexpected owner for certificate secrets
rox_sensor_logs
#########
## Test: SecuredCluster & Central deployed in a different namespace
#########
#########
### PASS Test case 1: With scanner.disable=false should succeed, scanner tls secrets should be created by Sensor
helm -n stackrox uninstall stackrox-secured-cluster-services
# no such namespace
$ kubectl get namespaces | grep another-ns
kubectl create namespace another-ns
# StackRox Secured Cluster Services 3.69.x-18-g07646583c4 has been installed.
helm -n another-ns install stackrox-secured-cluster-services ./stackrox-secured-cluster-services-chart \
--set clusterName=${CLUSTER_NAME} \
-f ${INIT_BUNDLE_PATH} \
--set allowNonstandardNamespace=true \
--set scanner.disable=false \
--set imagePullSecrets.username=juanrhrox \
--set centralEndpoint=central.stackrox.svc:443 \
--set imagePullSecrets.password=${DOCKER_PASS}
# scanner-tls and scanner-db-tls are missing
jrodrig-mac:stackrox jrodrig$ kubectl -n another-ns get secrets | grep -i scanner
scanner-db-password Opaque 1 23s
scanner-dockercfg-pftxh kubernetes.io/dockercfg 1 21s
scanner-token-gn7cr kubernetes.io/service-account-token 4 21s
scanner-token-trsf7 kubernetes.io/service-account-token 4 21s
jrodrig-mac:stackrox jrodrig$
$
# Enable feature flag in sensor and check it doesn't touch the secrets
kubectl -n another-ns set env deployment sensor ROX_LOCAL_IMAGE_SCANNING=true
kubectl -n another-ns get pods
kubectl -n stackrox set env deployment central ROX_LOCAL_IMAGE_SCANNING=true
# kubernetes/localscanner: 2022/03/04 18:16:46.968814 cert_refresher.go:92: Info: successfully refreshed local scanner credentials
kubectl logs -n another-ns sensor-759d68b675-ft6c5 -f
jrodrig-mac:stackrox jrodrig$ kubectl -n another-ns get secrets | grep -i scanner
scanner-db-password Opaque 1 16m
scanner-db-tls Opaque 3 11s
scanner-dockercfg-pftxh kubernetes.io/dockercfg 1 16m
scanner-tls Opaque 3 11s
scanner-token-gn7cr kubernetes.io/service-account-token 4 16m
scanner-token-trsf7 kubernetes.io/service-account-token 4 16m
jrodrig-mac:stackrox jrodrig$
jrodrig-mac:stackrox jrodrig$ kubectl get clusterroles | grep scanner
stackrox-another-ns-scanner-psp 2022-03-04T18:00:12Z
stackrox-scanner-psp 2022-03-04T15:39:54Z
jrodrig-mac:stackrox jrodrig$ |
porridge
left a comment
There was a problem hiding this comment.
What worries me most, is that the check for central deployment is racy and can have non-idempotent consequences. For example:
SecuredClusterandCentralCRs are created at the same time (this is perfectly legal given we now have init bundle autoprovisioning),- operator happens to evaluate the secured cluster chart before it sees the Central and before the central
Deploymentexists, - the first installation of this chart will install scanner.
- the when the central chart is evaluated, it will likely fail to process because of the clash with already-existing scanner.
- now we seem to have a wedged installation :-/
To be clear, I think temporary errors are fine, but I'd like to make sure that we don't end up in a situation which requires human action to resolve.
|
|
||
| {{ if eq ._rox.scanner.disable false }} | ||
| {{ $centralDeployment := dict }} | ||
| {{ include "srox.safeLookup" (list $ $centralDeployment "apps/v1" "Deployment" $.Release.Namespace "central") }} |
There was a problem hiding this comment.
This creates an implicit coupling between this file and the central deployment template. Not a big issue, but perhaps worth adding a comment there pointing to this file? 🤔
| {{ $centralDeployment := dict }} | ||
| {{ include "srox.safeLookup" (list $ $centralDeployment "apps/v1" "Deployment" $.Release.Namespace "central") }} | ||
| {{ if $centralDeployment.result }} | ||
| {{ include "srox.fail" "Local scanner is not supported to be deployed within a namespace running Central, Scanner must be deployed with the Central chart. To fix this error set scanner.disable=true and re-deploy." }} |
There was a problem hiding this comment.
| {{ include "srox.fail" "Local scanner is not supported to be deployed within a namespace running Central, Scanner must be deployed with the Central chart. To fix this error set scanner.disable=true and re-deploy." }} | |
| {{ include "srox.fail" "Deploying scanner with the stackrox-secured-cluster-services chart is not supported within a namespace running Central. In this case, scanner must be deployed with the stackrox-central-services chart. To fix this error set scanner.disable=true and re-deploy." }} |
Description
This PR removes the scanner-slim suffixes to reduce complexity in distinguishing between scanner instances.
The
-slimsuffix added complexity in different regions:scanner.stackroxandscanner-slim.stackroxif deployed along with Central._rox.scanner.namevariableDespite this scanner and scanner-slim only differentiate that the slim variant is less cached.
Checklist
If any of these don't apply, please comment below.
Testing Manual
Important: Central's scanner is always active in all test scenarios.
scanner.disable=falseHelm install fails because Central exists with a "not supported" messagescanner.disable=trueHelm install succeeds and tls issuer does not touchscanner-tlsorscanner-db-tlssecrets nor fails executionscanner.disable=falseshould succeed, scanner tls secrets should be created by SensorSetting up Helm