Skip to content

ROX-9589 - Remove scanner slim prefixes#820

Merged
SimonBaeumer merged 5 commits intomasterfrom
sb/rox-9589-remove-scanner-slim-prefixes
Mar 7, 2022
Merged

ROX-9589 - Remove scanner slim prefixes#820
SimonBaeumer merged 5 commits intomasterfrom
sb/rox-9589-remove-scanner-slim-prefixes

Conversation

@SimonBaeumer
Copy link
Contributor

@SimonBaeumer SimonBaeumer commented Mar 4, 2022

Description

This PR removes the scanner-slim suffixes to reduce complexity in distinguishing between scanner instances.

The -slim suffix added complexity in different regions:

  • Reconciliation logic needs to distinguish between slim and full scanner resources
  • Selectors need to distinguish between slim and full
  • SecuredCluster deployed within Central endpoint must be auto-sensed between scanner.stackrox and scanner-slim.stackrox if deployed along with Central
  • Debugging was harder because it was harder to find resource names as scanner resources were abstracted behind the ._rox.scanner.name variable
  • Tests which would work for scanner don't work for scanner anymore

Despite this scanner and scanner-slim only differentiate that the slim variant is less cached.

Checklist

  • Investigated and inspected CI test results
  • Unit test and regression tests added
  • Evaluated and added CHANGELOG entry if required
  • Determined and documented upgrade steps

If any of these don't apply, please comment below.

Testing Manual

Important: Central's scanner is always active in all test scenarios.

  • SecuredCluster & Central deployed in the namespace
    • With scanner.disable=false Helm install fails because Central exists with a "not supported" message
    • With scanner.disable=true Helm install succeeds and tls issuer does not touch scanner-tls or scanner-db-tls secrets nor fails execution
  • SecuredCluster & Central deployed in a different namespace
    • With scanner.disable=false should succeed, scanner tls secrets should be created by Sensor

Setting up Helm

# enable feature flag
export ROX_LOCAL_IMAGE_SCANNING=true

# use openshift cluster
# issue init-bundles with Central

# Render Helm chart
$ roxctl helm outout stackrox-secured-cluster-services --debug

# Install Secured Cluster 
helm upgrade -n <namespace> --install --set scanner.disable=false -f init-bundle.yaml stackrox-secured-cluster-services ./stackrox-secured-cluster-services-chart

## If deployed into a different namespace it must be manually allowed with something like --set allowNonstandardNamespace=true

@ghost
Copy link

ghost commented Mar 4, 2022

Tag for build #277239 is 3.69.x-19-gc3f5e3a64e.

💻 For deploying this image using the dev scripts, run the following first:

export MAIN_IMAGE_TAG='3.69.x-19-gc3f5e3a64e'

📦 You can also generate an installation bundle with:

docker run -i --rm stackrox/main:3.69.x-19-gc3f5e3a64e central generate interactive > bundle.zip

🕹️ A roxctl binary artifact can be downloaded from CircleCI.

"wrong owner for scanner and scanner db secrets": {secretNames: []string{"scanner-slim-tls", "scanner-db-slim-tls"}},
"wrong owner for scanner secret": {secretNames: []string{"scanner-tls"}},
"wrong owner for scanner db secret": {secretNames: []string{"scanner-db-tls"}},
"wrong owner for scanner and scanner db secrets": {secretNames: []string{"scanner-tls", "scanner-db-slim-tls"}},
Copy link
Contributor Author

@SimonBaeumer SimonBaeumer Mar 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove scanner-db-slim-tls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another PR for this?

Copy link
Contributor Author

@SimonBaeumer SimonBaeumer Mar 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is a review comment of myself but did not fixed it yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor

@RTann RTann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but with two questions

[< if not .FeatureFlags.ROX_LOCAL_IMAGE_SCANNING >]
{{ include "srox.fail" "Scanner's slim mode currently not supported" }}
[< end >]
{{ $_ := set $scannerCfg "name" "scanner-slim" }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat unrelated, but I see full mode mention certs. Do we not do that here because the slim scanners get their certs from Central?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly!

"wrong owner for scanner and scanner db secrets": {secretNames: []string{"scanner-slim-tls", "scanner-db-slim-tls"}},
"wrong owner for scanner secret": {secretNames: []string{"scanner-tls"}},
"wrong owner for scanner db secret": {secretNames: []string{"scanner-db-tls"}},
"wrong owner for scanner and scanner db secrets": {secretNames: []string{"scanner-tls", "scanner-db-slim-tls"}},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there another PR for this?

@juanrh
Copy link
Contributor

juanrh commented Mar 4, 2022

Manual tests results for commit 0764658

# on commit 07646583c
git checkout -b sb/rox-9589-remove-scanner-slim-prefixes --track origin/sb/rox-9589-remove-scanner-slim-prefixes

# Launch openshift-4
infractl create openshift-4 juan-test --lifespan=24h
infractl list
infractl artifacts juan-test  --download-dir  ~/infra/juan-test
cat  ~/infra/juan-test/url
cat ~/infra/juan-test/kubeadmin-password && echo
export KUBECONFIG="$HOME/infra/juan-test/kubeconfig"

export ROX_LOCAL_IMAGE_SCANNING=true
roxctl helm output central-services --debug --remove
roxctl helm output  secured-cluster-services --debug --remove
$ find stackrox-secured-cluster-services-chart/ -name _scanner_init.tpl
stackrox-secured-cluster-services-chart//templates/_scanner_init.tpl

CLUSTER_NAME='os4'
DOCKER_PASS=..
helm -n stackrox install stackrox-central-services \
  ./stackrox-central-services-chart \
  --set clusterName=${CLUSTER_NAME} \
  --set imagePullSecrets.username=juanrhrox  \
  --set imagePullSecrets.password=${DOCKER_PASS}

ADMIN_PWD=...
kubectl -n stackrox port-forward svc/central 8443:443 &
rm cluster-init-bundle.yaml 
./bin/darwin/roxctl central init-bundles generate init_bundle --output cluster-init-bundle.yaml  --insecure-skip-tls-verify -p ${ADMIN_PWD}
INIT_BUNDLE_PATH=cluster-init-bundle.yaml  

#########
## Test: SecuredCluster & Central deployed in the namespace
#########
### PASS Test case 1: With scanner.disable=false Helm install fails because Central exists with a "not supported" message
helm -n stackrox install stackrox-secured-cluster-services ./stackrox-secured-cluster-services-chart \
  --set clusterName=${CLUSTER_NAME} \
  -f ${INIT_BUNDLE_PATH} \
  --set scanner.disable=false  \
  --set imagePullSecrets.username=juanrhrox  \
  --set imagePullSecrets.password=${DOCKER_PASS} 

# Fails as expected in image/templates/helm/stackrox-secured-cluster/templates/_init.tpl.htpl line 285
# FATAL ERROR:
# Local scanner is not supported to be deployed within a namespace running Central,
# Scanner must be deployed with the Central chart. To fix this error set scanner.disable=true and
# re-deploy.

#########
### PASS Test case 2:  With scanner.disable=true Helm install succeeds and tls issuer does not touch scanner-tls or scanner-db-tls secrets nor fails execution
function rox_get_pod_name_by_component {
  component="${1}"

  POD_NAME=$(kubectl -n stackrox get pods -l app.kubernetes.io/component=${component} -o jsonpath='{.items[0].metadata.name}')
  echo ${POD_NAME}
}
function rox_sensor_logs {
  kubectl logs -n stackrox $(rox_get_pod_name_by_component sensor) -f
}
helm -n stackrox install stackrox-secured-cluster-services ./stackrox-secured-cluster-services-chart \
  --set clusterName=${CLUSTER_NAME} \
  -f ${INIT_BUNDLE_PATH} \
  --set scanner.disable=true  \
  --set imagePullSecrets.username=juanrhrox  \
  --set imagePullSecrets.password=${DOCKER_PASS} 

$ helm -n stackrox ls  | grep secured
...
stackrox-secured-cluster-services       stackrox        1               2022-03-04 16:56:23.522308 +0100 CET    deployed   stackrox-secured-cluster-services-69.0.18-g07646583c4    3.69.x-18-g07646583c4


# Enable feature flag in sensor and check it doesn't touch the secrets
kubectl -n stackrox set env deployment sensor ROX_LOCAL_IMAGE_SCANNING=true
# PASS: cert refresher stops itself because it doesn't own the local scanner secrets
# kubernetes/localscanner: 2022/03/04 17:16:12.453143 cert_refresher.go:50: Error: non-recoverable error refreshing local scanner credentials, automatic refresh will be stopped: 2 errors occurred:
#        * unexpected owner for certificate secrets
#        * unexpected owner for certificate secrets
rox_sensor_logs

#########
## Test: SecuredCluster & Central deployed in a different namespace
#########
#########
### PASS Test case 1: With scanner.disable=false should succeed, scanner tls secrets should be created by Sensor
helm -n stackrox uninstall stackrox-secured-cluster-services
# no such namespace
$ kubectl get namespaces | grep another-ns

kubectl create namespace another-ns
# StackRox Secured Cluster Services 3.69.x-18-g07646583c4 has been installed.
helm -n another-ns install stackrox-secured-cluster-services ./stackrox-secured-cluster-services-chart \
  --set clusterName=${CLUSTER_NAME} \
  -f ${INIT_BUNDLE_PATH} \
  --set allowNonstandardNamespace=true \
  --set scanner.disable=false  \
  --set imagePullSecrets.username=juanrhrox  \
  --set centralEndpoint=central.stackrox.svc:443  \
  --set imagePullSecrets.password=${DOCKER_PASS} 

# scanner-tls and scanner-db-tls are missing
jrodrig-mac:stackrox jrodrig$ kubectl -n another-ns get secrets | grep -i scanner
scanner-db-password                                       Opaque                                1      23s
scanner-dockercfg-pftxh                                   kubernetes.io/dockercfg               1      21s
scanner-token-gn7cr                                       kubernetes.io/service-account-token   4      21s
scanner-token-trsf7                                       kubernetes.io/service-account-token   4      21s
jrodrig-mac:stackrox jrodrig$ 
$ 


# Enable feature flag in sensor and check it doesn't touch the secrets
kubectl -n another-ns set env deployment sensor ROX_LOCAL_IMAGE_SCANNING=true
kubectl -n another-ns get pods

kubectl -n stackrox set env deployment central ROX_LOCAL_IMAGE_SCANNING=true

# kubernetes/localscanner: 2022/03/04 18:16:46.968814 cert_refresher.go:92: Info: successfully refreshed local scanner credentials
kubectl logs -n another-ns sensor-759d68b675-ft6c5 -f

jrodrig-mac:stackrox jrodrig$ kubectl -n another-ns get secrets | grep -i scanner
scanner-db-password                                       Opaque                                1      16m
scanner-db-tls                                            Opaque                                3      11s
scanner-dockercfg-pftxh                                   kubernetes.io/dockercfg               1      16m
scanner-tls                                               Opaque                                3      11s
scanner-token-gn7cr                                       kubernetes.io/service-account-token   4      16m
scanner-token-trsf7                                       kubernetes.io/service-account-token   4      16m
jrodrig-mac:stackrox jrodrig$ 

jrodrig-mac:stackrox jrodrig$ kubectl get clusterroles | grep scanner
stackrox-another-ns-scanner-psp                                             2022-03-04T18:00:12Z
stackrox-scanner-psp                                                        2022-03-04T15:39:54Z
jrodrig-mac:stackrox jrodrig$ 

Copy link
Contributor

@juanrh juanrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Additional commit c3f5e3a after manual test just fixes a constant in a unit test

@SimonBaeumer SimonBaeumer merged commit 86de9d2 into master Mar 7, 2022
@SimonBaeumer SimonBaeumer deleted the sb/rox-9589-remove-scanner-slim-prefixes branch March 7, 2022 10:24
Copy link
Contributor

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What worries me most, is that the check for central deployment is racy and can have non-idempotent consequences. For example:

  1. SecuredCluster and Central CRs are created at the same time (this is perfectly legal given we now have init bundle autoprovisioning),
  2. operator happens to evaluate the secured cluster chart before it sees the Central and before the central Deployment exists,
  3. the first installation of this chart will install scanner.
  4. the when the central chart is evaluated, it will likely fail to process because of the clash with already-existing scanner.
  5. now we seem to have a wedged installation :-/

To be clear, I think temporary errors are fine, but I'd like to make sure that we don't end up in a situation which requires human action to resolve.


{{ if eq ._rox.scanner.disable false }}
{{ $centralDeployment := dict }}
{{ include "srox.safeLookup" (list $ $centralDeployment "apps/v1" "Deployment" $.Release.Namespace "central") }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates an implicit coupling between this file and the central deployment template. Not a big issue, but perhaps worth adding a comment there pointing to this file? 🤔

{{ $centralDeployment := dict }}
{{ include "srox.safeLookup" (list $ $centralDeployment "apps/v1" "Deployment" $.Release.Namespace "central") }}
{{ if $centralDeployment.result }}
{{ include "srox.fail" "Local scanner is not supported to be deployed within a namespace running Central, Scanner must be deployed with the Central chart. To fix this error set scanner.disable=true and re-deploy." }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{ include "srox.fail" "Local scanner is not supported to be deployed within a namespace running Central, Scanner must be deployed with the Central chart. To fix this error set scanner.disable=true and re-deploy." }}
{{ include "srox.fail" "Deploying scanner with the stackrox-secured-cluster-services chart is not supported within a namespace running Central. In this case, scanner must be deployed with the stackrox-central-services chart. To fix this error set scanner.disable=true and re-deploy." }}

@SimonBaeumer SimonBaeumer modified the milestones: 69.0-rc.2, 69.1-rc.1 Mar 8, 2022
@parametalol parametalol modified the milestones: 69.1-rc.1, 69.0-rc.2 Mar 9, 2022
RTann pushed a commit that referenced this pull request Mar 9, 2022
RTann pushed a commit that referenced this pull request Apr 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants