Skip to content

ROX-10034: Make PodSecurityPolicies Opt-Out#1249

Merged
mtesseract merged 2 commits intomasterfrom
mc/psp-removal
Jun 8, 2022
Merged

ROX-10034: Make PodSecurityPolicies Opt-Out#1249
mtesseract merged 2 commits intomasterfrom
mc/psp-removal

Conversation

@mtesseract
Copy link
Contributor

@mtesseract mtesseract commented Apr 11, 2022

Description

ROX-10034

This PR is the first step towards deprecation of PSPs (PodSecurityPolicies), which are to be removed with Kubernetes 1.25.

More concretely, this PR...

  • implements Toggles within our Helm charts for enabling/disabling generation of PSP resources (defaulting to PSPs enabled).
  • implements CLI flag for roxctl for controlling the generation of PSPs for deployment bundles (defaulting to PSPs enabled).
  • extends deployment script to respect the $POD_SECURITY_POLICIES environment variable (defaulting to true)
  • extends CI to run a subset of tests on a Kubernetes cluster provisioned specifically with PSPs switched off.

For a follow-up PR (#1747) we can then change the above defaults so that

  • Helm charts and deployment bundles do not generate PSPs by default, making them opt-in.
  • Deployment scripts default to POD_SECURITY_POLICIES=false.
  • CI tests are executed (mostly?) with PSPs disabled.

Some more detailed explanations:

  • For the Helm charts I thought it makes most sense to introduce a setting for enabling PSPs, which currently defaults to true to retain current behaviour and will be changed to false in the future (before arrival of K8s 1.25).
  • On the other hand, for APIs I have implemented a toggle for disabling this. This is for backwards compatibility: This way one can use old roxctl version which are not aware of this toggle together with a new central and retrieve the expected output (i.e. with PSPs still there).
  • I have extracted the PSP related resources (ClusterRole, PodSecurityPolicy and RoleBinding) within the Helm charts to separate files, to keep them logically separated from the other non-deprecated resources.
  • For the Helm charts, the default values for the PSP toggle are injected via our meta-templating mechanism.

This PR should be safe to merge without any feature flags because it retains the current behaviour of producing PSPs while allowing to switch them off.

Note that this PR extends the meaning of an environment variable which is used in our CI pipeline: $POD_SECURITY_POLICIES=true/false. Irregardless of this PR the configuration of clusters to be provisionsed with respect to PSPs is based on this environment setting. E.g., $POD_SECURITY_POLICIES=false causes a cluster to be provisionsed with the PSP Admission Controller being disabled. With this PR the same environment variable is also taken into account by deployment scripts and controls whether StackRox itself should be installed including PSP resources or not.

Checklist

  • Investigated and inspected CI test results
  • Unit test and regression tests added
  • Evaluated and added CHANGELOG entry if required (don't think we need another CHANGELOG entry for this)
  • Determined and documented upgrade steps
  • Documented user facing changes (create PR based on stackrox/openshift-docs and merge into rhacs-docs)

Testing Performed

CI plus some manual tests.

@ghost
Copy link

ghost commented Apr 11, 2022

Tag for build #645421 is 3.70.x-307-gb13a08e56e.

💻 For deploying this image using the dev scripts, run the following first:

export MAIN_IMAGE_TAG='3.70.x-307-gb13a08e56e'

🕹️ A roxctl binary can be downloaded from the CircleCI artifacts.

@mtesseract
Copy link
Contributor Author

/retest

@mtesseract mtesseract changed the title Deactivate PSPs Make PodSecurityPolicies Opt-Out May 17, 2022
@mtesseract
Copy link
Contributor Author

/retest

@mtesseract
Copy link
Contributor Author

/test openshift-api-e2e-tests

@openshift-ci
Copy link

openshift-ci bot commented May 17, 2022

@mtesseract: The specified target(s) for /test were not found.
The following commands are available to trigger required jobs:

  • /test gke-upgrade-tests
  • /test style-checks

Use /test all to run all jobs.

Details

In response to this:

/test openshift-api-e2e-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@mtesseract mtesseract marked this pull request as ready for review May 17, 2022 21:32
@mtesseract mtesseract requested a review from a team May 17, 2022 21:32
@mtesseract mtesseract requested a review from a team as a code owner May 17, 2022 21:32
# Verify that submodules have been initialized, otherwise we cannot build much.
[ -f content/.git ] || { echo >&2 "Submodule docs/content is not initialized, please run 'git submodules update --init'" ; exit 1 ; }
[ -f tools/.git ] || { echo >&2 "Submodule docs/tools is not initialized, please run 'git submodules update --init'" ; exit 1 ; }
[ -f content/.git ] || { echo >&2 "Submodule docs/content is not initialized, please run 'git submodule update --init'" ; exit 1 ; }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a doc fix I stumbled over.

@SimonBaeumer SimonBaeumer self-requested a review May 18, 2022 12:24
@mtesseract mtesseract requested a review from porridge May 19, 2022 07:10
Copy link
Contributor

@SimonBaeumer SimonBaeumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @mtesseract!

Two major things I noticed:

  1. CI tests are too costly compared to how little it tests. Deploying to a cluster without PSPs works if PSPs are disabled, we only need to test that our Helm option does work properly.
  2. Speaking of DeprecatedPodSecurityPolicies is not optimal. The deprecation is relative to the used kubernetes version, rather mentioning that this is a setting for k8s < 1.25.

key: null # string
generate: null # bool
system:
enableDeprecatedPodSecurityPolicies: null # bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enableDeprecatedPodSecurityPolicies: null # bool
enablePodSecurityPolicies: null # bool

@@ -0,0 +1,69 @@
{{- include "srox.init" . -}}

{{- if and (not ._rox.scanner.disable) ._rox.system.enableDeprecatedPodSecurityPolicies }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{- if and (not ._rox.scanner.disable) ._rox.system.enableDeprecatedPodSecurityPolicies }}
{{- if and (not ._rox.scanner.disable) ._rox.system.enablePodSecurityPolicies }}

globalPrefix: null # string
system:
createSCCs: null # bool
enableDeprecatedPodSecurityPolicies: null # bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
enableDeprecatedPodSecurityPolicies: null # bool
enablePodSecurityPolicies: null # bool

@@ -1,5 +1,6 @@
{{- include "srox.init" . -}}

{{- if ._rox.system.enableDeprecatedPodSecurityPolicies }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{{- if ._rox.system.enableDeprecatedPodSecurityPolicies }}
{{- if ._rox.system.enablePodSecurityPolicies }}

SlimCollector *bool `json:"slimCollector"`
IstioVersion string `json:"istioVersion"`

DisableDeprecatedPodSecurityPolicies bool `json:"disableDeprecatedPodSecurityPolicies"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this field be inverted so that it aligns with the other configurations to enablePodSecurityPolicies?

Suggested change
DisableDeprecatedPodSecurityPolicies bool `json:"disableDeprecatedPodSecurityPolicies"`
DisableDeprecatedPodSecurityPolicies bool `json:"enablePodSecurityPolicies"`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, as I tried to explain in the PR description, I inverted the logic here on purpose for backwards compatibility reasons: If it is a positive toggle (enable...) and it is missing, then it would be implicitly understood to be false when an old roxctl is speaking to a new central.

When it is a negative toggle (disable...), then the absence would be understood as false, i.e. PSPs are enabled -- I guess this would follow the principle of the least surprise for users using older roxctl versions?


IstioVersion string `json:"istioVersion"`

DisableDeprecatedPodSecurityPolicies bool `json:"disableDeprecatedPodSecurityPolicies"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DisableDeprecatedPodSecurityPolicies bool `json:"disableDeprecatedPodSecurityPolicies"`
DisablePodSecurityPolicies bool `json:"disableDeprecatedPodSecurityPolicies"`

- check-label-to-skip-tests:
label: ci-no-qa-tests

gke-api-e2e-tests-no-psps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job testing it the Helm chart without PSPs but these tests look very costly to me to only test that our application can be installed into a cluster without PSPs.

In cases like these it makes more sense to have a forward test, so that our CI always runs our test suite against the next unstable kubernetes version under development.

To me PSPs seem sufficient to test via Helm test.

Copy link
Contributor

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider:

  1. including the ticket number in the PR title
  2. mentioning that POD_SECURITY_POLICIES is a variable we already use in CI to control PSP support in the control plane, and that we're also honoring it in deploy scripts
  3. splitting out doc fixes to a separate PR (to not lose them in case this change needs to be rolled back for whatever reason)

I don't know whether skipping PSP installation has any impact on the core product functionality, so just running the API e2e tests might be of limited usefulness. But then again, saying "I don't know" proves it's a good idea to run them :-) so +1 on this careful approach. As you say, we can flip the flag in the 3.71 release cycle and then remove this flavor. Let's use those CircleCI credits while we still have them!

@porridge
Copy link
Contributor

porridge commented May 19, 2022

Also, since this is intended to be a preparatory no-visible-impact change, I think you can tick those last two checkboxes already?

@mtesseract mtesseract changed the title Make PodSecurityPolicies Opt-Out ROX-10034: Make PodSecurityPolicies Opt-Out May 26, 2022
@mtesseract
Copy link
Contributor Author

Also, since this is intended to be a preparatory no-visible-impact change, I think you can tick those last two checkboxes already?

done

@mtesseract
Copy link
Contributor Author

  1. including the ticket number in the PR title

Done

  1. mentioning that POD_SECURITY_POLICIES is a variable we already use in CI to control PSP support in the control plane, and that we're also honoring it in deploy scripts

You mean in the PR description or where?

  1. splitting out doc fixes to a separate PR (to not lose them in case this change needs to be rolled back for whatever reason)

done

I don't know whether skipping PSP installation has any impact on the core product functionality, so just running the API e2e tests might be of limited usefulness. But then again, saying "I don't know" proves it's a good idea to run them :-) so +1 on this careful approach. As you say, we can flip the flag in the 3.71 release cycle and then remove this flavor. Let's use those CircleCI credits while we still have them!

Yeah, I am still not exactly sure what to do about the tests for the non-PSP case.

@porridge
Copy link
Contributor

  1. mentioning that POD_SECURITY_POLICIES is a variable we already use in CI to control PSP support in the control plane, and that we're also honoring it in deploy scripts

You mean in the PR description or where?

Yes. LGTM

@mtesseract mtesseract requested a review from SimonBaeumer May 27, 2022 07:27
@mtesseract mtesseract force-pushed the mc/psp-removal branch 3 times, most recently from 0cfdeb8 to 54bf9f1 Compare June 1, 2022 07:07
@ghost
Copy link

ghost commented Jun 1, 2022

Images are ready for the commit at b13a08e.

To use with deploy scripts, first export MAIN_IMAGE_TAG=3.70.x-326-ge7f0d592e3.

@mtesseract mtesseract force-pushed the mc/psp-removal branch 2 times, most recently from a7b931c to 67b38af Compare June 2, 2022 14:40
@mtesseract
Copy link
Contributor Author

mtesseract commented Jun 7, 2022

@SimonBaeumer The API e2e were removed from the CircleCI configuration (fon master, I mean). For the purpose of this PR I would like to leave it this way and skip no-psp tests for two reasons:

  1. It is a temporary situation anyway, because the second PR will need to be merged soon

and

  1. The previous CI test run gave positive results for the no-psp tests as they were previously configured.

Are you OK with this?

We could still open a PR for openshift/releases and extend the CI test suite there to run some smoke test in a non-PSP setting.

Copy link
Contributor

@SimonBaeumer SimonBaeumer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@openshift-ci
Copy link

openshift-ci bot commented Jun 8, 2022

@mtesseract: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-qa-e2e-tests b13a08e link false /test gke-qa-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@mtesseract
Copy link
Contributor Author

mtesseract commented Jun 8, 2022

@mtesseract mtesseract merged commit 1771472 into master Jun 8, 2022
@mtesseract mtesseract deleted the mc/psp-removal branch June 8, 2022 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants