ROX-30135: Auto-lock process baselines#16564
Conversation
|
Caution There are some errors in your PipelineRun template.
|
|
Images are ready for the commit at 5592667. To use with deploy scripts, first |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #16564 +/- ##
==========================================
- Coverage 48.72% 48.71% -0.01%
==========================================
Files 2658 2659 +1
Lines 198307 198531 +224
==========================================
+ Hits 96623 96724 +101
- Misses 94114 94224 +110
- Partials 7570 7583 +13
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/test ocp-4-12-qa-e2e-tests |
132c949 to
e023cb1
Compare
237bfd8 Lifecycle manager sends baselines to sensor eee08dc Beter separation of baseline creation and inserting them into the database 30d4c46 Cleanup f8fcb4b Added a feature flag f092fcc Only setting the user lock timestamp in detection lifecycle manager if the autolock feature flag is enabled 278b1bc Creating message separate from sending it 01ac856 Not sending baselines to sensor if they already exists and are locked
This reverts commit 64c4f44.
…it was called from the process baseline queue or process indicator queue
Co-authored-by: David Shrewsberry <99685630+dashrews78@users.noreply.github.com>
e023cb1 to
bbd664c
Compare
|
/test ocp-4-19-qa-e2e-tests |
|
/test ocp-4-19-qa-e2e-tests |
|
@JoukoVirtanen: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
When a deployment leaves the observation window a "user" locked process baseline is created for it automatically, which is persisted in the database and sent to sensor. This change is behind a feature flag. This is done in the detection lifecycle manager. Previously the detection lifecycle manager created a stackrox locked baseline and persisted it in the database without sending it to sensor.
The consequence of this change is that when the feature flag is enabled, there will be alerting for anomalous processes after the observation period, whereas before anomalous processes were merely flagged in "Risk".
There are two PRs built upon this PR.
Add process baseline autolocking to cluster config
#16427
Configure process baseline auto locking via helm
#16462
User-facing documentation
Testing and quality
ocp-4-19-qa-e2e-testsandocp-4-12-qa-e2e-testsfailed, but they also failed in the nightlies with the same error.Automated testing
How I validated my change
The observation period was set to 3m
The
e2e-test.shscript is used for testing.It has the following contents
It does the following steps
catcommand inside ittaccommand inside podlscommand inside the podbasenamecommand inside the podAfter each step there are API calls to check the status of the baseline, the processes that are associated with the pod and if they are anomalous, and alerts.
catcommand the following is the stateThe process baseline is unlocked, with the sleep and cat commands in the baseline. Neither of the processes are anomalous. There are no unauthorized process violations for the pod.
The process baseline is locked. There is no change to the processes or violations.
taccommandThe process baseline is unchanged. The
taccommand is listed as anomalous, and appears in violations.The process baseline is unlocked. There are no other changes.
lscommandThe
lscommand is anomalous, but does not appear in violations.The baseline is locked. There are no other changes.
basenamecommand is runThe
basenameprocess is anomalous and shows up in violations with thetacprocess.With the feature flag disabled
catcommand the following is the stateThe same as when the feature flag was enabled.
The process baseline is unlocked, with the sleep and cat commands in the baseline. Neither of the processes are anomalous. There are no unauthorized process violations for the pod.
No change. The process baseline remains unlocked. This is different from the case where the feature flag was enabled.
taccommandNo change to the process baseline. The
taccommand is anomalous, but does not show up in violations.The process baseline was already unlocked so there is no change.
lscommandThe
lscommand is anomalous, but does not show up in violations.The process baseline is locked. There are no other changes.
basenamecommand is runThe
basenameprocess is anomalous and appears in violations.Testing on master
The results on master were the same as running on this branch with feature flag disabled.
Testing on this branch with feature flag disabled and manually locking after three minutes
The results were the same as having the feature flag enabled on this branch and running with manually locking the process baseline after three minutes on master.
Testing on master with manually locking after three minutes
The results were the same as with this branch and the feature flag enabled.
Testing with early locking
The observation period was increased to 5m
The following script was used for testing what happens when a process baseline is locked early
e2e-early-lock.sh
The script used for testing did the following steps.
catcommand inside ittaccommand inside podlscommand inside the podbasenamecommand inside the podThis branch with feature flag enabled
catcommand the following is the stateThe process baseline is unlocked, with
catandsleepin the baseline. Neither process is anomalous. There are no violations.The process baseline is locked. There is no other change.
taccommandThe
tacprocess is anomalous and appears in violations.The process baseline is unlocked. The
tacprocess is no longer considered anomalous.lscommandThe
lscommand appears in the baseline. There continues to be an alert fortac.The process baseline is locked. The
tacprocess becomes anomalous again.basenamecommandThe
basenameprocess is anomalous. Thebasenameandtacprocess are both in violations.The deployment should leave the observation period during this time. There was no change.
Testing early locking with the feature flag disabled
The results were the same as when the feature flag was enabled.
Testing early locking on master
The results were the same as this branch with and without the feature enabled.