Skip to content

ROX-30577: Add process baseline autolocking to cluster config#16427

Closed
JoukoVirtanen wants to merge 38 commits intojv-ROX-30135-send-baselines-to-sensor-when-deployment-leaves-observationfrom
jv-add-proceess-baseline-autolocking-to-cluster-config
Closed

ROX-30577: Add process baseline autolocking to cluster config#16427
JoukoVirtanen wants to merge 38 commits intojv-ROX-30135-send-baselines-to-sensor-when-deployment-leaves-observationfrom
jv-add-proceess-baseline-autolocking-to-cluster-config

Conversation

@JoukoVirtanen
Copy link
Contributor

@JoukoVirtanen JoukoVirtanen commented Aug 18, 2025

Description

Adds auto locking to the cluster protobuf. Also makes it so that the cluster configuration is used to control auto locking. Thus it will be possible to control process baseline auto locking at the cluster level. The feature flag is still in place and in order to enable process baseline auto locking for a cluster the feature flag needs to be enabled and it needs to be enabled for the cluster via the cluster config.

After this change it will not be possible to control this new cluster field via helm or operator. That will be done in other PRs.

The PR to control process baseline auto-locking via helm can be found here #16462

It is possible to control the new setting via API.

This PR is built upon another PR that auto-locks process baselines and sends them to sensor #16077

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Set the following environment variables

export ROX_BASELINE_GENERATION_DURATION=5m
export ROX_AUTOLOCK_PROCESS_BASELINES=true

Deployed ACS.

Created a pod that could be used to run some processes and entered it.

kubectl run ubuntu-pod --image=ubuntu --restart=Never --command -- sleep infinity
kubectl exec ubuntu-pod -it -- /bin/bash

Logged into the UI and checked "Risk".

image

After more than five minutes the process baseline was still unlocked.

Ran the following script to enable process baseline auto locking for the cluster

#!/usr/bin/env bash
set -eou pipefail

ROX_ENDPOINT=${1:-https://localhost:8000}

start_time=$(date +%s)

json_clusters="$(curl --location --silent --request GET "${ROX_ENDPOINT}/v1/clusters" -k -H "Authorization: Bearer $ROX_API_TOKEN")"

json_cluster="$(echo "$json_clusters" | jq .clusters.[0])"
id="$(echo "$json_cluster" | jq -r .id)"

json_cluster="$(echo "$json_cluster" | jq '.dynamicConfig.autolockProcessBaseline.enabled = true')"
echo "$json_cluster" | jq

echo
echo
echo
echo

json_clusters_response="$(curl --location --silent --request PUT "${ROX_ENDPOINT}/v1/clusters/${id}" -k -H "Authorization: Bearer $ROX_API_TOKEN" --data "$json_cluster")"

json_clusters="$(curl --location --silent --request GET "${ROX_ENDPOINT}/v1/clusters" -k -H "Authorization: Bearer $ROX_API_TOKEN")"
echo "$json_clusters" | jq

Created another pod, entered it, and ran a command

kubectl run ubuntu-pod-2 --image=ubuntu --restart=Never --command -- sleep infinity
kubectl exec ubuntu-pod-2 -it -- /bin/bash
cat /proc/1/net/tcp
image

Initially the process baseline is unlocked.

image

After a little more than five minutes the baseline is locked.

Running a new process results in a violation

tac /proc/1/net/tcp
image

@openshift-ci
Copy link

openshift-ci bot commented Aug 18, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@red-hat-konflux
Copy link
Contributor

Caution

There are some errors in your PipelineRun template.

PipelineRun Error
central-db-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
main-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
operator-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
operator-bundle-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
retag-collector CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
retag-scanner-db-slim CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
retag-scanner-db CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
retag-scanner-slim CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
retag-scanner CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
roxctl-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
scanner-v4-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels
scanner-v4-db-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n body.pull_request.labels.exists(l, l.name == \"konflux-build\")\n )\n)\n" failed to evaluate: no such key: labels

@JoukoVirtanen JoukoVirtanen changed the base branch from master to jv-ROX-30135-send-baselines-to-sensor-when-deployment-leaves-observation August 18, 2025 03:05
@JoukoVirtanen JoukoVirtanen force-pushed the jv-add-proceess-baseline-autolocking-to-cluster-config branch from 622133b to d47f009 Compare August 18, 2025 17:03
@codecov
Copy link

codecov bot commented Aug 18, 2025

Codecov Report

❌ Patch coverage is 51.40845% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.13%. Comparing base (2b7fee0) to head (27e1369).

Files with missing lines Patch % Lines
central/graphql/resolvers/generated.go 10.52% 34 Missing ⚠️
central/detection/lifecycle/manager_impl.go 76.19% 11 Missing and 4 partials ⚠️
central/processbaseline/service/service_impl.go 22.22% 6 Missing and 1 partial ⚠️
...entral/processbaseline/datastore/datastore_impl.go 73.68% 3 Missing and 2 partials ⚠️
central/detection/lifecycle/manager.go 0.00% 4 Missing ⚠️
central/detection/lifecycle/singleton.go 0.00% 2 Missing ⚠️
central/alert/service/service_impl.go 0.00% 1 Missing ⚠️
central/processbaseline/service/singleton.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@                                            Coverage Diff                                             @@
##           jv-ROX-30135-send-baselines-to-sensor-when-deployment-leaves-observation   #16427    +/-   ##
==========================================================================================================
  Coverage                                                                     49.13%   49.13%            
==========================================================================================================
  Files                                                                          2641     2641            
  Lines                                                                        195674   195781   +107     
==========================================================================================================
+ Hits                                                                          96140    96200    +60     
- Misses                                                                        91995    92037    +42     
- Partials                                                                       7539     7544     +5     
Flag Coverage Δ
go-unit-tests 49.13% <51.40%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JoukoVirtanen JoukoVirtanen changed the title Add process baseline autolocking to cluster config ROX-30577: Add process baseline autolocking to cluster config Aug 18, 2025
@rhacs-bot
Copy link
Contributor

rhacs-bot commented Aug 19, 2025

Images are ready for the commit at 27e1369.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.9.x-572-g27e1369688.

@JoukoVirtanen JoukoVirtanen marked this pull request as ready for review August 19, 2025 16:57
@JoukoVirtanen JoukoVirtanen requested a review from a team as a code owner August 19, 2025 16:57
@clickboo
Copy link
Contributor

  1. What should be the expected behavior if someone turns this config on for a cluster? and then after a few days turns this off? Several baselines did get auto locked in that interim. So the auto locked baseline should not be used to generate violations after the config is turned off? Will we clean up the baselines in a way to say they are no longer locked or would they remain locked?
  2. How does this work with user locking baseline of a single deployment but turning the cluster level config for auto off?
  3. I see a RoxLocked function and a UserLocked one. Is Auto locked going to be a third type? Or is the same as RoxLocked? and you will also be using the StackRoxLockedTimestamp field?

@JoukoVirtanen
Copy link
Contributor Author

  1. What should be the expected behavior if someone turns this config on for a cluster? and then after a few days turns this off? Several baselines did get auto locked in that interim. So the auto locked baseline should not be used to generate violations after the config is turned off? Will we clean up the baselines in a way to say they are no longer locked or would they remain locked?
  2. How does this work with user locking baseline of a single deployment but turning the cluster level config for auto off?
  3. I see a RoxLocked function and a UserLocked one. Is Auto locked going to be a third type? Or is the same as RoxLocked? and you will also be using the StackRoxLockedTimestamp field?
  1. If the feature is turned off after being on all of the process baselines that were locked, will continue to be locked. All new deployments will not be locked without user action.

  2. There will be no change from the current behavior. If a user locks a baseline while this feature is disabled, the baseline will be locked in the usual way.

  3. For now UserLocked will be used for auto locking. It might not be 100% accurate as UserLocked implies that a user has taken action to lock the specific baseline, but it is the simplest solution. An AutoLock field can be added to process baselines in the future. Using RoxLocked and StackRoxLockedTimestamp would be incorrect as they are not used to trigger alerts on baseline violations. UserLocked is more accurately HardLocked and RoxLocked is more accurately SoftLocked. RoxLocked just means that new processes are anomalous, but alerts are not triggered. UserLocked means that alerts are triggered by anomalous processes.

@dashrews78
Copy link
Contributor

  1. What should be the expected behavior if someone turns this config on for a cluster? and then after a few days turns this off? Several baselines did get auto locked in that interim. So the auto locked baseline should not be used to generate violations after the config is turned off? Will we clean up the baselines in a way to say they are no longer locked or would they remain locked?
  2. How does this work with user locking baseline of a single deployment but turning the cluster level config for auto off?
  3. I see a RoxLocked function and a UserLocked one. Is Auto locked going to be a third type? Or is the same as RoxLocked? and you will also be using the StackRoxLockedTimestamp field?
  1. If the feature is turned off after being on all of the process baselines that were locked, will continue to be locked. All new deployments will not be locked without user action.
  2. There will be no change from the current behavior. If a user locks a baseline while this feature is disabled, the baseline will be locked in the usual way.
  3. For now UserLocked will be used for auto locking. It might not be 100% accurate as UserLocked implies that a user has taken action to lock the specific baseline, but it is the simplest solution. An AutoLock field can be added to process baselines in the future. Using RoxLocked and StackRoxLockedTimestamp would be incorrect as they are not used to trigger alerts on baseline violations. UserLocked is more accurately HardLocked and RoxLocked is more accurately SoftLocked. RoxLocked just means that new processes are anomalous, but alerts are not triggered. UserLocked means that alerts are triggered by anomalous processes.

This conversation has me thinking. I think I'm going to spend some more time on #16077 tomorrow. This has me wondering if we can isolate the changes further than we have.

I think question #1 is a very valid question that revolves around the experience and expectations of the user. Perhaps the result of that is what is locked is locked. Perhaps that has been discussed already. If it hasn't it probably should be as the user expectation matters. (Though I suspect the answer will be they can just add exclusions). If a baseline is locked with the auto feature will it show as user locked in the UI such that the user can toggle that if the feature is off. Which brings another point, if the feature is on, should that toggle switch be read-only?

@JoukoVirtanen
Copy link
Contributor Author

JoukoVirtanen commented Aug 19, 2025

Perhaps the result of that is what is locked is locked. Perhaps that has been discussed already. If it hasn't it probably should be as the user expectation matters. (Though I suspect the answer will be they can just add exclusions).

The first sentence seems simple, but given the context I am not 100% sure what is meant. Locked process baselines are locked until they are unlocked. Turning off the auto lock feature does not to me imply that previously locked process base are unlocked, only that process baselines will need to be manually locked in the future, if it is desired that they be locked.

If a baseline is locked with the auto feature will it show as user locked in the UI such that the user can toggle that if the feature is off.

Yes. Users need to know if a process baseline is locked or unlocked, so it needs to be displayed in the UI.

Which brings another point, if the feature is on, should that toggle switch be read-only?

I don't think we should be taking power away from users. I think there are legitimate reasons why a user might want to enable the auto locking feature in a cluster, but exempt a deployment or set of deployments.

@JoukoVirtanen
Copy link
Contributor Author

I have added a "Q&A" section to the design document with these questions and my answers to them https://docs.google.com/document/d/1t4O5sVhPt30Ikm5m7fw7XwC6NrviHktOOj8cxunsPF0/edit?usp=sharing

237bfd8 Lifecycle manager sends baselines to sensor
eee08dc Beter separation of baseline creation and inserting them into the database
30d4c46 Cleanup
f8fcb4b Added a feature flag
f092fcc Only setting the user lock timestamp in detection lifecycle manager if the autolock feature flag is enabled
278b1bc Creating message separate from sending it
01ac856 Not sending baselines to sensor if they already exists and are locked
@JoukoVirtanen JoukoVirtanen force-pushed the jv-ROX-30135-send-baselines-to-sensor-when-deployment-leaves-observation branch from 0af82d4 to b5bc597 Compare August 20, 2025 03:41
@JoukoVirtanen JoukoVirtanen requested a review from a team as a code owner August 20, 2025 03:41
@JoukoVirtanen JoukoVirtanen force-pushed the jv-add-proceess-baseline-autolocking-to-cluster-config branch from 27908d7 to 27e1369 Compare August 20, 2025 23:13
@JoukoVirtanen JoukoVirtanen force-pushed the jv-ROX-30135-send-baselines-to-sensor-when-deployment-leaves-observation branch from 8575a13 to 6c97e89 Compare August 27, 2025 17:16
@JoukoVirtanen JoukoVirtanen deleted the branch jv-ROX-30135-send-baselines-to-sensor-when-deployment-leaves-observation September 2, 2025 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants