ROX-33252: optimize ViolationsMultiplier to query what it needs by dashrews78 · Pull Request #19115 · stackrox/stackrox

dashrews78 · 2026-02-19T15:47:28Z

ViolationsMultiplier.Score() only needs policy_name and policy_severity from alerts but was deserializing full Alert protobuf blobs via SearchListAlerts. This is grossly unnecessary because we only need 2 fields both of which are populated column. Getting the entire serialized object and unmarshalling it is a very expensive and unnecessary.

Added SearchAlertPolicyNamesAndSeverities which uses RunSelectRequestForSchemaFn to query only the two needed columns directly

Follows the same pattern as central/secret/datastore for DB access
(ROX-31142).

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Description

change me!

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
CI results are inspected

Automated testing

How I validated my change

Added unit tests. Existing tests. CI. long running cluster.

dashrews78 · 2026-02-19T15:47:30Z

This change is part of the following stack:

ROX-33252: optimize ViolationsMultiplier to query what it needs #19115 ◀

_{Change managed by git-spice.}

openshift-ci · 2026-02-19T15:47:32Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

rhacs-bot · 2026-02-19T16:13:05Z

Images are ready for the commit at 2994031.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-164-g2994031a4b.

codecov · 2026-02-19T19:59:41Z

Codecov Report

❌ Patch coverage is 81.25000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.52%. Comparing base (573923b) to head (2994031).
⚠️ Report is 10 commits behind head on master.

Files with missing lines	Patch %	Lines
central/alert/datastore/datastore_impl.go	85.71%	2 Missing and 1 partial ⚠️
central/alert/datastore/singleton.go	0.00%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #19115   +/-   ##
=======================================
  Coverage   49.52%   49.52%           
=======================================
  Files        2672     2672           
  Lines      201665   201686   +21     
=======================================
+ Hits        99870    99895   +25     
+ Misses      94337    94334    -3     
+ Partials     7458     7457    -1

Flag	Coverage Δ
go-unit-tests	`49.52% <81.25%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dashrews78 · 2026-02-19T21:11:09Z

I did ask Claude to evaluate the impact of this change with the assumption of 10K deployments that have on average 6 active violations. This is the quantification that Claude provided which is likely reasonable. Before this change this was a very expensive operation.

Summary

ViolationsMultiplier.Score() only needs policy_name and policy_severity from alerts but was deserializing full storage.Alert protobuf blobs via SearchListAlerts. The optimization replaces this with a column projection query using RunSelectRequestForSchema, fetching only the two needed columns directly from PostgreSQL.

Call Frequency

ViolationsMultiplier.Score() is the first multiplier in the deployment scoring pipeline and is called on:

Every policy violation detection (build-time, deploy-time, runtime)
Every deployment change (via sensor pipeline)
Service account and process baseline changes
Periodic batch reprocessing every 10 minutes (deduped)
Full reprocessing of all deployments every 4 hours
Up to 15 concurrent risk scoring operations per Central instance (semaphore-controlled)

Query Change

Old:

SELECT alerts.serialized FROM alerts WHERE deployment_id=$1 AND state=$2

Transfers entire serialized protobuf blob per row
Deserializes full storage.Alert proto (all nested messages)
Converts to ListAlert (allocates another struct, copies ~11 fields)
Caller reads 2 fields, discards the rest

New:

SELECT alerts.policy_name, alerts.policy_severity FROM alerts WHERE deployment_id=$1 AND state=$2

Transfers only a varchar + integer per row
No protobuf deserialization
Scans directly into a 2-field struct

Assumptions

Typical serialized storage.Alert protobuf: ~2-5KB (includes policy with all fields, violation messages, deployment metadata, timestamps, enforcement info, nested messages)
The two projected columns: policy_name (varchar, ~30-50 bytes) + policy_severity (integer, 4 bytes) ~ 50 bytes per row
Each UnmarshalVTUnsafe of a full Alert allocates the top-level object plus every nested message (Policy, Deployment, Violation list entries, ProcessIndicators, NetworkFlows, timestamps) — easily 20+ allocations per alert
WHERE clause and index usage are identical (deployment_id and state are both btree indexed)

Scenario: 10,000 Deployments, 3 Violations Each (30,000 alerts)

Per Score() Call (1 deployment, 3 alerts)

Metric	Old Path	New Path
SQL query	`SELECT serialized` (bytea blob)	`SELECT policy_name, policy_severity`
Data from Postgres	~6-15KB	~150 bytes
Proto deserializations	3 full `storage.Alert` unmarshals	0
Heap allocations	3 `Alert` + 3 `ListAlert` + all nested messages	3 two-field structs

Per Full Reprocessing Cycle (10,000 deployments)

Metric	Old Path	New Path	Reduction
Data transferred from Postgres	~60-150MB	~1.5MB	~98%
Protobuf deserializations	30,000	0	100%
Heap object allocations	~600,000+ (nested messages)	30,000	~95%
GC pressure	Significant	Minimal

Scenario: 10,000 Deployments, 6 Violations Each (60,000 alerts)

Per Score() Call (1 deployment, 6 alerts)

Metric	Old Path	New Path
Data from Postgres	~12-30KB	~300 bytes
Proto deserializations	6 full `storage.Alert` unmarshals	0
Heap allocations	6 `Alert` + 6 `ListAlert` + all nested messages	6 two-field structs

Per Full Reprocessing Cycle (10,000 deployments)

Metric	Old Path	New Path	Reduction
Data transferred from Postgres	~120-300MB	~3MB	~98%
Protobuf deserializations	60,000	0	100%
Heap object allocations	~1,200,000+	60,000	~95%

What This Doesn't Change

WHERE clause and index usage are identical — query planning cost is the same
SAC (Scoped Access Control) filtering still runs
The scoring math (severityImpact, NormalizeScore, factor sorting) is unchanged and cheap
Savings scale linearly with violation count per deployment

ViolationsMultiplier.Score() only needs policy_name and policy_severity from alerts but was deserializing full Alert protobuf blobs via SearchListAlerts. Add SearchAlertPolicyNamesAndSeverities which uses RunSelectRequestForSchema to query only the two needed columns directly, avoiding protobuf deserialization entirely. Follows the same pattern as central/secret/datastore for DB access (ROX-31142). Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Move the lightweight projection type to a dedicated views package under central/alert/ so future projection types can be co-located. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add GetPolicyName() and GetSeverity() methods so callers use accessors instead of direct field access. GetSeverity() returns storage.Severity directly, removing the need for manual casts. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Sort imports alphabetically and pass DB pool to New() in datastore_impl_test.go. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace deprecated RunSelectRequestForSchema with the callback-based RunSelectRequestForSchemaFn. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add integration tests in datastore_impl_test.go covering basic behavior, excludeResolved filtering, deployment ID filtering, and multiple alerts with different severities. Add SAC tests in datastore_sac_test.go covering scoped and unrestricted access patterns. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

central/alert/datastore/datastore_impl.go

clickboo

Code looks good, curious to see the improvements (because they will be significant for the risk reprocessing path on a scaled cluster). Left a few thoughts as comments.

central/alert/datastore/singleton.go

central/alert/views/views.go

openshift-ci bot added the do-not-merge/work-in-progress label Feb 19, 2026

github-actions bot added the area/central label Feb 19, 2026

dashrews78 force-pushed the dashrews/update-violations-multiplier-33252 branch from 40dde73 to 46e7d88 Compare February 19, 2026 20:56

dashrews78 marked this pull request as ready for review February 19, 2026 21:13

openshift-ci bot removed the do-not-merge/work-in-progress label Feb 19, 2026

dashrews78 and others added 8 commits February 23, 2026 07:26

ROX-33252: fix gofmt import ordering and test compilation

5719b1c

Sort imports alphabetically and pass DB pool to New() in datastore_impl_test.go. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

regenerate the mocks

89d1df7

ROX-33252: use non-deprecated RunSelectRequestForSchemaFn

6b1af37

Replace deprecated RunSelectRequestForSchema with the callback-based RunSelectRequestForSchemaFn. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

drop some wordiness from my robot friend

2994031

dashrews78 force-pushed the dashrews/update-violations-multiplier-33252 branch from 46e7d88 to 2994031 Compare February 23, 2026 12:26

dashrews78 mentioned this pull request Feb 23, 2026

ROX-33251: use SQL aggregate query for FailingPolicyCounter resolvers #19136

Open

9 tasks

charmik-redhat approved these changes Feb 24, 2026

View reviewed changes

ksurabhi91 reviewed Feb 24, 2026

View reviewed changes

central/alert/datastore/datastore_impl.go Outdated Show resolved Hide resolved

clickboo approved these changes Feb 24, 2026

View reviewed changes

central/alert/datastore/singleton.go Show resolved Hide resolved

central/alert/views/views.go Show resolved Hide resolved

dashrews78 merged commit 06385ec into master Feb 24, 2026
94 checks passed

dashrews78 deleted the dashrews/update-violations-multiplier-33252 branch February 24, 2026 16:56

dashrews78 mentioned this pull request Feb 24, 2026

ROX-33254: use SQL aggregate query for GetAlertsGroup #19166

Draft

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

ROX-33252: optimize ViolationsMultiplier to query what it needs#19115

ROX-33252: optimize ViolationsMultiplier to query what it needs#19115
dashrews78 merged 8 commits intomasterfrom
dashrews/update-violations-multiplier-33252

dashrews78 commented Feb 19, 2026 •

edited

Loading

Uh oh!

dashrews78 commented Feb 19, 2026 •

edited

Loading

Uh oh!

openshift-ci bot commented Feb 19, 2026

Uh oh!

rhacs-bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

dashrews78 commented Feb 19, 2026

Uh oh!

Uh oh!

clickboo left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

Conversation

dashrews78 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

User-facing documentation

Testing and quality

Automated testing

How I validated my change

Uh oh!

dashrews78 commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Feb 19, 2026

Uh oh!

rhacs-bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dashrews78 commented Feb 19, 2026

Summary

Call Frequency

Query Change

Assumptions

Scenario: 10,000 Deployments, 3 Violations Each (30,000 alerts)

Per Score() Call (1 deployment, 3 alerts)

Per Full Reprocessing Cycle (10,000 deployments)

Scenario: 10,000 Deployments, 6 Violations Each (60,000 alerts)

Per Score() Call (1 deployment, 6 alerts)

Per Full Reprocessing Cycle (10,000 deployments)

What This Doesn't Change

Uh oh!

Uh oh!

clickboo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dashrews78 commented Feb 19, 2026 •

edited

Loading

dashrews78 commented Feb 19, 2026 •

edited

Loading

rhacs-bot commented Feb 19, 2026 •

edited

Loading

codecov bot commented Feb 19, 2026 •

edited

Loading