Skip to content

Comments

ROX-33252: optimize ViolationsMultiplier to query what it needs#19115

Merged
dashrews78 merged 8 commits intomasterfrom
dashrews/update-violations-multiplier-33252
Feb 24, 2026
Merged

ROX-33252: optimize ViolationsMultiplier to query what it needs#19115
dashrews78 merged 8 commits intomasterfrom
dashrews/update-violations-multiplier-33252

Conversation

@dashrews78
Copy link
Contributor

@dashrews78 dashrews78 commented Feb 19, 2026

ViolationsMultiplier.Score() only needs policy_name and policy_severity from alerts but was deserializing full Alert protobuf blobs via SearchListAlerts. This is grossly unnecessary because we only need 2 fields both of which are populated column. Getting the entire serialized object and unmarshalling it is a very expensive and unnecessary.

Added SearchAlertPolicyNamesAndSeverities which uses RunSelectRequestForSchemaFn to query only the two needed columns directly

Follows the same pattern as central/secret/datastore for DB access
(ROX-31142).

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Description

change me!

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Added unit tests. Existing tests. CI. long running cluster.

@dashrews78
Copy link
Contributor Author

dashrews78 commented Feb 19, 2026

This change is part of the following stack:

Change managed by git-spice.

@openshift-ci
Copy link

openshift-ci bot commented Feb 19, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Feb 19, 2026

Images are ready for the commit at 2994031.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-164-g2994031a4b.

@codecov
Copy link

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 81.25000% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.52%. Comparing base (573923b) to head (2994031).
⚠️ Report is 10 commits behind head on master.

Files with missing lines Patch % Lines
central/alert/datastore/datastore_impl.go 85.71% 2 Missing and 1 partial ⚠️
central/alert/datastore/singleton.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #19115   +/-   ##
=======================================
  Coverage   49.52%   49.52%           
=======================================
  Files        2672     2672           
  Lines      201665   201686   +21     
=======================================
+ Hits        99870    99895   +25     
+ Misses      94337    94334    -3     
+ Partials     7458     7457    -1     
Flag Coverage Δ
go-unit-tests 49.52% <81.25%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dashrews78 dashrews78 force-pushed the dashrews/update-violations-multiplier-33252 branch from 40dde73 to 46e7d88 Compare February 19, 2026 20:56
@dashrews78
Copy link
Contributor Author

I did ask Claude to evaluate the impact of this change with the assumption of 10K deployments that have on average 6 active violations. This is the quantification that Claude provided which is likely reasonable. Before this change this was a very expensive operation.

Summary

ViolationsMultiplier.Score() only needs policy_name and policy_severity from alerts but was deserializing full storage.Alert protobuf blobs via SearchListAlerts. The optimization replaces this with a column projection query using RunSelectRequestForSchema, fetching only the two needed columns directly from PostgreSQL.


Call Frequency

ViolationsMultiplier.Score() is the first multiplier in the deployment scoring pipeline and is called on:

  • Every policy violation detection (build-time, deploy-time, runtime)
  • Every deployment change (via sensor pipeline)
  • Service account and process baseline changes
  • Periodic batch reprocessing every 10 minutes (deduped)
  • Full reprocessing of all deployments every 4 hours
  • Up to 15 concurrent risk scoring operations per Central instance (semaphore-controlled)

Query Change

Old:

SELECT alerts.serialized FROM alerts WHERE deployment_id=$1 AND state=$2
  • Transfers entire serialized protobuf blob per row
  • Deserializes full storage.Alert proto (all nested messages)
  • Converts to ListAlert (allocates another struct, copies ~11 fields)
  • Caller reads 2 fields, discards the rest

New:

SELECT alerts.policy_name, alerts.policy_severity FROM alerts WHERE deployment_id=$1 AND state=$2
  • Transfers only a varchar + integer per row
  • No protobuf deserialization
  • Scans directly into a 2-field struct

Assumptions

  • Typical serialized storage.Alert protobuf: ~2-5KB (includes policy with all fields, violation messages, deployment metadata, timestamps, enforcement info, nested messages)
  • The two projected columns: policy_name (varchar, ~30-50 bytes) + policy_severity (integer, 4 bytes) ~ 50 bytes per row
  • Each UnmarshalVTUnsafe of a full Alert allocates the top-level object plus every nested message (Policy, Deployment, Violation list entries, ProcessIndicators, NetworkFlows, timestamps) — easily 20+ allocations per alert
  • WHERE clause and index usage are identical (deployment_id and state are both btree indexed)

Scenario: 10,000 Deployments, 3 Violations Each (30,000 alerts)

Per Score() Call (1 deployment, 3 alerts)

Metric Old Path New Path
SQL query SELECT serialized (bytea blob) SELECT policy_name, policy_severity
Data from Postgres ~6-15KB ~150 bytes
Proto deserializations 3 full storage.Alert unmarshals 0
Heap allocations 3 Alert + 3 ListAlert + all nested messages 3 two-field structs

Per Full Reprocessing Cycle (10,000 deployments)

Metric Old Path New Path Reduction
Data transferred from Postgres ~60-150MB ~1.5MB ~98%
Protobuf deserializations 30,000 0 100%
Heap object allocations ~600,000+ (nested messages) 30,000 ~95%
GC pressure Significant Minimal

Scenario: 10,000 Deployments, 6 Violations Each (60,000 alerts)

Per Score() Call (1 deployment, 6 alerts)

Metric Old Path New Path
Data from Postgres ~12-30KB ~300 bytes
Proto deserializations 6 full storage.Alert unmarshals 0
Heap allocations 6 Alert + 6 ListAlert + all nested messages 6 two-field structs

Per Full Reprocessing Cycle (10,000 deployments)

Metric Old Path New Path Reduction
Data transferred from Postgres ~120-300MB ~3MB ~98%
Protobuf deserializations 60,000 0 100%
Heap object allocations ~1,200,000+ 60,000 ~95%

What This Doesn't Change

  • WHERE clause and index usage are identical — query planning cost is the same
  • SAC (Scoped Access Control) filtering still runs
  • The scoring math (severityImpact, NormalizeScore, factor sorting) is unchanged and cheap
  • Savings scale linearly with violation count per deployment

@dashrews78 dashrews78 marked this pull request as ready for review February 19, 2026 21:13
dashrews78 and others added 8 commits February 23, 2026 07:26
ViolationsMultiplier.Score() only needs policy_name and policy_severity
from alerts but was deserializing full Alert protobuf blobs via
SearchListAlerts. Add SearchAlertPolicyNamesAndSeverities which uses
RunSelectRequestForSchema to query only the two needed columns directly,
avoiding protobuf deserialization entirely.

Follows the same pattern as central/secret/datastore for DB access
(ROX-31142).

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move the lightweight projection type to a dedicated views package
under central/alert/ so future projection types can be co-located.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add GetPolicyName() and GetSeverity() methods so callers use accessors
instead of direct field access. GetSeverity() returns storage.Severity
directly, removing the need for manual casts.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sort imports alphabetically and pass DB pool to New() in
datastore_impl_test.go.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace deprecated RunSelectRequestForSchema with the callback-based
RunSelectRequestForSchemaFn.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add integration tests in datastore_impl_test.go covering basic behavior,
excludeResolved filtering, deployment ID filtering, and multiple alerts
with different severities. Add SAC tests in datastore_sac_test.go
covering scoped and unrestricted access patterns.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@clickboo clickboo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good, curious to see the improvements (because they will be significant for the risk reprocessing path on a scaled cluster). Left a few thoughts as comments.

@dashrews78 dashrews78 merged commit 06385ec into master Feb 24, 2026
94 checks passed
@dashrews78 dashrews78 deleted the dashrews/update-violations-multiplier-33252 branch February 24, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants