ROX-33935: improve component ranking by dashrews78 · Pull Request #19739 · stackrox/stackrox

dashrews78 · 2026-04-01T10:42:20Z

Description

Updating the initialization of the component rankers because the way it works with the walk was slow and essentially was holding a lot of memory unnecessarily.

imagecomponent/v2/datastore.initializeRankers() walks the entire image_components_v2 table (~15M rows in production), deserializing every full serialized protobuf blob via
UnmarshalVTUnsafe just to extract two fields: id and risk_score.

Because UnmarshalVTUnsafe performs zero-copy deserialization, the resulting protobuf string fields (including the component ID) hold direct references into the pgx-allocated
[]byte row buffers. When the ranker stores the ID as a map key in idToScore, it pins the entire serialized row buffer in memory for the lifetime of the process.

Benchmark Results

Benchmark compares the current Walk-based ranker initialization (full protobuf deserialization) against the new SelectRiskView approach (fetches only id and riskscore
columns).

BenchmarkInitializeRankers/Walk-12 210 5709767 ns/op 338654 B/op 1243 allocs/op
BenchmarkInitializeRankers/Walk-12 222 5441142 ns/op 338608 B/op 1243 allocs/op
BenchmarkInitializeRankers/Walk-12 202 5737878 ns/op 338808 B/op 1247 allocs/op
BenchmarkInitializeRankers/Walk-12 206 5612813 ns/op 338728 B/op 1250 allocs/op
BenchmarkInitializeRankers/Walk-12 216 5732239 ns/op 338735 B/op 1250 allocs/op
BenchmarkInitializeRankers/SelectRiskView-12 236 5096242 ns/op 245012 B/op 3829 allocs/op
BenchmarkInitializeRankers/SelectRiskView-12 236 5071273 ns/op 245004 B/op 3829 allocs/op
BenchmarkInitializeRankers/SelectRiskView-12 235 5057107 ns/op 244997 B/op 3829 allocs/op
BenchmarkInitializeRankers/SelectRiskView-12 232 5142105 ns/op 244969 B/op 3829 allocs/op
BenchmarkInitializeRankers/SelectRiskView-12 222 5292091 ns/op 245011 B/op 3829 allocs/op

Metric	Walk (before)	SelectRiskView (after)	Delta
ns/op	~5,647,000	~5,132,000	-9% faster
B/op	338,705	244,999	-28% less memory

Run with 500 test components using small serialized payloads (~200 bytes). In production (~15M components with multi-KB serialized blobs), the memory improvement is dramatically
larger — the serialized column is never fetched at all, eliminating the ~6.7GB of retained heap from pinned pgx row buffers.

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
CI results are inspected

Automated testing

How I validated my change

unit test and benchmarks

…zation initializeRankers() previously used Walk() which deserialized the full protobuf serialized column for every row. With ~15M components in production, UnmarshalVTUnsafe zero-copy deserialization pinned pgx row buffers in memory (~6.7GB retained heap), causing Central OOM. Replace Walk() with RunSelectRequestForSchemaFn using a ComponentRiskView struct that fetches only the id and riskscore columns, following the established pattern from imagev2/datastore. Uses streaming callback to avoid intermediate slice allocation for 15M rows. Benchmark (500 components): 22% faster, 55% less memory per operation. Production impact is significantly larger due to multi-KB serialized blobs. This code was partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-01T10:50:34Z

📝 Walkthrough

Walkthrough

Datastore constructor now accepts a leading postgres.DB. Ranker initialization was changed to run a Postgres select that returns component ID and risk score into ComponentRiskView rather than walking full objects. Call sites and tests were updated; a SQL integration benchmark for ranker initialization was added.

Changes

Cohort / File(s)	Summary
Constructor & call sites `central/imagecomponent/v2/datastore/datastore.go`, `central/imagecomponent/v2/datastore/test_setup_utils.go`, `central/image/datastore/datastore_impl_flat_postgres_test.go`, `central/imagev2/datastoretest/datastore_impl_test.go`, `central/imagecomponent/v2/datastore/singleton.go`	`New` constructor signature changed to `New(db postgres.DB, storage, risks, ranker)`. All call sites updated to pass the Postgres handle as the first argument.
Datastore implementation `central/imagecomponent/v2/datastore/datastore_impl.go`	`datastoreImpl` now stores a typed Postgres DB handle. `initializeRankers` replaced in-memory `storage.Walk` with a Postgres select into `views.ComponentRiskView` and iterates result rows to populate the ranker.
Views `central/imagecomponent/v2/views/component_risk_view.go`	Added exported `ComponentRiskView` struct with `ComponentID string` and `ComponentRiskScore float32` annotated with `db` tags for scanning.
Benchmarks / Tests `central/imagecomponent/v2/datastore/datastore_bench_postgres_test.go`	Added SQL integration benchmark `BenchmarkInitializeRankers` (build tag `sql_integration`) that inserts 500 components and benchmarks `Walk` vs. Postgres select-based risk view retrieval.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test/App
    participant DS as Datastore
    participant DB as PostgreSQL

    Test->>DS: New(db, storage, risks, ranker)
    DS->>DS: initializeRankers()
    DS->>DB: SELECT component_id, component_risk_score<br/>FROM image_component_v2
    DB-->>DS: ResultSet (ComponentRiskView rows)
    DS->>DS: Iterate rows and update ranker
    DS-->>Test: Datastore ready

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'ROX-33935: improve component ranking' directly relates to the main change: optimizing component ranker initialization to avoid memory overhead and improve performance.
Docstring Coverage	✅ Passed	Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The PR description provides detailed context about the performance issue, includes benchmark results comparing the old Walk-based approach with the new SelectRiskView approach, and explains the memory pinning problem that motivated the change.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch dashrews/improve-component-ranking-33935

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

central/imagecomponent/v2/datastore/datastore.go (1)
31-35: Avoid making db and storage separate sources of truth.

initializeRankers() reads through db, while the rest of the datastore reads through storage. That makes it possible to construct a datastore whose ranker is initialized from a different database than the one serving reads. Prefer collapsing this to one dependency—either build storage from db inside New, or move the lightweight select behind the store and drop the extra db parameter.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@central/imagecomponent/v2/datastore/datastore.go` around lines 31 - 35, The
constructor New currently accepts both db and storage, causing initializeRankers
to read from db while other methods use storage; pick a single source of truth
and unify them: either (A) remove the db parameter and update initializeRankers
to use the provided storage (modify initializeRankers to call storage methods on
datastoreImpl instead of direct db queries), or (B) construct the pgStore.Store
from the db inside New (create storage via the store constructor using db and
assign only storage to datastoreImpl) and stop passing storage in; update New,
datastoreImpl, and initializeRankers accordingly so all reads use the same store
symbol (storage) and drop the extra db dependency.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@central/imagecomponent/v2/datastore/datastore.go`:
- Around line 31-35: The constructor New currently accepts both db and storage,
causing initializeRankers to read from db while other methods use storage; pick
a single source of truth and unify them: either (A) remove the db parameter and
update initializeRankers to use the provided storage (modify initializeRankers
to call storage methods on datastoreImpl instead of direct db queries), or (B)
construct the pgStore.Store from the db inside New (create storage via the store
constructor using db and assign only storage to datastoreImpl) and stop passing
storage in; update New, datastoreImpl, and initializeRankers accordingly so all
reads use the same store symbol (storage) and drop the extra db dependency.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 418fd758-49cf-4031-8523-694b9edd7459

📥 Commits

Reviewing files that changed from the base of the PR and between b0defe6 and d151b08.

📒 Files selected for processing (6)

central/graphql/resolvers/test_setup_utils.go
central/imagecomponent/v2/datastore/datastore.go
central/imagecomponent/v2/datastore/datastore_bench_postgres_test.go
central/imagecomponent/v2/datastore/datastore_impl.go
central/imagecomponent/v2/datastore/singleton.go
central/imagecomponent/v2/views/component_risk_view.go

rhacs-bot · 2026-04-01T10:56:32Z

Images are ready for the commit at 620d9bb.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-523-g620d9bb383.

Two test files were not updated when the New() signature gained a postgres.DB parameter. This caused compilation failures in CI (style-check, golangci-lint, go-postgres, go-bench). Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

codecov · 2026-04-01T14:57:29Z

Codecov Report

❌ Patch coverage is 66.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.60%. Comparing base (5fe70ee) to head (620d9bb).
⚠️ Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
...tral/imagecomponent/v2/datastore/datastore_impl.go	70.58%	4 Missing and 1 partial ⚠️
central/imagecomponent/v2/datastore/singleton.go	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #19739      +/-   ##
==========================================
- Coverage   49.60%   49.60%   -0.01%     
==========================================
  Files        2756     2756              
  Lines      208036   208050      +14     
==========================================
+ Hits       103194   103195       +1     
- Misses      97183    97194      +11     
- Partials     7659     7661       +2

Flag	Coverage Δ
go-unit-tests	`49.60% <66.66%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The benchmark was discarding id/score values without adding them to a ranker, so it did not exercise the memory pinning behavior of UnmarshalVTUnsafe strings stored as map keys in idToScore. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

openshift-ci · 2026-04-01T15:52:59Z

@dashrews78: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/gke-nongroovy-e2e-tests	`620d9bb`	link	true	`/test gke-nongroovy-e2e-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

dashrews78 and others added 2 commits April 1, 2026 06:07

X-Smart-Branch-Parent: master

0631d8f

github-actions bot added the area/central label Apr 1, 2026

dashrews78 added coderabbit-review and removed area/central labels Apr 1, 2026

coderabbitai bot reviewed Apr 1, 2026

View reviewed changes

github-actions bot added the area/central label Apr 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROX-33935: improve component ranking#19739

ROX-33935: improve component ranking#19739
dashrews78 wants to merge 4 commits intomasterfrom
dashrews/improve-component-ranking-33935

dashrews78 commented Apr 1, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

rhacs-bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

openshift-ci bot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dashrews78 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Benchmark Results

User-facing documentation

Testing and quality

Automated testing

How I validated my change

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rhacs-bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

openshift-ci bot commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dashrews78 commented Apr 1, 2026 •

edited

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

rhacs-bot commented Apr 1, 2026 •

edited

Loading

codecov bot commented Apr 1, 2026 •

edited

Loading