Skip to content

ROX-33935: improve component ranking#19739

Open
dashrews78 wants to merge 4 commits intomasterfrom
dashrews/improve-component-ranking-33935
Open

ROX-33935: improve component ranking#19739
dashrews78 wants to merge 4 commits intomasterfrom
dashrews/improve-component-ranking-33935

Conversation

@dashrews78
Copy link
Copy Markdown
Contributor

@dashrews78 dashrews78 commented Apr 1, 2026

Description

Updating the initialization of the component rankers because the way it works with the walk was slow and essentially was holding a lot of memory unnecessarily.

imagecomponent/v2/datastore.initializeRankers() walks the entire image_components_v2 table (~15M rows in production), deserializing every full serialized protobuf blob via
UnmarshalVTUnsafe just to extract two fields: id and risk_score.

Because UnmarshalVTUnsafe performs zero-copy deserialization, the resulting protobuf string fields (including the component ID) hold direct references into the pgx-allocated
[]byte row buffers. When the ranker stores the ID as a map key in idToScore, it pins the entire serialized row buffer in memory for the lifetime of the process.

Benchmark Results

Benchmark compares the current Walk-based ranker initialization (full protobuf deserialization) against the new SelectRiskView approach (fetches only id and riskscore
columns).

BenchmarkInitializeRankers/Walk-12 210 5709767 ns/op 338654 B/op 1243 allocs/op
BenchmarkInitializeRankers/Walk-12 222 5441142 ns/op 338608 B/op 1243 allocs/op
BenchmarkInitializeRankers/Walk-12 202 5737878 ns/op 338808 B/op 1247 allocs/op
BenchmarkInitializeRankers/Walk-12 206 5612813 ns/op 338728 B/op 1250 allocs/op
BenchmarkInitializeRankers/Walk-12 216 5732239 ns/op 338735 B/op 1250 allocs/op
BenchmarkInitializeRankers/SelectRiskView-12 236 5096242 ns/op 245012 B/op 3829 allocs/op
BenchmarkInitializeRankers/SelectRiskView-12 236 5071273 ns/op 245004 B/op 3829 allocs/op
BenchmarkInitializeRankers/SelectRiskView-12 235 5057107 ns/op 244997 B/op 3829 allocs/op
BenchmarkInitializeRankers/SelectRiskView-12 232 5142105 ns/op 244969 B/op 3829 allocs/op
BenchmarkInitializeRankers/SelectRiskView-12 222 5292091 ns/op 245011 B/op 3829 allocs/op

Metric Walk (before) SelectRiskView (after) Delta
ns/op ~5,647,000 ~5,132,000 -9% faster
B/op 338,705 244,999 -28% less memory

Run with 500 test components using small serialized payloads (~200 bytes). In production (~15M components with multi-KB serialized blobs), the memory improvement is dramatically
larger — the serialized column is never fetched at all, eliminating the ~6.7GB of retained heap from pinned pgx row buffers.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

unit test and benchmarks

dashrews78 and others added 2 commits April 1, 2026 06:07
…zation

initializeRankers() previously used Walk() which deserialized the full
protobuf serialized column for every row. With ~15M components in
production, UnmarshalVTUnsafe zero-copy deserialization pinned pgx row
buffers in memory (~6.7GB retained heap), causing Central OOM.

Replace Walk() with RunSelectRequestForSchemaFn using a ComponentRiskView
struct that fetches only the id and riskscore columns, following the
established pattern from imagev2/datastore. Uses streaming callback to
avoid intermediate slice allocation for 15M rows.

Benchmark (500 components): 22% faster, 55% less memory per operation.
Production impact is significantly larger due to multi-KB serialized blobs.

This code was partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 1, 2026

📝 Walkthrough

Walkthrough

Datastore constructor now accepts a leading postgres.DB. Ranker initialization was changed to run a Postgres select that returns component ID and risk score into ComponentRiskView rather than walking full objects. Call sites and tests were updated; a SQL integration benchmark for ranker initialization was added.

Changes

Cohort / File(s) Summary
Constructor & call sites
central/imagecomponent/v2/datastore/datastore.go, central/imagecomponent/v2/datastore/test_setup_utils.go, central/image/datastore/datastore_impl_flat_postgres_test.go, central/imagev2/datastoretest/datastore_impl_test.go, central/imagecomponent/v2/datastore/singleton.go
New constructor signature changed to New(db postgres.DB, storage, risks, ranker). All call sites updated to pass the Postgres handle as the first argument.
Datastore implementation
central/imagecomponent/v2/datastore/datastore_impl.go
datastoreImpl now stores a typed Postgres DB handle. initializeRankers replaced in-memory storage.Walk with a Postgres select into views.ComponentRiskView and iterates result rows to populate the ranker.
Views
central/imagecomponent/v2/views/component_risk_view.go
Added exported ComponentRiskView struct with ComponentID string and ComponentRiskScore float32 annotated with db tags for scanning.
Benchmarks / Tests
central/imagecomponent/v2/datastore/datastore_bench_postgres_test.go
Added SQL integration benchmark BenchmarkInitializeRankers (build tag sql_integration) that inserts 500 components and benchmarks Walk vs. Postgres select-based risk view retrieval.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test/App
    participant DS as Datastore
    participant DB as PostgreSQL

    Test->>DS: New(db, storage, risks, ranker)
    DS->>DS: initializeRankers()
    DS->>DB: SELECT component_id, component_risk_score<br/>FROM image_component_v2
    DB-->>DS: ResultSet (ComponentRiskView rows)
    DS->>DS: Iterate rows and update ranker
    DS-->>Test: Datastore ready
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'ROX-33935: improve component ranking' directly relates to the main change: optimizing component ranker initialization to avoid memory overhead and improve performance.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description provides detailed context about the performance issue, includes benchmark results comparing the old Walk-based approach with the new SelectRiskView approach, and explains the memory pinning problem that motivated the change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dashrews/improve-component-ranking-33935

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
central/imagecomponent/v2/datastore/datastore.go (1)

31-35: Avoid making db and storage separate sources of truth.

initializeRankers() reads through db, while the rest of the datastore reads through storage. That makes it possible to construct a datastore whose ranker is initialized from a different database than the one serving reads. Prefer collapsing this to one dependency—either build storage from db inside New, or move the lightweight select behind the store and drop the extra db parameter.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@central/imagecomponent/v2/datastore/datastore.go` around lines 31 - 35, The
constructor New currently accepts both db and storage, causing initializeRankers
to read from db while other methods use storage; pick a single source of truth
and unify them: either (A) remove the db parameter and update initializeRankers
to use the provided storage (modify initializeRankers to call storage methods on
datastoreImpl instead of direct db queries), or (B) construct the pgStore.Store
from the db inside New (create storage via the store constructor using db and
assign only storage to datastoreImpl) and stop passing storage in; update New,
datastoreImpl, and initializeRankers accordingly so all reads use the same store
symbol (storage) and drop the extra db dependency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@central/imagecomponent/v2/datastore/datastore.go`:
- Around line 31-35: The constructor New currently accepts both db and storage,
causing initializeRankers to read from db while other methods use storage; pick
a single source of truth and unify them: either (A) remove the db parameter and
update initializeRankers to use the provided storage (modify initializeRankers
to call storage methods on datastoreImpl instead of direct db queries), or (B)
construct the pgStore.Store from the db inside New (create storage via the store
constructor using db and assign only storage to datastoreImpl) and stop passing
storage in; update New, datastoreImpl, and initializeRankers accordingly so all
reads use the same store symbol (storage) and drop the extra db dependency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 418fd758-49cf-4031-8523-694b9edd7459

📥 Commits

Reviewing files that changed from the base of the PR and between b0defe6 and d151b08.

📒 Files selected for processing (6)
  • central/graphql/resolvers/test_setup_utils.go
  • central/imagecomponent/v2/datastore/datastore.go
  • central/imagecomponent/v2/datastore/datastore_bench_postgres_test.go
  • central/imagecomponent/v2/datastore/datastore_impl.go
  • central/imagecomponent/v2/datastore/singleton.go
  • central/imagecomponent/v2/views/component_risk_view.go

@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Apr 1, 2026

Images are ready for the commit at 620d9bb.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-523-g620d9bb383.

Two test files were not updated when the New() signature gained a
postgres.DB parameter. This caused compilation failures in CI
(style-check, golangci-lint, go-postgres, go-bench).

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.60%. Comparing base (5fe70ee) to head (620d9bb).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
...tral/imagecomponent/v2/datastore/datastore_impl.go 70.58% 4 Missing and 1 partial ⚠️
central/imagecomponent/v2/datastore/singleton.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19739      +/-   ##
==========================================
- Coverage   49.60%   49.60%   -0.01%     
==========================================
  Files        2756     2756              
  Lines      208036   208050      +14     
==========================================
+ Hits       103194   103195       +1     
- Misses      97183    97194      +11     
- Partials     7659     7661       +2     
Flag Coverage Δ
go-unit-tests 49.60% <66.66%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

The benchmark was discarding id/score values without adding them to a
ranker, so it did not exercise the memory pinning behavior of
UnmarshalVTUnsafe strings stored as map keys in idToScore.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 1, 2026

@dashrews78: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-nongroovy-e2e-tests 620d9bb link true /test gke-nongroovy-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants