ROX-31174: reverting ID back to index based by dashrews78 · Pull Request #17200 · stackrox/stackrox

dashrews78 · 2025-10-08T20:58:51Z

Description

The hash based index is horribly expensive. We only needed it because of graphql image scan resolver converted to/from embedded scan components and image component. #17154 eliminated that need so we can fix the ID. Since the records are repopulated with each scan there is no need to migrate.

Cursor helped re-factor the tests to get the correct IDs set.

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
CI results are inspected

Automated testing

How I validated my change

Here is the payoff. Compared the benchmark of writing an image to the benchmark in PR #17274 soon to be merged to master. Notice the very significant speed and memory improvements by being able to remove the need to hash the objects for the ID.

dashrews-mac:stackrox dashrews$ benchcmp master.txt new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                               old ns/op     new ns/op     delta
BenchmarkImageUpsert/UpsertImage-12     860959104     506537424     -41.17%
BenchmarkImageUpsert/UpsertImage-12     843851417     533249821     -36.81%
BenchmarkImageUpsert/UpsertImage-12     862154073     566152417     -34.33%
BenchmarkImageUpsert/UpsertImage-12     844871594     566797389     -32.91%
BenchmarkImageUpsert/UpsertImage-12     885779938     563705000     -36.36%
BenchmarkImageUpsert/UpsertImage-12     910828500     556446847     -38.91%

benchmark                               old allocs     new allocs     delta
BenchmarkImageUpsert/UpsertImage-12     7013800        3673820        -47.62%
BenchmarkImageUpsert/UpsertImage-12     7013811        3673804        -47.62%
BenchmarkImageUpsert/UpsertImage-12     7013809        3673771        -47.62%
BenchmarkImageUpsert/UpsertImage-12     7013798        3673793        -47.62%
BenchmarkImageUpsert/UpsertImage-12     7013784        3673782        -47.62%
BenchmarkImageUpsert/UpsertImage-12     7013782        3673777        -47.62%

benchmark                               old bytes     new bytes     delta
BenchmarkImageUpsert/UpsertImage-12     165261882     104668724     -36.66%
BenchmarkImageUpsert/UpsertImage-12     165264484     104665533     -36.67%
BenchmarkImageUpsert/UpsertImage-12     165265146     104662940     -36.67%
BenchmarkImageUpsert/UpsertImage-12     165262820     104665182     -36.67%
BenchmarkImageUpsert/UpsertImage-12     165258840     104664850     -36.67%
BenchmarkImageUpsert/UpsertImage-12     165259720     104662657     -36.67%

dashrews78 · 2025-10-08T20:58:52Z

This change is part of the following stack:

ROX-31173: do not embed component in resolver context #17154
- ROX-31174: reverting ID back to index based #17200 ◀

_{Change managed by git-spice.}

openshift-ci · 2025-10-08T20:58:55Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

rhacs-bot · 2025-10-08T21:22:21Z

Images are ready for the commit at 745882a.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.10.x-61-g745882a152.

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `pkg/scancomponent/component_id.go:16` </location>
<code_context>
-	}
-
-	return pgSearch.IDFromPks([]string{component.GetName(), strconv.FormatUint(hash, 10), imageID}), nil
+func ComponentIDV2(component *storage.EmbeddedImageScanComponent, imageID string, index int) string {
+	// The index it occurs in the component list is sufficient for uniqueness.  We do not need to be able to
+	// rebuild this ID at query time from an embedded object.  Which is why we were forced to use a hash before.
</code_context>

<issue_to_address>
**issue (bug_risk):** Switching to index-based uniqueness may risk collisions if component order changes.

If component order is not guaranteed to be stable, index-based IDs may collide or mismatch. Please verify order stability or implement safeguards to prevent these issues.
</issue_to_address>

### Comment 2
<location> `pkg/cve/cve.go:27` </location>
<code_context>
-	}
-
-	return pgSearch.IDFromPks([]string{cve.GetCve(), strconv.FormatUint(hash, 10), componentID}), nil
+func IDV2(cve *storage.EmbeddedVulnerability, componentID string, index int) string {
+	// The index it occurs in the component list is sufficient for uniqueness.  We do not need to be able to
+	// rebuild this ID at query time from an embedded object.
</code_context>

<issue_to_address>
**issue (bug_risk):** Index-based CVE ID generation may be fragile if vulnerability ordering is not guaranteed.

Please ensure the vulnerability list order remains stable to prevent inconsistent or duplicate CVE IDs.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

pkg/scancomponent/component_id.go

pkg/cve/cve.go

codecov · 2025-10-09T17:45:30Z

Codecov Report

❌ Patch coverage is 50.00000% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.84%. Comparing base (2c41166) to head (745882a).

Files with missing lines	Patch %	Lines
central/risk/manager/manager.go	0.00%	14 Missing ⚠️
central/imagev2/datastore/datastore_impl.go	0.00%	4 Missing ⚠️
pkg/cve/cve.go	0.00%	4 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #17200   +/-   ##
=======================================
  Coverage   48.84%   48.84%           
=======================================
  Files        2720     2718    -2     
  Lines      203032   202948   -84     
=======================================
- Hits        99166    99125   -41     
+ Misses      96079    96042   -37     
+ Partials     7787     7781    -6

Flag	Coverage Δ
go-unit-tests	`48.84% <50.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents

Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `central/imagev2/datastore/store/common/split.go:46` </location>
<code_context>

-		// dedupe components within the component
-		if _, ok := componentMap[generatedComponentV2.GetId()]; ok {
+		// dedupe components within the component using content-based key
+		dedupKey := fmt.Sprintf("%s:%s:%d:%s:%s:%d",
+			component.GetName(),
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying deduplication by using map[string]struct{} keyed on generated IDs instead of manual fmt.Sprintf keys.

```go
// In splitComponents, replace the manual fmt.Sprintf and map[string]*storage.EmbeddedImageScanComponent
// with a simple map[string]struct{} keyed on generatedComponentV2.GetId():
func splitComponents(parts ImagePartsV2) ([]ComponentPartsV2, error) {
    ret := make([]ComponentPartsV2, 0, len(parts.Image.GetScan().GetComponents()))
    seen := make(map[string]struct{})

    for _, comp := range parts.Image.GetScan().GetComponents() {
        genComp, err := GenerateImageComponentV2(
            parts.Image.GetScan().GetOperatingSystem(), parts.Image, /* index */ comp,
        )
        if err != nil {
            return nil, err
        }

        key := genComp.GetId()
        if _, ok := seen[key]; ok {
            log.Infof("Component %s-%s has already been processed. Skipping...", comp.GetName(), comp.GetVersion())
            continue
        }
        seen[key] = struct{}{}

        cves, err := splitCVEs(parts.Image.GetId(), genComp.GetId(), comp)
        if err != nil {
            return nil, err
        }

        ret = append(ret, ComponentPartsV2{ComponentV2: genComp, Children: cves})
    }

    return ret, nil
}
```

```go
// In splitCVEs, do the same: use convertedCVE.GetId() instead of a long fmt.Sprintf:
func splitCVEs(imageID, componentID string, embedded *storage.EmbeddedImageScanComponent) ([]CVEPartsV2, error) {
    ret := make([]CVEPartsV2, 0, len(embedded.GetVulns()))
    seen := make(map[string]struct{})

    for _, v := range embedded.GetVulns() {
        converted, err := utils.EmbeddedVulnerabilityToImageCVEV2(imageID, componentID, v)
        if err != nil {
            return nil, err
        }

        key := converted.GetId()
        if _, ok := seen[key]; ok {
            log.Infof("CVE %s has already been processed. Skipping...", v.GetCve())
            continue
        }
        seen[key] = struct{}{}

        ret = append(ret, CVEPartsV2{CVEV2: converted})
    }

    return ret, nil
}
```

This keeps the exact dedupe behavior (ComponentIDV2 and EmbeddedVulnerabilityToImageCVEV2 already encode all relevant fields), removes duplicated `fmt.Sprintf` calls, and centralizes the logic into a single, easy‐to‐scan pattern.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

dashrews78 · 2025-10-16T09:42:56Z

/retest

ajheflin

The local index for the ID generation concerned me but then I saw this comment which assuaged those concerns, good comment.

stackrox/pkg/scancomponent/component_id.go

Lines 17 to 18 in dcbf109

    
           // The index it occurs in the component list is sufficient for uniqueness.  We do not need to be able to 
        
           // rebuild this ID at query time from an embedded object.  Which is why we were forced to use a hash before.

central/image/datastore/store/common/v2/split_v2.go

charmik-redhat

Would it cause any trouble if someone rolls back from a version with index based IDs to a version with hash based IDs? It seems that reprocessing should take care of it but just want to confirm.

central/image/datastore/store/common/v2/split_v2.go

central/risk/manager/manager.go

dashrews78 · 2025-10-16T16:05:33Z

Would it cause any trouble if someone rolls back from a version with index based IDs to a version with hash based IDs? It seems that reprocessing should take care of it but just want to confirm.

Really doesn't matter at all. There would be a period of time where the "deprecated" dashboards may not fully work that use the image_scan resolver. But that would eventually resolve it self as images are reprocessed.

central/imagev2/datastore/store/common/split.go

charmik-redhat

LGTM

openshift-ci bot added the do-not-merge/work-in-progress label Oct 8, 2025

github-actions bot added the area/central label Oct 8, 2025

dashrews78 mentioned this pull request Oct 8, 2025

ROX-31173: do not embed component in resolver context #17154

Merged

9 tasks

dashrews78 added the ai-review label Oct 8, 2025

sourcery-ai bot reviewed Oct 8, 2025

View reviewed changes

pkg/scancomponent/component_id.go Show resolved Hide resolved

pkg/cve/cve.go Show resolved Hide resolved

dashrews78 added the ai-assisted label Oct 8, 2025

dashrews78 force-pushed the dashrews/eliminate-query-time-embedded-component-conversion-31173 branch from c58a45a to 718d07d Compare October 13, 2025 14:20

dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from fa39764 to cd42b57 Compare October 13, 2025 19:53

dashrews78 force-pushed the dashrews/eliminate-query-time-embedded-component-conversion-31173 branch from 3e0ae95 to a8e3bf3 Compare October 14, 2025 10:23

dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from cd42b57 to 320c1ba Compare October 14, 2025 10:23

dashrews78 mentioned this pull request Oct 14, 2025

chore(bench): image upsert benchmark #17274

Merged

9 tasks

dashrews78 force-pushed the dashrews/eliminate-query-time-embedded-component-conversion-31173 branch from a8e3bf3 to 4cda588 Compare October 14, 2025 15:33

dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from 7cda5ba to b5bfcd9 Compare October 14, 2025 15:33

dashrews78 force-pushed the dashrews/eliminate-query-time-embedded-component-conversion-31173 branch from 4cda588 to 3bbf294 Compare October 15, 2025 10:09

dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from b5bfcd9 to 8fe0b3c Compare October 15, 2025 17:01

Base automatically changed from dashrews/eliminate-query-time-embedded-component-conversion-31173 to master October 15, 2025 18:54

dashrews78 marked this pull request as ready for review October 15, 2025 19:23

dashrews78 requested review from a team and janisz as code owners October 15, 2025 19:23

openshift-ci bot removed the do-not-merge/work-in-progress label Oct 15, 2025

dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from 8fe0b3c to dcbf109 Compare October 15, 2025 19:23

sourcery-ai bot reviewed Oct 15, 2025

View reviewed changes

janisz approved these changes Oct 16, 2025

View reviewed changes

dashrews78 requested review from clickboo and ksurabhi91 October 16, 2025 13:43

dashrews78 requested review from ajheflin, charmik-redhat and johannes94 October 16, 2025 13:43

ajheflin approved these changes Oct 16, 2025

View reviewed changes

janisz reviewed Oct 16, 2025

View reviewed changes

central/image/datastore/store/common/v2/split_v2.go Outdated Show resolved Hide resolved

charmik-redhat reviewed Oct 16, 2025

View reviewed changes

central/image/datastore/store/common/v2/split_v2.go Outdated Show resolved Hide resolved

central/image/datastore/store/common/v2/split_v2.go Outdated Show resolved Hide resolved

central/risk/manager/manager.go Show resolved Hide resolved

janisz reviewed Oct 16, 2025

View reviewed changes

central/imagev2/datastore/store/common/split.go Outdated Show resolved Hide resolved

charmik-redhat approved these changes Oct 16, 2025

View reviewed changes

dashrews78 added 10 commits October 16, 2025 12:29

reverting ID back to index based

be90240

fixing the test

4a601b9

fix another test

986d60f

fixing a bunch of tests

42b0d91

fixing another test

69815c8

fixing dedup logic

7de63de

update tests

2149c45

rebase and remove hash stuff

4dd6f7c

test fix

126fbb9

remove the dupe map as it is no longer necessary

745882a

dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from dcbf109 to 745882a Compare October 16, 2025 16:29

charmik-redhat approved these changes Oct 16, 2025

View reviewed changes

dashrews78 merged commit fa2968e into master Oct 16, 2025
149 of 179 checks passed

dashrews78 deleted the dashrews/do-not-hash-component-cve-id-31174 branch October 16, 2025 19:22

dashrews78 added a commit that referenced this pull request Oct 16, 2025

ROX-31174: reverting ID back to index based (#17200)

c4c0308

dashrews78 mentioned this pull request Oct 16, 2025

ROX-31173, ROX-31174: backport CVE data model id performance improvements #17339

Merged

9 tasks

dashrews78 added a commit that referenced this pull request Oct 17, 2025

ROX-31174: reverting ID back to index based (#17200)

81feecb

dashrews78 mentioned this pull request Oct 17, 2025

ROX-31173, ROX-31174: 4.8 backport CVE performance improvements #17351

Merged

9 tasks

	// The index it occurs in the component list is sufficient for uniqueness. We do not need to be able to
	// rebuild this ID at query time from an embedded object. Which is why we were forced to use a hash before.

Conversation

dashrews78 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

User-facing documentation

Testing and quality

Automated testing

How I validated my change

Uh oh!

dashrews78 commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Oct 8, 2025

Uh oh!

rhacs-bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

dashrews78 commented Oct 16, 2025

Uh oh!

ajheflin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

charmik-redhat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dashrews78 commented Oct 16, 2025

Uh oh!

Uh oh!

charmik-redhat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dashrews78 commented Oct 8, 2025 •

edited

Loading

dashrews78 commented Oct 8, 2025 •

edited

Loading

rhacs-bot commented Oct 8, 2025 •

edited

Loading

codecov bot commented Oct 9, 2025 •

edited

Loading