Skip to content

ROX-31174: reverting ID back to index based#17200

Merged
dashrews78 merged 10 commits intomasterfrom
dashrews/do-not-hash-component-cve-id-31174
Oct 16, 2025
Merged

ROX-31174: reverting ID back to index based#17200
dashrews78 merged 10 commits intomasterfrom
dashrews/do-not-hash-component-cve-id-31174

Conversation

@dashrews78
Copy link
Contributor

@dashrews78 dashrews78 commented Oct 8, 2025

Description

The hash based index is horribly expensive. We only needed it because of graphql image scan resolver converted to/from embedded scan components and image component. #17154 eliminated that need so we can fix the ID. Since the records are repopulated with each scan there is no need to migrate.

Cursor helped re-factor the tests to get the correct IDs set.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Here is the payoff. Compared the benchmark of writing an image to the benchmark in PR #17274 soon to be merged to master. Notice the very significant speed and memory improvements by being able to remove the need to hash the objects for the ID.

dashrews-mac:stackrox dashrews$ benchcmp master.txt new.txt
benchcmp is deprecated in favor of benchstat: https://pkg.go.dev/golang.org/x/perf/cmd/benchstat
benchmark                               old ns/op     new ns/op     delta
BenchmarkImageUpsert/UpsertImage-12     860959104     506537424     -41.17%
BenchmarkImageUpsert/UpsertImage-12     843851417     533249821     -36.81%
BenchmarkImageUpsert/UpsertImage-12     862154073     566152417     -34.33%
BenchmarkImageUpsert/UpsertImage-12     844871594     566797389     -32.91%
BenchmarkImageUpsert/UpsertImage-12     885779938     563705000     -36.36%
BenchmarkImageUpsert/UpsertImage-12     910828500     556446847     -38.91%

benchmark                               old allocs     new allocs     delta
BenchmarkImageUpsert/UpsertImage-12     7013800        3673820        -47.62%
BenchmarkImageUpsert/UpsertImage-12     7013811        3673804        -47.62%
BenchmarkImageUpsert/UpsertImage-12     7013809        3673771        -47.62%
BenchmarkImageUpsert/UpsertImage-12     7013798        3673793        -47.62%
BenchmarkImageUpsert/UpsertImage-12     7013784        3673782        -47.62%
BenchmarkImageUpsert/UpsertImage-12     7013782        3673777        -47.62%

benchmark                               old bytes     new bytes     delta
BenchmarkImageUpsert/UpsertImage-12     165261882     104668724     -36.66%
BenchmarkImageUpsert/UpsertImage-12     165264484     104665533     -36.67%
BenchmarkImageUpsert/UpsertImage-12     165265146     104662940     -36.67%
BenchmarkImageUpsert/UpsertImage-12     165262820     104665182     -36.67%
BenchmarkImageUpsert/UpsertImage-12     165258840     104664850     -36.67%
BenchmarkImageUpsert/UpsertImage-12     165259720     104662657     -36.67%

@dashrews78
Copy link
Contributor Author

dashrews78 commented Oct 8, 2025

This change is part of the following stack:

Change managed by git-spice.

@openshift-ci
Copy link

openshift-ci bot commented Oct 8, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Oct 8, 2025

Images are ready for the commit at 745882a.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.10.x-61-g745882a152.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `pkg/scancomponent/component_id.go:16` </location>
<code_context>
-	}
-
-	return pgSearch.IDFromPks([]string{component.GetName(), strconv.FormatUint(hash, 10), imageID}), nil
+func ComponentIDV2(component *storage.EmbeddedImageScanComponent, imageID string, index int) string {
+	// The index it occurs in the component list is sufficient for uniqueness.  We do not need to be able to
+	// rebuild this ID at query time from an embedded object.  Which is why we were forced to use a hash before.
</code_context>

<issue_to_address>
**issue (bug_risk):** Switching to index-based uniqueness may risk collisions if component order changes.

If component order is not guaranteed to be stable, index-based IDs may collide or mismatch. Please verify order stability or implement safeguards to prevent these issues.
</issue_to_address>

### Comment 2
<location> `pkg/cve/cve.go:27` </location>
<code_context>
-	}
-
-	return pgSearch.IDFromPks([]string{cve.GetCve(), strconv.FormatUint(hash, 10), componentID}), nil
+func IDV2(cve *storage.EmbeddedVulnerability, componentID string, index int) string {
+	// The index it occurs in the component list is sufficient for uniqueness.  We do not need to be able to
+	// rebuild this ID at query time from an embedded object.
</code_context>

<issue_to_address>
**issue (bug_risk):** Index-based CVE ID generation may be fragile if vulnerability ordering is not guaranteed.

Please ensure the vulnerability list order remains stable to prevent inconsistent or duplicate CVE IDs.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link

codecov bot commented Oct 9, 2025

Codecov Report

❌ Patch coverage is 50.00000% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.84%. Comparing base (2c41166) to head (745882a).

Files with missing lines Patch % Lines
central/risk/manager/manager.go 0.00% 14 Missing ⚠️
central/imagev2/datastore/datastore_impl.go 0.00% 4 Missing ⚠️
pkg/cve/cve.go 0.00% 4 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master   #17200   +/-   ##
=======================================
  Coverage   48.84%   48.84%           
=======================================
  Files        2720     2718    -2     
  Lines      203032   202948   -84     
=======================================
- Hits        99166    99125   -41     
+ Misses      96079    96042   -37     
+ Partials     7787     7781    -6     
Flag Coverage Δ
go-unit-tests 48.84% <50.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dashrews78 dashrews78 force-pushed the dashrews/eliminate-query-time-embedded-component-conversion-31173 branch from c58a45a to 718d07d Compare October 13, 2025 14:20
@dashrews78 dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from fa39764 to cd42b57 Compare October 13, 2025 19:53
@dashrews78 dashrews78 force-pushed the dashrews/eliminate-query-time-embedded-component-conversion-31173 branch from 3e0ae95 to a8e3bf3 Compare October 14, 2025 10:23
@dashrews78 dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from cd42b57 to 320c1ba Compare October 14, 2025 10:23
@dashrews78 dashrews78 force-pushed the dashrews/eliminate-query-time-embedded-component-conversion-31173 branch from a8e3bf3 to 4cda588 Compare October 14, 2025 15:33
@dashrews78 dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from 7cda5ba to b5bfcd9 Compare October 14, 2025 15:33
@dashrews78 dashrews78 force-pushed the dashrews/eliminate-query-time-embedded-component-conversion-31173 branch from 4cda588 to 3bbf294 Compare October 15, 2025 10:09
@dashrews78 dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from b5bfcd9 to 8fe0b3c Compare October 15, 2025 17:01
Base automatically changed from dashrews/eliminate-query-time-embedded-component-conversion-31173 to master October 15, 2025 18:54
@dashrews78 dashrews78 marked this pull request as ready for review October 15, 2025 19:23
@dashrews78 dashrews78 requested review from a team and janisz as code owners October 15, 2025 19:23
@dashrews78 dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from 8fe0b3c to dcbf109 Compare October 15, 2025 19:23
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:

## Individual Comments

### Comment 1
<location> `central/imagev2/datastore/store/common/split.go:46` </location>
<code_context>

-		// dedupe components within the component
-		if _, ok := componentMap[generatedComponentV2.GetId()]; ok {
+		// dedupe components within the component using content-based key
+		dedupKey := fmt.Sprintf("%s:%s:%d:%s:%s:%d",
+			component.GetName(),
</code_context>

<issue_to_address>
**issue (complexity):** Consider simplifying deduplication by using map[string]struct{} keyed on generated IDs instead of manual fmt.Sprintf keys.

```go
// In splitComponents, replace the manual fmt.Sprintf and map[string]*storage.EmbeddedImageScanComponent
// with a simple map[string]struct{} keyed on generatedComponentV2.GetId():
func splitComponents(parts ImagePartsV2) ([]ComponentPartsV2, error) {
    ret := make([]ComponentPartsV2, 0, len(parts.Image.GetScan().GetComponents()))
    seen := make(map[string]struct{})

    for _, comp := range parts.Image.GetScan().GetComponents() {
        genComp, err := GenerateImageComponentV2(
            parts.Image.GetScan().GetOperatingSystem(), parts.Image, /* index */ comp,
        )
        if err != nil {
            return nil, err
        }

        key := genComp.GetId()
        if _, ok := seen[key]; ok {
            log.Infof("Component %s-%s has already been processed. Skipping...", comp.GetName(), comp.GetVersion())
            continue
        }
        seen[key] = struct{}{}

        cves, err := splitCVEs(parts.Image.GetId(), genComp.GetId(), comp)
        if err != nil {
            return nil, err
        }

        ret = append(ret, ComponentPartsV2{ComponentV2: genComp, Children: cves})
    }

    return ret, nil
}
```

```go
// In splitCVEs, do the same: use convertedCVE.GetId() instead of a long fmt.Sprintf:
func splitCVEs(imageID, componentID string, embedded *storage.EmbeddedImageScanComponent) ([]CVEPartsV2, error) {
    ret := make([]CVEPartsV2, 0, len(embedded.GetVulns()))
    seen := make(map[string]struct{})

    for _, v := range embedded.GetVulns() {
        converted, err := utils.EmbeddedVulnerabilityToImageCVEV2(imageID, componentID, v)
        if err != nil {
            return nil, err
        }

        key := converted.GetId()
        if _, ok := seen[key]; ok {
            log.Infof("CVE %s has already been processed. Skipping...", v.GetCve())
            continue
        }
        seen[key] = struct{}{}

        ret = append(ret, CVEPartsV2{CVEV2: converted})
    }

    return ret, nil
}
```

This keeps the exact dedupe behavior (ComponentIDV2 and EmbeddedVulnerabilityToImageCVEV2 already encode all relevant fields), removes duplicated `fmt.Sprintf` calls, and centralizes the logic into a single, easy‐to‐scan pattern.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@dashrews78
Copy link
Contributor Author

/retest

Copy link
Contributor

@ajheflin ajheflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The local index for the ID generation concerned me but then I saw this comment which assuaged those concerns, good comment.

// The index it occurs in the component list is sufficient for uniqueness. We do not need to be able to
// rebuild this ID at query time from an embedded object. Which is why we were forced to use a hash before.

Copy link
Contributor

@charmik-redhat charmik-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it cause any trouble if someone rolls back from a version with index based IDs to a version with hash based IDs? It seems that reprocessing should take care of it but just want to confirm.

@dashrews78
Copy link
Contributor Author

Would it cause any trouble if someone rolls back from a version with index based IDs to a version with hash based IDs? It seems that reprocessing should take care of it but just want to confirm.

Really doesn't matter at all. There would be a period of time where the "deprecated" dashboards may not fully work that use the image_scan resolver. But that would eventually resolve it self as images are reprocessed.

@dashrews78 dashrews78 force-pushed the dashrews/do-not-hash-component-cve-id-31174 branch from dcbf109 to 745882a Compare October 16, 2025 16:29
Copy link
Contributor

@charmik-redhat charmik-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dashrews78 dashrews78 merged commit fa2968e into master Oct 16, 2025
149 of 179 checks passed
@dashrews78 dashrews78 deleted the dashrews/do-not-hash-component-cve-id-31174 branch October 16, 2025 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants