ROX-31174: reverting ID back to index based#17200
Conversation
|
This change is part of the following stack: Change managed by git-spice. |
|
Skipping CI for Draft Pull Request. |
|
Images are ready for the commit at 745882a. To use with deploy scripts, first |
There was a problem hiding this comment.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `pkg/scancomponent/component_id.go:16` </location>
<code_context>
- }
-
- return pgSearch.IDFromPks([]string{component.GetName(), strconv.FormatUint(hash, 10), imageID}), nil
+func ComponentIDV2(component *storage.EmbeddedImageScanComponent, imageID string, index int) string {
+ // The index it occurs in the component list is sufficient for uniqueness. We do not need to be able to
+ // rebuild this ID at query time from an embedded object. Which is why we were forced to use a hash before.
</code_context>
<issue_to_address>
**issue (bug_risk):** Switching to index-based uniqueness may risk collisions if component order changes.
If component order is not guaranteed to be stable, index-based IDs may collide or mismatch. Please verify order stability or implement safeguards to prevent these issues.
</issue_to_address>
### Comment 2
<location> `pkg/cve/cve.go:27` </location>
<code_context>
- }
-
- return pgSearch.IDFromPks([]string{cve.GetCve(), strconv.FormatUint(hash, 10), componentID}), nil
+func IDV2(cve *storage.EmbeddedVulnerability, componentID string, index int) string {
+ // The index it occurs in the component list is sufficient for uniqueness. We do not need to be able to
+ // rebuild this ID at query time from an embedded object.
</code_context>
<issue_to_address>
**issue (bug_risk):** Index-based CVE ID generation may be fragile if vulnerability ordering is not guaranteed.
Please ensure the vulnerability list order remains stable to prevent inconsistent or duplicate CVE IDs.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17200 +/- ##
=======================================
Coverage 48.84% 48.84%
=======================================
Files 2720 2718 -2
Lines 203032 202948 -84
=======================================
- Hits 99166 99125 -41
+ Misses 96079 96042 -37
+ Partials 7787 7781 -6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c58a45a to
718d07d
Compare
fa39764 to
cd42b57
Compare
3e0ae95 to
a8e3bf3
Compare
cd42b57 to
320c1ba
Compare
a8e3bf3 to
4cda588
Compare
7cda5ba to
b5bfcd9
Compare
4cda588 to
3bbf294
Compare
b5bfcd9 to
8fe0b3c
Compare
8fe0b3c to
dcbf109
Compare
There was a problem hiding this comment.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `central/imagev2/datastore/store/common/split.go:46` </location>
<code_context>
- // dedupe components within the component
- if _, ok := componentMap[generatedComponentV2.GetId()]; ok {
+ // dedupe components within the component using content-based key
+ dedupKey := fmt.Sprintf("%s:%s:%d:%s:%s:%d",
+ component.GetName(),
</code_context>
<issue_to_address>
**issue (complexity):** Consider simplifying deduplication by using map[string]struct{} keyed on generated IDs instead of manual fmt.Sprintf keys.
```go
// In splitComponents, replace the manual fmt.Sprintf and map[string]*storage.EmbeddedImageScanComponent
// with a simple map[string]struct{} keyed on generatedComponentV2.GetId():
func splitComponents(parts ImagePartsV2) ([]ComponentPartsV2, error) {
ret := make([]ComponentPartsV2, 0, len(parts.Image.GetScan().GetComponents()))
seen := make(map[string]struct{})
for _, comp := range parts.Image.GetScan().GetComponents() {
genComp, err := GenerateImageComponentV2(
parts.Image.GetScan().GetOperatingSystem(), parts.Image, /* index */ comp,
)
if err != nil {
return nil, err
}
key := genComp.GetId()
if _, ok := seen[key]; ok {
log.Infof("Component %s-%s has already been processed. Skipping...", comp.GetName(), comp.GetVersion())
continue
}
seen[key] = struct{}{}
cves, err := splitCVEs(parts.Image.GetId(), genComp.GetId(), comp)
if err != nil {
return nil, err
}
ret = append(ret, ComponentPartsV2{ComponentV2: genComp, Children: cves})
}
return ret, nil
}
```
```go
// In splitCVEs, do the same: use convertedCVE.GetId() instead of a long fmt.Sprintf:
func splitCVEs(imageID, componentID string, embedded *storage.EmbeddedImageScanComponent) ([]CVEPartsV2, error) {
ret := make([]CVEPartsV2, 0, len(embedded.GetVulns()))
seen := make(map[string]struct{})
for _, v := range embedded.GetVulns() {
converted, err := utils.EmbeddedVulnerabilityToImageCVEV2(imageID, componentID, v)
if err != nil {
return nil, err
}
key := converted.GetId()
if _, ok := seen[key]; ok {
log.Infof("CVE %s has already been processed. Skipping...", v.GetCve())
continue
}
seen[key] = struct{}{}
ret = append(ret, CVEPartsV2{CVEV2: converted})
}
return ret, nil
}
```
This keeps the exact dedupe behavior (ComponentIDV2 and EmbeddedVulnerabilityToImageCVEV2 already encode all relevant fields), removes duplicated `fmt.Sprintf` calls, and centralizes the logic into a single, easy‐to‐scan pattern.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
/retest |
ajheflin
left a comment
There was a problem hiding this comment.
The local index for the ID generation concerned me but then I saw this comment which assuaged those concerns, good comment.
stackrox/pkg/scancomponent/component_id.go
Lines 17 to 18 in dcbf109
charmik-redhat
left a comment
There was a problem hiding this comment.
Would it cause any trouble if someone rolls back from a version with index based IDs to a version with hash based IDs? It seems that reprocessing should take care of it but just want to confirm.
Really doesn't matter at all. There would be a period of time where the "deprecated" dashboards may not fully work that use the image_scan resolver. But that would eventually resolve it self as images are reprocessed. |
dcbf109 to
745882a
Compare
Description
The hash based index is horribly expensive. We only needed it because of graphql image scan resolver converted to/from embedded scan components and image component. #17154 eliminated that need so we can fix the ID. Since the records are repopulated with each scan there is no need to migrate.
Cursor helped re-factor the tests to get the correct IDs set.
User-facing documentation
Testing and quality
Automated testing
How I validated my change
Here is the payoff. Compared the benchmark of writing an image to the benchmark in PR #17274 soon to be merged to master. Notice the very significant speed and memory improvements by being able to remove the need to hash the objects for the ID.