ROX-30961: migration to populate indicator container start time WIP DO NOT REVIEW by dashrews78 · Pull Request #16935 · stackrox/stackrox

dashrews78 · 2025-09-19T17:38:27Z

Description

Adds a migration to populate the container start time column in process indicators.

User-facing documentation

CHANGELOG.md is updated OR update is not needed
documentation PR is created and is linked above OR is not needed

Testing and quality

the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
CI results are inspected

Automated testing

How I validated my change

Unit test, upgrade test. Additional manual testing to ensure the upgrade and rollback worked properly.

dashrews78 · 2025-09-19T17:38:28Z

This change is part of the following stack:

ROX-30959: evaluate baseline benchmark #16932
- ROX-30960: add container start time as a column #16933
  - ROX-30961: migration to populate indicator container start time WIP DO NOT REVIEW #16935 ◀
    - ROX-30963: Only query indicator fields needed for evalution #16965

_{Change managed by git-spice.}

openshift-ci · 2025-09-19T17:38:32Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

rhacs-bot · 2025-09-19T17:58:24Z

Images are ready for the commit at 302311e.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.9.x-936-g302311e7c0.

codecov · 2025-09-19T18:15:32Z

Codecov Report

❌ Patch coverage is 61.16505% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.82%. Comparing base (2b2a970) to head (302311e).
⚠️ Report is 2 commits behind head on master.

Files with missing lines	Patch %	Lines
...dicators/test/schema/convert_process_indicators.go	0.00%	28 Missing ⚠️
...ainer_start_column_to_indicators/migration_impl.go	71.60%	16 Missing and 7 partials ⚠️
migrator/version/version.go	0.00%	17 Missing ⚠️
...to_indicators/schema/convert_process_indicators.go	79.31%	4 Missing and 2 partials ⚠️
...lumn_to_indicators/test/schema/convert_clusters.go	71.42%	4 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #16935      +/-   ##
==========================================
+ Coverage   48.80%   48.82%   +0.01%     
==========================================
  Files        2706     2716      +10     
  Lines      202090   202398     +308     
==========================================
+ Hits        98631    98814     +183     
- Misses      95695    95804     +109     
- Partials     7764     7780      +16

Flag	Coverage Δ
go-unit-tests	`48.82% <61.16%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

dashrews78 · 2025-09-26T16:03:13Z

/retest

red-hat-konflux · 2025-09-26T16:03:20Z

Caution

There are some errors in your PipelineRun template.

PipelineRun	Error
central-db-on-push	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
main-on-push	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
operator-on-push	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
operator-bundle-on-push	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
retag-collector	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
retag-scanner-db-slim	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
retag-scanner-db	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
retag-scanner-slim	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
retag-scanner	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
roxctl-on-push	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
scanner-v4-on-push	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`
scanner-v4-db-on-push	`CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master\|release-.\|refs/tags/.)$\")\n) \|\| (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") \|\|\n source_branch.matches(\"(konflux\|renovate\|appstudio\|rhtap)\") \|\|\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request`

clickboo

some comments

clickboo · 2025-09-26T13:55:55Z

migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/migration_impl.go

+
+	var clusters []string
+	db.Model(&updatedSchema.ProcessIndicators{}).Distinct("clusterid").Pluck("clusterid", &clusters)
+	log.Infof("clusters found: %v", clusters)


Debug? Applies to a bunch of the messages, and I assume you've set them to Info for purposes of testing and plan to change most of them to Debug after.

Actually I've set some of these intentionally. What we found with the network flow migration is customers would look at the log and it wouldn't be moving so they would restart assuming it was stuck. So I logged a little extra to try to avoid that.

clickboo · 2025-09-26T16:45:36Z

migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/migration_impl.go

+	// have no need to worry about the old schema and can simply perform all our work on the new one.
+	db := database.GormDB
+	pgutils.CreateTableFromModel(database.DBCtx, db, updatedSchema.CreateTableProcessIndicatorsStmt)
+	db = db.WithContext(database.DBCtx).Table(updatedSchema.ProcessIndicatorsTableName)


As we discussed, use clusters table here for less expensive query

clickboo · 2025-09-26T16:47:15Z

migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/migration_impl.go

+	log.Infof("Processing %s with %d indicators", cluster, len(storeIndicators))
+	for objBatch := range slices.Chunk(storeIndicators, batchSize) {
+		if err = store.UpsertMany(ctx, objBatch); err != nil {
+			return errors.Wrap(err, "failed to upsert all converted objects")


nit: upsert %d objects or upsert chunk (these are not converted)
(repeated instance below, so applies at other places)

repeated instance is unnecessary because batching with slices.Chunk doesn't leave any left overs. So really I was upserting them twice. So that was probably slow.

charmik-redhat

Why migrate by cluster and not directly migrate all processes_indicators in batches? Could there be process indicators without a cluster ID which are skipped this way?

charmik-redhat · 2025-09-26T20:10:35Z

.../migrations/m_212_to_m_213_add_container_start_column_to_indicators/generic/generic_store.go

Why copy the generic store too?

I did it purely to remove the mutex locks since there should be no risk of deadlock here as the migration is the only thing running.

charmik-redhat · 2025-09-26T20:18:18Z

migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/migration_impl.go

+			defer wg.Done()
+			err := migrateByCluster(cluster, database)
+			if err != nil {
+				errorList = append(errorList, err)


Would concurrent additions to the errorList cause any issue?

charmik-redhat · 2025-09-26T20:31:12Z

migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/migration_impl.go

+	ctx, cancel := context.WithTimeout(database.DBCtx, types.DefaultMigrationTimeout)
+	defer cancel()
+
+	store := updatedStore.New(database.PostgresDB)


Initialize it outside the function to avoid doing it many times

charmik-redhat · 2025-09-26T20:35:11Z

...r/migrations/m_212_to_m_213_add_container_start_column_to_indicators/test/schema/clusters.go

This looks the same as migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/schema/clusters.go

Can the tests use the above clusters schema instead?

dashrews78 · 2025-09-29T09:30:21Z

/test

dashrews78 · 2025-09-29T09:34:01Z

/test gke-race-condition-qa-e2e-tests

dashrews78 · 2025-09-29T10:01:33Z

Why migrate by cluster and not directly migrate all processes_indicators in batches? Could there be process indicators without a cluster ID which are skipped this way?

indicators without a cluster ID isn't valid. The pipeline adds the cluster to the indicator on create.

	case central.ResourceAction_CREATE_RESOURCE:
		indicator := event.GetProcessIndicator()
		normalize.Indicator(indicator)

		indicator.ClusterId = clusterID

		// Build indicator from exec filepath, process, and args
		// This allows for a consistent ID to be inserted into the DB
		id.SetIndicatorID(indicator)

		return s.process(indicator)

As for the reason, I was trying to group them in such a way I could process multiple batches at the same time so I could remove the mutex lock in the UpsertMany to maximize speed.

dashrews78 · 2025-09-29T14:12:23Z

/retest

dashrews78 · 2025-09-29T14:13:12Z

/test gke-upgrade-tests

openshift-ci · 2025-09-29T20:11:16Z

@dashrews78: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/gke-operator-e2e-tests	`302311e`	link	false	`/test gke-operator-e2e-tests`
ci/prow/gke-upgrade-tests	`302311e`	link	false	`/test gke-upgrade-tests`
ci/prow/gke-qa-e2e-tests	`302311e`	link	false	`/test gke-qa-e2e-tests`
ci/prow/ocp-4-19-operator-e2e-tests	`302311e`	link	false	`/test ocp-4-19-operator-e2e-tests`
ci/prow/ocp-4-12-operator-e2e-tests	`302311e`	link	false	`/test ocp-4-12-operator-e2e-tests`
ci/prow/ocp-4-12-qa-e2e-tests	`302311e`	link	false	`/test ocp-4-12-qa-e2e-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

dashrews78 · 2025-10-01T16:28:47Z

migrator and pools don't easily support concurrency. We would need a lot of work to disable cache statements and various things.

This was referenced Sep 19, 2025

ROX-30959: evaluate baseline benchmark #16932

Merged

ROX-30960: add container start time as a column #16933

Merged

openshift-ci bot added the do-not-merge/work-in-progress label Sep 19, 2025

dashrews78 force-pushed the dashrews/container-start-column-30960 branch from 43b8187 to 95eab27 Compare September 22, 2025 13:34

dashrews78 force-pushed the dashrews/container-start-migration-30961 branch from c2da3ac to 6df86ee Compare September 22, 2025 14:31

dashrews78 mentioned this pull request Sep 22, 2025

ROX-30963: Only query indicator fields needed for evalution #16965

Merged

9 tasks

dashrews78 force-pushed the dashrews/container-start-column-30960 branch from 6f0c461 to 92c8cbc Compare September 23, 2025 15:02

dashrews78 force-pushed the dashrews/container-start-migration-30961 branch from 6df86ee to 494fd3a Compare September 24, 2025 14:33

github-actions bot added the area/postgres label Sep 24, 2025

Base automatically changed from dashrews/container-start-column-30960 to master September 24, 2025 16:38

dashrews78 force-pushed the dashrews/container-start-migration-30961 branch 2 times, most recently from e985ea0 to cf81ef1 Compare September 26, 2025 09:39

dashrews78 marked this pull request as ready for review September 26, 2025 13:52

dashrews78 requested a review from a team as a code owner September 26, 2025 13:52

openshift-ci bot removed the do-not-merge/work-in-progress label Sep 26, 2025

clickboo reviewed Sep 26, 2025

View reviewed changes

charmik-redhat reviewed Sep 26, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

dashrews78 added 3 commits September 29, 2025 09:36

the migration

c7749f2

set up for concurrent

b2147b3

trying to make migration faster

c8128e7

dashrews78 added 16 commits September 29, 2025 09:36

cleaning up some

a94658d

some clean up

5bd5336

test fixes

f97abc0

some more style stuff

dff733e

stupid style

f3aef61

more style

2228d2f

messing with style still

0689a44

still working style

ee8d998

yet more style

34afc33

race

2a8448f

race

68c3d67

use cluster and see about closing connection

7caee71

fix test after getting clusters from clusters

71f317c

fix some style

e2abfe3

cursor concurrency help

c63bf0e

try walk to see if pgx is happier

3915aa5

dashrews78 force-pushed the dashrews/container-start-migration-30961 branch from 894aad5 to 3915aa5 Compare September 29, 2025 13:40

dashrews78 mentioned this pull request Sep 29, 2025

ROX-30961: simplify to single thread #17049

Merged

9 tasks

put lock back in

302311e

dashrews78 changed the title ~~ROX-30961: migration to populate indicator container start time~~ ROX-30961: migration to populate indicator container start time WIP DO NOT REVIEW Sep 29, 2025

dashrews78 closed this Oct 1, 2025

dashrews78 deleted the dashrews/container-start-migration-30961 branch October 1, 2025 16:28

Conversation

dashrews78 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

User-facing documentation

Testing and quality

Automated testing

How I validated my change

Uh oh!

dashrews78 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Sep 19, 2025

Uh oh!

rhacs-bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dashrews78 commented Sep 26, 2025

Uh oh!

red-hat-konflux bot commented Sep 26, 2025

Uh oh!

clickboo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

charmik-redhat left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dashrews78 commented Sep 29, 2025

Uh oh!

This comment was marked as outdated.

dashrews78 commented Sep 29, 2025

Uh oh!

dashrews78 commented Sep 29, 2025

Uh oh!

dashrews78 commented Sep 29, 2025

Uh oh!

dashrews78 commented Sep 29, 2025

Uh oh!

openshift-ci bot commented Sep 29, 2025

Uh oh!

dashrews78 commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dashrews78 commented Sep 19, 2025 •

edited

Loading

dashrews78 commented Sep 19, 2025 •

edited

Loading

rhacs-bot commented Sep 19, 2025 •

edited

Loading

codecov bot commented Sep 19, 2025 •

edited

Loading