Skip to content

ROX-30961: migration to populate indicator container start time WIP DO NOT REVIEW#16935

Closed
dashrews78 wants to merge 20 commits intomasterfrom
dashrews/container-start-migration-30961
Closed

ROX-30961: migration to populate indicator container start time WIP DO NOT REVIEW#16935
dashrews78 wants to merge 20 commits intomasterfrom
dashrews/container-start-migration-30961

Conversation

@dashrews78
Copy link
Contributor

@dashrews78 dashrews78 commented Sep 19, 2025

Description

Adds a migration to populate the container start time column in process indicators.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Unit test, upgrade test. Additional manual testing to ensure the upgrade and rollback worked properly.

@dashrews78
Copy link
Contributor Author

dashrews78 commented Sep 19, 2025

@openshift-ci
Copy link

openshift-ci bot commented Sep 19, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Sep 19, 2025

Images are ready for the commit at 302311e.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.9.x-936-g302311e7c0.

@codecov
Copy link

codecov bot commented Sep 19, 2025

Codecov Report

❌ Patch coverage is 61.16505% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 48.82%. Comparing base (2b2a970) to head (302311e).
⚠️ Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...dicators/test/schema/convert_process_indicators.go 0.00% 28 Missing ⚠️
...ainer_start_column_to_indicators/migration_impl.go 71.60% 16 Missing and 7 partials ⚠️
migrator/version/version.go 0.00% 17 Missing ⚠️
...to_indicators/schema/convert_process_indicators.go 79.31% 4 Missing and 2 partials ⚠️
...lumn_to_indicators/test/schema/convert_clusters.go 71.42% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #16935      +/-   ##
==========================================
+ Coverage   48.80%   48.82%   +0.01%     
==========================================
  Files        2706     2716      +10     
  Lines      202090   202398     +308     
==========================================
+ Hits        98631    98814     +183     
- Misses      95695    95804     +109     
- Partials     7764     7780      +16     
Flag Coverage Δ
go-unit-tests 48.82% <61.16%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@dashrews78 dashrews78 force-pushed the dashrews/container-start-column-30960 branch from 43b8187 to 95eab27 Compare September 22, 2025 13:34
@dashrews78 dashrews78 force-pushed the dashrews/container-start-migration-30961 branch from c2da3ac to 6df86ee Compare September 22, 2025 14:31
@dashrews78 dashrews78 force-pushed the dashrews/container-start-column-30960 branch from 6f0c461 to 92c8cbc Compare September 23, 2025 15:02
@dashrews78 dashrews78 force-pushed the dashrews/container-start-migration-30961 branch from 6df86ee to 494fd3a Compare September 24, 2025 14:33
Base automatically changed from dashrews/container-start-column-30960 to master September 24, 2025 16:38
@dashrews78 dashrews78 force-pushed the dashrews/container-start-migration-30961 branch 2 times, most recently from e985ea0 to cf81ef1 Compare September 26, 2025 09:39
@dashrews78 dashrews78 marked this pull request as ready for review September 26, 2025 13:52
@dashrews78 dashrews78 requested a review from a team as a code owner September 26, 2025 13:52
@dashrews78
Copy link
Contributor Author

/retest

@red-hat-konflux
Copy link
Contributor

Caution

There are some errors in your PipelineRun template.

PipelineRun Error
central-db-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
main-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
operator-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
operator-bundle-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
retag-collector CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
retag-scanner-db-slim CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
retag-scanner-db CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
retag-scanner-slim CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
retag-scanner CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
roxctl-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
scanner-v4-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request
scanner-v4-db-on-push CEL expression evaluation error: expression "(\n event == \"push\" && target_branch.matches(\"^(master|release-.*|refs/tags/.*)$\")\n) || (\n event == \"pull_request\" && (\n target_branch.startsWith(\"release-\") ||\n source_branch.matches(\"(konflux|renovate|appstudio|rhtap)\") ||\n (has(body.pull_request.labels) && body.pull_request.labels.exists(l, l.name == \"konflux-build\"))\n ) && body.action != \"ready_for_review\"\n)\n" failed to evaluate: no such key: pull_request

Copy link
Contributor

@clickboo clickboo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments


var clusters []string
db.Model(&updatedSchema.ProcessIndicators{}).Distinct("clusterid").Pluck("clusterid", &clusters)
log.Infof("clusters found: %v", clusters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug? Applies to a bunch of the messages, and I assume you've set them to Info for purposes of testing and plan to change most of them to Debug after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I've set some of these intentionally. What we found with the network flow migration is customers would look at the log and it wouldn't be moving so they would restart assuming it was stuck. So I logged a little extra to try to avoid that.

// have no need to worry about the old schema and can simply perform all our work on the new one.
db := database.GormDB
pgutils.CreateTableFromModel(database.DBCtx, db, updatedSchema.CreateTableProcessIndicatorsStmt)
db = db.WithContext(database.DBCtx).Table(updatedSchema.ProcessIndicatorsTableName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, use clusters table here for less expensive query

log.Infof("Processing %s with %d indicators", cluster, len(storeIndicators))
for objBatch := range slices.Chunk(storeIndicators, batchSize) {
if err = store.UpsertMany(ctx, objBatch); err != nil {
return errors.Wrap(err, "failed to upsert all converted objects")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: upsert %d objects or upsert chunk (these are not converted)
(repeated instance below, so applies at other places)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

repeated instance is unnecessary because batching with slices.Chunk doesn't leave any left overs. So really I was upserting them twice. So that was probably slow.

Copy link
Contributor

@charmik-redhat charmik-redhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why migrate by cluster and not directly migrate all processes_indicators in batches? Could there be process indicators without a cluster ID which are skipped this way?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why copy the generic store too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it purely to remove the mutex locks since there should be no risk of deadlock here as the migration is the only thing running.

defer wg.Done()
err := migrateByCluster(cluster, database)
if err != nil {
errorList = append(errorList, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would concurrent additions to the errorList cause any issue?

ctx, cancel := context.WithTimeout(database.DBCtx, types.DefaultMigrationTimeout)
defer cancel()

store := updatedStore.New(database.PostgresDB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize it outside the function to avoid doing it many times

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks the same as migrator/migrations/m_212_to_m_213_add_container_start_column_to_indicators/schema/clusters.go

Can the tests use the above clusters schema instead?

@dashrews78
Copy link
Contributor Author

/test

@openshift-ci

This comment was marked as outdated.

@dashrews78
Copy link
Contributor Author

/test gke-race-condition-qa-e2e-tests

@dashrews78
Copy link
Contributor Author

Why migrate by cluster and not directly migrate all processes_indicators in batches? Could there be process indicators without a cluster ID which are skipped this way?

indicators without a cluster ID isn't valid. The pipeline adds the cluster to the indicator on create.

	case central.ResourceAction_CREATE_RESOURCE:
		indicator := event.GetProcessIndicator()
		normalize.Indicator(indicator)

		indicator.ClusterId = clusterID

		// Build indicator from exec filepath, process, and args
		// This allows for a consistent ID to be inserted into the DB
		id.SetIndicatorID(indicator)

		return s.process(indicator)

As for the reason, I was trying to group them in such a way I could process multiple batches at the same time so I could remove the mutex lock in the UpsertMany to maximize speed.

@dashrews78 dashrews78 force-pushed the dashrews/container-start-migration-30961 branch from 894aad5 to 3915aa5 Compare September 29, 2025 13:40
@dashrews78
Copy link
Contributor Author

/retest

@dashrews78
Copy link
Contributor Author

/test gke-upgrade-tests

@dashrews78 dashrews78 changed the title ROX-30961: migration to populate indicator container start time ROX-30961: migration to populate indicator container start time WIP DO NOT REVIEW Sep 29, 2025
@openshift-ci
Copy link

openshift-ci bot commented Sep 29, 2025

@dashrews78: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-operator-e2e-tests 302311e link false /test gke-operator-e2e-tests
ci/prow/gke-upgrade-tests 302311e link false /test gke-upgrade-tests
ci/prow/gke-qa-e2e-tests 302311e link false /test gke-qa-e2e-tests
ci/prow/ocp-4-19-operator-e2e-tests 302311e link false /test ocp-4-19-operator-e2e-tests
ci/prow/ocp-4-12-operator-e2e-tests 302311e link false /test ocp-4-12-operator-e2e-tests
ci/prow/ocp-4-12-qa-e2e-tests 302311e link false /test ocp-4-12-qa-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@dashrews78
Copy link
Contributor Author

migrator and pools don't easily support concurrency. We would need a lot of work to disable cache statements and various things.

@dashrews78 dashrews78 closed this Oct 1, 2025
@dashrews78 dashrews78 deleted the dashrews/container-start-migration-30961 branch October 1, 2025 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants