Skip to content

ROX-33865: Scanner V4 retry initial DB connection#19761

Open
dcaravel wants to merge 1 commit intomasterfrom
dc/v4-restart
Open

ROX-33865: Scanner V4 retry initial DB connection#19761
dcaravel wants to merge 1 commit intomasterfrom
dc/v4-restart

Conversation

@dcaravel
Copy link
Copy Markdown
Contributor

@dcaravel dcaravel commented Apr 2, 2026

Description

Fixes crash in Scanner V4 Indexer and Matcher startup

The upgrade to pgx5 in claircore changed postgres.Connect() so that it no longer established a DB connection. This causes Scanner V4 indexer and matcher to crash at a code point where a connection to the DB is expected to be avail (see below for example).

This was impacting ROX-27690 (enabling Scanner V4 in CI) - jobs were failing due to 'unexpected pod restarts'.

Example

No certificates found in /usr/local/share/ca-certificates
'/etc/pki/injected-ca-trust/tls-ca-bundle.pem' -> '/etc/pki/ca-trust/source/anchors/tls-ca-bundle.pem'
pkg/memlimit: 2026/03/25 13:13:59.887533 memlimit.go:43: Info: ROX_MEMLIMIT set to 1.86Gi
2026/03/25 13:13:59 config_file "/run/secrets/stackrox.io/proxy-config/config.yaml" does not exist, continuing...
{"level":"info","host":"scanner-v4-matcher-54bcbf4564-hhcc6","component":"main","version":"4.11.x-443-g8494726004","build_flavor":"development","time":"2026-03-25T13:13:59Z","message":"starting scanner"}
pkg/metrics: 2026/03/25 13:14:00.009988 tls.go:176: Info: Updating secure metrics client CAs based on kube-system/extension-apiserver-authentication
{"level":"info","host":"scanner-v4-matcher-54bcbf4564-hhcc6","component":"main","time":"2026-03-25T13:14:00Z","message":"indexer is disabled"}
{"level":"info","host":"scanner-v4-matcher-54bcbf4564-hhcc6","component":"main","time":"2026-03-25T13:14:00Z","message":"matcher is enabled"}
{"level":"info","host":"scanner-v4-matcher-54bcbf4564-hhcc6","component":"main","time":"2026-03-25T13:14:00Z","message":"remote indexer is enabled"}

panic: migrate: failed to connect to `user=postgres database=`: 172.30.93.203:5432 (scanner-v4-db.stackrox.svc): dial error: dial tcp 172.30.93.203:5432: connect: connection refused
goroutine 1 [running]:
github.com/remind101/migrate.(*postgresLocker).do(0xc000134930, {0x47ee513?, 0xc00011a45c?})
	github.com/remind101/migrate@v0.0.0-20170729031349-52c1edff7319/migrate.go:116 +0x10c
github.com/remind101/migrate.(*postgresLocker).Lock(0x200c000089358?)
	github.com/remind101/migrate@v0.0.0-20170729031349-52c1edff7319/migrate.go:105 +0x1f
github.com/remind101/migrate.(*Migrator).Exec(0xc000bea2a0, 0x0, {0x79364e0, 0x10, 0x10})
	github.com/remind101/migrate@v0.0.0-20170729031349-52c1edff7319/migrate.go:140 +0x65
github.com/quay/claircore/datastore/postgres.InitPostgresMatcherStore({0xc000696c30?, 0xc0000896d0?}, 0xc0008ac8c0, 0x1)
	github.com/quay/claircore@v1.5.44/datastore/postgres/matcher_store.go:28 +0x153
github.com/stackrox/rox/scanner/datastore/postgres.InitPostgresMatcherStore({0x4ed1ec8?, 0xc0006aec30?}, 0xc0008ac8c0, 0xd9?)
	github.com/stackrox/rox/scanner/datastore/postgres/matcher_store.go:34 +0x2f
github.com/stackrox/rox/scanner/matcher.NewMatcher({0x4ed1ec8?, 0xc00084f500?}, {0x1, {{0xc0001801c0, 0xd9}, {0xc0000d90e0, 0x29}}, 0x1, {0xc0000d9110, 0x24}, ...})
	github.com/stackrox/rox/scanner/matcher/matcher.go:112 +0x1d4
main.createBackends({0x4ed1ec8, 0xc00084f500}, 0xc000b30000)
	github.com/stackrox/rox/scanner/cmd/scanner/main.go:245 +0x374
main.main()
	github.com/stackrox/rox/scanner/cmd/scanner/main.go:120 +0x865 

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

No automated tests were added, Scanner V4 is not running in CI. In the future when Scanner V4 is enabled in CI the pod crashing will be caught by existing pod restart checks.

How I validated my change

Manually (because Scanner V4 is not yet running in CI)

Scaled DB pod to zero, and then restarted indexer + matcher pods, observed in logs retry attempts:

{"level":"error","host":"scanner-v4-indexer-66487598f7-dcwc9","component":"scanner/backend/indexer.NewIndexer","error":"failed to connect to `user=postgres database=`: 172.30.187.242:5432 (scanner-v4-db.stackrox.svc): dial error: dial tcp 172.30.187.242:5432: connect: connection refused","time":"2026-04-02T04:00:47Z","message":"failed to connect to postgres database"}
{"level":"warn","host":"scanner-v4-indexer-66487598f7-dcwc9","component":"scanner/backend/indexer.NewIndexer","attempt":2,"time":"2026-04-02T04:00:47Z","message":"retrying connection to postgres database"}
{"level":"error","host":"scanner-v4-indexer-66487598f7-dcwc9","component":"scanner/backend/indexer.NewIndexer","error":"failed to connect to `user=postgres database=`: 172.30.187.242:5432 (scanner-v4-db.stackrox.svc): dial error: dial tcp 172.30.187.242:5432: connect: connection refused","time":"2026-04-02T04:00:57Z","message":"failed to connect to postgres database"}
{"level":"warn","host":"scanner-v4-indexer-66487598f7-dcwc9","component":"scanner/backend/indexer.NewIndexer","attempt":3,"time":"2026-04-02T04:00:57Z","message":"retrying connection to postgres database"}
{"level":"error","host":"scanner-v4-matcher-b7d8665cf-j89tc","component":"scanner/backend/matcher.NewMatcher","error":"failed to connect to `user=postgres database=`: 172.30.187.242:5432 (scanner-v4-db.stackrox.svc): dial error: dial tcp 172.30.187.242:5432: connect: connection refused","time":"2026-04-02T04:04:38Z","message":"failed to connect to postgres database"}
{"level":"warn","host":"scanner-v4-matcher-b7d8665cf-j89tc","component":"scanner/backend/matcher.NewMatcher","attempt":2,"time":"2026-04-02T04:04:38Z","message":"retrying connection to postgres database"}
{"level":"error","host":"scanner-v4-matcher-b7d8665cf-j89tc","component":"scanner/backend/matcher.NewMatcher","error":"failed to connect to `user=postgres database=`: 172.30.187.242:5432 (scanner-v4-db.stackrox.svc): dial error: dial tcp 172.30.187.242:5432: connect: connection refused","time":"2026-04-02T04:04:48Z","message":"failed to connect to postgres database"}
{"level":"warn","host":"scanner-v4-matcher-b7d8665cf-j89tc","component":"scanner/backend/matcher.NewMatcher","attempt":3,"time":"2026-04-02T04:04:48Z","message":"retrying connection to postgres database"}

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 2, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@dcaravel dcaravel marked this pull request as ready for review April 2, 2026 04:13
@dcaravel dcaravel requested a review from a team as a code owner April 2, 2026 04:13
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 0% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.58%. Comparing base (f042134) to head (7324d70).

Files with missing lines Patch % Lines
scanner/datastore/postgres/connect.go 0.00% 9 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19761      +/-   ##
==========================================
- Coverage   49.58%   49.58%   -0.01%     
==========================================
  Files        2761     2761              
  Lines      208140   208146       +6     
==========================================
  Hits       103214   103214              
- Misses      97260    97266       +6     
  Partials     7666     7666              
Flag Coverage Δ
go-unit-tests 49.58% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rhacs-bot
Copy link
Copy Markdown
Contributor

Images are ready for the commit at 7324d70.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-529-g7324d709e0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants