Dashrews/ROX-13253 wait for central-db to come back after bounce and allow FATAL connection lost error#3537
Conversation
|
Images are ready for the commit at fa4ffe3. To use with deploy scripts, first |
|
/test gke-postgres-upgrade-tests |
|
/retest |
|
/test gke-postgres-upgrade-tests |
|
/test gke-postgres-upgrade-tests |
2 similar comments
|
/test gke-postgres-upgrade-tests |
|
/test gke-postgres-upgrade-tests |
| # When working as expected it takes less than one minute for the API server to | ||
| # reach ready. Often times out on OSD. If this call fails in CI we need to | ||
| # identify the source of pull/scheduling latency, request throttling, etc. | ||
| # I tried increasing the timeout from 5m to 20m for OSD but it did not help. |
There was a problem hiding this comment.
I see you set the wait timeout to be 5 mins. Do you know why did OSD take so long to reinstate? How long does it usually take?
There was a problem hiding this comment.
@c-du I literally copied the top half of wait_for_api where it checks to make sure the deployment is running. That includes the copying of the comment. So I don't know.
| validate_upgrade "01-bounce-db-after-upgrade" "bounce central db after postgres upgrade" "268c98c6-e983-4f4e-95d2-9793cebddfd7" | ||
| collect_and_check_stackrox_logs "$log_output_dir" "01_post_bounce-db" | ||
| validate_upgrade "02-bounce-db-after-upgrade" "bounce central db after postgres upgrade" "268c98c6-e983-4f4e-95d2-9793cebddfd7" | ||
| # Since we bounced the DB we may see some errors. Those need to be allowed in the case of this test ONLY. |
|
@dashrews78: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
|
||
| # Ready case | ||
| if [[ "$replicas" == 1 && "$ready_replicas" == 1 ]]; then | ||
| sleep 30 |
There was a problem hiding this comment.
Why do we need this if replicas are ready?
There was a problem hiding this comment.
@janisz I mirrored (actually copied) this from the existing wait_for_api. So I assumed that was there for reasons. I could take it out if everyone likes, but I don't think it does any harm.
7ffc6be ROX-13368: Skip failing nongroovy tests on PG (#3721) bbdd7a0 Bump github.com/gofrs/uuid from 4.3.0+incompatible to 4.3.1+incompatible (#3642) 1f253f2 Bump github.com/google/certificate-transparency-go from 1.1.3 to 1.1.4 (#3543) d434c8d [ROX-13030] : Add delete collection API endpoint and service implementation (#3648) f062c21 Dashrews/ROX-13253 wait for central-db to come back after bounce and allow FATAL connection lost error (#3537) edc1174 CI: Fill the gaps for https://testgrid.k8s.io/ (#3715) 86d7c54 ROX-13231: use passed context when non-postgres (#3540) 9093195 Add less specific type for BE collection response string (#3728) 5abb652 Only enable ROX_OBJECT_COLLECTIONS feature flag during gke-postgres-ui-e2e job (#3727) 4f64cd1 Add centralDBOnly mode in render (#3707) d67bbe5 Dashrews/ROX-13082 UUID searcher and common updates to set allow use of postgres UUID PR 1 of 4 (#3679) 6f829d5 ROX-13259: graphInit called during init time (#3705) f3bc50d ROX-13380: Conditional rendering edges for deployments and namespaces (#3641) 3764476 ROX-12319: implement smoke test step with groovy test filter (#3220) f202fd4 ROX-11826: Disable kernel support package uploads for managed central (#3661) 61f03dc ROX-11101: Remove deprecated resources from central (#3115) e6aa6d7 ROX-11101: Restore Role permission in UI (#3428) 3203e04 ROX-11101: Remove deprecated resources (#3036) a35f41e Bump golang.org/x/sys from 0.1.0 to 0.2.0 (#3733) 4120524 Bump snakeyaml from 1.29 to 1.33 in /qa-tests-backend (#3732) e1785c0 Bump github.com/coreos/go-systemd/v22 from 22.4.0 to 22.5.0 (#3724) 870df4a Bump google.golang.org/api from 0.101.0 to 0.102.0 (#3723) 721454c Generalize User-Agent setup (#3672) 6a11bf0 Bumps collector version to 3.11.x-145-gc345f72f5e (#3736) ec5d343 [ROX-12923] Walk retries - remainder work (#3729) 40f3d43 ROX-13440: Replace ambiguous central with sensor in networkGraph integration test (#3730) 281ed22 Bump groovy-xml from 2.5.18 to 2.5.19 in /qa-tests-backend (#3741) 49d1651 Bump github.com/prometheus/client_golang from 1.13.1 to 1.14.0 (#3742) b5544aa Bump cloud.google.com/go/storage from 1.27.0 to 1.28.0 (#3743) 9c61e53 ensure CVSS is present for istio vulns (#3706) ae29d52 ROX-13452: don't always clobber scoped ctx when non-postgres (#3748) 517bf05 ROX-13261: DryRunUpdate on collection datastore (#3687) baf7654 ROX-13378: Group new resources with deprecated in UI (#3690) 569922f ROX-13421: Enable roxctl netpol generate and add tech-preview messages (#3740) 2465fc5 Dashrews/ROX-13082 UUID generator templates PR 2 of 4 (#3681) c093c68 Bump slack-api-client from 1.20.2 to 1.27.0 in /qa-tests-backend (#3752) 2c860bb Bump ubi8-micro from 8.6 to 8.7 in /operator (#3751) 80eb04c Make deploy.sh and deploy-local.sh pass shellcheck (#3582) 2182b43 Dashrews/ROX-13082 UUID test updates PR 3 of 4 (#3694) 6dc6ca5 [ROX-13403] : Fix node -> topVuln sub resolver bug when node cves is empty (#3689) 1b21361 Move integration tests for page title from general to specific containers (#3675) e1a9f31 Bump google.golang.org/api from 0.102.0 to 0.103.0 (#3773) a05ea31 Bump golang.org/x/crypto from 0.1.0 to 0.2.0 (#3772) 65ddf4f ROX-12824: Add roxctl commands to generate Central DB bundle (#3602) c3f1e2f Remove obsolete authProviders request for Integrations page (#3759) 7ccd54d Dashrews/ROX-13082 UUID protos generated PR 4 of 4 (#3698) 9ab5c8f cleanup image digest utilities (#3764) 187ed44 ROX-11931: Convert junit failure artifacts to Slack attachments (#3438) b5d8790 ROX-13432: leaning up unused code copied/pasted from topology demo (#3750) ab05bfc Refactor collection form page for better composition (#3744) c5562f7 Remove babel devDependencies in ui-components (#3761) 2b90b3a Extract collection form from drawer wrapper layout (#3745) a779fc9 [ROX-12625 + ROX-13032] : Add GetCollectionCount and UpdateCollection endpoints and services (#3749) e77f0da Upgrade cypress 11.0.0 devDependencies in ui (#3760) a3fba94 ROX-13068: Use real data for deployment details (#3688) 4c7d90e ROX-12617: Collection to search query converter (#3683) 3e98aec ROX-13067: fill out port configurations section of deployment details (#3714) a48de36 ROX-12835: Add support for NodeScanV2 to Sensor (#3533) 30c5dc7 ROX-13466: Fix deletion of groups with empty properties (#3756) 5cb2470 Add autocomplete for name selector dropdowns (#3676) b9a75ad ROX-13464 adding flows dropdown in NG (#3763) 3217a67 [ROX-13500] Perform type check for V1 CronJob (#3787) af3790d Remove bulk delete from collections table (#3776) dda123b Add more info in migration log (#3788) 179f0c9 ROX-13502: Remove the circular dependency between cluster datastore init and cscc notifier init (#3790) 029d584 Update SCANNER_VERSION (#3774) cbca57c Bump github.com/ckaznocha/protoc-gen-lint from 0.2.4 to 0.3.0 (#3783) 3613b56 Bump golang.org/x/tools from 0.2.0 to 0.3.0 (#3782) 5fc0a6a Bump github.com/google/go-containerregistry from 0.12.0 to 0.12.1 (#3781) 1d1c687 Bump controller-gen version to 0.10.0 (#3754) c3a5290 Untie documentation link from the product version (#3799) ed822aa use correct package for migration (#3784) 397a0b4 Validate that label keys are valid k8s labels and ensure correct key splitting (#3777) edd1050 Rename variable ScannerGRPCEndpoint to ScannerSlimGRPCEndpoint (#3657) 6662c9f ROX-13378: Access Control page permissions (#3720) b0e73c5 fix Operator reconciliation for external Central DB (#3796) b83bc1f ROX-13505: Fix error log scanning the postgres stat collection (#3795) ca660cb Prevent the collection being edited from displaying in its own embedded list (#3778) 3f7b3fc [ROX-13441][POSTGRES] Propagate context correctly in retries (#3793) e0cbc6f ROX-12839: Update changelog to announce removal of in-product docs (#3805) 696e8bc [ROX-12358] Follow up on vulnerability request proto change (#2851) c4b46d8 Change getCollectionCount endpoint and updateCollection request type 5f2efbc remove make proto-fmt (#3804) 0c75540 Remove os.Std* from roxctl/central (#3758) 25a90de Add ability to view embedded collections in a pop up modal (#3747) 5c1bf81 ROX-13240: fix scanner-slim updates when WebSockets are used (#3704) 1d98577 Add more context to jira notifier logging (#3812) da2fd28 ROX-13031: DryRun Collection API (#3766) 1c418d5 Test data migration code in postgres tests (#3803) ed95b37 Update UI Collection requests for BE compatibiltiy (#3762) 09cc188 ROX-11931: Fix junit-parse install in CI (#3811) d2b01e3 ROX-12814: Disable PolicyFieldsTest on openshift. (#3797) d10ce27 ROX-13345: disable 'missing required registry' aspect on openshift (#3798) 3d22396 Update collector to 3.12 (#3809) 1eb33fb ROX-13347: Modify scope queries to included quoted cluster and nameace names, to allow exact matches instead of erroneous and unintended prefix matches. (#3767) 3811a69 ROX-12621: list collection selectors api (#3806) f6d3f9d Add migration for groups with invalid values (#3789) cc21125 Bugfixes for collection autocomplete (#3816) 7623dec ROX-9350 Use fine-grained host paths for compliance mounts (#2479) b4bf5c2 Fix collector volumeMounts (#3826) 0e9be05 ROX-12953: figure out last 4 versions of sensor automatically (#3611) 459c7ae ROX-12814: Add proper todo for reenabling the test (#3817) 9ee40ff ROX-13523: add isEnabled enum to central db spec (#3815) 535bc72 Replace requestConfig with routeMatcherMap in helper functions for integration tests (#3686) 2b75b61 `gosec` G104: Add `ShouldErr(err)` that returns `err` (#3830) fb1b82f WIP: Introduce nodescan call 35f8a8f WIP: Prepare converter 716144b Moved and renamed fake nodescan tests 4748de7 Introduce real node scanner with conversion functions 5e6d9a8 wip: real scanner 0169868 wip: log results 1438d2c wip: Debug Analyze call b09894c wip: Debug Analyze call 3ceed72 wip: Update and improve debug logs d1669fd Remove copied lib, bump scanner version, add debug 14a3f73 Merge branch 'master' into mm/ROX-12967-real-nodescan fbd0450 Fix style issues 17ccb31 Debug: let both scans finish to see what they return
Description
2 issues here:
FATAL: terminating connection due to administrator command (SQLSTATE 57P01). These errors need to be allowed in this test, however, we probably shouldn't blindly allow them to ensure we don't miss issues in other tests. So that string is added to the allow list for this test only.Checklist
- [ ] Unit test and regression tests added- [ ] Evaluated and added CHANGELOG entry if required- [ ] Determined and documented upgrade steps- [ ] Documented user facing changes (create PR based on [openshift/openshift-docs](https://github.com/openshift/openshift-docs) and merge into rhacs-docs)If any of these don't apply, please comment below.
Testing Performed
This is a test. The testing is running it.