Skip to content

ROX-12934: Fix operator reconciliation when external DB is in use#3322

Merged
vladbologa merged 16 commits intomasterfrom
mi/rox-12934-operator-reconcile-external-db
Nov 3, 2022
Merged

ROX-12934: Fix operator reconciliation when external DB is in use#3322
vladbologa merged 16 commits intomasterfrom
mi/rox-12934-operator-reconcile-external-db

Conversation

@misberner
Copy link
Contributor

@misberner misberner commented Oct 5, 2022

Description

This PR fixes the reconciliation of the central-db persistent volume and password.

Unfortunately, the JIRA is missing a concrete description of the issue (and also doesn't cover all the existing issues). However, this PR functionally addresses the following:

  • a generated password secret is deleted when central DB is disabled
  • when an external DB is used, the use of a manually specified secret is enforced
  • when an external DB is used, and the specified secret is not called central-db-secret, the contents of the user-supplied secret are mirrored into a central-db-secret secret (as only that is mounted into Central)
  • when the password entry in the manually specified secret changed, the central-db-secret password is also updated upon reconciliation
  • when there exists an unmanaged central-db-secret, and a password is manually specified in a different secret, reconciliation will fail if the passwords are different
  • if the central DB deployment is not used, no PVC will be created
  • if the central DB deployment is turned off, an existing, managed PVC will be disowned
  • better error checking to prevent inconsistent settings, such as using central-db persistence setting with an external DB

On the non-functional sides, this PR improves some naming and adds extensive unit tests for central DB password reconciler, as well as kuttl tests.

This PR is based upon #3293 by @connorgorman .

Checklist

  • Investigated and inspected CI test results
  • Unit test and regression tests added
  • Evaluated and added CHANGELOG entry if required
  • Determined and documented upgrade steps
  • Documented user facing changes (create PR based on openshift/openshift-docs and merge into rhacs-docs)

If any of these don't apply, please comment below.

Testing Performed

  • Manual testing
  • CI

@ghost
Copy link

ghost commented Oct 5, 2022

Images are ready for the commit at e3059f6.

To use with deploy scripts, first export MAIN_IMAGE_TAG=3.72.x-498-ge3059f6665.

@misberner misberner force-pushed the mi/rox-12934-operator-reconcile-external-db branch 2 times, most recently from 5f7c4e7 to 89fefa6 Compare October 18, 2022 11:10
@misberner misberner force-pushed the mi/rox-12934-operator-reconcile-external-db branch from 83cf14a to 823b611 Compare October 19, 2022 13:24
@misberner misberner changed the title [wip] ROX-12934: Fix operator reconciliation when external DB is in use ROX-12934: Fix operator reconciliation when external DB is in use Oct 19, 2022
@misberner misberner force-pushed the mi/rox-12934-operator-reconcile-external-db branch from 0b709ce to 5213d5f Compare October 19, 2022 21:37
tail: -1
---
apiVersion: apps/v1
kind: Deployment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to also assert health of the central deployment in addition to central-db's health?

cv.SetError(errors.New("if a connection string is provided, no persistence settings must be supplied"))
}

// TODO: there are other settings which are ignored in external mode - should we error if those are set, too?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for erroring loud and clear on dubious configurations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it. My main reservation against this stemmed from the fact that switching between configurations gets challenging, as you cannot comment out stuff in objects on etcd when doing, e.g., oc edit. But I guess it's not a very important use case to switch back and forth between internal and external DB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think now I remember the problem, but that problem is even bigger.

The OpenShift console has the very unfortunate property of importing OpenAPI field defaults into the form and then acting as if the user has set those field values. See:

Screen.Recording.2022-10-26.at.18.26.49.mov

Currently there's no issue because Gorman didn't introduce a default value for the PVC name for the DB, but, arguably, there should be one, as we also have a stackrox-db default for the central persistence, and setting a default for a field is really nothing very unreasonable.

The bigger problem now is that setting a default for any of the fields means a user using OpenShift console can no longer disable central DB. OpenShift console simply does not respect the fact that (non-)nullness of fields in the CR might carry any meaning, it uses a lossy round-trip between the two representations. This, by the way, also applies in the other direction: with the existing CR structure, it is impossible to specify "use central DB, but with default settings" in the form view in OpenShift console, you have to set any of the fields (such as explicitly typing in the default central-db pvc name) or use YAML view.

@porridge I know you had some discussion with Gorman on the original PR re: an explicit boolean field vs the mere presence of another field (this was about external, not the DB as a whole). Putting the aspect of bool vs enum aside: what is your opinion on (a) avoiding explicit enabled/disabled fields and (b) erroring on inconsistent combinations of set fields, in light of these OpenShift console form view realities?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wow. This is a game-changer. 😞
I guess this means that mere presence of a struct can no longer have any meaning.
This in turn means CentralDBEnabled() needs to look at a new explicit toggle (I would very much prefer an enum on the CentralComponentSpec level rather than bools) rather than at the nilness of .DB 🤔

Please also mention in a (non-user visible) comment somewhere why we don't look at struct nilness. This is probably not obvious for a non-openshift developer.

return errors.New("no password secret was specified")
}

passwordSecretName := r.centralObj.Spec.Central.DB.PasswordSecret.Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
passwordSecretName := r.centralObj.Spec.Central.DB.PasswordSecret.Name
passwordSecretName := r.centralObj.Spec.Central.DB.GetPasswordSecret().Name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only makes sense if there's a nil-safe getter for Name. Adding this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, we already check that GetPasswordSecret() is non-nil above, so I don't think this adds any value.

return errors.Wrapf(err, "reading central db password from secret %s", passwordSecretName)
}

r.password = password
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not returning the password and setting it from the main function? It feels odd to me because the result of that function is not returned, instead in sets state on the struct.
Setting it from the caller makes it more explicit to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it. I don't feel very strongly tbh because the reconciliation execution structs are heavily dependent on modifying state. For example, validate also sets the current password, and this is intended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think it makes more sense to leave the behavior as-is. I will rename the function to readAndSetPassword...

Namespace: testutils.TestNamespace,
},
Data: map[string][]byte{
"password": []byte(fmt.Sprintf("%s\n", pw1)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job, did not thought about this test-case.

return central.Spec.Central.GetPersistence()
case DefaultCentralDBPVCName:
return convertDBPersistenceToPersistence(central.Spec.Central.DB.GetPersistence())
func getPersistenceByTarget(central *platform.Central, target PVCTarget) *platform.Persistence {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a small doc string what this function is used for? I found it hard to understand, looking into the caller also did not really helped me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@misberner misberner force-pushed the mi/rox-12934-operator-reconcile-external-db branch from 5213d5f to da160c9 Compare October 26, 2022 17:04
Comment on lines +168 to +169
// When using this option, you must explicitly set a password secret; automatically generating a password will not
// be supported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// When using this option, you must explicitly set a password secret; automatically generating a password will not
// be supported.
// When using this option, you must explicitly set a password secret; a password will not be automatically generated.

cv.SetError(errors.New("if a connection string is provided, no persistence settings must be supplied"))
}

// TODO: there are other settings which are ignored in external mode - should we error if those are set, too?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wow. This is a game-changer. 😞
I guess this means that mere presence of a struct can no longer have any meaning.
This in turn means CentralDBEnabled() needs to look at a new explicit toggle (I would very much prefer an enum on the CentralComponentSpec level rather than bools) rather than at the nilness of .DB 🤔

Please also mention in a (non-user visible) comment somewhere why we don't look at struct nilness. This is probably not obvious for a non-openshift developer.

@openshift-ci
Copy link

openshift-ci bot commented Nov 3, 2022

@misberner: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/eks-qa-e2e-tests 5f7c4e7 link false /test eks-qa-e2e-tests
ci/prow/osd-aws-qa-e2e-tests 5f7c4e7 link false /test osd-aws-qa-e2e-tests
ci/prow/rosa-qa-e2e-tests 5f7c4e7 link false /test rosa-qa-e2e-tests
ci/prow/aro-qa-e2e-tests 5f7c4e7 link false /test aro-qa-e2e-tests
ci/prow/gke-upgrade-tests 5f7c4e7 link false /test gke-upgrade-tests
ci/prow/gke-postgres-upgrade-tests 5213d5f link false /test gke-postgres-upgrade-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@vladbologa vladbologa force-pushed the mi/rox-12934-operator-reconcile-external-db branch from d806cde to e3059f6 Compare November 3, 2022 11:54
@vladbologa vladbologa merged commit 7f8bf59 into master Nov 3, 2022
@vladbologa vladbologa deleted the mi/rox-12934-operator-reconcile-external-db branch November 3, 2022 14:26
ivan-degtiarenko pushed a commit that referenced this pull request Nov 5, 2022
)

Co-authored-by: Connor Gorman <cgorman@redhat.com>
Co-authored-by: Vlad Bologa <vbologa@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants