Skip to content

fix: Operator reconciliation loop fixes#19194

Open
mclasmeier wants to merge 3 commits intomasterfrom
mc/status-controller-deployment-hpa-fix
Open

fix: Operator reconciliation loop fixes#19194
mclasmeier wants to merge 3 commits intomasterfrom
mc/status-controller-deployment-hpa-fix

Conversation

@mclasmeier
Copy link
Contributor

@mclasmeier mclasmeier commented Feb 25, 2026

Description

Two Three fixes:

  1. A change to our Helm templates which prevets our operator from playing ping-pong with the horizontal pod autoscaler (HPA). Don't set spec.replicas for deployments in case auto-scaling is enabled.
  2. This PR introduces a new reconciler predicate, which we use for filtering out Update events for Deployments for the status controller unless they really change the Status subresource. If they just change the spec (e.g. setting the number of replicas) we don't let the status controller react to it.
  3. Change the SkipStatusControllerUpdates predicate to work properly for unstructured.Unstructured.

User-facing documentation

Not needed.

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • modified existing tests
  • added unit tests

How I validated my change

change me!

@mclasmeier mclasmeier requested a review from a team as a code owner February 25, 2026 14:48
@mclasmeier mclasmeier requested review from porridge and removed request for a team February 25, 2026 14:48
@mclasmeier mclasmeier changed the title Operator reconciliation loop fixes fix: Operator reconciliation loop fixes Feb 25, 2026
@rhacs-bot
Copy link
Contributor

rhacs-bot commented Feb 25, 2026

Images are ready for the commit at 87be64a.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-181-g87be64afbd.

Moritz Clasmeier added 2 commits February 25, 2026 16:13
@mclasmeier mclasmeier force-pushed the mc/status-controller-deployment-hpa-fix branch from d2c0d1d to 87be64a Compare February 25, 2026 15:14
@codecov
Copy link

codecov bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 57.44681% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.55%. Comparing base (a4212e1) to head (9dec3f8).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
operator/internal/common/status/predicate.go 60.00% 14 Missing and 4 partials ⚠️
operator/internal/common/status/controller.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #19194      +/-   ##
==========================================
- Coverage   49.56%   49.55%   -0.01%     
==========================================
  Files        2675     2675              
  Lines      201820   201859      +39     
==========================================
+ Hits       100033   100040       +7     
- Misses      94332    94357      +25     
- Partials     7455     7462       +7     
Flag Coverage Δ
go-unit-tests 49.55% <57.44%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@porridge porridge added the auto-retest PRs with this label will be automatically retested if prow checks fails label Feb 25, 2026
Copy link
Contributor

@porridge porridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I view the predicate as a minor improvement, and would lean towards a minimal change such as #19199

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Feb 25, 2026

Images are ready for the commit at 9dec3f8.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-182-g9dec3f8f85.

to also work for unstructured.Unstructured.
conditionsChanged(objOldT, objNewT, platform.ConditionAvailable)

return !statusControllerConditionsChanged
if statusControllerConditionsChanged {
Copy link
Contributor

@porridge porridge Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, it seems that this condition was and is wrong. We should be skipping reconciliation, not simply when these two conditions changed, but instead if nothing else but these two conditions changed. (Well, modulo the resource version.)

I mean, I don't know whether the system cannot guarantee that individual updates are always processed separately. What if we see two combined changes together in a single event, due to a temporary disconnection from API server? Then this line would make us miss an update we potentially care about.

Okay, maybe this is not the best example and we could assume that if there's a disconnection then there's an unconditional reconcile after cache is reset, rather than an update event, so this code always works in practice. But I don't think we should be making such assumptions here. We should keep things simple and not add implicit coupling.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words I think this case should pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right in that this code is based on the assumption that the API server does not do any implicit batching of events. Back when I wrote it, I was convinced that this assumption holds...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/helm area/operator auto-retest PRs with this label will be automatically retested if prow checks fails

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants