Skip to content

ROX-33336: [Operator] Propagate cluster-wide TLS settings#18864

Open
vladbologa wants to merge 13 commits intomasterfrom
vb/inherit-tls-settings
Open

ROX-33336: [Operator] Propagate cluster-wide TLS settings#18864
vladbologa wants to merge 13 commits intomasterfrom
vb/inherit-tls-settings

Conversation

@vladbologa
Copy link
Copy Markdown
Contributor

@vladbologa vladbologa commented Feb 5, 2026

Description

When an OpenShift cluster has a cluster-wide TLS security profile configured via apiserver.config.openshift.io/cluster, the Operator now:

  1. Applies the TLS profile to its own metrics server (always, on OpenShift)
  2. Propagates the profile to all managed StackRox workloads via ROX_TLS_MIN_VERSION, ROX_TLS_CIPHER_SUITES, and ROX_OPENSSL_TLS_CIPHER_SUITES environment variables (when the spec.tlsAdherence policy requires it, or when FORCE_OPENSHIFT_TLS_PROFILE=true). The individual deployments can use the env var that is suitable for them (e.g. Go apps would use ROX_TLS_CIPHER_SUITES, C++/Rust/Postgres components would use ROX_OPENSSL_TLS_CIPHER_SUITES)
  3. Watches for changes to the TLS profile or adherence policy and restarts to apply them

The propagation uses customize.envVars in the Helm values, which would also allows users to set per-component overrides. The FORCE_OPENSHIFT_TLS_PROFILE Operator env var allows enforcement on clusters where spec.tlsAdherence is not yet available.

On non-OpenShift clusters, the feature is a no-op.

This PR follows the recommendations from this document, but adapted to the fact that we need to also support non-Openshift k8s. Note that in our implementation, we fetch the TLS settings only in the Operator. This avoids having multiple implementations in various languages for all our components and makes sure that the handling of these settings is consistent, by parsing them in a single place.

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

Tested on an OpenShift 4.21 cluster with Central and SecuredCluster CRs deployed. Verified env vars on all 8 workloads: central, central-db, sensor, admission-control, scanner-v4-indexer, scanner-v4-matcher, scanner-v4-db, collector.

Default (Intermediate) profile with FORCE_OPENSHIFT_TLS_PROFILE=true:

  • Set FORCE_OPENSHIFT_TLS_PROFILE=true on the operator CSV
  • No explicit tlsSecurityProfile set on the cluster (defaults to Intermediate)
  • All workloads received: ROX_TLS_MIN_VERSION=TLSv1.2, 6 ECDHE ciphers in both IANA and OpenSSL formats
  • TLS 1.3 ciphers and DHE ciphers correctly filtered out

Modern profile (watcher restart):

  • Changed apiserver.config.openshift.io/cluster to type: Modern
  • Operator logs: cluster TLS profile changed, restarting Operator to apply new settings, oldMinTLSVersion=VersionTLS12, newMinTLSVersion=VersionTLS13
  • Graceful shutdown of all controllers, operator restarted automatically
  • All workloads updated: ROX_TLS_MIN_VERSION=TLSv1.3, empty cipher lists (TLS 1.3 ciphers handled separately by Go/OpenSSL)

Custom profile (watcher restart):

  • Changed to Custom profile with 3 specific ciphers and minTLSVersion: VersionTLS12
  • Operator detected the change and restarted again
  • All workloads updated: ROX_TLS_MIN_VERSION=TLSv1.2, exactly the 3 specified ciphers in both IANA (TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
    TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) and OpenSSL format

Note that I had to use FORCE_OPENSHIFT_TLS_PROFILE=true for testing, as the tlsAdherence field will only be available in OpenShift 4.22.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Feb 5, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Copy Markdown
Contributor

rhacs-bot commented Feb 5, 2026

Images are ready for the commit at b9e88ce.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-477-gb9e88cef70.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 63.36634% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.38%. Comparing base (804901c) to head (b9e88ce).
⚠️ Report is 6 commits behind head on master.

Files with missing lines Patch % Lines
operator/internal/tlsprofile/watch.go 0.00% 26 Missing ⚠️
operator/internal/tlsprofile/provider.go 66.66% 8 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #18864      +/-   ##
==========================================
+ Coverage   49.34%   49.38%   +0.03%     
==========================================
  Files        2742     2746       +4     
  Lines      206953   207054     +101     
==========================================
+ Hits       102126   102252     +126     
+ Misses      97231    97216      -15     
+ Partials     7596     7586      -10     
Flag Coverage Δ
go-unit-tests 49.38% <63.36%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vladbologa vladbologa force-pushed the vb/inherit-tls-settings branch from 65477e3 to 1947c4e Compare February 5, 2026 12:37
@vladbologa vladbologa added the konflux-build Run Konflux in PR. Push commit to trigger it. label Feb 6, 2026
@vladbologa vladbologa force-pushed the vb/inherit-tls-settings branch 3 times, most recently from 5f90175 to 67aa3a3 Compare February 27, 2026 11:02
@vladbologa vladbologa force-pushed the vb/inherit-tls-settings branch 2 times, most recently from 0f9fa38 to be483be Compare March 2, 2026 09:53
@vladbologa vladbologa force-pushed the vb/inherit-tls-settings branch from be483be to 8a708ee Compare March 12, 2026 11:29
@vladbologa vladbologa changed the title Log TLS ciphers Honor cluster-wide TLS settings Mar 12, 2026
@stackrox stackrox deleted a comment from openshift-ci bot Mar 12, 2026
@stackrox stackrox deleted a comment from github-actions bot Mar 12, 2026
@vladbologa vladbologa force-pushed the vb/inherit-tls-settings branch 2 times, most recently from b3d11f9 to 6c56818 Compare March 12, 2026 14:50
@vladbologa vladbologa changed the title Honor cluster-wide TLS settings [wip] Honor cluster-wide TLS settings Mar 12, 2026
@vladbologa vladbologa force-pushed the vb/inherit-tls-settings branch from 6c56818 to 25bf072 Compare March 13, 2026 10:34
@vladbologa vladbologa force-pushed the vb/inherit-tls-settings branch from 25bf072 to b75703a Compare March 13, 2026 12:29
@vladbologa vladbologa force-pushed the vb/inherit-tls-settings branch from b75703a to 43f247e Compare March 24, 2026 13:13
@vladbologa vladbologa force-pushed the vb/inherit-tls-settings branch 3 times, most recently from 0023d83 to 7f73d85 Compare March 25, 2026 15:04
@stackrox stackrox deleted a comment from github-actions bot Mar 25, 2026
@stackrox stackrox deleted a comment from github-actions bot Mar 25, 2026
@vladbologa vladbologa requested a review from a team as a code owner March 26, 2026 14:01
@vladbologa vladbologa requested review from GrimmiMeloni and removed request for a team March 26, 2026 14:01
@vladbologa vladbologa changed the title [wip] Honor cluster-wide TLS settings ROX-33336: Honor cluster-wide TLS settings Mar 26, 2026
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • Consider using a bounded context (e.g., with timeout) instead of context.Background() when calling FetchProfile at startup, so a slow or non-responsive API server doesn’t block operator initialization indefinitely.
  • When ConvertProfile returns nil because the adherence policy does not require honoring the cluster TLS profile and FORCE_OPENSHIFT_TLS_PROFILE is false, it may be helpful to log this decision once so users understand why TLS env vars are not injected even though a profile exists.
  • The tls13Ciphers set in convert.go is hardcoded to the current three TLS 1.3 ciphers; consider delegating TLS 1.3 detection to library-go or another shared helper to avoid silently including new TLS 1.3 ciphers in future OpenShift releases.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider using a bounded context (e.g., with timeout) instead of context.Background() when calling FetchProfile at startup, so a slow or non-responsive API server doesn’t block operator initialization indefinitely.
- When ConvertProfile returns nil because the adherence policy does not require honoring the cluster TLS profile and FORCE_OPENSHIFT_TLS_PROFILE is false, it may be helpful to log this decision once so users understand why TLS env vars are not injected even though a profile exists.
- The tls13Ciphers set in convert.go is hardcoded to the current three TLS 1.3 ciphers; consider delegating TLS 1.3 detection to library-go or another shared helper to avoid silently including new TLS 1.3 ciphers in future OpenShift releases.

## Individual Comments

### Comment 1
<location path="operator/internal/tlsprofile/provider_test.go" line_range="145-152" />
<code_context>
+	assert.Equal(t, "ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384", profile.OpenSSLCiphers)
+}
+
+func TestFetchProfile_StrictWithNilTLSSecurityProfile(t *testing.T) {
+	apiServer := &configv1.APIServer{
+		ObjectMeta: metav1.ObjectMeta{Name: "cluster"},
+		Spec: configv1.APIServerSpec{
+			TLSAdherence: configv1.TLSAdherencePolicyStrictAllComponents,
+		},
+	}
+
+	clusterTLS, err := FetchProfile(context.Background(), fakeClientWithAPIServer(apiServer))
+	require.NoError(t, err)
+	profile := ConvertProfile(clusterTLS, false)
+	require.NotNil(t, profile)
+	assert.Equal(t, "TLSv1.2", profile.MinVersion)
+}
+
</code_context>
<issue_to_address>
**suggestion (testing):** Strengthen the nil TLSSecurityProfile test to also validate cipher conversion.

Since this path runs `FetchProfile` and `ConvertProfile` against library-go defaults, please also assert that `CipherSuites` and `OpenSSLCiphers` are populated and look reasonable (e.g., exclude TLS 1.3-only ciphers). That will verify default profiles yield valid cipher configuration, not just a minimum version.

```suggestion
	clusterTLS, err := FetchProfile(context.Background(), fakeClientWithAPIServer(apiServer))
	require.NoError(t, err)
	profile := ConvertProfile(clusterTLS, false)
	require.NotNil(t, profile)

	assert.Equal(t, "TLSv1.2", profile.MinVersion)

	// validate that default cipher configuration is populated and excludes TLS 1.3–only ciphers
	assert.NotEmpty(t, profile.CipherSuites)
	assert.Contains(t, profile.CipherSuites, "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384")
	assert.Contains(t, profile.CipherSuites, "TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384")
	assert.NotContains(t, profile.CipherSuites, "TLS_AES_")

	assert.NotEmpty(t, profile.OpenSSLCiphers)
	assert.Contains(t, profile.OpenSSLCiphers, "ECDHE-ECDSA-AES256-GCM-SHA384")
	assert.Contains(t, profile.OpenSSLCiphers, "ECDHE-RSA-AES256-GCM-SHA384")
	assert.NotContains(t, profile.OpenSSLCiphers, "TLS_AES_")
}
```
</issue_to_address>

### Comment 2
<location path="operator/internal/tlsprofile/enricher_test.go" line_range="34-54" />
<code_context>
+	assert.Contains(t, envVars["ROX_OPENSSL_TLS_CIPHER_SUITES"], "ECDHE-ECDSA-AES256-GCM-SHA384")
+}
+
+func TestEnricher_UserValuesOverrideInjected(t *testing.T) {
+	e := NewEnricher(intermediateProfile)
+
+	vals := chartutil.Values{
+		"customize": map[string]interface{}{
+			"envVars": map[string]interface{}{
+				"ROX_TLS_MIN_VERSION": "TLSv1.3",
+			},
+		},
+	}
+	result, err := e.Enrich(context.Background(), &unstructured.Unstructured{}, vals)
+	require.NoError(t, err)
+
+	envVars, err := result.Table("customize.envVars")
+	require.NoError(t, err)
+
+	assert.Equal(t, "TLSv1.3", envVars["ROX_TLS_MIN_VERSION"],
+		"user-specified value should take precedence over injected value")
+	assert.Contains(t, envVars["ROX_TLS_CIPHER_SUITES"], "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
+		"non-overridden values should still be injected")
+}
</code_context>
<issue_to_address>
**suggestion (testing):** Extend override test to verify that unrelated user envVars are preserved alongside injected TLS vars.

To better cover the coalescing behavior, please add an extra non-TLS user env var (e.g. `"FOO": "bar"`) alongside `ROX_TLS_MIN_VERSION` and assert that it remains in `customize.envVars` after enrichment, confirming we don't drop arbitrary user-defined env vars when injecting TLS settings.

```suggestion
func TestEnricher_UserValuesOverrideInjected(t *testing.T) {
	e := NewEnricher(intermediateProfile)

	vals := chartutil.Values{
		"customize": map[string]interface{}{
			"envVars": map[string]interface{}{
				"ROX_TLS_MIN_VERSION": "TLSv1.3",
				"FOO":                 "bar",
			},
		},
	}
	result, err := e.Enrich(context.Background(), &unstructured.Unstructured{}, vals)
	require.NoError(t, err)

	envVars, err := result.Table("customize.envVars")
	require.NoError(t, err)

	assert.Equal(t, "TLSv1.3", envVars["ROX_TLS_MIN_VERSION"],
		"user-specified value should take precedence over injected value")
	assert.Contains(t, envVars["ROX_TLS_CIPHER_SUITES"], "TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384",
		"non-overridden values should still be injected")
	assert.Equal(t, "bar", envVars["FOO"],
		"unrelated user-specified env vars should be preserved alongside injected TLS settings")
}
```
</issue_to_address>

### Comment 3
<location path="operator/cmd/main.go" line_range="150" />
<code_context>
 	}
 	defer restore()

+	// Read the cluster TLS profile once at startup. The TLSProfileWatcher
+	// restarts the Operator when this changes, so a single read is always current.
+	// The result is used for both the Operator's own metrics server TLS and the
</code_context>
<issue_to_address>
**issue (complexity):** Consider encapsulating the TLS profile bootstrap, conversion, and watcher wiring into helper functions so that run() focuses only on wiring controllers and starting the manager.

You can keep all the TLS-profile functionality while reducing `run()`’s cognitive load by pushing the orchestration into the `tlsprofile` package (or a small local helper). The main benefits: `run()` goes back to being “wire controllers + start manager”, and TLS concerns become self-contained.

### 1. Encapsulate TLS profile bootstrap + operand conversion + watcher

Instead of handling client creation, profile fetch, conversion, and watcher wiring inline, introduce a helper that returns everything `run()` needs:

```go
// in internal/tlsprofile/init.go (for example)
package tlsprofile

import (
    "context"
    "crypto/tls"

    configv1 "github.com/openshift/api/config/v1"
    tlspkg "github.com/openshift/controller-runtime-common/pkg/tls"
    "github.com/pkg/errors"
    ctrl "sigs.k8s.io/controller-runtime"
    ctrlClient "sigs.k8s.io/controller-runtime/pkg/client"
)

type InitResult struct {
    ClusterProfile   *configv1.TLSSecurityProfile
    OperandProfile   OperandTLSProfile // whatever ConvertProfile returns
    ManagerTLSOption func(*tls.Config) // nil if no cluster profile
}

func Initialize(
    ctx context.Context,
    mgr ctrl.Manager,
    cfg *rest.Config,          // utils.GetRHACSConfigOrDie()
    scheme *runtime.Scheme,
    forceOpenShift bool,
    log logr.Logger,
) (*InitResult, error) {
    bootstrapClient, err := ctrlClient.New(cfg, ctrlClient.Options{Scheme: scheme})
    if err != nil {
        return nil, errors.Wrap(err, "create bootstrap client for TLS profile")
    }

    clusterProfile, err := FetchProfile(ctx, bootstrapClient)
    if err != nil {
        return nil, errors.Wrap(err, "fetch cluster TLS profile")
    }

    var tlsOpt func(*tls.Config)
    if clusterProfile != nil {
        tlsConfigFn, unsupported := tlspkg.NewTLSConfigFromProfile(clusterProfile.ProfileSpec)
        if len(unsupported) > 0 {
            log.Info("some ciphers from cluster TLS profile are not supported by Go, skipping them", "ciphers", unsupported)
        }
        tlsOpt = tlsConfigFn
    }

    operandProfile := ConvertProfile(clusterProfile, forceOpenShift)

    return &InitResult{
        ClusterProfile:   clusterProfile,
        OperandProfile:   operandProfile,
        ManagerTLSOption: tlsOpt,
    }, nil
}
```

Then `run()` can be simplified to something like:

```go
ctx := context.Background()

tlsInit, err := tlsprofile.Initialize(
    ctx,
    mgr,
    utils.GetRHACSConfigOrDie(),
    scheme,
    forceOpenShiftTLSProfile.BooleanSetting(),
    setupLog,
)
if err != nil {
    return err
}

var tlsOpts []func(*tls.Config)
if tlsInit.ManagerTLSOption != nil {
    tlsOpts = append(tlsOpts, tlsInit.ManagerTLSOption)
}
if !enableHTTP2 {
    tlsOpts = append(tlsOpts, func(c *tls.Config) {
        c.NextProtos = []string{"http/1.1"}
    })
}

// ...

operandTLSProfile := tlsInit.OperandProfile
if centralReconcilerEnabled.BooleanSetting() {
    if err = centralReconciler.RegisterNewReconciler(mgr, centralLabelSelector, operandTLSProfile); err != nil {
        return errors.Wrap(err, "unable to set up Central reconciler")
    }
}
```

### 2. Hide watcher-specific context wiring

Similarly, hide the watcher + cancel wiring behind a helper so `run()` keeps the familiar pattern:

```go
// in internal/tlsprofile/init.go
func WithProfileWatcherContext(
    baseCtx context.Context,
    mgr ctrl.Manager,
    clusterProfile *configv1.TLSSecurityProfile,
) (context.Context, error) {
    if clusterProfile == nil {
        return baseCtx, nil
    }

    ctx, cancel := context.WithCancel(baseCtx)
    if err := SetupTLSProfileWatcher(mgr, *clusterProfile, cancel); err != nil {
        cancel()
        return nil, errors.Wrap(err, "set up TLS profile watcher")
    }
    return ctx, nil
}
```

And in `run()`:

```go
baseCtx := ctrl.SetupSignalHandler()
ctx, err := tlsprofile.WithProfileWatcherContext(baseCtx, mgr, tlsInit.ClusterProfile)
if err != nil {
    return err
}

setupLog.Info("starting manager")
if err = mgr.Start(ctx); err != nil {
    return errors.Wrap(err, "problem running manager")
}
```

These small helpers keep all the introduced behavior (profile fetch, operand conversion, manager TLS config, and watcher-triggered shutdown) but move the low-level details out of `run()`, which should make the startup flow easier to follow and maintain.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@vladbologa vladbologa force-pushed the vb/inherit-tls-settings branch 2 times, most recently from 7c68f15 to 8f7581f Compare March 26, 2026 19:14
@vladbologa
Copy link
Copy Markdown
Contributor Author

/konflux-retest operator-bundle-on-push

@vladbologa vladbologa changed the title ROX-33336: Honor cluster-wide TLS settings ROX-33336: [Operator] Propagate cluster-wide TLS settings Mar 27, 2026
@vladbologa vladbologa removed the konflux-build Run Konflux in PR. Push commit to trigger it. label Mar 27, 2026
@vladbologa
Copy link
Copy Markdown
Contributor Author

/test gke-operator-e2e-tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants