Skip to content

perf: reduce busybox init-time memory allocation#19952

Draft
davdhacs wants to merge 96 commits intomasterfrom
davdhacs/busybox-init-memory-reduction
Draft

perf: reduce busybox init-time memory allocation#19952
davdhacs wants to merge 96 commits intomasterfrom
davdhacs/busybox-init-memory-reduction

Conversation

@davdhacs
Copy link
Copy Markdown
Contributor

Description

Reduce init-time memory for the busybox binary and all components by eliminating unnecessary imports, deferring allocations with sync.OnceValue, and breaking heavy transitive dependency chains.

Results (Linux amd64):

  • Busybox: 16.1 MB -> 12.9 MB heap (-20%), 245K -> 173K mallocs (-29%)
  • AC standalone: 9.1 MB -> 7.2 MB heap (-21%), 87K -> 51K mallocs (-41%)
  • Binary size: 205 MB -> 194 MB (-5%)

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • modified existing tests

How I validated my change

Heap profiling with pprof in Linux amd64 containers via podman.

🤖 Generated with Claude Code

Reduce init-time memory for the busybox binary by eliminating unnecessary
imports, deferring allocations with sync.OnceValue, and breaking heavy
transitive dependency chains.

Results (Linux amd64):
- Busybox: 16.1 MB -> 12.9 MB heap (-20%), 245K -> 173K mallocs (-29%)
- AC standalone: 9.1 MB -> 7.2 MB heap (-21%), 87K -> 51K mallocs (-41%)
- Binary size: 205 MB -> 194 MB (-5%)

Generated with assistance from AI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 11, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @github-actions[bot], your pull request is larger than the review limit of 150000 diff characters

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 11, 2026

🚀 Build Images Ready

Images are ready for commit e0550b8. To use with deploy scripts:

export MAIN_IMAGE_TAG=4.11.x-721-ge0550b8251

davdhacs and others added 17 commits April 11, 2026 11:03
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test files with sql_integration build tag reference schema vars that
are now sync.OnceValue functions. Add () to all schema.XxxSchema and
pkgSchema.XxxSchema references in test files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each zap logger created with sampling enabled allocates a
counters [7][4096]counter array (~450 KB). With 100+ loggers
across sensor's dependency tree, this totals ~46 MB of heap
(40% of sensor's runtime memory at 128Mi limit).

Remove per-logger sampling from the zap config. This trades
potential log volume increase for massive memory savings.
On an idle cluster, sensor's heap drops from ~115 MB to ~69 MB.

For edge deployments with tight memory limits, this is critical —
it enables sensor to run at ~50 Mi instead of ~82 Mi.

Generated with assistance from AI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive fix for all _test.go files referencing schema vars
that are now sync.OnceValue functions. Covers sql_integration,
benchmark, and unit test files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With sync.OnceValue schemas, registration happens on first access.
Tests that explicitly call RegisterCategoryToTable or RegisterTable
after accessing a schema would cause fatal duplicate registration.
Make both functions idempotent — silently ignore re-registration
of the same table.

Also fix select_field_test.go which incorrectly added () to
TestStructsSchema (a test schema not converted to sync.OnceValue).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The GraphQL schema is parsed eagerly at startup even when no UI or
API client ever connects. On edge clusters without UI access, this
is 5 MB wasted.

Defer parsing to the first GraphQL HTTP request using sync.Once.
The first request pays a one-time parsing cost (~ms), subsequent
requests use the cached schema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Scale database connection pool based on ROX_MEMLIMIT (set via
Kubernetes downward API from container memory limits).

By default, pgx creates max(4, NumCPU) connections, each with
512-entry statement and description caches. On a 16-core host
this means 16 connections × 2 × 512 cached entries — duplicating
cache data across connections and using 10+ MB.

For memory-constrained environments:
- <512 Mi: 2 connections, 64-entry caches
- 512 Mi-2 Gi: 4 connections, 128-entry caches
- >2 Gi: pgx defaults (unchanged)

This also reduces CPU overhead from connection management on
small-core edge nodes.

Generated with assistance from AI

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix goimports formatting (blank lines in import groups) and replace
empty measurement tool files with valid Go stubs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sensor compiles ALL ~100 default policies into regexp matchers at
startup during initialPolicySync. Each policy's criteria (CVE severity,
image name patterns, etc.) gets compiled into booleanpolicy evaluators
with regexp matchers. This costs ~6 MB.

On an idle edge cluster, most policies are never evaluated because
they don't match the lifecycle stage or resource type being processed.

Wrap CompilePolicy with a lazy proxy that defers the expensive
compilation (regexp building, matcher construction) until the first
Match* or AppliesTo call. Policies that never get evaluated never
get compiled.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gRPC compression between sensor and central costs ~3 MB of compression
buffers and ongoing CPU. On local/same-cluster networks, the bandwidth
savings don't justify the cost. Disable by default with opt-in via
ROX_SENSOR_GRPC_COMPRESSION=true.

Also includes lazy policy compilation wrapper that defers 6 MB of
regexp building until policies are first evaluated.

Additional findings for future optimization:
- Process enricher LRU hardcoded at 100K entries (should scale with memory)
- Multiple cache/buffer sizes not memory-aware
- Network graph default entities use 33 MB (could be optional for edge)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each logger that writes to a file spawns a lumberjack goroutine for
log rotation. With ~30 loggers writing to /var/log/stackrox/log.txt,
that's 30 idle goroutines + 30 independent file handles to the same
file. In container environments, logs go to stdout and are collected
by the container runtime — file logging is unnecessary overhead.

Set ROX_LOGGING_TO_FILE=false to disable file logging, saving:
- 30 goroutines and their stacks
- File I/O overhead
- lumberjack rotation processing

Default is true (unchanged behavior) for backward compatibility.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Each CreateLogger call created an independent lumberjack.Logger for
the same log file, spawning its own rotation goroutine. With ~30
loggers, that's 30 goroutines + 30 file handles to the same file.

Share a single writer per path via a map. This reduces log rotation
goroutines from 30 to 1 and eliminates potential corruption from
concurrent uncoordinated writes to the same file.

GC sweet spot experiment findings (included in commit message for context):
- 128Mi: GC thrashing (84 GC/min, 200m CPU)
- 160Mi: Sweet spot (2 GC/min, 4m CPU)
- 192Mi: Comfortable (0 GC/min, 3m CPU)
- Rule: set limit to 1.3-1.5x natural heap size

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test schemas in tools/generate-helpers/pg-table-bindings/ were not
regenerated with the sync.OnceValue template change. Test files that
reference these schemas had () added by the bulk fix but the schemas
were still direct *walker.Schema, causing type mismatch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add ROX_SENSOR_LITE=false env var for edge deployments. When enabled:
- Skip local policy compilation in ProcessPolicySync (saves 6 MB + CPU)
- Skip network entity knowledge base loading (saves 16 MB)
- Events still flow to central for evaluation
- Admission controller still receives policies
- Enforcement (pod kill) still works via central commands

This reduces sensor's runtime memory by ~22 MB for edge clusters
that don't need local policy evaluation or cloud provider network
flow attribution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Process enrichment LRU cache was hardcoded at 100K entries — designed
for large enterprise clusters with thousands of containers. On a
50-container edge cluster, this is 2000x oversized.

Use pkg/sensor/queue.ScaleSize to scale based on ROX_MEMLIMIT:
- 128Mi limit → ~3K entries (sufficient for 50 containers)
- 4Gi limit → 100K entries (unchanged behavior)
- Minimum: 100 entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pkg/env imported pkg/timeutil just for HoursInDay=24 and DaysInWeek=7.
pkg/timeutil imports go-timezone which loads a 0.5 MB timezone map at
init. Since pkg/env is imported by EVERY component, this added 0.5 MB
to every binary.

Inline the constants (24 and 7) to eliminate this path. The timezone
dep still exists through cfssl/mtls but this removes one import chain.

Also regenerates test schemas with sync.OnceValue template.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
davdhacs and others added 22 commits April 12, 2026 21:38
…om sensor

Replace typed v1alpha1 struct access with unstructured field access across
all 15 compliance operator dispatcher and handler files. The v1alpha1 Go
package imports compliance-sdk/pkg/scanner which imports google/cel-go
(21 packages + antlr) — all for a customrule_types.go file that sensor
never uses.

Pattern: each dispatcher previously did
  runtime.DefaultUnstructuredConverter.FromUnstructured(obj, &typedObj)
  field := typedObj.Spec.FieldName
Now does:
  field, _, _ := unstructured.NestedString(obj.Object, "spec", "fieldName")

Constants (CheckResultPass, severity levels, annotation keys) inlined
into a local constants.go file.

Sensor third-party deps: 140 → 115 (-25)
Sensor total deps: 1330 → 1297 (-33)
Eliminated: cel-go (21), antlr (1), compliance-sdk (1), compliance-operator (1),
stoewer/go-strcase (1), plus transitive deps.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s sensor

Convert sensor from typed k8s client (kubernetes.Interface, which registers
all 58 API groups) to dynamic.Interface (which registers none). The typed
client's SharedInformerFactory pulled in informers/ (72 packages), listers/
(51), applyconfigurations/ (52), and typed API group clients (57) = 232
client-go packages.

Changes across 54 files:
- client.Interface: Kubernetes() removed, Discovery() added (lightweight)
- All enforcers: accept dynamic.Interface, use JSON merge patches
- Pod informer: k8swatch.InformerAdapter + v1Listers.NewPodLister
- Listener: removed SharedInformerFactory creation entirely
- Network policies: dynamic client for CRUD
- Cert refresh chain: 7 files converted to dynamic
- Telemetry gatherers: dynamic + discovery clients
- Heritage, orchestrator, configmap, CRS: all dynamic
- New gvr.go: 18 standard GVR constants

Remaining typed client paths (follow-up):
- pkg/k8sutil.MustCreateK8sClient (shared package, needs sub-package split)
- sensor/kubernetes/upgrade (self-contained, creates own client)

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- sensor/kubernetes/upgrade: replace kubernetes.Interface with
  dynamic.Interface for deployment CRUD operations
- pkg/k8sutil: move MustCreateK8sClient to pkg/k8sutil/k8sclient
  sub-package so the parent package no longer imports the typed client

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…client

Convert the last shared packages that imported k8s.io/client-go/kubernetes:
- pkg/metrics/tls.go: TLS cert watcher uses dynamic ConfigMapWatcher
- pkg/k8sintrospect/collector.go: diagnostic collector uses dynamic + REST
- pkg/cloudproviders/{aws,azure,gcp}: metadata from node labels via dynamic
- pkg/cloudproviders/utils: node label fetcher uses dynamic
- pkg/k8scfgwatch/cfg_watcher.go: ConfigMap watcher uses dynamic
- pkg/secretinformer: secret informer uses dynamic informer factory
- sensor/common/centralproxy: RBAC authorizer uses unstructured TokenReview

No stackrox code now directly imports k8s.io/client-go/kubernetes.
The typed client remains only as a transitive dep through
dynamicinformer → informers → kubernetes (k8s upstream coupling).

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…informers

Replace the last dependency chain to the typed k8s client:
dynamicinformer → client-go/informers → client-go/kubernetes (232 packages)

Convert 17 CRD informers from DynamicSharedInformerFactory to k8swatch:
- 9 compliance operator CRDs
- 2 virtual machine CRDs (kubevirt)
- 6 OpenShift CRDs (routes, deploymentconfigs, clusteroperators, mirrors)

Also converts CRD watcher and secret informer to k8swatch.

Helper function gvrToAPIPath converts GroupVersionResource to k8s API
paths (e.g., apps/v1/deployments → /apis/apps/v1/deployments).

k8s client-go packages: 270 → 36 (-234)
Sensor total deps: 1297 → 1062 (-235)

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The external sources handler eagerly builds a ~16 MB CIDR → entity map
when Central pushes network entities. Most clusters never query this map
from policy evaluation (only network policies referencing external
entities trigger lookups).

Change to lazy indexing: ProcessMessage stores the raw entity list,
and the CIDR/ID maps are built on the first LookupByNetwork or
LookupByID call. The IP network list for collectors is still generated
immediately (lightweight — just IP/prefix byte extraction, no maps).

On clusters without external-entity network policies: saves ~16 MB heap.
On clusters with such policies: same memory, one-time indexing cost on
first lookup (~1ms for 50K entities).

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ation

The standard protoc-gen-go generates init() functions that call
TypeBuilder.Build() to register every message type in the global
protobuf registry. This decompresses gzip descriptors, parses file
descriptor protos, and allocates type metadata — ~10-15 MB of heap
for 1,393 message types across 191 files.

Since we use vtprotobuf for all gRPC serialization (MarshalVT/
UnmarshalVT/SizeVT methods generated directly on message structs),
the global registry is never consulted at runtime for marshal/unmarshal.
The only sensor usage was one protojson.Marshal call for debug logging,
replaced with json.Marshal.

The Makefile now strips the init() call after protoc-gen-go runs:
  sed 's/^func init() { file_.*_proto_init() }/func init() {}/'

This is applied per-file during generation, so the stripped init()s
are what gets committed. The struct definitions, vtprotobuf methods,
and all message functionality remain — only the registry population
is skipped.

Note: Central's REST API (grpc-gateway) uses protojson which needs
the registry. A follow-up can conditionally re-enable registration
for the central entrypoint if needed.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PolicySet compilation (regexp compilation, matcher tree construction)
runs during initial Central sync via ProcessPolicySync. This allocates
~6 MB for ~100 policies with compiled regexps and matchers.

Defer compilation to the first Detect* call. During initial sync,
the deployment store is empty (informers haven't started), so the
re-evaluation flush is a no-op — policies are stored raw and compiled
only when the first deployment/process/network event triggers detection.

Also replaced protojson.Marshal with json.Marshal in dispatcher debug
logging — the last protojson usage in sensor.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These new files were created by the k8s dynamic client refactor and
compliance operator unstructured conversion but were not tracked by git.

- sensor/kubernetes/client/gvr.go: 18 standard GVR constants for the
  dynamic client (PodGVR, DeploymentGVR, SecretGVR, etc.)
- sensor/kubernetes/complianceoperator/dispatchers/constants.go: inlined
  compliance operator constants (check result statuses, severity levels,
  annotation keys) replacing the v1alpha1 package import

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removed from go.mod after origami replacements:
- aws-sdk-go-v2/feature/ec2/imds (replaced by pkg/cloudproviders/aws/imds.go)
- docker/distribution (replaced by inline manifest types)
- gobwas/glob (replaced by stdlib path.Match)
- heroku/docker-registry-client (replaced by origami registry client)
- openshift/runtime-utils (replaced by in-memory mirror matching)
- tkuchiki/go-timezone (replaced by stdlib time.Zone())

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- sensor/debugger/k8s: ClientSet implements Dynamic() + Discovery(),
  retains Kubernetes() for test infrastructure compatibility
- sensor/debugger/certs: cert_fetcher uses dynamic client for secrets
- sensor/tests: service_test uses e2e Resources() API instead of
  removed Kubernetes() method

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Update 18 test files for the k8s dynamic client refactor:
- Replace fake.NewClientset() with dynamicfake.NewSimpleDynamicClient
- Add Discovery() methods to fake client structs
- Update function call signatures for dynamic client parameters
- Convert typed k8s resource creation to unstructured in tests

Two complex test files (securedcluster_tls_issuer_test.go,
rbac/store_impl_test.go) still need refactoring — their mock
function signatures need updating for dynamic.Interface.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add ClusterRoleGVR and RoleGVR to GVR constants
- Fix rbac store_impl_test.go: add createUnstructured/updateUnstructured
  helpers, convert remaining fakeClient calls to dynamic
- Fix securedcluster_tls_issuer_test.go: update mock signatures for
  dynamic.Interface, create dynamic fake in test helper
- Fix sensor_owner_ref_test.go: use dynamic fake for FetchSensorDeploymentOwnerRef
- Fix tls_challenge_cert_loader_test.go: use dynamic fake for handleCABundleConfigMapUpdate
- Remove unused imports (binding_fetcher_test.go)

go vet ./sensor/... now passes with zero errors.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Revert proto init() stripping from committed generated files. The
policyutil CI tool panics with nil pointer dereference when proto
type registration is skipped — it uses protobuf reflection which
requires the global registry.

The Makefile sed approach is kept for documentation but the stripping
should be applied at build time for sensor-only images, not in the
committed generated code that all tools share.

The ~10-15 MB savings from proto init stripping remain a valid
optimization for sensor-lite builds where a separate image is built
with stripped init()s.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…_INIT

Instead of stripping proto init()s at compile time (which broke CI tools
that need the registry), add a runtime guard: when ROX_SKIP_PROTO_INIT=true,
the proto file init functions early-return without calling TypeBuilder.Build().

Mechanism: each generated package gets a 00_skip_proto_init.go file with:
  var skipProtoInit = os.Getenv("ROX_SKIP_PROTO_INIT") == "true"

Go initializes package-level vars before init() functions, and files are
processed alphabetically. The "00_" prefix ensures the var is set before
any proto init() checks it.

Each proto init guard is patched from:
  if File_storage_alert_proto != nil { return }
To:
  if File_storage_alert_proto != nil || skipProtoInit { return }

Sensor, admission-control, and config-controller can set this env var
to save ~10-15 MB of heap. Central and CLI tools run with the default
(false) and get the full proto registry for protojson/reflection.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace ROX_SKIP_PROTO_INIT env var with automatic os.Args[0] detection.
The busybox binary knows which entrypoint it is before any init() runs
(os.Args is set by the Go runtime before user code).

Central, roxctl, and migrator get full proto registry (they need
protojson/reflection). All other entrypoints (sensor, admission-control,
config-controller, compliance) skip registration automatically since
vtprotobuf handles serialization.

Zero configuration required — the binary does the right thing based on
how it was invoked.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace eager proto type registry initialization in init() with lazy
registration triggered by the first ProtoReflect() call via sync.Once.

Before: init() calls TypeBuilder.Build() for every proto file at startup,
allocating ~10-15 MB of type descriptors regardless of whether any binary
entrypoint uses reflection.

After: init() is a no-op. Each proto file gets:
  var file_*_proto_init_once sync.Once
  func file_*_proto_init_ensure() { ..._once.Do(file_*_proto_init) }

Every ProtoReflect() method calls ensure() before accessing msgTypes.
Central gets the registry when grpc-gateway calls ProtoReflect() during
startup. Sensor never triggers it because vtprotobuf uses MarshalVT/
UnmarshalVT which don't call ProtoReflect().

A Go tool (tools/proto-lazy-init) performs the transformation after
protoc-gen-go runs. The Makefile applies it to each generated .pb.go file.

No env vars, no os.Args detection, no separate builds. The right thing
happens automatically based on code paths.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The previous lazy init only added ensure() to ProtoReflect() methods.
Enum types also need registration — their String() method calls
EnumStringOf(x.Descriptor()) which accesses the file descriptor.

Add ensure() calls to every enum Descriptor() method so enum string
conversion triggers lazy registration on first use.

Also removes the 00_skip_proto_init.go files — no longer needed with
the sync.Once approach.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean up leftover skipProtoInit guards and 00_skip_proto_init.go files
that were re-introduced during the transition to sync.Once lazy init.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clean application of proto-lazy-init tool from restored generated files.
Handles split proto files (e.g., image.pb.go + image_v2.pb.go) correctly:
only the file defining _proto_init() gets _once and _ensure declarations,
secondary files just make init() a no-op.

Adds ensure() to both ProtoReflect() and enum Descriptor()/Type() methods
so enum String() works correctly (calls EnumStringOf via Descriptor).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
statement_cache_capacity and description_cache_capacity are pgx
client-side settings, not PostgreSQL server runtime parameters.
Sending them via RuntimeParams causes PostgreSQL to reject the
connection with "unrecognized configuration parameter".

Remove these from RuntimeParams — pgx manages its own statement
cache internally. The MaxConns reduction for small/medium memory
environments is retained.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs
Copy link
Copy Markdown
Contributor Author

/test ?

@davdhacs
Copy link
Copy Markdown
Contributor Author

/test gke-nongroovy-e2e-tests

@davdhacs
Copy link
Copy Markdown
Contributor Author

/test gke-latest-qa-e2e-tests

When ROX_SENSOR_LITE=true, sensor skips creating components that are
unnecessary for a lightweight event forwarder + enforcer:

Skipped (~30 MB, ~82 goroutines saved):
- Process pipeline and signal service
- Network flow manager and enrichment
- Image scanning service
- Policy detector queues (detector still exists for lazy compilation)
- Compliance operator and node inventory
- Admission control forwarding and alerts
- Telemetry, cluster metrics, external sources
- Deployment enhancer, reprocessor, delegated registry
- Virtual machine handler

Kept (core secured cluster function):
- Enforcer (kill pods on command from Central)
- Event pipeline (forward k8s events to Central)
- Cluster status and health reporting
- Network policy enforcement
- Certificate refresh and config management
- Upgrade handler

Target: sensor-lite at ~33 Mi idle (vs 132 Mi full mode under load).
Central handles policy evaluation, process analysis, and network flow
enrichment — sensor just watches and forwards.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs
Copy link
Copy Markdown
Contributor Author

/test gke-nongroovy-e2e-tests

In lite mode, imageService is nil — guard the SetClient call in
Sensor.Start() to prevent nil pointer dereference.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@davdhacs
Copy link
Copy Markdown
Contributor Author

/test gke-nongroovy-e2e-tests

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 14, 2026

@davdhacs: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-nongroovy-e2e-tests e0550b8 link true /test gke-nongroovy-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant