Skip to content

ROX-32316: Add support for the new ACK to Sensor#18228

Merged
vikin91 merged 10 commits intomasterfrom
piotr/ROX-32316-sensor-new-ack
Jan 27, 2026
Merged

ROX-32316: Add support for the new ACK to Sensor#18228
vikin91 merged 10 commits intomasterfrom
piotr/ROX-32316-sensor-new-ack

Conversation

@vikin91
Copy link
Contributor

@vikin91 vikin91 commented Dec 16, 2025

Description

This PR is implementing Sensor ACK/NACK forwarding for the new SensorACK message.

This PR enables Sensor to receive the new SensorACK messages from Central and forward them to Compliance as ComplianceACK messages. This is required for the end-to-end ACK/NACK flow when Central rate-limits VM index reports.

Changes:

  1. Node Inventory Handler (sensor/common/compliance/node_inventory_handler_impl.go):

    • Advertises SensorACKSupport capability to Central
    • Accepts both legacy NodeInventoryACK and new SensorACK messages
    • Forwards node-related ACKs (NODE_INVENTORY, NODE_INDEX_REPORT) to Compliance as ComplianceACK
    • Ignores non-node message types (whitelist approach for future-proofing)
  2. VM Handler (sensor/common/virtualmachine/index/handler_impl.go):

    • Accepts SensorACK messages for VM_INDEX_REPORT type
    • Logs and records metrics (VM relay uses unary gRPC, no async return path)
  3. Metrics (sensor/common/virtualmachine/metrics/metrics.go):

    • Added IndexReportAcksReceived counter with action label (ACK/NACK)

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

  • Ran unit tests locally with race detection (20 iterations):
    go test -race -count=20 ./sensor/common/compliance/... -run "TestNodeInventoryHandler/TestHandlerSensorACKsToCompliance"
    go test -race -count=20 ./sensor/common/virtualmachine/index/... -run "TestVirtualMachineHandler"
  • CI
  • Manual test on a cluster and looking into metrics

AI Disclosure: Initial implementation and tests generated by AI. User corrected test patterns to avoid flaky timeout-based goroutines and reviewed whitelist logic for message type handling.

@vikin91
Copy link
Contributor Author

vikin91 commented Dec 16, 2025

@openshift-ci
Copy link

openshift-ci bot commented Dec 16, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In nodeInventoryHandlerImpl.Accepts, you now return true for any SensorAck, but processSensorACK only handles node-related types; consider narrowing Accepts to the specific SensorACK_* values this handler actually processes to avoid accidentally claiming unrelated ACKs if routing changes in the future.
  • The TestHandlerSensorACKsToCompliance test uses a 3s timeout with time.After for each subtest, which can slow down the suite and be flaky under load; consider using a buffered channel or driving ProcessMessage in a goroutine and doing a direct receive without a long timeout.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `nodeInventoryHandlerImpl.Accepts`, you now return true for any `SensorAck`, but `processSensorACK` only handles node-related types; consider narrowing `Accepts` to the specific `SensorACK_*` values this handler actually processes to avoid accidentally claiming unrelated ACKs if routing changes in the future.
- The `TestHandlerSensorACKsToCompliance` test uses a 3s timeout with `time.After` for each subtest, which can slow down the suite and be flaky under load; consider using a buffered channel or driving `ProcessMessage` in a goroutine and doing a direct receive without a long timeout.

## Individual Comments

### Comment 1
<location> `sensor/common/compliance/node_inventory_handler_impl.go:117-118` </location>
<code_context>
 }

 func (c *nodeInventoryHandlerImpl) Accepts(msg *central.MsgToSensor) bool {
-	return msg.GetNodeInventoryAck() != nil
+	return msg.GetNodeInventoryAck() != nil || msg.GetSensorAck() != nil
 }

</code_context>

<issue_to_address>
**suggestion (bug_risk):** Restrict SensorACK acceptance to node-related message types to avoid unnecessary routing

`Accepts` now returns true for any message with a `SensorACK`, but `processSensorACK` only handles `NODE_INVENTORY` and `NODE_INDEX_REPORT`, ignoring other types. This can cause unrelated SensorACK messages to be routed here and no-op. Please restrict `Accepts` to only return true for the node-related SensorACK types so other SensorACKs go to their appropriate handlers.

Suggested implementation:

```golang
func (c *nodeInventoryHandlerImpl) Accepts(msg *central.MsgToSensor) bool {
	// Always accept explicit NodeInventory acks
	if msg.GetNodeInventoryAck() != nil {
		return true
	}

	// For generic SensorAck, only accept node-related message types
	ack := msg.GetSensorAck()
	if ack == nil {
		return false
	}

	switch ack.GetMsgType() {
	case central.MsgFromSensorType_NODE_INVENTORY,
		central.MsgFromSensorType_NODE_INDEX_REPORT:
		return true
	default:
		return false
	}
}

```

The constants in the `switch` (`central.MsgFromSensorType_NODE_INVENTORY` and `central.MsgFromSensorType_NODE_INDEX_REPORT`) may need to be adjusted to match the actual enum names and package path used in your `SensorAck`/message-type definitions (for example, they might be `central.MsgFromSensor_NODE_INVENTORY`, `central.MsgFromSensor_NODE_INDEX_REPORT`, or similar). Ensure `ack.GetMsgType()` exists and returns the enum used for these values; if the accessor has a different name (e.g. `GetMsgTypeId()` or `GetOriginalType()`), update accordingly.
</issue_to_address>

### Comment 2
<location> `sensor/common/compliance/node_inventory_handler_impl.go:156-157` </location>
<code_context>
-	log.Debugf("Received node-scanning-ACK message of type %s, action %s for node %s",
+
+	// Map central.SensorACK action to sensor.ComplianceACK action
+	var action sensor.MsgToCompliance_ComplianceACK_Action
+	switch sensorAck.GetAction() {
+	case central.SensorACK_ACK:
+		action = sensor.MsgToCompliance_ComplianceACK_ACK
</code_context>

<issue_to_address>
**issue (bug_risk):** Handle unknown SensorACK actions explicitly instead of relying on zero-value enum

Right now, if `sensorAck.GetAction()` is neither ACK nor NACK (e.g., a new enum value added in Central), `action` stays at its zero value and we silently send a ComplianceACK with that default, which can misrepresent the actual action. Please add explicit handling for unknown values (e.g., a `default` branch that at least logs and ideally returns without sending).
</issue_to_address>

### Comment 3
<location> `sensor/common/virtualmachine/index/handler_impl.go:136` </location>
<code_context>
+		reason := sensorAck.GetReason()
</code_context>

<issue_to_address>
**suggestion:** Consider handling or logging unexpected SensorACK actions for VM index reports

Right now only ACK and NACK are handled; any other `SensorACK_Action` is silently ignored. As with the node handler, consider adding a `default` case that logs unexpected actions (and optionally increments a metric) to help catch protocol mismatches or future enum values.
</issue_to_address>

### Comment 4
<location> `sensor/common/compliance/node_inventory_handler_test.go:433-440` </location>
<code_context>
+				// (channel is unbuffered so we need a receiver ready)
+				var msg common.MessageToComplianceWithAddress
+				received := concurrency.NewSignal()
+				go func() {
+					defer received.Signal()
+					select {
+					case msg = <-handler.ComplianceC():
+					case <-time.After(3 * time.Second):
+						s.Fail("ComplianceACK message not received within 3 seconds")
+					}
+				}()
+
+				// Send the SensorACK message
</code_context>

<issue_to_address>
**suggestion (testing):** Avoid calling testify assertions from a goroutine and relying on a long timeout, to reduce flakiness and data races

Here `s.Fail` is called from a background goroutine and guarded by a 3s timeout, which can introduce races with the test framework and slow failures. Instead, consider having the goroutine send either the received message or an error on a buffered channel, and then `select` on that (plus a `time.After`) in the main test goroutine so all `s.*` calls stay there. Alternatively, if feasible, make `ComplianceC` buffered so you can receive synchronously without a separate goroutine. This will make the test more deterministic and race-free.
</issue_to_address>

### Comment 5
<location> `sensor/common/compliance/node_inventory_handler_test.go:388-397` </location>
<code_context>
+		"NODE_INVENTORY ACK should be forwarded": {
</code_context>

<issue_to_address>
**suggestion (testing):** Add SensorACK NACK cases (and unknown message types) to cover negative paths for node-related ACK handling

The table in `TestHandlerSensorACKsToCompliance` currently covers only ACKs and one VM message type. Please add cases for:

- `NODE_INVENTORY` with `Action: SensorACK_NACK`
- `NODE_INDEX_REPORT` with `Action: SensorACK_NACK`
- A `SensorACK` with an unknown/unused `MessageType`

For NACKs, assert that `ComplianceACK` uses `Action: ComplianceACK_NACK` and propagates `Reason`. For unknown message types, assert that `ComplianceC()` remains empty. This will exercise the negative paths of `processSensorACK`.

Suggested implementation:

```golang
		"NODE_INDEX_REPORT ACK should be forwarded": {
			sensorACK: &central.SensorACK{
				Action:      central.SensorACK_ACK,
				MessageType: central.SensorACK_NODE_INDEX_REPORT,
				ResourceId:  "node-1",
			},
			shouldForward:       true,
			expectedAction:      sensor.MsgToCompliance_ComplianceACK_ACK,
			expectedMessageType: sensor.MsgToCompliance_ComplianceACK_NODE_INDEX_REPORT,
		},
		"NODE_INVENTORY NACK should be forwarded with reason": {
			sensorACK: &central.SensorACK{
				Action:      central.SensorACK_NACK,
				MessageType: central.SensorACK_NODE_INVENTORY,
				ResourceId:  "node-1",
				Reason:      "validation failed",
			},
			shouldForward:       true,
			expectedAction:      sensor.MsgToCompliance_ComplianceACK_NACK,
			expectedMessageType: sensor.MsgToCompliance_ComplianceACK_NODE_INVENTORY,
		},
		"NODE_INDEX_REPORT NACK should be forwarded with reason": {
			sensorACK: &central.SensorACK{
				Action:      central.SensorACK_NACK,
				MessageType: central.SensorACK_NODE_INDEX_REPORT,
				ResourceId:  "node-1",
				Reason:      "validation failed",
			},
			shouldForward:       true,
			expectedAction:      sensor.MsgToCompliance_ComplianceACK_NACK,
			expectedMessageType: sensor.MsgToCompliance_ComplianceACK_NODE_INDEX_REPORT,
		},
		"Unknown SensorACK message type should be ignored": {
			sensorACK: &central.SensorACK{
				Action:      central.SensorACK_ACK,
				MessageType: central.SensorACK_UNKNOWN, // unknown/unused message type
				ResourceId:  "node-1",
			},
			shouldForward: false,
		},

```

To fully satisfy the test expectations from your comment, you should also:

1. In the test loop inside `TestHandlerSensorACKsToCompliance`:
   - For cases where `shouldForward` is `true`, in addition to checking `expectedAction` and `expectedMessageType`, assert that `ComplianceACK.GetReason()` equals the `Reason` from the input `SensorACK` for the NACK cases.
   - For cases where `shouldForward` is `false` (including the "Unknown SensorACK message type should be ignored" and the VM case), assert that no `ComplianceC` messages were emitted (e.g., the slice/queue/channel you read from is empty).

2. If `central.SensorACK_UNKNOWN` does not exist in your protobuf enum, replace it with an appropriate unknown/unused value, for example:
   - A dedicated enum value like `central.SensorACK_MESSAGE_TYPE_UNKNOWN`, or
   - A cast of an out-of-range value, e.g. `MessageType: central.SensorACK_MessageType(999)`,
   depending on how your generated code names the enum.

3. Ensure the existing VM-related case (e.g. `VM_INDEX_REPORT`) remains in the `cases` map and is still asserting that no `ComplianceACK` is forwarded for it.
</issue_to_address>

### Comment 6
<location> `sensor/common/virtualmachine/index/handler_impl_test.go:254-256` </location>
<code_context>
+	s.Assert().False(s.handler.Accepts(otherMsg), "Handler should not accept other message types")
+}
+
 func (s *virtualMachineHandlerSuite) TestProcessMessage() {
 	msg := &central.MsgToSensor{}
 	err := s.handler.ProcessMessage(context.Background(), msg)
</code_context>

<issue_to_address>
**suggestion (testing):** Extend VM handler ProcessMessage tests to cover ACK/NACK SensorACK handling and metrics

`TestProcessMessage` currently only covers the non-ACK path. Please add cases that:

- Send a `SensorACK` with `MessageType: VM_INDEX_REPORT` and `Action: SensorACK_ACK`, and assert `IndexReportAcksReceived` is incremented with label `"ACK"`.
- Send a `SensorACK` with `SensorACK_NACK` and assert the metric is incremented with label `"NACK"`.
- Optionally, send a `SensorACK` with a non-VM message type (e.g., `NODE_INDEX_REPORT`) and assert no metric changes and no error.

Prometheus `testutil` (e.g., `testutil.ToFloat64`) or resetting the counter between cases can help keep assertions deterministic.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Dec 16, 2025

Images are ready for the commit at d075ff9.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.10.x-921-gd075ff91af.

@codecov
Copy link

codecov bot commented Dec 16, 2025

Codecov Report

❌ Patch coverage is 95.45455% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 49.32%. Comparing base (6fc06f6) to head (d075ff9).
⚠️ Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
...r/common/compliance/node_inventory_handler_impl.go 94.20% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #18228      +/-   ##
==========================================
+ Coverage   49.29%   49.32%   +0.02%     
==========================================
  Files        2663     2663              
  Lines      200405   200514     +109     
==========================================
+ Hits        98786    98896     +110     
  Misses      94177    94177              
+ Partials     7442     7441       -1     
Flag Coverage Δ
go-unit-tests 49.32% <95.45%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@vikin91 vikin91 force-pushed the piotr/ROX-32316-central-rate-limiter branch from cdefaba to 1bbae43 Compare December 17, 2025 15:56
@vikin91 vikin91 force-pushed the piotr/ROX-32316-sensor-new-ack branch from c3bb1ab to 37454fc Compare December 17, 2025 15:56
@vikin91 vikin91 force-pushed the piotr/ROX-32316-central-rate-limiter branch from 1bbae43 to 8f2bc58 Compare December 18, 2025 11:33
@vikin91 vikin91 force-pushed the piotr/ROX-32316-sensor-new-ack branch from 37454fc to 62eb393 Compare December 18, 2025 11:33
@vikin91 vikin91 force-pushed the piotr/ROX-32316-central-rate-limiter branch from 8f2bc58 to fe92563 Compare January 2, 2026 14:29
@vikin91 vikin91 force-pushed the piotr/ROX-32316-sensor-new-ack branch from 62eb393 to a06a205 Compare January 2, 2026 14:29
@vikin91 vikin91 force-pushed the piotr/ROX-32316-central-rate-limiter branch from fe92563 to 7ec45b6 Compare January 6, 2026 10:54
@vikin91 vikin91 force-pushed the piotr/ROX-32316-sensor-new-ack branch 4 times, most recently from 967f10d to 69b8da0 Compare January 7, 2026 16:08
@vikin91 vikin91 force-pushed the piotr/ROX-32316-central-rate-limiter branch from fb2af82 to 06ca2c1 Compare January 8, 2026 16:57
@vikin91 vikin91 force-pushed the piotr/ROX-32316-sensor-new-ack branch 3 times, most recently from 89f1c95 to 698dad6 Compare January 13, 2026 15:20
@vikin91 vikin91 changed the base branch from piotr/ROX-32316-central-rate-limiter to master January 14, 2026 09:50
@vikin91 vikin91 force-pushed the piotr/ROX-32316-sensor-new-ack branch from 698dad6 to 09868a3 Compare January 14, 2026 09:50
@vikin91
Copy link
Contributor Author

vikin91 commented Jan 14, 2026

@sourcery-ai review

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In nodeInventoryHandlerImpl.Accepts, consider restricting SensorACK acceptance to the node-related message types you actually handle, so VM-related SensorACKs don’t appear to be owned by this handler and can be cleanly routed to the VM handler only.
  • In sendComplianceAckToCompliance, the debug log is emitted even when the select exits via StopRequested, which can be misleading; move the log into the send branch or add a separate log path for the stop case.
  • For IndexReportAcksReceived, you might prefer to derive the action label from SensorACK_Action.String() or constants instead of hardcoded "ACK"/"NACK" literals to avoid drift if new actions are added or names change.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `nodeInventoryHandlerImpl.Accepts`, consider restricting `SensorACK` acceptance to the node-related message types you actually handle, so VM-related `SensorACK`s don’t appear to be owned by this handler and can be cleanly routed to the VM handler only.
- In `sendComplianceAckToCompliance`, the debug log is emitted even when the select exits via `StopRequested`, which can be misleading; move the log into the send branch or add a separate log path for the stop case.
- For `IndexReportAcksReceived`, you might prefer to derive the `action` label from `SensorACK_Action.String()` or constants instead of hardcoded "ACK"/"NACK" literals to avoid drift if new actions are added or names change.

## Individual Comments

### Comment 1
<location> `sensor/common/compliance/node_inventory_handler_test.go:382-391` </location>
<code_context>
+	cases := map[string]struct {
</code_context>

<issue_to_address>
**suggestion (testing):** Consider extending SensorACK forwarding tests to cover NACKs and broadcast semantics

The table in `TestHandlerSensorACKsToCompliance` only covers ACKs for node-related message types. Please add: (1) NODE_INVENTORY and NODE_INDEX_REPORT cases with `SensorACK_NACK`, asserting `ComplianceACK_NACK` and `Reason` propagation; and (2) a case with an empty `ResourceId` to verify `Hostname` and `Broadcast` are set correctly for broadcast behavior. This will better cover the mapping logic for NACK and broadcast flows.

Suggested implementation:

```golang
func (s *NodeInventoryHandlerTestSuite) TestHandlerSensorACKsToCompliance() {
	cases := map[string]struct {
		sensorACK           *central.SensorACK
		shouldForward       bool // true if message should be forwarded to Compliance
		expectedAction      sensor.MsgToCompliance_ComplianceACK_Action
		expectedMessageType sensor.MsgToCompliance_ComplianceACK_MessageType
		// expectedReason verifies that NACK reasons are propagated from SensorACK to ComplianceACK.
		expectedReason string
		// expectedHostname and expectedBroadcast are used for broadcast semantics (empty ResourceId).
		expectedHostname  string
		expectedBroadcast bool
	}{

```

To fully implement your comment, the following additional edits are needed elsewhere in `TestHandlerSensorACKsToCompliance` (and possibly helper code in this file):

1. **Add NACK cases to the table:**
   - After the existing `"NODE_INVENTORY ACK should be forwarded"` case, add:
     - `"NODE_INVENTORY NACK should be forwarded with reason"`:
       - `sensorACK`: `Action: central.SensorACK_NACK`, `MessageType: central.SensorACK_NODE_INVENTORY`, non-empty `ResourceId` (e.g. `"node-1"`), and `Reason: "some failure reason"`.
       - `shouldForward: true`.
       - `expectedAction: sensor.MsgToCompliance_ComplianceACK_NACK`.
       - `expectedMessageType: sensor.MsgToCompliance_ComplianceACK_NODE_INVENTORY`.
       - `expectedReason: "some failure reason"`.
     - `"NODE_INDEX_REPORT NACK should be forwarded with reason"`:
       - `sensorACK`: `Action: central.SensorACK_NACK`, `MessageType: central.SensorACK_NODE_INDEX_REPORT`, non-empty `ResourceId`, and `Reason: "index failure"`.
       - `shouldForward: true`.
       - `expectedAction: sensor.MsgToCompliance_ComplianceACK_NACK`.
       - `expectedMessageType: sensor.MsgToCompliance_ComplianceACK_NODE_INDEX_REPORT`.
       - `expectedReason: "index failure"`.
   - Use the same enum constants for `expectedAction` and `expectedMessageType` as already used in the existing ACK case (e.g. `ComplianceACK_ACK`, `ComplianceACK_NODE_INVENTORY`, etc.).

2. **Add a broadcast case (empty `ResourceId`) to the table:**
   - Add a case like `"Broadcast NODE_INVENTORY ACK should set hostname and broadcast"`:
     - `sensorACK`: `Action: central.SensorACK_ACK`, `MessageType: central.SensorACK_NODE_INVENTORY`, `ResourceId: ""` (empty).
     - `shouldForward: true`.
     - `expectedAction: sensor.MsgToCompliance_ComplianceACK_ACK`.
     - `expectedMessageType: sensor.MsgToCompliance_ComplianceACK_NODE_INVENTORY`.
     - Set `expectedBroadcast` to whatever the handler uses to indicate broadcast (likely `true`).
     - Set `expectedHostname` to the value the handler is expected to set for broadcast (this might be `""` or derived from test suite state; align with existing handler logic).

3. **Extend the assertion logic in the test loop:**
   - Inside the loop over `cases`, where you currently assert on forwarding behavior, `Action`, and `MessageType`, add:
     - If `tc.expectedReason != ""`, assert that `complianceACK.GetReason()` (or equivalent accessor) equals `tc.expectedReason`.
     - If `tc.expectedHostname != ""` or `tc.expectedBroadcast` is set, assert:
       - `complianceACK.GetHostname()` (or equivalent) equals `tc.expectedHostname`.
       - `complianceACK.GetBroadcast()` (or equivalent) equals `tc.expectedBroadcast`.
   - Ensure the assertions are gated so that existing cases that leave these fields at their zero values (`""` / `false`) do not fail.

4. **Align with existing field names and accessors:**
   - Use the actual field names and getters on the generated `ComplianceACK` proto (e.g. `Hostname`, `GetHostname`, `Broadcast`, `GetBroadcast`, `Reason`, `GetReason`) that are already used elsewhere in the file.
   - Ensure the enum values for `ComplianceACK_Action` and `ComplianceACK_MessageType` match what’s defined in the `sensor` proto package (reusing any already referenced constants in the test).

These additions will extend the table-driven test to cover:
- NACK mapping for `NODE_INVENTORY` and `NODE_INDEX_REPORT`, including reason propagation.
- Broadcast semantics for node-related ACKs when `ResourceId` is empty, including hostname and broadcast flag expectations.
</issue_to_address>

### Comment 2
<location> `sensor/common/virtualmachine/index/handler_impl_test.go:254-256` </location>
<code_context>
+	s.Assert().False(s.handler.Accepts(otherMsg), "Handler should not accept other message types")
+}
+
 func (s *virtualMachineHandlerSuite) TestProcessMessage() {
 	msg := &central.MsgToSensor{}
 	err := s.handler.ProcessMessage(context.Background(), msg)
</code_context>

<issue_to_address>
**suggestion (testing):** Expand VM handler ProcessMessage tests to cover ACK/NACK SensorACK behavior and metrics

With the new `SensorACK` handling for `VM_INDEX_REPORT`, this test (using an empty `MsgToSensor`) no longer validates the main behavior. Please update it to: (1) send a `SensorACK` with `MessageType=VM_INDEX_REPORT` and `Action=SensorACK_ACK` and assert `IndexReportAcksReceived` increments with the `"ACK"` label; (2) send a NACK and assert the `"NACK"` label increments and no error is returned; and (3) confirm non-VM `SensorACK` messages and non-ACK messages are ignored (no counter change, no error).

Suggested implementation:

```golang
func (s *virtualMachineHandlerSuite) TestProcessMessage() {
	ctx := context.Background()

	// Helper to read the current counter value for a given label.
	getIndexReportAcksReceived := func(label string) float64 {
		return testutil.ToFloat64(s.handler.indexReportAcksReceived.WithLabelValues(label))
	}

	initialAckCount := getIndexReportAcksReceived("ACK")
	initialNackCount := getIndexReportAcksReceived("NACK")

	// 1) ACK for VM_INDEX_REPORT should increment "ACK" label counter and not error.
	ackMsg := &central.MsgToSensor{
		Msg: &central.MsgToSensor_SensorAck{
			SensorAck: &central.SensorACK{
				MessageType: central.SensorMessageType_VM_INDEX_REPORT,
				Action:      central.SensorACK_ACK,
			},
		},
	}

	err := s.handler.ProcessMessage(ctx, ackMsg)
	s.Require().NoError(err, "ACK for VM_INDEX_REPORT should not return an error")
	s.Assert().Equal(initialAckCount+1, getIndexReportAcksReceived("ACK"), "ACK counter should increment for VM_INDEX_REPORT ACK")
	s.Assert().Equal(initialNackCount, getIndexReportAcksReceived("NACK"), "NACK counter should not change for VM_INDEX_REPORT ACK")

	// 2) NACK for VM_INDEX_REPORT should increment "NACK" label counter and not error.
	nackMsg := &central.MsgToSensor{
		Msg: &central.MsgToSensor_SensorAck{
			SensorAck: &central.SensorACK{
				MessageType: central.SensorMessageType_VM_INDEX_REPORT,
				Action:      central.SensorACK_NACK,
			},
		},
	}

	err = s.handler.ProcessMessage(ctx, nackMsg)
	s.Require().NoError(err, "NACK for VM_INDEX_REPORT should not return an error")
	s.Assert().Equal(initialAckCount+1, getIndexReportAcksReceived("ACK"), "ACK counter should not change for VM_INDEX_REPORT NACK")
	s.Assert().Equal(initialNackCount+1, getIndexReportAcksReceived("NACK"), "NACK counter should increment for VM_INDEX_REPORT NACK")

	// 3a) ACK for non-VM message type should be ignored (no counter change, no error).
	nonVmAckMsg := &central.MsgToSensor{
		Msg: &central.MsgToSensor_SensorAck{
			SensorAck: &central.SensorACK{
				MessageType: central.SensorMessageType_NODE_INDEX_REPORT,
				Action:      central.SensorACK_ACK,
			},
		},
	}

	currentAckCount := getIndexReportAcksReceived("ACK")
	currentNackCount := getIndexReportAcksReceived("NACK")

	err = s.handler.ProcessMessage(ctx, nonVmAckMsg)
	s.Require().NoError(err, "ACK for non-VM message type should not return an error")
	s.Assert().Equal(currentAckCount, getIndexReportAcksReceived("ACK"), "ACK counter should not change for non-VM SensorACK")
	s.Assert().Equal(currentNackCount, getIndexReportAcksReceived("NACK"), "NACK counter should not change for non-VM SensorACK")

	// 3b) Non-ACK action for VM_INDEX_REPORT should be ignored (no counter change, no error).
	nonAckVmMsg := &central.MsgToSensor{
		Msg: &central.MsgToSensor_SensorAck{
			SensorAck: &central.SensorACK{
				MessageType: central.SensorMessageType_VM_INDEX_REPORT,
				Action:      central.SensorACK_RETRY,
			},
		},
	}

	currentAckCount = getIndexReportAcksReceived("ACK")
	currentNackCount = getIndexReportAcksReceived("NACK")

	err = s.handler.ProcessMessage(ctx, nonAckVmMsg)
	s.Require().NoError(err, "Non-ACK action for VM_INDEX_REPORT should not return an error")
	s.Assert().Equal(currentAckCount, getIndexReportAcksReceived("ACK"), "ACK counter should not change for non-ACK VM_INDEX_REPORT")
	s.Assert().Equal(currentNackCount, getIndexReportAcksReceived("NACK"), "NACK counter should not change for non-ACK VM_INDEX_REPORT")

```

To make this compile and work as intended, you will likely need to:
1. Ensure `testutil` from Prometheus is imported at the top of `handler_impl_test.go`:
   - `github.com/prometheus/client_golang/prometheus/testutil`
2. Adjust the metric accessor to match the actual handler field:
   - If the metric field is named differently (e.g. `indexReportAcksReceived` vs `indexReportAcksReceivedCounter` or exposed via a getter on the suite), update `s.handler.indexReportAcksReceived` accordingly.
3. Confirm the proto enum names:
   - If the enum for VM index reports is not `central.SensorMessageType_VM_INDEX_REPORT`, update to the correct constant.
   - Likewise, if the ACK/NACK/RETRY enums differ from `central.SensorACK_ACK`, `central.SensorACK_NACK`, `central.SensorACK_RETRY`, adjust them to the actual generated names.
4. If there is already a helper in this test file for reading metrics (e.g. a generic `getCounterValue`), consider reusing it instead of inlining `getIndexReportAcksReceived`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@vikin91 vikin91 marked this pull request as ready for review January 15, 2026 13:53
@vikin91 vikin91 requested a review from a team as a code owner January 15, 2026 13:53
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In processSensorACK, unknown or unset SensorACK.Action values fall through to the zero ComplianceACK_Action; consider explicitly handling the default/unknown case (e.g., log and return early) to avoid sending ambiguous ACKs to Compliance.
  • In TestHandlerSensorACKsToCompliance, s.Fail is called from within a goroutine, which is not goroutine-safe for testify; instead, send any error or failure flag over a channel and assert in the main test goroutine after received.Wait().
  • For the VM index ACK metrics, you rely on action.String() as the label value; if you intend the label to be strictly ACK/NACK, it may be safer to map the enum explicitly rather than relying on the generated String() representation.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `processSensorACK`, unknown or unset `SensorACK.Action` values fall through to the zero `ComplianceACK_Action`; consider explicitly handling the default/unknown case (e.g., log and return early) to avoid sending ambiguous ACKs to Compliance.
- In `TestHandlerSensorACKsToCompliance`, `s.Fail` is called from within a goroutine, which is not goroutine-safe for testify; instead, send any error or failure flag over a channel and assert in the main test goroutine after `received.Wait()`.
- For the VM index ACK metrics, you rely on `action.String()` as the label value; if you intend the label to be strictly `ACK`/`NACK`, it may be safer to map the enum explicitly rather than relying on the generated `String()` representation.

## Individual Comments

### Comment 1
<location> `sensor/common/compliance/node_inventory_handler_impl.go:147-156` </location>
<code_context>
+func (c *nodeInventoryHandlerImpl) processSensorACK(sensorAck *central.SensorACK) error {
</code_context>

<issue_to_address>
**issue:** Guard against unexpected/unknown SensorACK actions before forwarding to Compliance.

Right now only ACK and NACK are handled; any other value falls through to the zero value of `sensor.MsgToCompliance_ComplianceACK_Action`, which then gets forwarded silently. That could be misinterpreted as a valid action. Consider explicitly handling unknown actions (e.g., log and return) or mapping them to a dedicated enum value instead of relying on the zero value.
</issue_to_address>

### Comment 2
<location> `sensor/common/compliance/node_inventory_handler_test.go:423-400` </location>
<code_context>
+			expectedAction:      sensor.MsgToCompliance_ComplianceACK_ACK,
+			expectedMessageType: sensor.MsgToCompliance_ComplianceACK_NODE_INDEX_REPORT,
+		},
+		"NODE_INDEX_REPORT NACK should be forwarded with reason": {
+			sensorACK: &central.SensorACK{
+				Action:      central.SensorACK_NACK,
+				MessageType: central.SensorACK_NODE_INDEX_REPORT,
+				ResourceId:  "node-2",
+				Reason:      "index failure",
+			},
+			shouldForward:       true,
+			expectedAction:      sensor.MsgToCompliance_ComplianceACK_NACK,
+			expectedMessageType: sensor.MsgToCompliance_ComplianceACK_NODE_INDEX_REPORT,
+			expectedReason:      "index failure",
+		},
+		"Broadcast NODE_INVENTORY ACK should set broadcast": {
+			sensorACK: &central.SensorACK{
</code_context>

<issue_to_address>
**suggestion (testing):** Add a test case for SensorACK with an unexpected or unspecified action to document handler behavior

Current cases only cover `ACK` and `NACK`. Since `processSensorACK` uses a `switch` and unknown actions fall through to the zero-value `ComplianceACK_Action`, add a case where `Action` is default (`0`) or an unexpected enum. Assert the expected behavior (ignored vs forwarded with a specific mapped action) so behavior is explicit and guarded against changes when new `SensorACK` actions are added in the proto.

Suggested implementation:

```golang
		"NODE_INDEX_REPORT ACK should be forwarded": {
			sensorACK: &central.SensorACK{
				Action:      central.SensorACK_ACK,
				MessageType: central.SensorACK_NODE_INDEX_REPORT,
				ResourceId:  "node-2",
			},
			shouldForward:       true,
			expectedAction:      sensor.MsgToCompliance_ComplianceACK_ACK,
			expectedMessageType: sensor.MsgToCompliance_ComplianceACK_NODE_INDEX_REPORT,
		},
		"NODE_INDEX_REPORT with unknown/unspecified action should not be forwarded": {
			sensorACK: &central.SensorACK{
				// Zero-value / unspecified action should exercise the default branch
				// in processSensorACK's switch over central.SensorACK_Action.
				Action:      central.SensorACK_Action(0),
				MessageType: central.SensorACK_NODE_INDEX_REPORT,
				ResourceId:  "node-3",
			},
			shouldForward: false,
			// No expectedAction / expectedMessageType because nothing should be forwarded.
		},

```

If the test table struct enforces non-zero values for `expectedAction` or `expectedMessageType`, or the test code always asserts them, you may need to:
1. Update the test harness to only assert `expectedAction` / `expectedMessageType` when `shouldForward` is `true`.
2. Alternatively, set `expectedAction` to the zero-value `sensor.MsgToCompliance_ComplianceACK_Action(0)` and assert explicitly that nothing is written to the outbound channel when `shouldForward` is `false`.
</issue_to_address>

### Comment 3
<location> `sensor/common/virtualmachine/index/handler_impl_test.go:256-265` </location>
<code_context>
+func (s *virtualMachineHandlerSuite) TestProcessMessage() {
</code_context>

<issue_to_address>
**suggestion (testing):** Extend TestProcessMessage to cover messages without SensorAck to fully exercise the early-return path

Beyond non-VM `SensorAck` messages, `ProcessMessage` also returns early when `msg.GetSensorAck()` is `nil` (for other `MsgToSensor` variants). Please add coverage for this case—either in this test or a subtest—by sending a message with no `SensorAck` and asserting that metrics remain unchanged.

Suggested implementation:

```golang
	initialAck := getMetric(central.SensorACK_ACK.String())
	initialNack := getMetric(central.SensorACK_NACK.String())

	s.Run("message without SensorAck does not change metrics", func() {
		// Construct a MsgToSensor that has no SensorAck payload.
		msgWithoutAck := &central.MsgToSensor{
			Msg: &central.MsgToSensor_ClusterConfig{
				ClusterConfig: &central.ClusterConfig{},
			},
		}

		err := s.handler.ProcessMessage(ctx, msgWithoutAck)
		s.Require().NoError(err, "ProcessMessage should not error for messages without SensorAck")

		// Metrics should remain unchanged due to early return
		s.Equal(initialAck, getMetric(central.SensorACK_ACK.String()),
			"ACK metric should remain unchanged for messages without SensorAck")
		s.Equal(initialNack, getMetric(central.SensorACK_NACK.String()),
			"NACK metric should remain unchanged for messages without SensorAck")
	})

```

If `ProcessMessage` has a different signature (e.g., does not return an `error` or requires additional parameters), adjust the call accordingly and update the `s.Require().NoError(err, ...)` assertion to match the actual return type / behavior. Also ensure that `central.ClusterConfig` is the correct type for the `ClusterConfig` payload; if it has required fields, populate them appropriately in `msgWithoutAck`.
</issue_to_address>

### Comment 4
<location> `sensor/common/compliance/node_inventory_handler_impl.go:144` </location>
<code_context>
+	return nil
+}
+
+// processSensorACK handles the new generic SensorACK message from Central.
+// Only node-related ACK/NACK messages (NODE_INVENTORY, NODE_INDEX_REPORT) are forwarded to Compliance.
+// All other message types are ignored - they should be handled by their respective handlers.
</code_context>

<issue_to_address>
**issue (complexity):** Consider extracting shared ACK handling and send-and-log behavior into common helpers so the legacy and new ACK paths only differ in their small type-specific mappings.

You can reduce the new complexity by extracting the common ACK handling and send‑and‑log pattern into small helpers, while keeping the legacy/new behavior intact.

### 1. Unify ACK processing logic

Both `processSensorACK` and `processNodeInventoryACK`:

- log receipt
- map an action enum
- derive a message type enum
- forward to Compliance

You can factor out the common part into a generic helper that takes a small “adapter” struct for each ACK type.

Example sketch:

```go
type ackContext struct {
	resourceID   string
	actionStr    string
	messageTypeStr string

	// enums for Compliance
	compAction     sensor.MsgToCompliance_ComplianceACK_Action
	compMsgType    sensor.MsgToCompliance_ComplianceACK_MessageType

	// legacy metrics fields (optional)
	nodeName       string
	legacyAction   string
	legacyMsgType  string
	metricsEnabled bool

	reason         string
	logPrefix      string
}

func (c *nodeInventoryHandlerImpl) handleAck(ctx ackContext) {
	log.Debugf("%s type=%s, action=%s, resource_id=%s, reason=%s",
		ctx.logPrefix, ctx.messageTypeStr, ctx.actionStr, ctx.resourceID, ctx.reason)

	if ctx.metricsEnabled {
		metrics.ObserveNodeScanningAck(
			ctx.nodeName,
			ctx.legacyAction,
			ctx.legacyMsgType,
			metrics.AckOperationReceive,
			"",
			metrics.AckOriginSensor,
		)
	}

	c.sendComplianceAck(ctx.resourceID, ctx.compAction, ctx.compMsgType, ctx.reason)
}
```

Then each concrete handler just builds the context:

```go
func (c *nodeInventoryHandlerImpl) processSensorACK(sensorAck *central.SensorACK) error {
	var msgType sensor.MsgToCompliance_ComplianceACK_MessageType
	switch sensorAck.GetMessageType() {
	case central.SensorACK_NODE_INVENTORY:
		msgType = sensor.MsgToCompliance_ComplianceACK_NODE_INVENTORY
	case central.SensorACK_NODE_INDEX_REPORT:
		msgType = sensor.MsgToCompliance_ComplianceACK_NODE_INDEX_REPORT
	default:
		log.Debugf("Ignoring SensorACK message type %s - not handled by node inventory handler", sensorAck.GetMessageType())
		return nil
	}

	var action sensor.MsgToCompliance_ComplianceACK_Action
	switch sensorAck.GetAction() {
	case central.SensorACK_ACK:
		action = sensor.MsgToCompliance_ComplianceACK_ACK
	case central.SensorACK_NACK:
		action = sensor.MsgToCompliance_ComplianceACK_NACK
	}

	c.handleAck(ackContext{
		resourceID:      sensorAck.GetResourceId(),
		actionStr:       sensorAck.GetAction().String(),
		messageTypeStr:  sensorAck.GetMessageType().String(),
		compAction:      action,
		compMsgType:     msgType,
		reason:          sensorAck.GetReason(),
		logPrefix:       "Received SensorACK",
		metricsEnabled:  false, // or true if you want metrics for new path later
	})
	return nil
}
```

```go
func (c *nodeInventoryHandlerImpl) processNodeInventoryACK(ackMsg *central.NodeInventoryACK) error {
	var msgType sensor.MsgToCompliance_ComplianceACK_MessageType
	switch ackMsg.GetMessageType() {
	case central.NodeInventoryACK_NodeIndexer:
		msgType = sensor.MsgToCompliance_ComplianceACK_NODE_INDEX_REPORT
	// add other mappings if needed
	}

	var action sensor.MsgToCompliance_ComplianceACK_Action
	switch ackMsg.GetAction() {
	case central.NodeInventoryACK_ACK:
		action = sensor.MsgToCompliance_ComplianceACK_ACK
	case central.NodeInventoryACK_NACK:
		action = sensor.MsgToCompliance_ComplianceACK_NACK
	}

	c.handleAck(ackContext{
		resourceID:      ackMsg.GetNodeName(),
		actionStr:       ackMsg.GetAction().String(),
		messageTypeStr:  ackMsg.GetMessageType().String(),
		compAction:      action,
		compMsgType:     msgType,
		reason:          "", // legacy has no reason
		logPrefix:       "Received legacy node-scanning-ACK message",
		metricsEnabled:  true,
		nodeName:        ackMsg.GetNodeName(),
		legacyAction:    ackMsg.GetAction().String(),
		legacyMsgType:   ackMsg.GetMessageType().String(),
	})
	return nil
}
```

This keeps type‑specific mapping in the per‑ACK functions, but consolidates logging + “send to Compliance” behavior.

### 2. Unify the send‑and‑log functions

`sendAckToCompliance` and `sendComplianceAckToCompliance` share the same select + sent flag + logging pattern. You can extract the shared part and keep only the payload builder different.

Example:

```go
func (c *nodeInventoryHandlerImpl) sendComplianceAck(
	resourceID string,
	action sensor.MsgToCompliance_ComplianceACK_Action,
	messageType sensor.MsgToCompliance_ComplianceACK_MessageType,
	reason string,
) {
	msg := &sensor.MsgToCompliance{
		Msg: &sensor.MsgToCompliance_ComplianceAck{
			ComplianceAck: &sensor.MsgToCompliance_ComplianceACK{
				Action:      action,
				MessageType: messageType,
				ResourceId:  resourceID,
				Reason:      reason,
			},
		},
	}
	c.sendToComplianceWithLogging(resourceID, msg,
		"Sent ComplianceACK to Compliance",
		"Skipped sending ComplianceACK (stop requested)")
}

func (c *nodeInventoryHandlerImpl) sendToComplianceWithLogging(
	resourceID string,
	msg *sensor.MsgToCompliance,
	successMsg, skippedMsg string,
) {
	sent := false
	select {
	case <-c.stopper.Flow().StopRequested():
	case c.toCompliance <- common.MessageToComplianceWithAddress{
		Msg:       msg,
		Hostname:  resourceID,
		Broadcast: resourceID == "",
	}:
		sent = true
	}
	if sent {
		log.Debugf("%s: resource_id=%s", successMsg, resourceID)
	} else {
		log.Debugf("%s: resource_id=%s", skippedMsg, resourceID)
	}
}
```

You can then migrate existing `sendAckToCompliance` to use `sendToComplianceWithLogging` as well, so all ACK paths share identical send behavior and logging structure while differing only in the payload builder.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@vikin91 vikin91 force-pushed the piotr/ROX-32316-sensor-new-ack branch 2 times, most recently from 0380835 to 4909e9e Compare January 23, 2026 08:25
Copy link
Contributor

@lvalerom lvalerom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting partial review (I need to check the unit tests).

LGTM. I just added minor nitpicks

@vikin91 vikin91 force-pushed the piotr/ROX-32316-sensor-new-ack branch from 942aa19 to 8340b7e Compare January 26, 2026 09:18
Copy link
Contributor

@lvalerom lvalerom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a two small comments in the tests feel free to address them or resolve them 🙂

Add SensorACK message handling in Sensor to forward Central's ACK/NACK
responses to Compliance for node scanning workflows.

Changes:
- Node inventory handler advertises SensorACKSupport capability
- Node-related SensorACKs (NODE_INVENTORY, NODE_INDEX_REPORT) forwarded
  to Compliance as ComplianceACK messages
- VM handler accepts VM_INDEX_REPORT ACKs for logging/metrics only
  (VM relay uses unary gRPC, no async return path)
- Added IndexReportAcksReceived metric for VM ACK observability

AI-assisted: Initial implementation and tests generated by AI.
User-corrected: Test patterns to avoid flaky timeout-based goroutines.
@vikin91 vikin91 force-pushed the piotr/ROX-32316-sensor-new-ack branch from 8340b7e to 166d103 Compare January 27, 2026 08:22
@openshift-ci
Copy link

openshift-ci bot commented Jan 27, 2026

@vikin91: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-qa-e2e-tests d075ff9 link false /test gke-qa-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@vikin91
Copy link
Contributor Author

vikin91 commented Jan 27, 2026

The qa-e2e-test failure is unrelated "VulnScanWithGraphQLTest: Verify image info from CVE-2017-5638 in GraphQL"

@vikin91 vikin91 merged commit 2143a77 into master Jan 27, 2026
154 of 181 checks passed
@vikin91 vikin91 deleted the piotr/ROX-32316-sensor-new-ack branch January 27, 2026 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants