Skip to content

ROX-28315: Add UpdatedAt field to the NetFlows table#14483

Merged
lvalerom merged 9 commits intomasterfrom
lvm/rox-28315-add-updated-at-field-to-network-flows
Mar 17, 2025
Merged

ROX-28315: Add UpdatedAt field to the NetFlows table#14483
lvalerom merged 9 commits intomasterfrom
lvm/rox-28315-add-updated-at-field-to-network-flows

Conversation

@lvalerom
Copy link
Contributor

@lvalerom lvalerom commented Mar 4, 2025

Description

This PR aims to fix an edge case where we do not received a NetworkFlow with the LastSeenTimestamp set and the flow is received after the deployment was deleted.

A little bit of context:

  1. NetworkFlows events and Deployment/Pod events are handled in different pipelines, thus there's no guarantee that the Deployment/Pod event will be received before the NetworkFlows associated with it.
  2. NetworkFlows associated with a Deployment/Pod are deleted when the REMOVE event of the associated Deployment/Pod is handled.
  3. Due to (1) we can receive a NetworkFlow update associated with a deployment that was already deleted.
  4. Due to (1) we can receive a NetworkFlow update associated with a deployment that has yet to arrive.
  5. Due to (3) and (4) we cannot discard NetworkFlow events even if the deployment is not found.
  6. Central has a pruning mechanism to remove orphaned flows that relies on LastSeenTimestamp being set.

The bug:

Sometimes we might never receive the updated flows with LastSeenTimestamp set. This could happen because Collector/Sensor failed to process the LastSeenTimestamp (#14538 will improve on that) or simply the message was lost and never arrived to Central (e.g. Sensor restarted and lost all the information about the flows).

An example step by step of the bug:

  1. Central receives a Deployment event (CREATE/SYNC).
  2. A NetworkFlow associated with that deployment is handled and stored in centraldb.
  3. Central receives a Deployment event (REMOVE) for the same deployment.
  4. The deployment and all the NetworkFlows associated with it are deleted.
  5. Central receives a NetworkFlow event associated with the already deleted deployment without LastSeenTimestamp.
  6. Central stores the orphaned flow.
  7. Sensor fails to send the NetworkFlow event associated with the deployment with LastSeenTimestamp (Maybe Sensor restarted for some reason or collector failed to send the information).
  8. The orphaned flow will stay in centraldb forever.

The solution:

  • Add a new column UpdatedAt to the NetworkFlows table.
  • This new field will be handled with central and represents the last time central upserted the flow.
  • Update the pruning mechanism to use this UpdatedAt field instead of the LastSeenTimestamp.

User-facing documentation

  • CHANGELOG is updated OR update is not needed
  • documentation PR is created and is linked above OR is not needed

Testing and quality

  • the change is production ready: the change is GA or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

  • Updated postgres test.
  • Tested manually that the column is set.
  • Migration added and tested.
  • Manually tested the pruning mechanism with fake-workloads.
  • Manually tested the upgrade path
    • Deploy with older version
    • Upgrade to this branch's version
    • Observe the column is created and set: select * from network_flows_v2 where updatedat is null; should return zero
  • Manually tested the rollback path
    • Deploy with older version
    • Upgrade to this branch's version
    • Downgrade to older version
    • Observe new flows will set the UpdatedAt field to NULL: select * from network_flows_v2 where updatedat is null; should return some flows
    • Upgrade again to this branch's version
    • Observe there are no flows with UpdatedAt unset: select * from network_flows_v2 where updatedat is null; should return zero

@openshift-ci
Copy link

openshift-ci bot commented Mar 4, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Contributor

rhacs-bot commented Mar 4, 2025

Images are ready for the commit at 8ddb721.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.8.x-203-g8ddb7214df.

@codecov
Copy link

codecov bot commented Mar 4, 2025

Codecov Report

Attention: Patch coverage is 72.85714% with 19 lines in your changes missing coverage. Please review.

Project coverage is 49.29%. Comparing base (7413310) to head (8ddb721).
Report is 41 commits behind head on master.

Files with missing lines Patch % Lines
...d_updated_at_to_network_flows_v2/migration_impl.go 56.66% 9 Missing and 4 partials ⚠️
central/graphql/resolvers/generated.go 25.00% 3 Missing ⚠️
...ph/flow/datastore/internal/store/postgres/store.go 85.71% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #14483       +/-   ##
===========================================
- Coverage   49.45%   49.29%    -0.16%     
===========================================
  Files         723     2532     +1809     
  Lines       72094   185086   +112992     
===========================================
+ Hits        35655    91245    +55590     
- Misses      33198    86610    +53412     
- Partials     3241     7231     +3990     
Flag Coverage Δ
go-unit-tests 49.29% <72.85%> (-0.16%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lvalerom lvalerom force-pushed the lvm/rox-28315-add-updated-at-field-to-network-flows branch 7 times, most recently from 4b8ae34 to bb4e156 Compare March 10, 2025 12:41
@lvalerom lvalerom force-pushed the lvm/rox-28315-add-updated-at-field-to-network-flows branch from bb4e156 to d32ab30 Compare March 10, 2025 12:45
@lvalerom lvalerom force-pushed the lvm/rox-28315-add-updated-at-field-to-network-flows branch from d32ab30 to c49386c Compare March 10, 2025 14:19
@lvalerom lvalerom marked this pull request as ready for review March 10, 2025 15:45
@lvalerom lvalerom requested a review from a team as a code owner March 10, 2025 15:45
@lvalerom lvalerom requested a review from dashrews78 March 10, 2025 15:46
@lvalerom lvalerom changed the title ROX-28315: wip ROX-28315: Add UpdatedAt field to the NetFlows table Mar 10, 2025
@lvalerom lvalerom requested review from a team and dashrews78 March 12, 2025 08:52
@janisz janisz requested a review from a team March 13, 2025 10:11
Copy link
Contributor

@dashrews78 dashrews78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@janisz janisz self-requested a review March 13, 2025 10:17
Copy link
Contributor

@mtodor mtodor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Added a few comments related to some edge cases.

@openshift-ci
Copy link

openshift-ci bot commented Mar 14, 2025

@lvalerom: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ocp-4-12-ui-e2e-tests 8ddb721 link false /test ocp-4-12-ui-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@lvalerom lvalerom merged commit 44acdf8 into master Mar 17, 2025
99 of 100 checks passed
@lvalerom lvalerom deleted the lvm/rox-28315-add-updated-at-field-to-network-flows branch March 17, 2025 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants