Skip to content

ROX-12967: Replace fake NodeScan with real NodeInventory#3722

Merged
Maddosaurus merged 14 commits intomasterfrom
mm/ROX-12967-real-nodescan
Jan 3, 2023
Merged

ROX-12967: Replace fake NodeScan with real NodeInventory#3722
Maddosaurus merged 14 commits intomasterfrom
mm/ROX-12967-real-nodescan

Conversation

@Maddosaurus
Copy link
Contributor

@Maddosaurus Maddosaurus commented Nov 7, 2022

Description

This PR introduces the real NodeScan functionality that collects installed components on a Node.
Prerequisite for this PR to work correctly is the use of latest Scanner after stackrox/scanner#1026 is merged.

The scanning code requires two changes to the running Compliance image:

  • rpmdb binary, provided by the RPM package
  • Writable /tmp dir in Compliance (achieved through an emptyDir mount)

Checklist

  • Investigated and inspected CI test results
  • Unit test and regression tests added
  • Evaluated and added CHANGELOG entry if required
  • Determined and documented upgrade steps
  • Documented user facing changes (create PR based on openshift/openshift-docs and merge into rhacs-docs)

If any of these don't apply, please comment below.

Testing Performed

Deployed to an infra-provisioned OpenShift 4 cluster and observed the logs in the Compliance container to state the correct namespace (rhcos:4.11) as well as a list of installed components in its log message.

Example log line:

collection/nodescanv2: 2022/12/08 13:40:06.588628 nodescan.go:27: Info: Components found under /host: &{0xc000b7d440 [] rhcos:4.11 - [ ][ mozjs60:60.9.0-4.el8::x86_64 (...)

Steps to reproduce

  • Pull required Scanner version into stackrox/stackrox:
    • go get github.com/stackrox/scanner@mm/rox-12967-fix-rhcos-detection && go mod tidy
  • Pushed resulting roxctl (make docker-build-roxctl-image) and main (make image) images to a private quay repo
    • e.g. push roxctl and main as two tags to quay.io/mmeiding/playground
  • Set required env variables for deploy scripts
export MAIN_IMAGE_REPO=quay.io/mmeiding/playground
export MAIN_IMAGE_TAG=main-g87a5cdaf05-dirty

export CENTRAL_DB_IMAGE_TAG=3.73.x-176-g5e618a632d

export ROXCTL_IMAGE_REPO=quay.io/mmeiding/playground
export ROXCTL_IMAGE_TAG=roxctl-g87a5cdaf05-dirty
  • While KUBECONFIG is pointed to the target cluster, call ./deploy/openshift/deploy.sh

@Maddosaurus Maddosaurus force-pushed the mm/ROX-12967-real-nodescan branch from 166591b to 0debacd Compare November 7, 2022 15:30
@ghost
Copy link

ghost commented Nov 7, 2022

Images are ready for the commit at 17897d7.

To use with deploy scripts, first export MAIN_IMAGE_TAG=3.73.x-290-g17897d704e.

@vikin91 vikin91 force-pushed the mm/ROX-12967-real-nodescan branch from 7a0ae57 to d1669fd Compare November 16, 2022 16:09
@github-actions github-actions bot added the ci-all-qa-tests Tells CI to run all API tests (not just BAT). label Nov 16, 2022
@vikin91 vikin91 changed the title Mm/rox 12967 real nodescan ROX-12967: Replace fake NodeScan with real NodeInventory Nov 16, 2022
@vikin91
Copy link
Contributor

vikin91 commented Nov 16, 2022

Local result on Colima:

No certificates found in /usr/local/share/ca-certificates
No certificates found in /etc/pki/injected-ca-trust
main: 2022/11/16 17:11:04.873193 main.go:249: Info: Running StackRox Version: 3.73.x-19-gfbd0450942
main: 2022/11/16 17:11:04.879051 main.go:257: Info: Initialized Sensor gRPC stream connection
main: 2022/11/16 17:11:04.879204 main.go:274: Info: Node Rescan interval: 4h0m0s
time="2022-11-16T17:11:04Z" level=info msg="add files from directory" directory=root/buildinfo/ root=/host
time="2022-11-16T17:11:04Z" level=info msg="add files from directory" directory=etc/ root=/host
time="2022-11-16T17:11:04Z" level=info msg="add files from directory" directory=usr/lib/ root=/host
time="2022-11-16T17:11:04Z" level=info msg="add files from directory" directory=var/lib/ root=/host
main: 2022/11/16 17:11:04.977963 main.go:243: Info: Successfully connected to Sensor at sensor.stackrox.svc:443
collection/nodescanv2: 2022/11/16 17:11:51.358488 nodescan.go:23: Info: Finished node inventory /host scan
collection/nodescanv2: 2022/11/16 17:11:51.358664 nodescan.go:25: Error: Error scanning node /host inventory: failed to match filesMap at "var/lib/" (at "/host"): failed to read "/host/var/lib/docker/overlay2/d9133d536321a91d08b1f461014bbe80e59a0185a1060d4a0cfb94ed411afa55/merged/host/proc/32738/net": readdirent /host/var/lib/docker/overlay2/d9133d536321a91d08b1f461014bbe80e59a0185a1060d4a0cfb94ed411afa55/merged/host/proc/32738/net: invalid argument
main: 2022/11/16 17:11:51.358725 main.go:161: Error: error running scanNode: failed to match filesMap at "var/lib/" (at "/host"): failed to read "/host/var/lib/docker/overlay2/d9133d536321a91d08b1f461014bbe80e59a0185a1060d4a0cfb94ed411afa55/merged/host/proc/32738/net": readdirent /host/var/lib/docker/overlay2/d9133d536321a91d08b1f461014bbe80e59a0185a1060d4a0cfb94ed411afa55/merged/host/proc/32738/net: invalid argument

@RTann
Copy link
Contributor

RTann commented Nov 16, 2022

Local result on Colima:

No certificates found in /usr/local/share/ca-certificates
No certificates found in /etc/pki/injected-ca-trust
main: 2022/11/16 17:11:04.873193 main.go:249: Info: Running StackRox Version: 3.73.x-19-gfbd0450942
main: 2022/11/16 17:11:04.879051 main.go:257: Info: Initialized Sensor gRPC stream connection
main: 2022/11/16 17:11:04.879204 main.go:274: Info: Node Rescan interval: 4h0m0s
time="2022-11-16T17:11:04Z" level=info msg="add files from directory" directory=root/buildinfo/ root=/host
time="2022-11-16T17:11:04Z" level=info msg="add files from directory" directory=etc/ root=/host
time="2022-11-16T17:11:04Z" level=info msg="add files from directory" directory=usr/lib/ root=/host
time="2022-11-16T17:11:04Z" level=info msg="add files from directory" directory=var/lib/ root=/host
main: 2022/11/16 17:11:04.977963 main.go:243: Info: Successfully connected to Sensor at sensor.stackrox.svc:443
collection/nodescanv2: 2022/11/16 17:11:51.358488 nodescan.go:23: Info: Finished node inventory /host scan
collection/nodescanv2: 2022/11/16 17:11:51.358664 nodescan.go:25: Error: Error scanning node /host inventory: failed to match filesMap at "var/lib/" (at "/host"): failed to read "/host/var/lib/docker/overlay2/d9133d536321a91d08b1f461014bbe80e59a0185a1060d4a0cfb94ed411afa55/merged/host/proc/32738/net": readdirent /host/var/lib/docker/overlay2/d9133d536321a91d08b1f461014bbe80e59a0185a1060d4a0cfb94ed411afa55/merged/host/proc/32738/net: invalid argument
main: 2022/11/16 17:11:51.358725 main.go:161: Error: error running scanNode: failed to match filesMap at "var/lib/" (at "/host"): failed to read "/host/var/lib/docker/overlay2/d9133d536321a91d08b1f461014bbe80e59a0185a1060d4a0cfb94ed411afa55/merged/host/proc/32738/net": readdirent /host/var/lib/docker/overlay2/d9133d536321a91d08b1f461014bbe80e59a0185a1060d4a0cfb94ed411afa55/merged/host/proc/32738/net: invalid argument

interesting. why are we trying to read docker/overlay2 here?

@vikin91 vikin91 force-pushed the mm/ROX-12967-real-nodescan branch from 17ccb31 to d9d8759 Compare November 25, 2022 08:50
@vikin91 vikin91 requested review from a team as code owners November 25, 2022 08:50
@vikin91 vikin91 marked this pull request as draft November 25, 2022 08:50
@vikin91 vikin91 force-pushed the mm/ROX-12967-real-nodescan branch 2 times, most recently from 17ccb31 to 9831cb4 Compare November 25, 2022 09:10
@ivan-degtiarenko
Copy link
Contributor

@vikin91 is the review from the Merlin team expected?

@vikin91
Copy link
Contributor

vikin91 commented Nov 25, 2022

@vikin91 is the review from the Merlin team expected?

@ivan-degtiarenko sorry for the confusion, this PR is not yet ready for review. I marked it as draft to make it clear.

@janisz janisz removed the request for review from a team November 28, 2022 12:31
@Maddosaurus Maddosaurus force-pushed the mm/ROX-12967-real-nodescan branch from 9831cb4 to 9a5c795 Compare November 29, 2022 16:42
@Maddosaurus
Copy link
Contributor Author

@RTann @vikin91 I was also able to reproduce the docker layer message in a slightly different wording on an OS4 cluster as well:

main: 2022/11/30 10:01:08.843531 main.go:274: Info: Node Rescan interval: 4h0m0s
time="2022-11-30T10:01:08Z" level=info msg="add files from directory" directory=usr/lib/ root=/host
main: 2022/11/30 10:01:08.849471 main.go:238: Info: Sleeping for 0.55 seconds between attempts to connect to Sensor
main: 2022/11/30 10:01:09.406021 main.go:238: Info: Sleeping for 1.08 seconds between attempts to connect to Sensor
time="2022-11-30T10:01:09Z" level=info msg="add files from directory" directory=var/lib/ root=/host
main: 2022/11/30 10:01:10.536573 main.go:243: Info: Successfully connected to Sensor at sensor.stackrox.svc:443
collection/nodescanv2: 2022/11/30 10:01:23.965963 nodescan.go:23: Info: Finished node inventory /host scan
collection/nodescanv2: 2022/11/30 10:01:23.966044 nodescan.go:25: Error: Error scanning node /host inventory: failed to match filesMap at "var/lib/" (at "/host"): failed to read "/host/var/lib/containers/storage/overlay/69ce03231b78e7fc0173a8de0c2cc7476bf3d485e72e62b2f262a6d5284e883c/merged/host/proc/sys/fs/binfmt_misc": open /host/var/lib/containers/storage/overlay/69ce03231b78e7fc0173a8de0c2cc7476bf3d485e72e62b2f262a6d5284e883c/merged/host/proc/sys/fs/binfmt_misc: too many levels of symbolic links
collection/nodescanv2: 2022/11/30 10:01:23.966101 nodescan.go:27: Info: Components found under /host: <nil>
time="2022-11-30T10:01:23Z" level=info msg="add files from directory" directory=var/lib/ root=/
time="2022-11-30T10:01:23Z" level=info msg="add files from directory" directory=root/buildinfo/ root=/
time="2022-11-30T10:01:23Z" level=info msg="add files from directory" directory=etc/ root=/
time="2022-11-30T10:01:23Z" level=info msg="add files from directory" directory=usr/lib/ root=/
time="2022-11-30T10:01:23Z" level=error msg="could not create temporary folder for the rpm database" error="mkdir /tmp/rpm4230183219: read-only file system"
collection/nodescanv2: 2022/11/30 10:01:23.987265 nodescan.go:31: Info: Finished node inventory / scan
collection/nodescanv2: 2022/11/30 10:01:23.987299 nodescan.go:35: Info: Components found under /: <nil>
main: 2022/11/30 10:01:23.987317 main.go:161: Error: error running scanNode: failed to match filesMap at "var/lib/" (at "/host"): failed to read "/host/var/lib/containers/storage/overlay/69ce03231b78e7fc0173a8de0c2cc7476bf3d485e72e62b2f262a6d5284e883c/merged/host/proc/sys/fs/binfmt_misc": open /host/var/lib/containers/storage/overlay/69ce03231b78e7fc0173a8de0c2cc7476bf3d485e72e62b2f262a6d5284e883c/merged/host/proc/sys/fs/binfmt_misc: too many levels of symbolic links

@msugakov msugakov requested a review from a team November 30, 2022 12:13
@Maddosaurus Maddosaurus force-pushed the mm/ROX-12967-real-nodescan branch from 980bafe to cbcdfa6 Compare December 8, 2022 14:55
@Maddosaurus Maddosaurus marked this pull request as ready for review December 8, 2022 14:56
microdnf clean all && \
rm /tmp/snappy.rpm /tmp/postgres.rpm /tmp/postgres-libs.rpm RPM-GPG-KEY-CentOS-Official && \
# (Optional) Remove line below to keep package management utilities
rpm -e --nodeps $(rpm -qa curl '*rpm*' '*dnf*' '*libsolv*' '*hawkey*' 'yum*') && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this here? (I am not sure what this file does, that is why I am asking)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the spot where we remove all package managers from the main image.
I have only removed *RPM* from this line, as we want to keep RPM in the images for the rpmdb command Compliance needs.
IIUC, this file is the template from which the Dockerfile for main is generated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, now I recall. We remove it from the removal code, so in fact we are keeping the package. All good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to a lack of a follow-up // TODO(ROX-12345): comment, I have a question: where do we stand in this decision? Related Slack thread looks unfinished to me https://srox.slack.com/archives/C04213W844U/p1670426415333499

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customers may ask about this (someone asked why Scanner still has it). Perhaps we should consider updating the docs or CHANGELOG? Definitely need to also make sure our default policy which allows Scanner to have RPM also allows our other images have it.

Perhaps a followup ticket, find a way to do this without rpm (eh. I wanted to remove it in Scanner, but it's the only official way to get rpmdb data), or perhaps use a separate Dockerfile for Compliance, so we minimize the number images with rpm installed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a follow up ticket (ROX-13934) so we don't forget about it and will push this as a TODO comment.

@Maddosaurus Maddosaurus requested review from a team and removed request for a team December 8, 2022 17:36
microdnf clean all && \
rm /tmp/snappy.rpm /tmp/postgres.rpm /tmp/postgres-libs.rpm RPM-GPG-KEY-CentOS-Official && \
# (Optional) Remove line below to keep package management utilities
rpm -e --nodeps $(rpm -qa curl '*rpm*' '*dnf*' '*libsolv*' '*hawkey*' 'yum*') && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customers may ask about this (someone asked why Scanner still has it). Perhaps we should consider updating the docs or CHANGELOG? Definitely need to also make sure our default policy which allows Scanner to have RPM also allows our other images have it.

Perhaps a followup ticket, find a way to do this without rpm (eh. I wanted to remove it in Scanner, but it's the only official way to get rpmdb data), or perhaps use a separate Dockerfile for Compliance, so we minimize the number images with rpm installed

var scanner nodescanv2.NodeScanner
if features.UseFakeNodeInventory.Enabled() {
log.Infof("Using FakeNodeScanner")
scanner = &nodescanv2.FakeNodeScanner{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big deal, but something like scanner = nodescanv2.NewFakeNodeScanner() looks cleaner (similar for below). Actually something like nodescanner.NewFake() and nodescanner.New() where this is defined in something like pkg/nodescanner/v2 looks even cleaner IMO, but maybe that's for another time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customers may ask about this

I have introduced a changelog entry in cb77bec informing of this change, and also created a follow up ticket ROX-13934 to track this TODO.

looks even cleaner IMO

I concur! The FakeNodeScanner was introduced as a temporary measure to unblock parallel workstreams on the RHCOS topic.
We aim to remove it (and the according Interface) with the MVP release. I really like your suggestion here, and am definitely using this on future occasions. But for now, I'd tend to leave it as is and remove this code with the MVP release.

)

// NodeInventorizer is the interface that defines the interface a scanner must implement
type NodeInventorizer interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we changed the name from NodeScanner? I figure if we define a Scan function, NodeScanner makes sense. I don't think Inventorizer is a word, so I'm a bit skeptical using it to name this interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We originally planned this as NodeScanning V2, but later changed it to NodeInventory, to improve separation, because the changes were rather confusing, as NodeScanning is already used in multiple places in the codebase.
I decided to reflect these changes in the implementation as well, to make it more clear we're talking about a NodeInventory and not the already existing NodeScan.

I concur that the name doesn't feel common, as it's a nonstandard word.
I'd be happy to consider alternatives, because I currently don't know how we could call it otherwise (except reverting back to NodeScanner, which would also be an option). WDYT?

Scan(nodeName string) (*storage.NodeInventory, error)
}

// NodeInventoryCollector is the implementation of NodeInventorizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it is "an" implementation. not necessarily "the" only one


// Scan scans the current node and returns the results as storage.NodeInventory object
func (n *NodeInventoryCollector) Scan(nodeName string) (*storage.NodeInventory, error) {
componentsHost, err := nodes.Analyze(nodeName, "/host/", false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hostComponents?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe comment why we use false here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, since we have a log to say when we finish, can we have a log to state when we start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the code, added a more detailed comment and update the loglevel

log.Errorf("Error scanning node /host inventory: %v", err)
return nil, err
}
log.Infof("Components found under /host: %v", componentsHost)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we plan on keeping this? If so, then maybe do a Debug log instead

return m, nil
}

func protoComponentsFromScanComponents(c *nodes.Components) *scannerV1.Components {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we discussed probably moving this inside the Scanner repo, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we did. I have created ROX-14029 to track this change for the future. Thank you for reminding me 👍

@RTann RTann requested a review from jvdm December 19, 2022 20:01
Copy link
Contributor

@msugakov msugakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Some final touches could be:

  • Please address my last two comments.
  • Style checks are failing for some reason.
  • I hope your testing works out fine.

Once you update the code and ping me for re-review, I'll most likely approve it.

@openshift-ci
Copy link

openshift-ci bot commented Dec 20, 2022

@Maddosaurus: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/gke-ui-e2e-tests d9d8759 link false /test gke-ui-e2e-tests
ci/prow/gke-postgres-ui-e2e-tests d9d8759 link false /test gke-postgres-ui-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@Maddosaurus
Copy link
Contributor Author

This looks good to me. Some final touches could be:

  • Please address my last two comments.

Done!

  • Style checks are failing for some reason.

It looks like style checks moved into GH Actions. Maybe a rebase fixes this.

  • I hope your testing works out fine.

Testing worked out fine with the latest changes from here as well as the combined Scanner PR (1026).
Instructions how to get a running cluster with these changes are here: stackrox/scanner#1026 (comment)

Once you update the code and ping me for re-review, I'll most likely approve it.

👍

vikin91 and others added 14 commits December 20, 2022 16:39
Experiment with calls to Analyze
…iance image, temporary switch to bleeding edge scanner PR branch
- Restructured code
- Moved nil handling into functions
- Add TODOs for tracking open points
- Add changelog for RPM
- Rename tmp volume in Helm chart
- Rework logging for nodeinventory.go
Co-authored-by: Misha Sugakov <537715+msugakov@users.noreply.github.com>
@Maddosaurus Maddosaurus force-pushed the mm/ROX-12967-real-nodescan branch from 7f222ff to 17897d7 Compare December 20, 2022 15:40
@Maddosaurus
Copy link
Contributor Author

@msugakov Let's see if the rebase fixed the style check. quickstyle succeeds locally, so I suspect this might be because we moved to GH Actions.

@Maddosaurus Maddosaurus requested a review from vikin91 January 3, 2023 15:21
v1rhelc := make([]*scannerV1.RHELComponent, 0, len(rc.Packages))
for _, rhelc := range rc.Packages {
v1rhelc = append(v1rhelc, &scannerV1.RHELComponent{
// TODO(ROX-13936): Find out if ID field is needed here
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: We need a unique ID here. For now, use loop index.
If no index is provided, it is set to 0. If IDs are not unique, results are overwritten in ScannerDB
Idea for long term: Hash(Name+Version) that yields int64.

More information: https://srox.slack.com/archives/C04213W844U/p1667825859123589

Copy link
Contributor

@vikin91 vikin91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as discussed in the sync-review.

The issue with ID in scannerV1.RHELComponent must be addressed (we need a unique int64 for a name:version combination), but I am fine doing it in ROX-13936 (the ticket would need to be slightly repurposed for this).

@Maddosaurus Maddosaurus merged commit 227a794 into master Jan 3, 2023
@Maddosaurus Maddosaurus deleted the mm/ROX-12967-real-nodescan branch January 3, 2023 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/helm ci-all-qa-tests Tells CI to run all API tests (not just BAT). team/maple

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants