ROX-33621: Refactor VM host probing and diagnostics#19458
ROX-33621: Refactor VM host probing and diagnostics#19458
Conversation
|
This change is part of the following stack: Change managed by git-spice. |
|
Skipping CI for Draft Pull Request. |
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The new diagnostics in
logIndexReportDiagnosticslog every repository and distribution unconditionally; consider gating the per-repo/dist lines (and possibly the filesystem diagnostics) behind the--debugflag or similar to avoid very verbose logs on large systems. - In
logFilesystemDiagnostics,YumRepoCounttreating a missing repo directory the same as an unreadable one results in a genericYum repo dir unreadablemessage; it may be more actionable to distinguishos.IsNotExistfrom permission errors in both the helper and the log message.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new diagnostics in `logIndexReportDiagnostics` log every repository and distribution unconditionally; consider gating the per-repo/dist lines (and possibly the filesystem diagnostics) behind the `--debug` flag or similar to avoid very verbose logs on large systems.
- In `logFilesystemDiagnostics`, `YumRepoCount` treating a missing repo directory the same as an unreadable one results in a generic `Yum repo dir unreadable` message; it may be more actionable to distinguish `os.IsNotExist` from permission errors in both the helper and the log message.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
Images are ready for the commit at 2f59bb3. To use with deploy scripts, first |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #19458 +/- ##
==========================================
- Coverage 49.56% 49.33% -0.23%
==========================================
Files 2711 2723 +12
Lines 204849 205270 +421
==========================================
- Hits 101524 101268 -256
- Misses 95744 96465 +721
+ Partials 7581 7537 -44
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@sourcery-ai recheck |
|
Sure! I'm generating a new review now. |
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- In
discoverDnfStatusFlags, consider reusinghostprobe.DetectDNFVersioninstead of re-deriving DNF version from history DB paths to keep version-detection logic centralized and avoid future drift. - In
discoverDnfCacheRepoDirPresence, the DNF5 detection logic relies onstrings.Contains(cachePath, "/libdnf5"); it would be more robust to infer DNF5 based on the specific cache root being iterated (e.g., the input path) rather than substring matching on the full path.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `discoverDnfStatusFlags`, consider reusing `hostprobe.DetectDNFVersion` instead of re-deriving DNF version from history DB paths to keep version-detection logic centralized and avoid future drift.
- In `discoverDnfCacheRepoDirPresence`, the DNF5 detection logic relies on `strings.Contains(cachePath, "/libdnf5")`; it would be more robust to infer DNF5 based on the specific cache root being iterated (e.g., the input path) rather than substring matching on the full path.
## Individual Comments
### Comment 1
<location path="compliance/virtualmachines/roxagent/index/index.go" line_range="78-85" />
<code_context>
+ indexerCfg.HostPath, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout)
+
+ // This may slow the indexing process down by 1-2 seconds, but the diagnostics are invaluable for debugging.
+ logFilesystemDiagnostics(indexerCfg.HostPath)
+ fetchRepo2CPEMappingForDiagnostics(ctx, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout, httpClient)
+
+ log.Info("Running node indexer...")
</code_context>
<issue_to_address>
**suggestion (performance):** Diagnostics and repo2cpe fetch run unconditionally and may add network/latency overhead for every index run
`logFilesystemDiagnostics` and `fetchRepo2CPEMappingForDiagnostics` now run on every `runIndexer` call, even when not debugging. The repo2cpe call adds a network round-trip and full-body read, which can materially slow indexing and add external coupling per run. Please gate these behind `cfg.Debug` (or a dedicated diagnostics flag), or at least make the repo2cpe fetch conditional, so normal runs avoid this overhead and only pay the cost when troubleshooting.
```suggestion
log.Infof("Indexer config: HostPath=%q, Repo2CPEMappingURL=%q, Timeout=%v",
indexerCfg.HostPath, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout)
if cfg.Debug {
// These diagnostics may slow the indexing process down by 1-2 seconds due to filesystem inspection
// and an extra repo2cpe network round-trip, so run them only in debug mode.
logFilesystemDiagnostics(indexerCfg.HostPath)
fetchRepo2CPEMappingForDiagnostics(ctx, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout, httpClient)
}
log.Info("Running node indexer...")
```
</issue_to_address>
### Comment 2
<location path="compliance/virtualmachines/roxagent/vsock/discovery.go" line_range="276" />
<code_context>
- return false, fmt.Errorf("unsupported OS detected: missing %s: %w", cachePath, err)
- }
- return false, fmt.Errorf("reading %s: %w", cachePath, err)
+func discoverDnfCacheRepoDirPresence(hostPath string, cacheDirPaths []string) (hasDir, hasRepo bool, err error) {
+ if len(cacheDirPaths) == 0 {
+ return false, false, errors.New("missing cache directories")
</code_context>
<issue_to_address>
**suggestion (bug_risk):** DNF5 detection via `strings.Contains(cachePath, "/libdnf5")` is brittle and may misclassify future layouts
This logic tightly couples behavior to the literal path string and may fail if cache roots change (different prefixes, symlinks) or if an unrelated path happens to contain `libdnf5`. Since the caller already passes explicit `cacheDirPaths`, consider deciding DNF5 vs non-DNF based on which configured root you’re iterating (e.g., compare against a known constant or index) rather than substring-matching the host-adjusted path, so the behavior remains stable if paths change.
Suggested implementation:
```golang
const dnf5CacheRoot = "/var/cache/libdnf5"
// - Tested against DNF4 defaults; DNF5 uses /var/cache/libdnf5:
// https://github.com/rpm-software-management/dnf5/blob/185eaef1e0ad663bdb827a2179ab1df574a27d88/include/libdnf5/conf/const.hpp
func discoverDnfCacheRepoDirPresence(hostPath string, cacheDirPaths []string) (hasDir, hasRepo bool, err error) {
```
```golang
for _, cacheDirPath := range cacheDirPaths {
// Decide DNF5 vs non-DNF based on the configured cache root, not the host-adjusted path.
// This avoids coupling behavior to the literal host path and keeps detection stable if
// host path prefixes or layouts change.
isDnf5Root := cacheDirPath == dnf5CacheRoot
cachePath := hostprobe.HostPathFor(hostPath, cacheDirPath)
```
```golang
if isDnf5Root {
```
</issue_to_address>
### Comment 3
<location path="compliance/virtualmachines/roxagent/index/diagnostics.go" line_range="23" />
<code_context>
+ zlog.Set(&l)
+}
+
+func logFilesystemDiagnostics(hostPath string) {
+ switch hostprobe.DetectDNFVersion(hostPath) {
+ case hostprobe.DNFVersion5:
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting small helpers for error handling, repeated logging patterns, and HTTP body checks to keep these diagnostics functions linear and less repetitive without changing behavior.
You can reduce the added complexity without changing behavior by extracting a few tiny helpers and reusing patterns.
### 1. Factor out Yum repo error classification
`logFilesystemDiagnostics` mixes error classification and logging; a small helper keeps the main flow linear:
```go
func logYumRepoError(err error) {
switch {
case os.IsNotExist(err):
log.Info("Yum repo dir not found")
case errors.Is(err, fs.ErrPermission):
log.Warnf("Yum repo dir is not readable: %v", err)
default:
log.Infof("Yum repo dir is unavailable: %v", err)
}
}
func logFilesystemDiagnostics(hostPath string) {
switch hostprobe.DetectDNFVersion(hostPath) {
case hostprobe.DNFVersion5:
log.Info("DNF history DB (v5) found")
case hostprobe.DNFVersion4:
log.Info("DNF history DB (v4) found")
default:
log.Warn("DNF history DB not found!")
}
hasEntitlement, err := hostprobe.HasEntitlementCertKeyPair(hostPath)
if err != nil {
log.Warnf("Entitlement certificates not found: %v", err)
} else if !hasEntitlement {
log.Warn("Entitlement certificates not found")
} else {
log.Info("Entitlement certificates found")
}
repoCount, err := hostprobe.YumRepoCount(hostPath)
if err != nil {
logYumRepoError(err)
return
}
if repoCount == 0 {
log.Info("Yum repo dir is present but empty (0 repositories)")
return
}
log.Infof("Yum repo dir contains %d repositories", repoCount)
}
```
### 2. Reduce repetition in `logIndexReportDiagnostics`
The repo/dist loops differ only in formatting. You can encapsulate the “limited iteration with truncation” pattern:
```go
func logLimited[T any, ID comparable](
name string,
items map[ID]T,
limit int,
debug bool,
formatter func(idx, total int, id ID, item T) string,
) {
total := len(items)
idx := 0
for id, item := range items {
idx++
log.Info(formatter(idx, total, id, item))
if !debug && idx == limit {
log.Info(" (truncated for brevity)")
break
}
}
}
func logIndexReportDiagnostics(report *v4.IndexReport, debug bool) {
contents := report.GetContents()
numPkgs := len(contents.GetPackages())
numRepos := len(contents.GetRepositories())
numDists := len(contents.GetDistributions())
numEnvs := len(contents.GetEnvironments())
log.Infof("Index report summary: packages=%d, repositories=%d, distributions=%d, environments=%d",
numPkgs, numRepos, numDists, numEnvs)
logLimited("repositories", contents.GetRepositories(), 10, debug,
func(idx, total int, id string, repo *v4.Repository) string {
return fmt.Sprintf(
"Repository (%d of %d) id=%q name=%q key=%q cpe=%q",
idx, total, id, repo.GetName(), repo.GetKey(), repo.GetCpe(),
)
})
logLimited("distributions", contents.GetDistributions(), 10, debug,
func(idx, total int, id string, dist *v4.Distribution) string {
return fmt.Sprintf(
"Distribution (%d of %d) id=%s name=%q version=%q cpe=%q did=%q",
idx, total, id, dist.GetName(), dist.GetVersion(), dist.GetCpe(), dist.GetDid(),
)
})
if numRepos == 0 {
log.Warn("Index report contains 0 repositories. Packages will be marked as UNSCANNED.")
}
}
```
If you prefer smaller steps, you can start with a non-generic helper just for the truncation logic and keep two separate formatters.
### 3. Simplify `fetchRepo2CPEMappingForDiagnostics` body handling
You only care about “non-empty vs empty/unavailable”, so you don't need `io.ReadAll` and `len(body)`. A single-byte read via `LimitedReader` reduces IO and branches:
```go
func fetchRepo2CPEMappingForDiagnostics(ctx context.Context, mappingURL string, timeout time.Duration, client *http.Client) {
if mappingURL == "" {
log.Warn("Repo2CPE mapping URL is empty")
return
}
log.Info("Attempting to fetch repo2cpe mapping for diagnostics...")
reqCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
req, err := http.NewRequestWithContext(reqCtx, http.MethodGet, mappingURL, nil)
if err != nil {
log.Warnf("Could not create repo2cpe mapping request for %q: %v", mappingURL, err)
return
}
resp, err := client.Do(req)
if err != nil {
log.Warnf("Could not download repo2cpe mapping from %q: %v", mappingURL, err)
return
}
defer resp.Body.Close()
var lr io.LimitedReader
lr.R = resp.Body
lr.N = 1
n, _ := io.Copy(io.Discard, &lr) // n is 0 if body is empty or not readable
log.Debugf("Repo2CPE mapping fetch: status=%d, hasBody=%t", resp.StatusCode, n > 0)
if resp.StatusCode >= http.StatusBadRequest || n == 0 {
log.Warn("Repo2CPE mapping is unavailable or empty")
} else if resp.StatusCode == http.StatusOK && n > 0 {
log.Info("Repo2CPE mapping downloaded successfully")
}
}
```
This preserves the logging semantics (success only on 200 + non-empty body, warn otherwise) while reducing control flow and memory usage.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| log.Infof("Indexer config: HostPath=%q, Repo2CPEMappingURL=%q, Timeout=%v", | ||
| indexerCfg.HostPath, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout) | ||
|
|
||
| // This may slow the indexing process down by 1-2 seconds, but the diagnostics are invaluable for debugging. | ||
| logFilesystemDiagnostics(indexerCfg.HostPath) | ||
| fetchRepo2CPEMappingForDiagnostics(ctx, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout, httpClient) | ||
|
|
||
| log.Info("Running node indexer...") |
There was a problem hiding this comment.
suggestion (performance): Diagnostics and repo2cpe fetch run unconditionally and may add network/latency overhead for every index run
logFilesystemDiagnostics and fetchRepo2CPEMappingForDiagnostics now run on every runIndexer call, even when not debugging. The repo2cpe call adds a network round-trip and full-body read, which can materially slow indexing and add external coupling per run. Please gate these behind cfg.Debug (or a dedicated diagnostics flag), or at least make the repo2cpe fetch conditional, so normal runs avoid this overhead and only pay the cost when troubleshooting.
| log.Infof("Indexer config: HostPath=%q, Repo2CPEMappingURL=%q, Timeout=%v", | |
| indexerCfg.HostPath, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout) | |
| // This may slow the indexing process down by 1-2 seconds, but the diagnostics are invaluable for debugging. | |
| logFilesystemDiagnostics(indexerCfg.HostPath) | |
| fetchRepo2CPEMappingForDiagnostics(ctx, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout, httpClient) | |
| log.Info("Running node indexer...") | |
| log.Infof("Indexer config: HostPath=%q, Repo2CPEMappingURL=%q, Timeout=%v", | |
| indexerCfg.HostPath, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout) | |
| if cfg.Debug { | |
| // These diagnostics may slow the indexing process down by 1-2 seconds due to filesystem inspection | |
| // and an extra repo2cpe network round-trip, so run them only in debug mode. | |
| logFilesystemDiagnostics(indexerCfg.HostPath) | |
| fetchRepo2CPEMappingForDiagnostics(ctx, indexerCfg.Repo2CPEMappingURL, indexerCfg.Timeout, httpClient) | |
| } | |
| log.Info("Running node indexer...") |
| return false, fmt.Errorf("unsupported OS detected: missing %s: %w", cachePath, err) | ||
| } | ||
| return false, fmt.Errorf("reading %s: %w", cachePath, err) | ||
| func discoverDnfCacheRepoDirPresence(hostPath string, cacheDirPaths []string) (hasDir, hasRepo bool, err error) { |
There was a problem hiding this comment.
suggestion (bug_risk): DNF5 detection via strings.Contains(cachePath, "/libdnf5") is brittle and may misclassify future layouts
This logic tightly couples behavior to the literal path string and may fail if cache roots change (different prefixes, symlinks) or if an unrelated path happens to contain libdnf5. Since the caller already passes explicit cacheDirPaths, consider deciding DNF5 vs non-DNF based on which configured root you’re iterating (e.g., compare against a known constant or index) rather than substring-matching the host-adjusted path, so the behavior remains stable if paths change.
Suggested implementation:
const dnf5CacheRoot = "/var/cache/libdnf5"
// - Tested against DNF4 defaults; DNF5 uses /var/cache/libdnf5:
// https://github.com/rpm-software-management/dnf5/blob/185eaef1e0ad663bdb827a2179ab1df574a27d88/include/libdnf5/conf/const.hpp
func discoverDnfCacheRepoDirPresence(hostPath string, cacheDirPaths []string) (hasDir, hasRepo bool, err error) { for _, cacheDirPath := range cacheDirPaths {
// Decide DNF5 vs non-DNF based on the configured cache root, not the host-adjusted path.
// This avoids coupling behavior to the literal host path and keeps detection stable if
// host path prefixes or layouts change.
isDnf5Root := cacheDirPath == dnf5CacheRoot
cachePath := hostprobe.HostPathFor(hostPath, cacheDirPath) if isDnf5Root {| zlog.Set(&l) | ||
| } | ||
|
|
||
| func logFilesystemDiagnostics(hostPath string) { |
There was a problem hiding this comment.
issue (complexity): Consider extracting small helpers for error handling, repeated logging patterns, and HTTP body checks to keep these diagnostics functions linear and less repetitive without changing behavior.
You can reduce the added complexity without changing behavior by extracting a few tiny helpers and reusing patterns.
1. Factor out Yum repo error classification
logFilesystemDiagnostics mixes error classification and logging; a small helper keeps the main flow linear:
func logYumRepoError(err error) {
switch {
case os.IsNotExist(err):
log.Info("Yum repo dir not found")
case errors.Is(err, fs.ErrPermission):
log.Warnf("Yum repo dir is not readable: %v", err)
default:
log.Infof("Yum repo dir is unavailable: %v", err)
}
}
func logFilesystemDiagnostics(hostPath string) {
switch hostprobe.DetectDNFVersion(hostPath) {
case hostprobe.DNFVersion5:
log.Info("DNF history DB (v5) found")
case hostprobe.DNFVersion4:
log.Info("DNF history DB (v4) found")
default:
log.Warn("DNF history DB not found!")
}
hasEntitlement, err := hostprobe.HasEntitlementCertKeyPair(hostPath)
if err != nil {
log.Warnf("Entitlement certificates not found: %v", err)
} else if !hasEntitlement {
log.Warn("Entitlement certificates not found")
} else {
log.Info("Entitlement certificates found")
}
repoCount, err := hostprobe.YumRepoCount(hostPath)
if err != nil {
logYumRepoError(err)
return
}
if repoCount == 0 {
log.Info("Yum repo dir is present but empty (0 repositories)")
return
}
log.Infof("Yum repo dir contains %d repositories", repoCount)
}2. Reduce repetition in logIndexReportDiagnostics
The repo/dist loops differ only in formatting. You can encapsulate the “limited iteration with truncation” pattern:
func logLimited[T any, ID comparable](
name string,
items map[ID]T,
limit int,
debug bool,
formatter func(idx, total int, id ID, item T) string,
) {
total := len(items)
idx := 0
for id, item := range items {
idx++
log.Info(formatter(idx, total, id, item))
if !debug && idx == limit {
log.Info(" (truncated for brevity)")
break
}
}
}
func logIndexReportDiagnostics(report *v4.IndexReport, debug bool) {
contents := report.GetContents()
numPkgs := len(contents.GetPackages())
numRepos := len(contents.GetRepositories())
numDists := len(contents.GetDistributions())
numEnvs := len(contents.GetEnvironments())
log.Infof("Index report summary: packages=%d, repositories=%d, distributions=%d, environments=%d",
numPkgs, numRepos, numDists, numEnvs)
logLimited("repositories", contents.GetRepositories(), 10, debug,
func(idx, total int, id string, repo *v4.Repository) string {
return fmt.Sprintf(
"Repository (%d of %d) id=%q name=%q key=%q cpe=%q",
idx, total, id, repo.GetName(), repo.GetKey(), repo.GetCpe(),
)
})
logLimited("distributions", contents.GetDistributions(), 10, debug,
func(idx, total int, id string, dist *v4.Distribution) string {
return fmt.Sprintf(
"Distribution (%d of %d) id=%s name=%q version=%q cpe=%q did=%q",
idx, total, id, dist.GetName(), dist.GetVersion(), dist.GetCpe(), dist.GetDid(),
)
})
if numRepos == 0 {
log.Warn("Index report contains 0 repositories. Packages will be marked as UNSCANNED.")
}
}If you prefer smaller steps, you can start with a non-generic helper just for the truncation logic and keep two separate formatters.
3. Simplify fetchRepo2CPEMappingForDiagnostics body handling
You only care about “non-empty vs empty/unavailable”, so you don't need io.ReadAll and len(body). A single-byte read via LimitedReader reduces IO and branches:
func fetchRepo2CPEMappingForDiagnostics(ctx context.Context, mappingURL string, timeout time.Duration, client *http.Client) {
if mappingURL == "" {
log.Warn("Repo2CPE mapping URL is empty")
return
}
log.Info("Attempting to fetch repo2cpe mapping for diagnostics...")
reqCtx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
req, err := http.NewRequestWithContext(reqCtx, http.MethodGet, mappingURL, nil)
if err != nil {
log.Warnf("Could not create repo2cpe mapping request for %q: %v", mappingURL, err)
return
}
resp, err := client.Do(req)
if err != nil {
log.Warnf("Could not download repo2cpe mapping from %q: %v", mappingURL, err)
return
}
defer resp.Body.Close()
var lr io.LimitedReader
lr.R = resp.Body
lr.N = 1
n, _ := io.Copy(io.Discard, &lr) // n is 0 if body is empty or not readable
log.Debugf("Repo2CPE mapping fetch: status=%d, hasBody=%t", resp.StatusCode, n > 0)
if resp.StatusCode >= http.StatusBadRequest || n == 0 {
log.Warn("Repo2CPE mapping is unavailable or empty")
} else if resp.StatusCode == http.StatusOK && n > 0 {
log.Info("Repo2CPE mapping downloaded successfully")
}
}This preserves the logging semantics (success only on 200 + non-empty body, warn otherwise) while reducing control flow and memory usage.
|
Hey @vikin91, I've posted a new review for you! |
Description
This PR refactors VM roxagent discovery/diagnostics to share host probing logic and reduces noisy/high-cardinality diagnostics output.
compliance/virtualmachines/roxagent/internal/hostprobeas a shared probe layer for host-path resolution, DNF version detection, entitlement pair detection, and.repocounting.roxagent/indexdiagnostics to use compact operational logs (DNF version summary, entitlement status, yum.repocount, and repo2cpe fetch diagnostics with clear failure context).--debug-gated debug logger setup for roxagent diagnostics.dnf_metadata_status) and moves granular DNF state to a separate low-cardinality metric (virtual_machine_discovered_data_dnf_status_total).AI/user contribution note:
hostprobepackage extraction, diagnostics rewiring, metric split, and initial/expanded test scaffolding).User-facing documentation
Testing and quality
Automated testing
How I validated my change
Sensor logs: