Skip to content

chore: mixed benchmark test#3223

Merged
chatton merged 17 commits intomainfrom
cian/mixed-benchmark-test
Apr 1, 2026
Merged

chore: mixed benchmark test#3223
chatton merged 17 commits intomainfrom
cian/mixed-benchmark-test

Conversation

@chatton
Copy link
Copy Markdown
Contributor

@chatton chatton commented Apr 1, 2026

Overview

Runs a benchmark test that deploys a variety of workloads

Summary by CodeRabbit

  • Chores

    • Updated logging library dependency to latest patch version
  • Tests

    • Added new E2E benchmark tests for spammer distribution scenarios
    • Added mixed workload benchmark test for concurrent multi-type operations
    • Updated benchmark configuration defaults for spammer count
    • Enhanced metric polling validation for test reliability

chatton added 16 commits March 24, 2026 11:34
emit full benchmark run metadata (config, tags, metrics, block range,
spamoor stats) as JSON when BENCH_RESULT_OUTPUT is set. consumed by
external matrix runner for table generation.
- Deduplicate overhead/reth-rate computation: move stats-based helpers
  to helpers.go, make span-based wrappers delegate to them
- Fix sub-millisecond precision loss in engine span timings by using
  microsecond-based float division instead of integer truncation
- Add spamoor stats to TestGasBurner for consistency with other tests
- Add MaxPending, Rebroadcast, BaseFee, TipFee to benchConfig
- Fix ERC20 test hardcoding max_wallets=200 instead of using cfg
- Replace all hardcoded spamoor params with cfg fields across tests
- Add resourceAttrs struct with host, OS, and service fields
- Extract attributes from VictoriaTraces LogsQL span data via
  resourceAttrCollector interface
- Include host metadata in structured benchmark result JSON
Move emitRunResult into a deferred closure in all three test functions.
If the test fails after metrics are collected, the structured JSON is
still written. If it fails before result data exists, the defer is a
no-op.
Remove the 3-second sleep before requireSpammersRunning that caused all
transactions to be mined before the measurement window started, leaving
SteadyState at 0s. Also add deferred emitRunResult, configurable spamoor
params, and spamoorStats collection to match the other benchmark tests.
… mode

In external mode the sequencer reports spans as "ev-node" (not the
test-specific name like "ev-node-erc20"), so trace queries returned
zero spans. Store service names on env: local mode uses the
test-specific name, external mode defaults to "ev-node"/"ev-reth"
with BENCH_EVNODE_SERVICE_NAME/BENCH_EVRETH_SERVICE_NAME overrides.
fetchResourceAttrs only needs one span but was streaming the full
result set from VictoriaTraces. Add limit=1 to the LogsQL query to
avoid wasting bandwidth on long-lived instances with many spans.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 94ac0c94-ecf9-40b0-832d-15e1250ea540

📥 Commits

Reviewing files that changed from the base of the PR and between cc9f965 and 8410d69.

⛔ Files ignored due to path filters (3)
  • apps/grpc/go.sum is excluded by !**/*.sum
  • execution/evm/test/go.sum is excluded by !**/*.sum
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (6)
  • apps/grpc/go.mod
  • go.mod
  • test/e2e/benchmark/config.go
  • test/e2e/benchmark/distribute_spammers_test.go
  • test/e2e/benchmark/helpers.go
  • test/e2e/benchmark/mixed_workload_test.go

📝 Walkthrough

Walkthrough

Updates Go module dependencies to patch version, increases default benchmark concurrency parameters, introduces a new comprehensive MixedWorkload benchmark test with configurable workload distribution across four spammer types, adds dedicated tests for the distributeSpammers helper function, and enhances test failure handling to fatally terminate tests on timeout.

Changes

Cohort / File(s) Summary
Dependency Updates
apps/grpc/go.mod, go.mod
Updated github.com/hashicorp/go-hclog indirect dependency from v1.6.2 to v1.6.3.
Benchmark Configuration
test/e2e/benchmark/config.go
Increased default NumSpammers parameter from 2 to 4 via BENCH_NUM_SPAMMERS environment variable fallback.
Distributor Tests
test/e2e/benchmark/distribute_spammers_test.go
Added table-driven tests for distributeSpammers function validating proportional distribution, sum correctness, and panic behavior on invalid totals.
Mixed Workload Benchmark
test/e2e/benchmark/mixed_workload_test.go
Added TestMixedWorkload benchmark with environment-driven workload mix percentages, spammer allocation across four types (ERC20, Uniswap, gas, storage), metrics polling for warmup/steady-state, block-level data collection, and comprehensive assertions.
Test Helper Enhancement
test/e2e/benchmark/helpers.go
Modified waitForMetricTarget to fail tests fatally using t.Fatalf on timeout or context cancellation, replacing previous t.Logf behavior.

Sequence Diagram

sequenceDiagram
    participant Test as TestMixedWorkload
    participant Suite as Benchmark Suite
    participant Spammers as Spammer Nodes
    participant Metrics as Spamoor Metrics
    participant Chain as Blockchain

    Test->>Suite: Setup benchmark environment
    Test->>Suite: Remove stale spammers
    Test->>Test: Calculate spammer distribution<br/>(4 workload types)
    
    loop For each workload type
        Test->>Spammers: CreateSpammer (ERC20/Uniswap/Gas/Storage)
    end
    
    Test->>Test: Wait for spammers running
    
    Test->>Metrics: Poll spamoor_transactions_sent_total<br/>(deployment target)
    Metrics-->>Test: Metrics received
    
    Test->>Metrics: Poll spamoor_transactions_sent_total<br/>(warmup target)
    Metrics-->>Test: Warmup complete
    
    Test->>Chain: Capture start block header
    
    Test->>Metrics: Poll until steady-state<br/>transaction count reached
    Metrics-->>Test: Target reached
    
    Test->>Chain: Drain pending transactions
    Test->>Chain: Capture end block header
    
    Test->>Chain: Collect block-level metrics<br/>for range
    Chain-->>Test: Metrics data
    
    Test->>Suite: Gather execution traces
    Suite-->>Test: Trace data
    
    Test->>Metrics: Fetch final spamoor metrics
    Metrics-->>Test: Final metrics (sent/failed)
    
    Test->>Test: Assert results (duration > 0,<br/>sent > 0, failed == 0)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • alpe
  • julienrbrt
  • tac0turtle

Poem

🐰 Hop along, the benchmarks grow,
Four spammers dance in a measured show,
Mixed workloads blend with proportioned grace,
Metrics collected at a rapid pace,
Tests ensure no spam shall fail,
A well-tuned system tells its tale!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The pull request description provides a clear overview of the changes ('Runs a benchmark test that deploys a variety of workloads') but lacks detailed context, background, rationale, and issue links required by the template. Expand the description with background context, detailed rationale for the test, measurement methodology, and any related issue numbers to provide complete context for reviewers.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'chore: mixed benchmark test' directly and specifically describes the main change in the changeset—a new mixed workload benchmark test is the primary addition, aligning with the semantic commit convention and clearly summarizing the key contribution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cian/mixed-benchmark-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedApr 1, 2026, 9:35 AM

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Apr 1, 2026

Claude finished @chatton's task in 2m 13s —— View job


Code Review: chore: mixed benchmark test

  • Gather context and read changed files
  • Review mixed_workload_test.go
  • Review distribute_spammers_test.go
  • Review helpers.go and config.go changes
  • Post comprehensive review feedback

Overall this is well-structured code with good documentation. A few things worth addressing before merge:


Issues

1. Missing non-negative validation for BENCH_MIX_*_PCT env vars (mixed_workload_test.go:52)

The test validates that the four percentages sum to 100, but doesn't guard against negative values. A configuration like 150, -50, 50, 50 would satisfy the sum check but cause distributeSpammers to produce incorrect (or even negative) counts before the 1-minimum floor applies. Fix this →

// current
s.Require().Equal(100, pctSum, "BENCH_MIX_*_PCT values must sum to 100, got %d", pctSum)
// also add:
for i, p := range pcts {
    s.Require().GreaterOrEqual(p, 0, "BENCH_MIX_*_PCT[%d] must be non-negative, got %d", i, p)
}

2. distributeSpammers doesn't validate pcts for negative values (mixed_workload_test.go:212)

The panic guard correctly rejects total < 4, but negative percentages in pcts will silently produce wrong distributions (negative floor, compensated by the 1-minimum, but still incorrect). Consider adding a guard or documenting the assumption that all pcts values are ≥ 0.

3. requireSpammersRunning uses a hardcoded 5-second timeout (helpers.go:201)

In slow CI environments (particularly when deploying contracts for the first time), 5 seconds may be insufficient to confirm all spammers have transitioned to a running state.

}, 5*time.Second, 100*time.Millisecond)

This is especially relevant for the mixed workload, which starts 4+ spammers (vs 2 previously). Consider using cfg.WaitTimeout or at least a longer fixed value. Fix this →

4. envInt silently ignores invalid values (config.go:83)

If a user sets BENCH_MIX_ERC20_PCT=abc, it silently falls back to the default 40 without any log message. For a benchmark that's typically run with explicit configuration, a warning log would improve debuggability. Minor DX issue.


Observations

waitForMetricTarget: t.Logft.Fatalf (good fix) (helpers.go diff)

The change from non-fatal logging to t.Fatalf on timeout/context cancellation is the right call — without it, timed-out benchmarks would silently pass with incomplete data.

distributeSpammers algorithm and tests are solid (distribute_spammers_test.go)

The largest-remainder method is the correct approach for this kind of proportional allocation. Table-driven tests cover edge cases well (minimum total, uneven splits, large totals). The panic-on-total < 4 is tested explicitly.

config.go default bump from 2 → 4 spammers

Necessary for the mixed workload (1 per type minimum). Note this also changes the default for all other benchmark tests (ERC20, gas burner, etc.), which now get 4 spammers by default. Intended?

collectBlockMetrics makes O(N) sequential RPC calls (helpers.go:485)

One HeaderByNumber + one TransactionCount per block in the range. For a 10-minute benchmark run at 100ms block time that's ~6000 blocks / ~12000 RPC calls made serially. Not a correctness issue (benchmark teardown phase), but worth noting for very long runs.

Log message uses cfg.NumSpammers as total (mixed_workload_test.go:56)

t.Logf("mix distribution: erc20=%d, defi=%d, gasburner=%d, state=%d (total=%d)",
    counts[0], counts[1], counts[2], counts[3], cfg.NumSpammers)

Since distributeSpammers guarantees sum(counts) == total, this is equivalent, but using counts[0]+counts[1]+counts[2]+counts[3] would be more self-evident that we're logging the actual distributed sum.


@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.45%. Comparing base (cc9f965) to head (8410d69).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3223      +/-   ##
==========================================
+ Coverage   61.43%   61.45%   +0.01%     
==========================================
  Files         120      120              
  Lines       12504    12504              
==========================================
+ Hits         7682     7684       +2     
+ Misses       3960     3959       -1     
+ Partials      862      861       -1     
Flag Coverage Δ
combined 61.45% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- fail test on waitForMetricTarget timeout instead of silently continuing
- bump default NumSpammers from 2 to 4 to satisfy mixed workload minimum
- replace time.Sleep with metric polling for contract deployment readiness
- add panic guard to distributeSpammers for total < 4
- add table-driven unit tests for distributeSpammers
@chatton chatton marked this pull request as ready for review April 1, 2026 10:15
@chatton chatton added this pull request to the merge queue Apr 1, 2026
Merged via the queue into main with commit 4840f50 Apr 1, 2026
40 checks passed
@chatton chatton deleted the cian/mixed-benchmark-test branch April 1, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants