chore: mixed benchmark test by chatton · Pull Request #3223 · evstack/ev-node

chatton · 2026-04-01T08:28:36Z

Overview

Runs a benchmark test that deploys a variety of workloads

Summary by CodeRabbit

Chores
- Updated logging library dependency to latest patch version
Tests
- Added new E2E benchmark tests for spammer distribution scenarios
- Added mixed workload benchmark test for concurrent multi-type operations
- Updated benchmark configuration defaults for spammer count
- Enhanced metric polling validation for test reliability

emit full benchmark run metadata (config, tags, metrics, block range, spamoor stats) as JSON when BENCH_RESULT_OUTPUT is set. consumed by external matrix runner for table generation.

- Deduplicate overhead/reth-rate computation: move stats-based helpers to helpers.go, make span-based wrappers delegate to them - Fix sub-millisecond precision loss in engine span timings by using microsecond-based float division instead of integer truncation - Add spamoor stats to TestGasBurner for consistency with other tests

- Add MaxPending, Rebroadcast, BaseFee, TipFee to benchConfig - Fix ERC20 test hardcoding max_wallets=200 instead of using cfg - Replace all hardcoded spamoor params with cfg fields across tests

- Add resourceAttrs struct with host, OS, and service fields - Extract attributes from VictoriaTraces LogsQL span data via resourceAttrCollector interface - Include host metadata in structured benchmark result JSON

Move emitRunResult into a deferred closure in all three test functions. If the test fails after metrics are collected, the structured JSON is still written. If it fails before result data exists, the defer is a no-op.

Remove the 3-second sleep before requireSpammersRunning that caused all transactions to be mined before the measurement window started, leaving SteadyState at 0s. Also add deferred emitRunResult, configurable spamoor params, and spamoorStats collection to match the other benchmark tests.

… mode In external mode the sequencer reports spans as "ev-node" (not the test-specific name like "ev-node-erc20"), so trace queries returned zero spans. Store service names on env: local mode uses the test-specific name, external mode defaults to "ev-node"/"ev-reth" with BENCH_EVNODE_SERVICE_NAME/BENCH_EVRETH_SERVICE_NAME overrides.

fetchResourceAttrs only needs one span but was streaming the full result set from VictoriaTraces. Add limit=1 to the LogsQL query to avoid wasting bandwidth on long-lived instances with many spans.

coderabbitai · 2026-04-01T08:28:45Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 94ac0c94-ecf9-40b0-832d-15e1250ea540

📥 Commits

Reviewing files that changed from the base of the PR and between cc9f965 and 8410d69.

⛔ Files ignored due to path filters (3)

apps/grpc/go.sum is excluded by !**/*.sum
execution/evm/test/go.sum is excluded by !**/*.sum
go.sum is excluded by !**/*.sum

📒 Files selected for processing (6)

apps/grpc/go.mod
go.mod
test/e2e/benchmark/config.go
test/e2e/benchmark/distribute_spammers_test.go
test/e2e/benchmark/helpers.go
test/e2e/benchmark/mixed_workload_test.go

📝 Walkthrough

Walkthrough

Updates Go module dependencies to patch version, increases default benchmark concurrency parameters, introduces a new comprehensive MixedWorkload benchmark test with configurable workload distribution across four spammer types, adds dedicated tests for the distributeSpammers helper function, and enhances test failure handling to fatally terminate tests on timeout.

Changes

Cohort / File(s)	Summary
Dependency Updates `apps/grpc/go.mod`, `go.mod`	Updated `github.com/hashicorp/go-hclog` indirect dependency from v1.6.2 to v1.6.3.
Benchmark Configuration `test/e2e/benchmark/config.go`	Increased default `NumSpammers` parameter from 2 to 4 via `BENCH_NUM_SPAMMERS` environment variable fallback.
Distributor Tests `test/e2e/benchmark/distribute_spammers_test.go`	Added table-driven tests for `distributeSpammers` function validating proportional distribution, sum correctness, and panic behavior on invalid totals.
Mixed Workload Benchmark `test/e2e/benchmark/mixed_workload_test.go`	Added `TestMixedWorkload` benchmark with environment-driven workload mix percentages, spammer allocation across four types (ERC20, Uniswap, gas, storage), metrics polling for warmup/steady-state, block-level data collection, and comprehensive assertions.
Test Helper Enhancement `test/e2e/benchmark/helpers.go`	Modified `waitForMetricTarget` to fail tests fatally using `t.Fatalf` on timeout or context cancellation, replacing previous `t.Logf` behavior.

Sequence Diagram

sequenceDiagram
    participant Test as TestMixedWorkload
    participant Suite as Benchmark Suite
    participant Spammers as Spammer Nodes
    participant Metrics as Spamoor Metrics
    participant Chain as Blockchain

    Test->>Suite: Setup benchmark environment
    Test->>Suite: Remove stale spammers
    Test->>Test: Calculate spammer distribution<br/>(4 workload types)
    
    loop For each workload type
        Test->>Spammers: CreateSpammer (ERC20/Uniswap/Gas/Storage)
    end
    
    Test->>Test: Wait for spammers running
    
    Test->>Metrics: Poll spamoor_transactions_sent_total<br/>(deployment target)
    Metrics-->>Test: Metrics received
    
    Test->>Metrics: Poll spamoor_transactions_sent_total<br/>(warmup target)
    Metrics-->>Test: Warmup complete
    
    Test->>Chain: Capture start block header
    
    Test->>Metrics: Poll until steady-state<br/>transaction count reached
    Metrics-->>Test: Target reached
    
    Test->>Chain: Drain pending transactions
    Test->>Chain: Capture end block header
    
    Test->>Chain: Collect block-level metrics<br/>for range
    Chain-->>Test: Metrics data
    
    Test->>Suite: Gather execution traces
    Suite-->>Test: Trace data
    
    Test->>Metrics: Fetch final spamoor metrics
    Metrics-->>Test: Final metrics (sent/failed)
    
    Test->>Test: Assert results (duration > 0,<br/>sent > 0, failed == 0)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

feat(benchmarking): add mixed workload benchmark (TestMixedTraffic) #3186: The PR directly implements the proposed mixed-workload benchmark test architecture described in the issue, including the distributeSpammers distribution logic and comprehensive TestMixedWorkload test flow.

Possibly related PRs

feat(benchmarking): enable tests to run in dedicated environment or in docker #3157: Overlaps in modifying benchmark test suite files (test/config.go, helpers.go) and benchmark distribution logic (distributeSpammers), with related changes to waitForMetricTarget behavior.
feat: adding spammoor test to benchmark #3105: Both PRs enhance the spamoor/e2e benchmark test suite with new benchmark tests and modify test helpers for metrics collection and JSON output handling.
fix: state pressure benchmark test #3203: Both PRs modify benchmark configuration defaults in test/e2e/benchmark/config.go, specifically updating parameters in newBenchConfig.

Suggested reviewers

alpe
julienrbrt
tac0turtle

Poem

🐰 Hop along, the benchmarks grow,
Four spammers dance in a measured show,
Mixed workloads blend with proportioned grace,
Metrics collected at a rapid pace,
Tests ensure no spam shall fail,
A well-tuned system tells its tale! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The pull request description provides a clear overview of the changes ('Runs a benchmark test that deploys a variety of workloads') but lacks detailed context, background, rationale, and issue links required by the template.	Expand the description with background context, detailed rationale for the test, measurement methodology, and any related issue numbers to provide complete context for reviewers.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'chore: mixed benchmark test' directly and specifically describes the main change in the changeset—a new mixed workload benchmark test is the primary addition, aligning with the semantic commit convention and clearly summarizing the key contribution.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cian/mixed-benchmark-test

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-04-01T08:28:49Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Apr 1, 2026, 9:35 AM

claude · 2026-04-01T08:28:53Z

Claude finished @chatton's task in 2m 13s —— View job

Code Review: `chore: mixed benchmark test`

Gather context and read changed files
Review mixed_workload_test.go
Review distribute_spammers_test.go
Review helpers.go and config.go changes
Post comprehensive review feedback

Overall this is well-structured code with good documentation. A few things worth addressing before merge:

Issues

1. Missing non-negative validation for BENCH_MIX_*_PCT env vars (mixed_workload_test.go:52)

The test validates that the four percentages sum to 100, but doesn't guard against negative values. A configuration like 150, -50, 50, 50 would satisfy the sum check but cause distributeSpammers to produce incorrect (or even negative) counts before the 1-minimum floor applies. Fix this →

// current
s.Require().Equal(100, pctSum, "BENCH_MIX_*_PCT values must sum to 100, got %d", pctSum)
// also add:
for i, p := range pcts {
    s.Require().GreaterOrEqual(p, 0, "BENCH_MIX_*_PCT[%d] must be non-negative, got %d", i, p)
}

2. distributeSpammers doesn't validate pcts for negative values (mixed_workload_test.go:212)

The panic guard correctly rejects total < 4, but negative percentages in pcts will silently produce wrong distributions (negative floor, compensated by the 1-minimum, but still incorrect). Consider adding a guard or documenting the assumption that all pcts values are ≥ 0.

3. requireSpammersRunning uses a hardcoded 5-second timeout (helpers.go:201)

In slow CI environments (particularly when deploying contracts for the first time), 5 seconds may be insufficient to confirm all spammers have transitioned to a running state.

}, 5*time.Second, 100*time.Millisecond)

This is especially relevant for the mixed workload, which starts 4+ spammers (vs 2 previously). Consider using cfg.WaitTimeout or at least a longer fixed value. Fix this →

4. envInt silently ignores invalid values (config.go:83)

If a user sets BENCH_MIX_ERC20_PCT=abc, it silently falls back to the default 40 without any log message. For a benchmark that's typically run with explicit configuration, a warning log would improve debuggability. Minor DX issue.

Observations

waitForMetricTarget: t.Logf → t.Fatalf (good fix) (helpers.go diff)

The change from non-fatal logging to t.Fatalf on timeout/context cancellation is the right call — without it, timed-out benchmarks would silently pass with incomplete data.

distributeSpammers algorithm and tests are solid (distribute_spammers_test.go)

The largest-remainder method is the correct approach for this kind of proportional allocation. Table-driven tests cover edge cases well (minimum total, uneven splits, large totals). The panic-on-total < 4 is tested explicitly.

config.go default bump from 2 → 4 spammers

Necessary for the mixed workload (1 per type minimum). Note this also changes the default for all other benchmark tests (ERC20, gas burner, etc.), which now get 4 spammers by default. Intended?

collectBlockMetrics makes O(N) sequential RPC calls (helpers.go:485)

One HeaderByNumber + one TransactionCount per block in the range. For a 10-minute benchmark run at 100ms block time that's ~6000 blocks / ~12000 RPC calls made serially. Not a correctness issue (benchmark teardown phase), but worth noting for very long runs.

Log message uses cfg.NumSpammers as total (mixed_workload_test.go:56)

t.Logf("mix distribution: erc20=%d, defi=%d, gasburner=%d, state=%d (total=%d)",
    counts[0], counts[1], counts[2], counts[3], cfg.NumSpammers)

Since distributeSpammers guarantees sum(counts) == total, this is equivalent, but using counts[0]+counts[1]+counts[2]+counts[3] would be more self-evident that we're logging the actual distributed sum.

codecov · 2026-04-01T08:34:06Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.45%. Comparing base (cc9f965) to head (8410d69).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3223      +/-   ##
==========================================
+ Coverage   61.43%   61.45%   +0.01%     
==========================================
  Files         120      120              
  Lines       12504    12504              
==========================================
+ Hits         7682     7684       +2     
+ Misses       3960     3959       -1     
+ Partials      862      861       -1

Flag	Coverage Δ
combined	`61.45% <ø> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

- fail test on waitForMetricTarget timeout instead of silently continuing - bump default NumSpammers from 2 to 4 to satisfy mixed workload minimum - replace time.Sleep with metric polling for contract deployment readiness - add panic guard to distributeSpammers for total < 4 - add table-driven unit tests for distributeSpammers

chatton added 16 commits March 24, 2026 11:34

feat(benchmarking): add structured result output via BENCH_RESULT_OUTPUT

23d0718

emit full benchmark run metadata (config, tags, metrics, block range, spamoor stats) as JSON when BENCH_RESULT_OUTPUT is set. consumed by external matrix runner for table generation.

Merge branch 'main' into cian/benchmark-results-artifact

3dc3e35

refactor: make spamoor config fully configurable via BENCH_* env vars

c0592ee

- Add MaxPending, Rebroadcast, BaseFee, TipFee to benchConfig - Fix ERC20 test hardcoding max_wallets=200 instead of using cfg - Replace all hardcoded spamoor params with cfg fields across tests

fix: defer emitRunResult so results are written even on test failure

c55bec1

Move emitRunResult into a deferred closure in all three test functions. If the test fails after metrics are collected, the structured JSON is still written. If it fails before result data exists, the defer is a no-op.

perf: use limit=1 for resource attribute trace queries

f2936c8

fetchResourceAttrs only needs one span but was streaming the full result set from VictoriaTraces. Add limit=1 to the LogsQL query to avoid wasting bandwidth on long-lived instances with many spans.

Merge branch 'main' into cian/benchmark-results-artifact

1028b10

Merge branch 'main' into cian/benchmark-results-artifact

41a4e40

docs: add missing doc comments to run_result.go functions

3fb75ee

Merge branch 'main' into cian/benchmark-results-artifact

f0eb3a7

Merge branch 'main' into cian/benchmark-results-artifact

f0f71bd

feat: adding mixed benchmarking test

09f1b19

chore: merge main

e5073ee

github-actions bot assigned chatton Apr 1, 2026

chatton marked this pull request as ready for review April 1, 2026 10:15

julienrbrt approved these changes Apr 1, 2026

View reviewed changes

chatton added this pull request to the merge queue Apr 1, 2026

Merged via the queue into main with commit 4840f50 Apr 1, 2026
40 checks passed

chatton deleted the cian/mixed-benchmark-test branch April 1, 2026 13:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: mixed benchmark test#3223

chore: mixed benchmark test#3223
chatton merged 17 commits intomainfrom
cian/mixed-benchmark-test

chatton commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

claude bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

codecov bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chatton commented Apr 1, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review: chore: mixed benchmark test

Issues

Observations

Uh oh!

codecov bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chatton commented Apr 1, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 1, 2026 •

edited

Loading

github-actions bot commented Apr 1, 2026 •

edited

Loading

claude bot commented Apr 1, 2026 •

edited

Loading

Code Review: `chore: mixed benchmark test`

codecov bot commented Apr 1, 2026 •

edited

Loading