feat(benchmarking): [WIP] enable tests to run in dedicated environment or in docker by chatton · Pull Request #3157 · evstack/ev-node

chatton · 2026-03-12T15:12:45Z

Overview

…mark - Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite) - Move spamoor smoke test into suite as TestSpamoorSmoke - Split helpers into focused files: traces.go, output.go, metrics.go - Introduce resultWriter for defer-based benchmark JSON output - Export shared symbols from evm_test_common.go for cross-package use - Restructure CI to fan-out benchmark jobs and fan-in publishing - Run benchmarks on PRs only when benchmark-related files change

Resolve conflicts keeping the benchmark suite refactoring: - benchmark.yml: keep path filters and suite-style test command - evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg) - evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON (now in benchmark/output.go)

go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.

go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.

go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.

The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.

…nfig collectBlockMetrics hit reth's 20K FilterLogs limit at high tx volumes. Replace with direct header iteration over [startBlock, endBlock] and add Phase 1 metrics: non-empty ratio, block interval p50/p99, gas/block and tx/block p50/p99. Optimize spamoor configuration for 100ms block time: - --slot-duration 100ms, --startup-delay 0 on daemon - throughput=50 per 100ms slot (500 tx/s per spammer) - max_pending=50000 to avoid 3s block poll backpressure - 5 staggered spammers with 50K txs each Results: 55 MGas/s, 1414 TPS, 19.8% non-empty blocks (up from 6%).

- Move startBlock capture after spammer creation to exclude warm-up - Replace 20s drain sleep with smart poll (waitForDrain) - Add deleteAllSpammers cleanup to handle stale spamoor DB entries - Lower trace sample rate to 10% to prevent Jaeger OOM

- make reth tag configurable via EV_RETH_TAG env var (default pr-140) - fix OTLP config: remove duplicate env vars, use http/protobuf protocol - use require.Eventually for host readiness polling - rename requireHTTP to requireHostUp - use non-fatal logging in resultWriter.flush deferred context - fix stale doc comment (setupCommonEVMEnv -> SetupCommonEVMEnv) - rename loop variable to avoid shadowing testing.TB convention - add block/internal/executing/** to CI path trigger - remove unused require import from output.go

# Conflicts: # scripts/test.mk

# Conflicts: # test/e2e/benchmark/suite_test.go

move EV_RETH_TAG resolution and rpc connection limits into setupEnv so all benchmark tests share the same reth configuration. lower ERC20 spammer count from 5 to 2 to reduce resource contention on local hardware while keeping the loop for easy scaling on dedicated infra.

- add blockMetricsSummary with summarize(), log(), and entries() methods - add evNodeOverhead() for computing ProduceBlock vs ExecuteTxs overhead - add collectTraces() suite method to deduplicate trace collection pattern - add addEntries() convenience method on resultWriter - slim TestERC20Throughput from ~217 to ~119 lines - reuse collectTraces in TestSpamoorSmoke

Bumps tastora to pick up host network support in the spamoor builder. Spamoor in external mode now uses host networking so it can resolve the same hostnames as the host machine.

Record startTime when the provider is created and use it as the lower bound for trace queries, preventing spans from previous runs being included in the analysis.

Replace the Jaeger-compatible API with VictoriaTraces native LogsQL endpoint which streams all results without the 1000 trace limit.

…run-without-infra # Conflicts: # .github/workflows/benchmark.yml # test/e2e/benchmark/gasburner_test.go # test/e2e/benchmark/helpers.go # test/e2e/benchmark/spamoor_erc20_test.go # test/e2e/benchmark/suite_test.go # test/e2e/benchmark/traces.go # test/e2e/go.mod # test/e2e/go.sum

…metrics - Make gas_units_to_burn, max_wallets, num_spammers, throughput, and warmup_txs configurable via BENCH_* env vars - Add rethExecutionRate() for ev-reth GGas/s measurement - Add engineSpanEntries() for ProduceBlock/GetPayload/NewPayload timing - Switch local benchmarks from Jaeger to VictoriaTraces - Add setupExternalEnv for running against pre-deployed infrastructure - Update tastora to 2ee1b0a (victoriatraces support)

- benchConfig consolidates all BENCH_* env vars into a single struct constructed once per test via newBenchConfig(serviceName) - benchmarkResult collects all output metrics (block summary, overhead, GGas/s, engine span timing, seconds_per_gigagas, span averages) and produces entries via a single entries() call - Removes scattered envInt/envOrDefault calls from test files - Removes manual entry-by-entry result assembly from each test - Net reduction of ~129 lines across existing files

coderabbitai · 2026-03-12T15:13:00Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77b0c502-f8da-4578-8615-7fe10531f436

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cian/enable-tests-to-run-without-infra

📝 Coding Plan for PR comments

Generate coding plan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-12T15:13:03Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Mar 12, 2026, 3:13 PM

claude · 2026-03-12T15:13:06Z

Claude finished @chatton's task in 3m 15s —— View job

Code Review

Gather context and understand the PR
Read changed files
Analyze code quality, architecture, error handling, security, performance, testing
Post comprehensive review

Summary

This PR introduces a dual-mode benchmark harness: tests can run against locally-spun-up Docker infrastructure (the existing path) or against pre-deployed external infrastructure (new, via BENCH_ETH_RPC_URL). It also extracts benchmark config into benchConfig, adds a benchmarkResult type for aggregated metrics, and introduces rich span/flowchart rendering from VictoriaTraces.

Overall the architecture is clean and the goal is well-executed. There are a few actionable issues and several nits worth addressing before merge.

Medium Issues

1. TestERC20Throughput ignores benchConfig env vars — hardcoded constants

test/e2e/benchmark/spamoor_erc20_test.go:19-24

The test creates a benchConfig and calls setupEnv(cfg), but uses its own hardcoded constants for numSpammers, countPerSpammer, waitTimeout, and erc20Config["throughput"]. This means BENCH_NUM_SPAMMERS, BENCH_COUNT_PER_SPAMMER, BENCH_THROUGHPUT, and BENCH_WAIT_TIMEOUT have no effect on this test — inconsistent with TestGasBurner which uses cfg.* throughout. Fix this →

2. waitForMetricTarget doesn't respect context cancellation during sleep

test/e2e/benchmark/helpers.go:454-466

for time.Now().Before(deadline) {
    v, err := poll()
    if err == nil && v >= target {
        return
    }
    time.Sleep(2 * time.Second)  // blocks for 2s ignoring context cancellation
}

If the test context is cancelled while sleeping (e.g., test timeout), this loop won't wake up for up to 2 additional seconds. Use a time.NewTimer in a select to be responsive to cancellation. Fix this →

3. Dead code: truncateID defined but never called

test/e2e/benchmark/flowchart.go:289-294

func truncateID(id string) string { ... }

This function is defined but has no callers. Remove it or use it. Fix this →

4. fetchAllSpans and fetchAllRichSpans duplicate HTTP boilerplate

test/e2e/benchmark/traces.go:151-207 and :209-262

Both functions construct the same URL, create the same request, handle the same status check, and configure identical scanner buffers. The only differences are the JSON struct and the output type. A shared fetchLogStream(ctx, url) (io.ReadCloser, error) or generic helper would eliminate ~40 lines of duplication and a potential future divergence. Fix this →

Minor Issues

5. extractHostName allocates a full map per span — hot path inefficiency

test/e2e/benchmark/traces.go:282-293

func extractHostName(line []byte) string {
    var raw map[string]string  // allocates for every span
    if err := json.Unmarshal(line, &raw); err != nil { ... }

This deserializes the entire JSON line (already parsed once into logsqlRichSpan) just to find one non-standard field. For large trace collections this is wasteful. Consider a bytes.Contains pre-check or map[string]json.RawMessage on the original parse.

6. Fragile hardcoded string comparison in result.go

test/e2e/benchmark/result.go:80

if e.Name == r.prefix+" - ProduceBlock avg" {

This reconstructs a derived string at call-site to compare against engineSpanEntries's output. If the suffix format in engineSpanEntries changes, this silently stops matching. Use the same constant (spanProduceBlock) and a shared format function.

7. time.Sleep(3 * time.Second) after spammer creation is fragile

test/e2e/benchmark/gasburner_test.go:55-56

A fixed 3-second sleep is brittle under CI load. The subsequent assertSpammersRunning call only confirms spammers haven't already failed — a slow environment could still report status > 0 while the spammer is still in its startup phase. Consider polling assertSpammersRunning with retries or relying solely on the warmup tx check (which already waits for a meaningful number of sends).

8. Stale "Jaeger" comment

test/e2e/benchmark/spamoor_smoke_test.go:67

// so that the expected tracing spans appear in Jaeger.

The project has migrated to VictoriaTraces. Fix this →

9. envInt silently swallows parse errors

test/e2e/benchmark/config.go:68-77

If a user sets BENCH_NUM_SPAMMERS=two by mistake, it silently falls back to the default with no warning. This is acceptable but t.Logf isn't available at package init time — at minimum document the silent-fallback behaviour in a comment.

10. WaitTimeout is not configurable via env var

test/e2e/benchmark/config.go:46

All other benchConfig fields are env-var-driven, but WaitTimeout is hardcoded to 10 minutes. In external/dedicated environments with different network characteristics, this may need tuning. Consider envDuration("BENCH_WAIT_TIMEOUT", 10*time.Minute).

Architecture Observations

Dual-mode dispatch is clean. The setupEnv → setupLocalEnv | setupExternalEnv dispatch in suite_test.go:60-65 is minimal and easy to follow. The contract for external mode (requires BENCH_PRIVATE_KEY + BENCH_TRACE_QUERY_URL) is clearly guarded with s.Require().NotEmpty.

traceProvider / richSpanCollector interface split is appropriate. The optional upgrade via type assertion (if rc, ok := e.traces.(richSpanCollector)) is idiomatic Go and keeps the interface minimal.

buildTree multiple-root heuristic. flowchart.go:187-212 picks the longest-duration root when multiple spans have no parent in the set. This is a reasonable heuristic but silently drops orphan spans. A comment explaining this trade-off would help future readers.

SetupTest vs SetupSuite for Docker client. Each test in SpamoorSuite calls tastoradocker.Setup — creating a new Docker client and network per test. If tests are run individually this is fine, but if the full suite runs this creates unnecessary setup overhead. Worth considering SetupSuite + TearDownSuite if the Docker client is safe to share.

codecov · 2026-03-12T15:17:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.20%. Comparing base (abb6b71) to head (87348de).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3157   +/-   ##
=======================================
  Coverage   60.20%   60.20%           
=======================================
  Files         115      115           
  Lines       11896    11896           
=======================================
  Hits         7162     7162           
  Misses       3923     3923           
  Partials      811      811

Flag	Coverage Δ
combined	`60.20% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

chatton added 30 commits February 25, 2026 12:16

fix: correct BENCH_JSON_OUTPUT path for spamoor benchmark

18fc15a

go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.

fix: place package pattern before test binary flags in benchmark CI

fccd9db

go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.

fix: adjust evm-binary path for benchmark subpackage working directory

ae525ca

go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.

wip: erc20 benchmark test

039eaf7

fix: exclude benchmark subpackage from make test-e2e

85c9d2d

The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.

Merge branch 'main' into cian/bench-refactor

1c3b560

# Conflicts: # scripts/test.mk

chore: specify http

03b9239

chore: filter out benchmark tests from test-e2e

fe3ca23

Merge branch 'main' into cian/bench-refactor

8752fee

Merge branch 'cian/bench-refactor' into cian/erc20-benchmark

06f532b

# Conflicts: # test/e2e/benchmark/suite_test.go

Merge remote-tracking branch 'origin/main' into cian/erc20-benchmark

560974a

# Conflicts: # test/e2e/benchmark/suite_test.go

chore: collect all traces at once

26bb117

chore: self review

b88cae3

docs: add detailed documentation to benchmark helper methods

676e0d1

ci: add ERC20 throughput benchmark job

f4949a1

chore: remove span assertions

4c7b7e1

chore: adding gas burner test

6f56c80

feat: allowing for env vars to be set for external resources

40ff689

chore: adding private key as env var

4a7af54

chore: use host networking for spamoor in external mode

d2aeebc

Bumps tastora to pick up host network support in the spamoor builder. Spamoor in external mode now uses host networking so it can resolve the same hostnames as the host machine.

fix: pin tastora to correct commit with host network support

7a85f3b

chore: change service name

f42ffe0

chore: scope victoria traces queries to current test run

53d12e1

Record startTime when the provider is created and use it as the lower bound for trace queries, preventing spans from previous runs being included in the analysis.

chatton added 25 commits March 4, 2026 12:26

chore: increase gas burner spammers from 4 to 8

2756ce3

chore: revert gas burner spammers back to 4

8802838

chore: switch victoria traces to LogsQL API

2f27ab9

Replace the Jaeger-compatible API with VictoriaTraces native LogsQL endpoint which streams all results without the 1000 trace limit.

fix: use _stream filter syntax for LogsQL query

67c00d3

chore: tune gasburner config for 100M gas limit blocks

f7a8c5f

chore: match spamoor slot-duration to deployed 250ms block time

1308af4

chore: increase gasburner tx count for sustained load

3b4f5d1

chore: use 8 spammers for sustained gasburner load

c0de7a3

chore: revert to 4 spammers to avoid nonce collisions

f5d0eaa

chore: bump gasburner fees to 1000/500 gwei to clear stale txs

6c4ca40

chore: adding support for rich flowchart display

2d4ab9a

adding host name to span

9bd2381

chore: add re-broadcast

dffc429

chore: adding flowchart

7fac0a3

chore: bump fee

4445cde

chore: bump wallet amounts

64633d8

chore: increase throughput

f6100a2

chore: 4x tx volume

51754b8

chore: fewer tx, higher gas

f1e68d2

chore: remove amount

a3325dd

Bumped from 50/20 ETH to 500/200 ETH per wallet

aea69a2

chore: cleaning up flowchart dissplay and fixing UI link error

87348de

github-actions bot assigned chatton Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(benchmarking): [WIP] enable tests to run in dedicated environment or in docker#3157

feat(benchmarking): [WIP] enable tests to run in dedicated environment or in docker#3157
chatton wants to merge 57 commits intomainfrom
cian/enable-tests-to-run-without-infra

chatton commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026

Review skipped

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

claude bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

codecov bot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

chatton commented Mar 12, 2026

Overview

Uh oh!

coderabbitai bot commented Mar 12, 2026

Review skipped

Uh oh!

github-actions bot commented Mar 12, 2026

Uh oh!

claude bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Summary

Medium Issues

Minor Issues

Architecture Observations

Uh oh!

codecov bot commented Mar 12, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 12, 2026 •

edited

Loading