feat(benchmarking): [WIP] enable tests to run in dedicated environment or in docker#3157
feat(benchmarking): [WIP] enable tests to run in dedicated environment or in docker#3157
Conversation
…mark - Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite) - Move spamoor smoke test into suite as TestSpamoorSmoke - Split helpers into focused files: traces.go, output.go, metrics.go - Introduce resultWriter for defer-based benchmark JSON output - Export shared symbols from evm_test_common.go for cross-package use - Restructure CI to fan-out benchmark jobs and fan-in publishing - Run benchmarks on PRs only when benchmark-related files change
Resolve conflicts keeping the benchmark suite refactoring: - benchmark.yml: keep path filters and suite-style test command - evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg) - evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON (now in benchmark/output.go)
go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.
go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.
go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.
The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.
…nfig collectBlockMetrics hit reth's 20K FilterLogs limit at high tx volumes. Replace with direct header iteration over [startBlock, endBlock] and add Phase 1 metrics: non-empty ratio, block interval p50/p99, gas/block and tx/block p50/p99. Optimize spamoor configuration for 100ms block time: - --slot-duration 100ms, --startup-delay 0 on daemon - throughput=50 per 100ms slot (500 tx/s per spammer) - max_pending=50000 to avoid 3s block poll backpressure - 5 staggered spammers with 50K txs each Results: 55 MGas/s, 1414 TPS, 19.8% non-empty blocks (up from 6%).
- Move startBlock capture after spammer creation to exclude warm-up - Replace 20s drain sleep with smart poll (waitForDrain) - Add deleteAllSpammers cleanup to handle stale spamoor DB entries - Lower trace sample rate to 10% to prevent Jaeger OOM
- make reth tag configurable via EV_RETH_TAG env var (default pr-140) - fix OTLP config: remove duplicate env vars, use http/protobuf protocol - use require.Eventually for host readiness polling - rename requireHTTP to requireHostUp - use non-fatal logging in resultWriter.flush deferred context - fix stale doc comment (setupCommonEVMEnv -> SetupCommonEVMEnv) - rename loop variable to avoid shadowing testing.TB convention - add block/internal/executing/** to CI path trigger - remove unused require import from output.go
# Conflicts: # scripts/test.mk
# Conflicts: # test/e2e/benchmark/suite_test.go
# Conflicts: # test/e2e/benchmark/suite_test.go
move EV_RETH_TAG resolution and rpc connection limits into setupEnv so all benchmark tests share the same reth configuration. lower ERC20 spammer count from 5 to 2 to reduce resource contention on local hardware while keeping the loop for easy scaling on dedicated infra.
- add blockMetricsSummary with summarize(), log(), and entries() methods - add evNodeOverhead() for computing ProduceBlock vs ExecuteTxs overhead - add collectTraces() suite method to deduplicate trace collection pattern - add addEntries() convenience method on resultWriter - slim TestERC20Throughput from ~217 to ~119 lines - reuse collectTraces in TestSpamoorSmoke
Bumps tastora to pick up host network support in the spamoor builder. Spamoor in external mode now uses host networking so it can resolve the same hostnames as the host machine.
Record startTime when the provider is created and use it as the lower bound for trace queries, preventing spans from previous runs being included in the analysis.
Replace the Jaeger-compatible API with VictoriaTraces native LogsQL endpoint which streams all results without the 1000 trace limit.
…run-without-infra # Conflicts: # .github/workflows/benchmark.yml # test/e2e/benchmark/gasburner_test.go # test/e2e/benchmark/helpers.go # test/e2e/benchmark/spamoor_erc20_test.go # test/e2e/benchmark/suite_test.go # test/e2e/benchmark/traces.go # test/e2e/go.mod # test/e2e/go.sum
…metrics - Make gas_units_to_burn, max_wallets, num_spammers, throughput, and warmup_txs configurable via BENCH_* env vars - Add rethExecutionRate() for ev-reth GGas/s measurement - Add engineSpanEntries() for ProduceBlock/GetPayload/NewPayload timing - Switch local benchmarks from Jaeger to VictoriaTraces - Add setupExternalEnv for running against pre-deployed infrastructure - Update tastora to 2ee1b0a (victoriatraces support)
- benchConfig consolidates all BENCH_* env vars into a single struct constructed once per test via newBenchConfig(serviceName) - benchmarkResult collects all output metrics (block summary, overhead, GGas/s, engine span timing, seconds_per_gigagas, span averages) and produces entries via a single entries() call - Removes scattered envInt/envOrDefault calls from test files - Removes manual entry-by-entry result assembly from each test - Net reduction of ~129 lines across existing files
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan for PR comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @chatton's task in 3m 15s —— View job Code Review
SummaryThis PR introduces a dual-mode benchmark harness: tests can run against locally-spun-up Docker infrastructure (the existing path) or against pre-deployed external infrastructure (new, via Overall the architecture is clean and the goal is well-executed. There are a few actionable issues and several nits worth addressing before merge. Medium Issues1.
The test creates a 2.
for time.Now().Before(deadline) {
v, err := poll()
if err == nil && v >= target {
return
}
time.Sleep(2 * time.Second) // blocks for 2s ignoring context cancellation
}If the test context is cancelled while sleeping (e.g., test timeout), this loop won't wake up for up to 2 additional seconds. Use a 3. Dead code:
func truncateID(id string) string { ... }This function is defined but has no callers. Remove it or use it. Fix this → 4.
Both functions construct the same URL, create the same request, handle the same status check, and configure identical scanner buffers. The only differences are the JSON struct and the output type. A shared Minor Issues5.
func extractHostName(line []byte) string {
var raw map[string]string // allocates for every span
if err := json.Unmarshal(line, &raw); err != nil { ... }This deserializes the entire JSON line (already parsed once into 6. Fragile hardcoded string comparison in
if e.Name == r.prefix+" - ProduceBlock avg" {This reconstructs a derived string at call-site to compare against 7.
A fixed 3-second sleep is brittle under CI load. The subsequent 8. Stale "Jaeger" comment
// so that the expected tracing spans appear in Jaeger.The project has migrated to VictoriaTraces. Fix this → 9.
If a user sets 10.
All other Architecture ObservationsDual-mode dispatch is clean. The
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3157 +/- ##
=======================================
Coverage 60.20% 60.20%
=======================================
Files 115 115
Lines 11896 11896
=======================================
Hits 7162 7162
Misses 3923 3923
Partials 811 811
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overview