chore: Better check for stored meta not stale#3221
Conversation
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @julienrbrt's task in 2m 12s —— View job Code Review
Overall AssessmentThis is a solid improvement. The stale-in-progress check prevents resuming an execution that was built for different block inputs (different timestamp or transactions), which is the right defense against the failover/crash-recovery scenario. The general structure and logic are correct. A few issues worth addressing:
|
| Area | Severity | Issue |
|---|---|---|
hashTxs collision |
Major | Different tx-lists can produce identical hashes |
Named return err reuse |
Minor | Fragile, should use local var |
| Test map ordering | Minor | Non-deterministic subtest order |
| Mock error message | Nit | Misleading error on HeaderByNumber |
| CHANGELOG section | Minor | Should be under Fixed, not Changes |
CHANGELOG stray * |
Nit | Orphan bullet at line 333 |
The hashTxs collision is the one issue worth fixing before merge — everything else is quality-of-life. The core logic and test coverage are correct.
📝 WalkthroughWalkthroughReconciliation now validates stored ExecStageStarted metadata against the requested execution timestamp and transaction set before attempting payload retrieval. A tx-hash helper was introduced, logging/control flow for stale executions adjusted, changelog formatting updated, and a unit test added covering matching and mismatching scenarios. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3221 +/- ##
=======================================
Coverage 61.43% 61.43%
=======================================
Files 120 120
Lines 12504 12504
=======================================
Hits 7682 7682
Misses 3960 3960
Partials 862 862
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
lgtm! this makes #3172 more readable as well so great
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
execution/evm/execution_reconcile_test.go (1)
19-57: Use a slice-backed table so the subtests run in a stable order.
rangeover a map randomizes the case order, which makes failures a bit noisier to reproduce. A slice keeps this table-driven without losing determinism.As per coding guidelines, "Use table-driven tests in Go unit tests" and "Ensure tests are deterministic in Go."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@execution/evm/execution_reconcile_test.go` around lines 19 - 57, The test currently builds specs as a map (variable specs) and ranges over it, which randomizes subtest order; change specs to a slice of structs that include a name field (e.g., type case struct { name string; execMetaTimestamp int64; execMetaTxs [][]byte; requestedTxs [][]byte; requestedTime time.Time; expectFound bool; expectPayloadID bool; expectGetPayloads int }) and populate it in the desired stable order, then iterate over that slice (for _, spec := range cases { t.Run(spec.name, func(t *testing.T) { ... }) }) instead of ranging the map so subtests run deterministically while keeping the same assertions and using the same symbols (specs -> cases, t.Run).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Line 333: Remove the orphan list marker '*' from the links footer in
CHANGELOG.md (the stray asterisk that renders as an empty bullet); open
CHANGELOG.md, locate the lone '*' in the footer area and delete that line so the
list renders correctly without the empty bullet.
- Around line 12-14: Move the entry "chore: Better check for stored meta not
stale" out of the "### Added" section into the "### Fixed" section and rewrite
it in user-facing language that explains the bug and the user impact (e.g.,
"Fixed stale execution-resume behavior by improving validation of stored
metadata to prevent resuming with outdated state"). Update the changelog entry
to remove commit-style phrasing and use a concise, user-first sentence
describing the fix and its effect.
In `@execution/evm/execution.go`:
- Around line 758-785: The resume path for ExecStageStarted currently only
checks timestamp and tx hash (execMeta.Timestamp, execMeta.TxHash via
hashTxs(txs)) before returning the stored PayloadID (pid); you must also
validate that the current parent hash matches execMeta.ParentHash to avoid
resuming on a changed parent. Retrieve the current parent hash for the block
(height-1) from the chain/headers API available in this package and compare it
to execMeta.ParentHash; if they differ, treat the payload as stale (log and fall
through to fresh execution) rather than returning pid even if
c.engineClient.GetPayload(ctx, pid) succeeds. Ensure you still keep the existing
GetPayload existence check (c.engineClient.GetPayload) but only return pid when
both the payload exists and the parent hash matches execMeta.ParentHash.
- Around line 1048-1059: The hashTxs function currently concatenates raw tx
bytes which can produce ambiguous collisions; modify hashTxs to make the tx-list
encoding unambiguous by prefixing each tx with its length (or hashing each tx
separately with a domain separator) before writing into the sha256 hasher: e.g.,
for each tx in hashTxs write the tx length as a fixed-size varint/uint64 or a
clear separator and then write the tx bytes (or write sha256(tx) with a
separator), then return h.Sum(nil) so the digest uniquely represents the ordered
list.
---
Nitpick comments:
In `@execution/evm/execution_reconcile_test.go`:
- Around line 19-57: The test currently builds specs as a map (variable specs)
and ranges over it, which randomizes subtest order; change specs to a slice of
structs that include a name field (e.g., type case struct { name string;
execMetaTimestamp int64; execMetaTxs [][]byte; requestedTxs [][]byte;
requestedTime time.Time; expectFound bool; expectPayloadID bool;
expectGetPayloads int }) and populate it in the desired stable order, then
iterate over that slice (for _, spec := range cases { t.Run(spec.name, func(t
*testing.T) { ... }) }) instead of ranging the map so subtests run
deterministically while keeping the same assertions and using the same symbols
(specs -> cases, t.Run).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e12b9912-7b63-430f-b6da-57e2d9c96b2c
📒 Files selected for processing (3)
CHANGELOG.mdexecution/evm/execution.goexecution/evm/execution_reconcile_test.go
CHANGELOG.md
Outdated
| ### Added | ||
|
|
||
| * chore: Better check for stored meta not stale [#3221](https://github.com/evstack/ev-node/pull/3221) |
There was a problem hiding this comment.
Move this entry under Fixed and rewrite it user-first.
This change hardens stale execution-resume behavior, so it reads like a bug fix rather than a new feature, and chore: Better check for stored meta not stale is still commit-title/internal wording instead of changelog wording.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CHANGELOG.md` around lines 12 - 14, Move the entry "chore: Better check for
stored meta not stale" out of the "### Added" section into the "### Fixed"
section and rewrite it in user-facing language that explains the bug and the
user impact (e.g., "Fixed stale execution-resume behavior by improving
validation of stored metadata to prevent resuming with outdated state"). Update
the changelog entry to remove commit-style phrasing and use a concise,
user-first sentence describing the fix and its effect.
| <!-- Links --> | ||
|
|
||
| - | ||
| * |
There was a problem hiding this comment.
Remove the orphan list marker in the links footer.
Line 333 renders as an empty bullet and looks like a leftover from the list-marker normalization.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CHANGELOG.md` at line 333, Remove the orphan list marker '*' from the links
footer in CHANGELOG.md (the stray asterisk that renders as an empty bullet);
open CHANGELOG.md, locate the lone '*' in the footer area and delete that line
so the list renders correctly without the empty bullet.
| if execMeta.Stage == ExecStageStarted && len(execMeta.PayloadID) == 8 { | ||
| var pid engine.PayloadID | ||
| copy(pid[:], execMeta.PayloadID) | ||
| requestedTxHash := hashTxs(txs) | ||
| if execMeta.Timestamp != timestamp.Unix() || !bytes.Equal(execMeta.TxHash, requestedTxHash) { | ||
| c.logger.Debug(). | ||
| Uint64("height", height). | ||
| Int64("execmeta_timestamp", execMeta.Timestamp). | ||
| Int64("requested_timestamp", timestamp.Unix()). | ||
| Msg("ExecuteTxs: ignoring stale in-progress execution for different block inputs") | ||
| } else { | ||
| var pid engine.PayloadID | ||
| copy(pid[:], execMeta.PayloadID) | ||
|
|
||
| // Validate payload still exists by attempting to retrieve it | ||
| if _, err = c.engineClient.GetPayload(ctx, pid); err == nil { | ||
| c.logger.Info(). | ||
| // Validate payload still exists by attempting to retrieve it | ||
| if _, err = c.engineClient.GetPayload(ctx, pid); err == nil { | ||
| c.logger.Info(). | ||
| Uint64("height", height). | ||
| Str("stage", execMeta.Stage). | ||
| Msg("ExecuteTxs: found in-progress execution with payloadID, returning payloadID for resume") | ||
| return nil, &pid, true, nil | ||
| } | ||
| // Payload is stale (expired or node restarted) - proceed with fresh execution | ||
| c.logger.Debug(). | ||
| Uint64("height", height). | ||
| Str("stage", execMeta.Stage). | ||
| Msg("ExecuteTxs: found in-progress execution with payloadID, returning payloadID for resume") | ||
| return nil, &pid, true, nil | ||
| Str("payloadID", pid.String()). | ||
| Err(err). | ||
| Msg("ExecuteTxs: stale ExecMeta payloadID no longer valid in EL, will re-execute") | ||
| // Don't return - fall through to fresh execution | ||
| } |
There was a problem hiding this comment.
ExecStageStarted resumes still ignore parent-chain changes.
This gate only checks timestamp and tx hash before reusing pid. After a rollback/reorg, the parent at height-1 can change while both still match, so we can resume and later promote a payload that was built on the old parent. Please also validate the current parent hash before returning pid; ExecMeta already has a ParentHash field you can use.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@execution/evm/execution.go` around lines 758 - 785, The resume path for
ExecStageStarted currently only checks timestamp and tx hash
(execMeta.Timestamp, execMeta.TxHash via hashTxs(txs)) before returning the
stored PayloadID (pid); you must also validate that the current parent hash
matches execMeta.ParentHash to avoid resuming on a changed parent. Retrieve the
current parent hash for the block (height-1) from the chain/headers API
available in this package and compare it to execMeta.ParentHash; if they differ,
treat the payload as stale (log and fall through to fresh execution) rather than
returning pid even if c.engineClient.GetPayload(ctx, pid) succeeds. Ensure you
still keep the existing GetPayload existence check (c.engineClient.GetPayload)
but only return pid when both the payload exists and the parent hash matches
execMeta.ParentHash.
| func hashTxs(txs [][]byte) []byte { | ||
| if len(txs) == 0 { | ||
| return nil | ||
| } | ||
|
|
||
| h := sha256.New() | ||
| for _, tx := range txs { | ||
| h.Write(tx) | ||
| } | ||
|
|
||
| return h.Sum(nil) | ||
| } |
There was a problem hiding this comment.
Make the tx-list hash unambiguous.
hashTxs hashes the raw concatenation of every tx, so different [][]byte values can collide when they flatten to the same byte stream. That weakens the stale-meta guard this PR is adding. Hash each element independently or include a separator/length per element.
Possible fix
func hashTxs(txs [][]byte) []byte {
if len(txs) == 0 {
return nil
}
h := sha256.New()
for _, tx := range txs {
- h.Write(tx)
+ txHash := sha256.Sum256(tx)
+ h.Write(txHash[:])
}
return h.Sum(nil)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func hashTxs(txs [][]byte) []byte { | |
| if len(txs) == 0 { | |
| return nil | |
| } | |
| h := sha256.New() | |
| for _, tx := range txs { | |
| h.Write(tx) | |
| } | |
| return h.Sum(nil) | |
| } | |
| func hashTxs(txs [][]byte) []byte { | |
| if len(txs) == 0 { | |
| return nil | |
| } | |
| h := sha256.New() | |
| for _, tx := range txs { | |
| txHash := sha256.Sum256(tx) | |
| h.Write(txHash[:]) | |
| } | |
| return h.Sum(nil) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@execution/evm/execution.go` around lines 1048 - 1059, The hashTxs function
currently concatenates raw tx bytes which can produce ambiguous collisions;
modify hashTxs to make the tx-list encoding unambiguous by prefixing each tx
with its length (or hashing each tx separately with a domain separator) before
writing into the sha256 hasher: e.g., for each tx in hashTxs write the tx length
as a fixed-size varint/uint64 or a clear separator and then write the tx bytes
(or write sha256(tx) with a separator), then return h.Sum(nil) so the digest
uniquely represents the ordered list.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
CHANGELOG.md (2)
333-333:⚠️ Potential issue | 🟡 MinorRemove the orphan list marker in the links footer.
Line 333 is a stray bullet and renders as an empty list item.
Suggested edit
-*🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` at line 333, Remove the stray orphan list marker "*" in the links/footer area (the lone asterisk at line 333) so it no longer renders an empty list item; locate the "*" token in CHANGELOG.md near the links/footer section and delete that single character to fix the rendered output.
12-15:⚠️ Potential issue | 🟡 MinorMove this Unreleased entry to
### Fixedand rewrite it user-first.This is a bug-fix outcome (preventing stale execution resume), not a generic “Changes” item, and the current text is still commit-style/internal phrasing.
Suggested edit
-### Changes - -* Improve execution/evm check for stored meta not stale [`#3221`](https://github.com/evstack/ev-node/pull/3221) +### Fixed + +* Prevent resuming execution from stale stored metadata during failover by validating execution timestamp and transaction set before restore. [`#3221`](https://github.com/evstack/ev-node/pull/3221)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` around lines 12 - 15, Move the Unreleased bullet "Improve execution/evm check for stored meta not stale [`#3221`]" from the generic "### Changes" section into the "### Fixed" section and rewrite the line in user‑facing language (e.g., "Prevented resuming a stale execution by improving EVM stored-metadata freshness checks") so it reads as the user-visible fix rather than commit-style/internal wording; update the entry text where "Improve execution/evm check for stored meta not stale" appears to the new phrasing and ensure the PR link [`#3221`] remains appended.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@CHANGELOG.md`:
- Line 333: Remove the stray orphan list marker "*" in the links/footer area
(the lone asterisk at line 333) so it no longer renders an empty list item;
locate the "*" token in CHANGELOG.md near the links/footer section and delete
that single character to fix the rendered output.
- Around line 12-15: Move the Unreleased bullet "Improve execution/evm check for
stored meta not stale [`#3221`]" from the generic "### Changes" section into the
"### Fixed" section and rewrite the line in user‑facing language (e.g.,
"Prevented resuming a stale execution by improving EVM stored-metadata freshness
checks") so it reads as the user-visible fix rather than commit-style/internal
wording; update the entry text where "Improve execution/evm check for stored
meta not stale" appears to the new phrasing and ensure the PR link [`#3221`]
remains appended.
Overview
Enhanced stale block detection during failover to prevent resuming outdated execution states.
Summary by CodeRabbit