Conversation
(cherry picked from commit a6573fe6c40082380759f55018a47c8da5bf97bd)
📝 WalkthroughWalkthroughAdds a retry-with-backoff loop for fetching block timestamps, caches timestamps, introduces a RetrieveBlobs API/path that omits header timestamp, makes Retrieve fail with structured errors instead of using current time on timestamp resolution failure, and updates Submit to accept an Changes
Sequence Diagram(s)sequenceDiagram
participant Client as DA Client
participant Cache as BlockTimestampCache (local)
participant Header as DA Header Service
participant Store as Result / Caller
Client->>Cache: cachedBlockTimestamp(height)?
alt cache hit
Cache-->>Client: timestamp
else cache miss
Client->>Header: fetch header by height (attempt 1)
alt header returns timestamp
Header-->>Client: header with timestamp
Client->>Cache: storeBlockTimestamp(height, ts)
else header error
Client->>Client: wait (backoff)
Client->>Header: fetch header by height (attempt 2)
alt header returns timestamp
Header-->>Client: header with timestamp
Client->>Cache: storeBlockTimestamp(height, ts)
else header error
Client->>Client: wait (backoff)
Client->>Header: fetch header by height (attempt 3)
alt header returns timestamp
Header-->>Client: header with timestamp
Client->>Cache: storeBlockTimestamp(height, ts)
else retries exhausted
Header-->>Client: error
Client-->>Store: return structured ResultRetrieve error (no fallback to now)
end
end
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @alpe's task in 2m 39s —— View job PR Review: feat: Retry getBlockTimestamp
SummaryThe core objective (retry with backoff + hard error instead of Issues1. // Outer function scope:
rawCh, err := c.blobAPI.Subscribe(ctx, ns)
...
go func() {
...
blockTime, err = c.getBlockTimestamp(ctx, resp.Height) // captures outer `err` by referenceThe goroutine captures var getErr error
blockTime, getErr = c.getBlockTimestamp(ctx, resp.Height)
if getErr != nil { ... }2. Subscribe emits events with zero timestamp on retry exhaustion — blockTime, err = c.getBlockTimestamp(ctx, resp.Height)
if err != nil {
c.logger.Error()...
blockTime = time.Time{} // zero time emitted downstream
}
3. Missing test: context cancellation during retry —
select {
case <-ctx.Done():
return time.Time{}, fmt.Errorf("fetching header timestamp for block %d: %w", height, ctx.Err())
case <-time.After(backoff):
}But there is no test verifying this branch. A cancelled context mid-retry should return quickly with a wrapped 4. if !strict {
if blockTime, ok := c.cachedBlockTimestamp(height); ok {
return blockTime, nil
}
return time.Time{}, nil // silent zero time, no error
}This is by design for ObservationsRetry logic is correct — exponential backoff, no sleep after last attempt, and context cancellation handled properly in the select. The error wrapping at line 294 is clean. Duplicate test is resolved — the pre-merge check flagged a duplicate Interface break is propagated cleanly — Test backoff timing — tests run against the real
VerdictThe critical fix (deterministic block timestamps via retry + hard error) is correct. Address the |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3166 +/- ##
==========================================
+ Coverage 60.75% 61.13% +0.37%
==========================================
Files 117 117
Lines 11988 12082 +94
==========================================
+ Hits 7283 7386 +103
+ Misses 3886 3870 -16
- Partials 819 826 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
julienrbrt
left a comment
There was a problem hiding this comment.
This makes sense. The sequencer should retry eventually if this happens on the force inclusion da fetching? We should make sure it cannot kill a sequencer. Although we do need this fix, as it could produce non deterministic blocks otherwise.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
block/internal/da/client.go (2)
432-437: Inconsistency:Subscribestill falls back totime.Now()on error.The
Retrievemethod now properly returns an error when timestamp fetch fails, butSubscribestill falls back totime.Now(). The TODO comment acknowledges this, but it creates an inconsistency in timestamp handling between the two code paths.Is this intentional for this PR, deferring Subscribe fixes to a follow-up? If so, consider linking the TODO to issue
#3142explicitly.📝 Link TODO to issue
if err != nil { c.logger.Error().Uint64("height", resp.Height).Err(err).Msg("failed to get DA block timestamp for subscription event") blockTime = time.Now() - // TODO: we should retry fetching the timestamp. Current time may mess block time consistency for based sequencers. + // TODO(`#3142`): we should retry fetching the timestamp. Current time may mess block time consistency for based sequencers. }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@block/internal/da/client.go` around lines 432 - 437, Subscribe currently falls back to time.Now() when getBlockTimestamp fails, causing inconsistency with Retrieve which returns an error; update the Subscribe code path in the block subscription handler (the blockTime assignment block that calls c.getBlockTimestamp and logs via c.logger.Error().Uint64(...).Err(err).Msg(...)) to stop silently using time.Now(): either propagate the error to the subscription caller (return or send an error into the subscription channel) or fail the subscription event instead of substituting the current time, and remove or update the TODO accordingly; if you intentionally defer this change, update the TODO to reference issue `#3142` explicitly (e.g., "TODO: address in `#3142`") so the inconsistency is tracked.
45-48: Consider making retry parameters configurable.The hardcoded constants work for an initial implementation, but you may want to expose these via
Configfor environments with different latency characteristics or reliability needs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@block/internal/da/client.go` around lines 45 - 48, The retry constants blockTimestampFetchMaxAttempts and blockTimestampFetchBackoff should be made configurable via the client's Config: add fields (e.g., BlockTimestampFetchMaxAttempts int and BlockTimestampFetchBackoff time.Duration) to the Config struct, initialize them with the current defaults (3 and 100*time.Millisecond) when constructing the client (e.g., in NewClient or config validation), and replace direct uses of the constants with Config references wherever block timestamp fetching logic runs (references to blockTimestampFetchMaxAttempts and blockTimestampFetchBackoff). Ensure nil/zero-value handling preserves the existing defaults so behavior is unchanged unless the caller overrides the config.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@block/internal/da/client.go`:
- Around line 432-437: Subscribe currently falls back to time.Now() when
getBlockTimestamp fails, causing inconsistency with Retrieve which returns an
error; update the Subscribe code path in the block subscription handler (the
blockTime assignment block that calls c.getBlockTimestamp and logs via
c.logger.Error().Uint64(...).Err(err).Msg(...)) to stop silently using
time.Now(): either propagate the error to the subscription caller (return or
send an error into the subscription channel) or fail the subscription event
instead of substituting the current time, and remove or update the TODO
accordingly; if you intentionally defer this change, update the TODO to
reference issue `#3142` explicitly (e.g., "TODO: address in `#3142`") so the
inconsistency is tracked.
- Around line 45-48: The retry constants blockTimestampFetchMaxAttempts and
blockTimestampFetchBackoff should be made configurable via the client's Config:
add fields (e.g., BlockTimestampFetchMaxAttempts int and
BlockTimestampFetchBackoff time.Duration) to the Config struct, initialize them
with the current defaults (3 and 100*time.Millisecond) when constructing the
client (e.g., in NewClient or config validation), and replace direct uses of the
constants with Config references wherever block timestamp fetching logic runs
(references to blockTimestampFetchMaxAttempts and blockTimestampFetchBackoff).
Ensure nil/zero-value handling preserves the existing defaults so behavior is
unchanged unless the caller overrides the config.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3f6f2ee7-12b4-46b5-a4f7-e262388bfcb5
📒 Files selected for processing (2)
CHANGELOG.mdblock/internal/da/client.go
🚧 Files skipped from review as they are similar to previous changes (1)
- CHANGELOG.md
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
block/internal/syncing/da_retriever_test.go (1)
262-269: Use a fixed timestamp in mockedRetrieveBlobsresponses.
time.Now()here is unnecessary for assertions and introduces avoidable nondeterminism in test fixtures.As per coding guidelines, "Ensure tests are deterministic."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@block/internal/syncing/da_retriever_test.go` around lines 262 - 269, The mocked RetrieveBlobs responses use time.Now(), causing nondeterministic tests; replace those timestamps with a fixed time variable (e.g., ts := time.Unix(...)) and use that ts in the BaseResult.Timestamp for both datypes.ResultRetrieve returns in the test so assertions are deterministic; update the two client.On("RetrieveBlobs", ...).Return(...) calls to set BaseResult.Timestamp to the fixed ts instead of time.Now() and run the test.block/internal/da/client_test.go (1)
349-351: Guard event receive with a timeout to avoid hanging tests.A timeout-based
selectkeeps this test failure mode deterministic if subscription delivery regresses.As per coding guidelines, "Ensure tests are deterministic."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@block/internal/da/client_test.go` around lines 349 - 351, The test currently blocks on a direct receive from the events channel (ev := <-events) which can hang if delivery regresses; change this to a timeout-guarded select that waits for either an event from the events channel (assign to ev and run the existing assertions on ev.Height and ev.Timestamp) or a time.After timeout branch that fails the test (use t.Fatalf or require.FailNow with a clear timeout message). Reference the existing events channel and ev variable in the select and pick a reasonable timeout (e.g., 1s or configurable) to make the test deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@block/internal/syncing/da_retriever.go`:
- Around line 87-94: The current callers of RetrieveBlobs (in the block using
r.client.RetrieveBlobs with r.client.GetHeaderNamespace()/GetDataNamespace())
rely on validateBlobResponse returning nil for unknown non-success DA statuses
which can mask real failures; update validateBlobResponse to return a non-nil
error for any unhandled/default DA status (rather than nil), and change the
callers around RetrieveBlobs to propagate and return that error immediately (do
not convert it into ErrBlobNotFound). Specifically, modify validateBlobResponse
to handle the default case with an explicit error describing the unexpected DA
status and ensure both call sites that invoke validateBlobResponse after
RetrieveBlobs (the header/data branch and the other similar block) check and
return that error instead of proceeding.
---
Nitpick comments:
In `@block/internal/da/client_test.go`:
- Around line 349-351: The test currently blocks on a direct receive from the
events channel (ev := <-events) which can hang if delivery regresses; change
this to a timeout-guarded select that waits for either an event from the events
channel (assign to ev and run the existing assertions on ev.Height and
ev.Timestamp) or a time.After timeout branch that fails the test (use t.Fatalf
or require.FailNow with a clear timeout message). Reference the existing events
channel and ev variable in the select and pick a reasonable timeout (e.g., 1s or
configurable) to make the test deterministic.
In `@block/internal/syncing/da_retriever_test.go`:
- Around line 262-269: The mocked RetrieveBlobs responses use time.Now(),
causing nondeterministic tests; replace those timestamps with a fixed time
variable (e.g., ts := time.Unix(...)) and use that ts in the
BaseResult.Timestamp for both datypes.ResultRetrieve returns in the test so
assertions are deterministic; update the two client.On("RetrieveBlobs",
...).Return(...) calls to set BaseResult.Timestamp to the fixed ts instead of
time.Now() and run the test.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: daeec9e5-12d4-4cf5-b8bf-86c3bda137f2
📒 Files selected for processing (10)
apps/evm/server/force_inclusion_test.goblock/internal/da/client.goblock/internal/da/client_test.goblock/internal/da/interface.goblock/internal/da/tracing.goblock/internal/da/tracing_test.goblock/internal/syncing/da_retriever.goblock/internal/syncing/da_retriever_test.gotest/mocks/da.gotest/testda/dummy.go
| headerRes := r.client.RetrieveBlobs(ctx, daHeight, r.client.GetHeaderNamespace()) | ||
|
|
||
| // If namespaces are the same, return header result | ||
| if bytes.Equal(r.client.GetHeaderNamespace(), r.client.GetDataNamespace()) { | ||
| return headerRes, r.validateBlobResponse(headerRes, daHeight) | ||
| } | ||
|
|
||
| dataRes := r.client.Retrieve(ctx, daHeight, r.client.GetDataNamespace()) | ||
| dataRes := r.client.RetrieveBlobs(ctx, daHeight, r.client.GetDataNamespace()) |
There was a problem hiding this comment.
Do not silently accept unknown non-success DA statuses.
With both paths now using RetrieveBlobs, validateBlobResponse returning nil in default can mask real failures and later return ErrBlobNotFound incorrectly.
Suggested fix
func (r *daRetriever) validateBlobResponse(res datypes.ResultRetrieve, daHeight uint64) error {
switch res.Code {
case datypes.StatusError:
return fmt.Errorf("DA retrieval failed: %s", res.Message)
case datypes.StatusHeightFromFuture:
return fmt.Errorf("%w: height from future", datypes.ErrHeightFromFuture)
case datypes.StatusNotFound:
return fmt.Errorf("%w: blob not found", datypes.ErrBlobNotFound)
case datypes.StatusSuccess:
r.logger.Debug().Uint64("da_height", daHeight).Msg("successfully retrieved from DA")
return nil
default:
- return nil
+ return fmt.Errorf("DA retrieval failed with status code %d: %s", res.Code, res.Message)
}
}Also applies to: 141-154
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@block/internal/syncing/da_retriever.go` around lines 87 - 94, The current
callers of RetrieveBlobs (in the block using r.client.RetrieveBlobs with
r.client.GetHeaderNamespace()/GetDataNamespace()) rely on validateBlobResponse
returning nil for unknown non-success DA statuses which can mask real failures;
update validateBlobResponse to return a non-nil error for any unhandled/default
DA status (rather than nil), and change the callers around RetrieveBlobs to
propagate and return that error immediately (do not convert it into
ErrBlobNotFound). Specifically, modify validateBlobResponse to handle the
default case with an explicit error describing the unexpected DA status and
ensure both call sites that invoke validateBlobResponse after RetrieveBlobs (the
header/data branch and the other similar block) check and return that error
instead of proceeding.
Resolves #3141
Overview
Retry fetching timestamp from DA with backoff on failure.
Fails with error instead of defaulting to current time.
Summary by CodeRabbit
Bug Fixes
New Features
Tests
Breaking Changes