Conversation
… executor comments.
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
Summary of ChangesHello @alpe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on significant performance improvements for block execution and storage operations. By introducing caching mechanisms, optimizing data serialization, and refining key generation, the changes aim to reduce CPU and memory overhead, particularly in high-throughput scenarios. The modifications also streamline the block production pipeline by avoiding redundant computations and ensuring correct P2P broadcast ordering, contributing to a more efficient and robust node operation. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Claude finished @alpe's task in 4m 36s —— View job Code ReviewTasks
Great set of optimizations with solid benchmarks backing them up. The ~3x throughput improvement and ~50% allocation reduction are impressive. A few issues need addressing before merge: Bugs / Correctness1. Optimization #6 is incomplete —
// line 534
signature, _, err := e.signHeader(&header.Header)
// ...
// line 553
if err := batch.SaveBlockData(header, data, &signature); err != nil {The optimization is wired halfway — the interface change and Note: this only works if 2. e.lastBlockInfo.Store(&lastBlockInfo{
headerHash: newState.LastHeaderHash,
dataHash: data.Hash(),
signature: signature,
})If Thread Safety3. func (s *Signer) MarshalledPubKey() ([]byte, error) {
if len(s.marshalledPubKey) > 0 {
return s.marshalledPubKey, nil
}
// ...
s.marshalledPubKey = bz // unsynchronized write
return bz, nil
}This is a check-then-act without synchronization. If the same // executor.go:738-743
Signer: types.Signer{
PubKey: pubKey, // marshalledPubKey is zero
Address: e.genesis.ProposerAddress,
},4. pendingCheckCounter uint64This field is only accessed from the execution loop goroutine, so it's safe today. But it's a plain field adjacent to Design / Correctness Concerns5. The new lightweight validation intentionally skips 6. if err := e.savePendingBlock(ctx, header, data); err != nil {
return fmt.Errorf("failed to save block data: %w", err)
}
// hasPendingBlock is still false hereThe flag only transitions Code Quality7. signatureHash := *signature
8. func getHeightKey() string {
return "/" + heightPrefix
}All other height-related keys now use the 9. The old implementation presumably used 10. LastBlockTime: ×tamppb.Timestamp{Seconds: secs, Nanos: nanos},
Performance (Minor)11. Sequential broadcast adds latency to block cycle [ The correctness argument for sequential broadcast is sound. But this adds the latency of header broadcast to the block production cycle, potentially increasing block times under slow P2P conditions. Consider whether the broadcast should happen in a background goroutine after the batch is committed (with the header-before-data ordering preserved). The current approach is the safest and likely acceptable, but worth profiling under network load. Testing12. No test for The new lightweight validation method has no dedicated unit tests. Given it's now on the critical path for every produced block (replacing 13. The tracing wrapper and mock exist, but there's no test verifying that SummaryThe performance results are compelling and the architecture is sound. The most actionable items before merge:
|
There was a problem hiding this comment.
Code Review
The pull request introduces several performance optimizations aimed at reducing redundant store reads and protobuf marshaling/unmarshaling. Key changes include caching lastHeaderHash, lastDataHash, and lastSignature in the Executor struct, amortizing pending limit checks, and introducing SaveBlockDataFromBytes to avoid re-serialization. Additionally, TxsByteSize is added for more efficient metrics reporting, and MarshalledPubKey caches public key serialization. The signHeader function now returns serialized header bytes to prevent redundant marshaling. Overall, these changes should lead to a more efficient block production pipeline.
block/internal/executing/executor.go
Outdated
| "github.com/libp2p/go-libp2p/core/crypto" | ||
| "github.com/rs/zerolog" | ||
| "golang.org/x/sync/errgroup" | ||
|
|
There was a problem hiding this comment.
The golang.org/x/sync/errgroup import was removed, but the corresponding usage in ProduceBlock (lines 548-558 in the old code) was replaced with sequential broadcasting. This is a functional change that might impact the overall block production time if broadcasting was previously done concurrently. While the comment on line 598 explains the sequential broadcast, it's worth noting the performance implication of this change.
| // For based sequencer, this will return an empty signature | ||
| signature, err := e.signHeader(header.Header) | ||
| // For based sequencer, this will return an empty signature. | ||
| signature, _, err := e.signHeader(&header.Header) |
There was a problem hiding this comment.
The signHeader function now returns three values, but only two are being captured here. The headerBytes (second return value) is not being used, which could be passed to batch.SaveBlockDataFromBytes to avoid re-marshalling the header. This would further optimize the block production process.
| signature, _, err := e.signHeader(&header.Header) | |
| signature, headerBytes, err := e.signHeader(&header.Header) |
| height := header.Height() | ||
| signatureHash := *signature | ||
|
|
||
| headerBlob, err := header.MarshalBinary() |
| signatureHash := *signature | ||
|
|
||
| headerBlob, err := header.MarshalBinary() | ||
| if err != nil { |
| // SaveBlockDataFromBytes saves pre-serialized block data to the batch. | ||
| // This avoids re-marshalling header and data when the caller already has the binary blobs. | ||
| func (b *DefaultBatch) SaveBlockDataFromBytes(header *types.SignedHeader, headerBlob, dataBlob []byte, signature *types.Signature) error { | ||
| height := header.Height() |
| @@ -39,29 +39,40 @@ const ( | |||
| heightPrefix = "t" | |||
There was a problem hiding this comment.
The heightPrefix constant is defined but not used in the new heightKey function. It seems like the intention was to use it as the prefix, but it's hardcoded as 't' in the constant definition and then 'h', 'd', 'c', 's' in the heightKey calls. This could lead to confusion or incorrect key generation.
| func getHeightKey() string { | ||
| return GenerateKey([]string{heightPrefix}) | ||
| return "/" + heightPrefix |
There was a problem hiding this comment.
The getHeightKey function now returns a hardcoded string "/t". This is inconsistent with the heightKey function which dynamically generates keys based on a prefix and height. If heightPrefix is intended to be used here, it should be passed to heightKey or directly used in the string concatenation.
| func getHeightKey() string { | |
| return GenerateKey([]string{heightPrefix}) | |
| return "/" + heightPrefix | |
| func getHeightKey() string { | |
| return heightKey(heightPrefix, 0) // Or whatever default height is appropriate | |
| } |
| func GenerateKey(fields []string) string { | ||
| key := "/" + strings.Join(fields, "/") | ||
| return path.Clean(key) | ||
| // Pre-calculate total size to avoid re-allocation. | ||
| n := 0 | ||
| for _, f := range fields { | ||
| n += 1 + len(f) // '/' + field | ||
| } | ||
| var b strings.Builder | ||
| b.Grow(n) | ||
| for _, f := range fields { | ||
| b.WriteByte('/') | ||
| b.WriteString(f) | ||
| } | ||
| return b.String() |
There was a problem hiding this comment.
The GenerateKey function was optimized to use strings.Builder for performance. However, the path.Clean call was removed. While strings.Builder is more efficient for concatenation, path.Clean ensures that the generated key is always in a canonical form (e.g., handling double slashes or ./.. components). Removing it might lead to non-canonical keys if the input fields are not perfectly clean, which could cause issues with datastore lookups.
julienrbrt
left a comment
There was a problem hiding this comment.
nice! mostly lgtm! i think we can delete some old code and delete some struct cache to put them in cached_store instead (better for readability and consistency)
block/internal/executing/executor.go
Outdated
|
|
||
| // Cached per-block data to avoid store reads + protobuf deserialization | ||
| // in CreateBlock. Updated after each successful block production. | ||
| lastHeaderHash types.Hash |
There was a problem hiding this comment.
I still believe those aren't necessary, because GetBlockData is cached on the second call.
We may simply want to add GetSignature to the pkg/store/cached_store.go for consistency.
There was a problem hiding this comment.
I replaced the code to use GetBlockData and ran benchmarks. It is slower with GetBlockData (using a memory DB).
This branch:
BenchmarkProduceBlock/empty_batch-12 43850 26704 ns/op 6758 B/op 70 allocs/op
BenchmarkProduceBlock/empty_batch-12 45126 26834 ns/op 6854 B/op 70 allocs/op
BenchmarkProduceBlock/empty_batch-12 47043 27658 ns/op 7074 B/op 70 allocs/op
BenchmarkProduceBlock/empty_batch-12 45932 25491 ns/op 6985 B/op 70 allocs/op
BenchmarkProduceBlock/empty_batch-12 46345 26730 ns/op 7016 B/op 70 allocs/op
BenchmarkProduceBlock/single_tx-12 44605 27264 ns/op 7258 B/op 80 allocs/op
BenchmarkProduceBlock/single_tx-12 44408 26924 ns/op 7252 B/op 80 allocs/op
BenchmarkProduceBlock/single_tx-12 44490 27180 ns/op 7253 B/op 80 allocs/op
BenchmarkProduceBlock/single_tx-12 44314 27036 ns/op 7234 B/op 80 allocs/op
BenchmarkProduceBlock/single_tx-12 45469 27158 ns/op 7367 B/op 80 allocs/op
BenchmarkProduceBlock/100_txs-12 36204 32863 ns/op 25767 B/op 80 allocs/op
BenchmarkProduceBlock/100_txs-12 36198 32815 ns/op 25767 B/op 80 allocs/op
BenchmarkProduceBlock/100_txs-12 37375 32799 ns/op 25745 B/op 80 allocs/op
BenchmarkProduceBlock/100_txs-12 36219 32371 ns/op 25767 B/op 80 allocs/op
BenchmarkProduceBlock/100_txs-12 36669 32604 ns/op 25758 B/op 80 allocs/op
PASS
ok github.com/evstack/ev-node/block/internal/executing 23.490s
With GetBlockData
BenchmarkProduceBlock/empty_batch-12 41150 28790 ns/op 8979 B/op 107 allocs/op
BenchmarkProduceBlock/empty_batch-12 42402 28799 ns/op 8966 B/op 107 allocs/op
BenchmarkProduceBlock/empty_batch-12 41312 29003 ns/op 8977 B/op 107 allocs/op
BenchmarkProduceBlock/empty_batch-12 42162 29241 ns/op 8967 B/op 107 allocs/op
BenchmarkProduceBlock/empty_batch-12 41576 28934 ns/op 8973 B/op 107 allocs/op
BenchmarkProduceBlock/single_tx-12 41461 29483 ns/op 9471 B/op 120 allocs/op
BenchmarkProduceBlock/single_tx-12 40598 29281 ns/op 9484 B/op 120 allocs/op
BenchmarkProduceBlock/single_tx-12 41346 29488 ns/op 9472 B/op 120 allocs/op
BenchmarkProduceBlock/single_tx-12 40634 28958 ns/op 9483 B/op 120 allocs/op
BenchmarkProduceBlock/single_tx-12 38173 29275 ns/op 9523 B/op 120 allocs/op
BenchmarkProduceBlock/100_txs-12 30086 39664 ns/op 39148 B/op 226 allocs/op
BenchmarkProduceBlock/100_txs-12 30886 39006 ns/op 39126 B/op 226 allocs/op
BenchmarkProduceBlock/100_txs-12 30547 39378 ns/op 39135 B/op 226 allocs/op
BenchmarkProduceBlock/100_txs-12 31329 38776 ns/op 39115 B/op 226 allocs/op
BenchmarkProduceBlock/100_txs-12 31376 38786 ns/op 39113 B/op 226 allocs/op
PASS
ok github.com/evstack/ev-node/block/internal/executing 23.685s
There was a problem hiding this comment.
Ok, interesting. Have you made sure to call store.NewCachedStore on that memdb store?
There was a problem hiding this comment.
EDIT: Checked with store.NewCachedStore and it is slightly slower still, a field access will probably always be faster 👍🏾
| lastSignature = e.lastSignature | ||
| } else { | ||
| // Cold start fallback: read from store | ||
| lastHeader, lastData, err := e.store.GetBlockData(ctx, height-1) |
There was a problem hiding this comment.
technically not true, without those cached struct, it would have read from cache anyway
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3093 +/- ##
==========================================
- Coverage 61.18% 60.91% -0.27%
==========================================
Files 113 113
Lines 11444 11617 +173
==========================================
+ Hits 7002 7077 +75
- Misses 3654 3740 +86
- Partials 788 800 +12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Bring moonshot 2 optimizations to main
Performance Optimizations Analysis (alex/bench_phase1_start vs HEAD)
Summary: Targeted optimizations focused on reducing CPU overhead (marshalling/hashing) and I/O latency (store reads) in the block production hot path.
1. Executor: In-Memory Caching for Block Production
Executornow cacheslastHeaderHash,lastDataHash, andlastSignaturein memory after committing a block.store.GetBlockDataandstore.GetSignature(database I/O) at the start ofCreateBlock. This significantly speeds up the "cold start" of a new block round.2. Executor: Avoid Re-hashing
CreateBlockreusesnewState.LastHeaderHash(computed during state transition) instead of callingheader.Hash()again before broadcasting.header.Hash()triggers a full Protobuf marshal and SHA-256 hash. Reusing the already-computed hash saves CPU cycles.3. Executor: Optimized Metrics Collection
data.Size()withdata.TxsByteSize()for theBlockSizeBytesmetric.data.Size()marshals the entire object to Protobuf to get the size.data.TxsByteSize()simply sums the lengths of the transaction byte slices, avoiding expensive allocations and serialization.4. Types: Cached Public Key Marshalling
Signerstruct now caches the marshalled public key bytes (marshalledPubKey).ToProtomethods (called frequently during P2P exchange and storage) used to callcrypto.MarshalPublicKeyevery time. Caching this eliminates repeated serialization overhead.5. Types: Lightweight Sequence Validation
AssertValidSequenceinState.Validate()call, which requires re-marshalling and re-hashing the entire block data to verify theDACommitment.6. Store/Executor: Pre-serialized writes (
SaveBlockDataFromBytes)SaveBlockDataFromBytesto the Store interface;signHeadernow returns the serialized header bytes.MarshalBinarycall at the storage layer.7. Protobuf: Zero-Allocation Timestamp Conversion
State.ToProto, manually constructed×tamppb.Timestamp{...}.timestamppb.New(time), which performs a heap allocation. Small optimization for a high-frequency operation.8. Executor: Sequential P2P Broadcast
errgroup) to sequential (Header then Data).This branch:
base
alex/bench_phase1_start