Move CRC32C specific compilation options to only impact crc-impl.c++ by FlorentCollin · Pull Request #6078 · cloudflare/workerd

FlorentCollin · 2026-02-13T21:35:33Z

While investigating a D1 issue, I found that two translation units compile different implementations of kj::_::intHash32(), which causes a bug in src/workerd/api/sql.c++ statement cache.

Disclaimer: I used Claude Opus 4.6 to help me understand all the moving parts and to analyze the workerd binaries. I have checked everything it claims, but I'm not an expert in these sorts of low-level optimizations, so if anything is wrong please correct me so that I can learn along the way!

Description

This PR tries to fix these two issues:

Both issues report that executing large SQL files on Linux and Windows fails with:

$ npx wrangler d1 execute db-name --file=database.sql
# output from wrangler...

kj/table.c++:57: error: HashIndex detected hash table inconsistency. This can happen if you create a kj::Table with a hash index and you modify the rows in the table post-indexing in a way that would change their hash. This is a serious bug which will lead to undefined behavior.
stack: ; kj::getStackTrace() = 7ff6490d0fb7 7ff649ab56f1 7ff649aafb57 7ff649ab3e1e 7ff6491c488b 7ff648ddd9e0 7ff648ddd70f 7ff648ddd4c3 7ff64aa403b9 7ff6aaa87b74 3df00000010 3df00cda16c 9 3df00020000 3df0c09b9a8

However, the same command executes successfully on macOS ARM.

Investigation

After some testing, I found that the error only appears for SQL files with over 1 MB worth of SQL statements. This led me to the SQL_STATEMENT_CACHE_MAX_SIZE constant

workerd/src/workerd/api/sql.c++

Line 18 in 244803a

static constexpr uint SQL_STATEMENT_CACHE_MAX_SIZE = 1024 * 1024;

and the code that evicts statements from the cache.

workerd/src/workerd/api/sql.c++

Lines 84 to 92 in 244803a

    
           // If the statement cache grew too big, drop the least-recently-used entry. 
        
           while (statementCache.totalSize > SQL_STATEMENT_CACHE_MAX_SIZE) { 
        
             auto& toRemove = *statementCache.lru.begin(); 
        
             auto oldQuery = jsg::JsString(toRemove.query.getHandle(js)); 
        
             statementCache.totalSize -= toRemove.statementSize; 
        
             statementCache.lru.remove(toRemove); 
        
             KJ_ASSERT(statementCache.map.eraseMatch(oldQuery)); 
        
           }

Specifically, it is the call to statementCache.map.eraseMatch(oldQuery) that produces the "HashIndex detected hash table inconsistency" error.

The SQL statement cache uses a kj::Table with a kj::HashIndex

workerd/src/workerd/api/sql.h

Line 130 in 244803a

    
           using StatementMap = kj::Table<kj::Rc<CachedStatement>, kj::HashIndex<StatementCacheCallbacks>>;

and there are two code paths that compute hashes for cache entries:

Insertion via findOrCreate(JsString) calls JsString::hashCode() (defined in jsvalue.c++)

workerd/src/workerd/api/sql.c++

Lines 44 to 48 in 244803a

    
           kj::Rc<CachedStatement>& slot = statementCache.map.findOrCreate(querySql, [&]() { 
        
             auto result = kj::rc<CachedStatement>(js, *this, db, querySql, js.toString(querySql)); 
        
             statementCache.totalSize += result->statementSize; 
        
             return result; 
        
           });

Eviction uses HashableV8Ref::hashCode() cached in the CachedStatement constructor (defined in sql.c++)

workerd/src/workerd/api/sql.c++

Lines 87 to 90 in 244803a

    
           auto oldQuery = jsg::JsString(toRemove.query.getHandle(js)); 
        
           statementCache.totalSize -= toRemove.statementSize; 
        
           statementCache.lru.remove(toRemove); 
        
           KJ_ASSERT(statementCache.map.eraseMatch(oldQuery));

The problem is a One Definition Rule violation: kj::_::intHash32() is an inline function in kj/hash.h that compiles to different implementations depending on whether CRC32 compiler flags are set:

sql.c++ is compiled as part of //src/workerd/io, which has -mcrc (ARM) / -msse4.2 (x86) copts, so intHash32() uses hardware CRC32.
jsvalue.c++ is compiled as part of //src/workerd/jsg:jsg-core, which has no CRC flags, so intHash32() uses the Thomas Wang software fallback.

Both functions take the same input and produce different outputs. Entries are inserted into one bucket (Thomas Wang hash) but looked up during eviction from a different bucket (CRC32 hash).
Thanks to the amazing logging in the kj::Table, the kj::HashIndex detects the mismatch and logs the "hash table inconsistency" error.

I'm not entirely sure, but I think that macOS ARM is unaffected because the default compiler flags already enable __ARM_FEATURE_CRC32 for all targets.

I had Claude disassemble the relevant functions in the workerd binary distributed with the wrangler package, and it confirmed the mismatch in the Linux x86_64 binary:

JsString::hashCode() at 0x31d1860 -- Thomas Wang software fallback:
31d1886: imull  $0x809, %ecx, %ecx    # Thomas Wang constant (2057)
CachedStatement constructor at 0x2cae690 -- CRC32 hardware instruction:
2cae6f8: crc32l  %eax, %edx            # SSE4.2 CRC32 instruction
2cae6fd: movl    %edx, 0x30(%rbx)      # stored as cached identityHash
eraseImpl at 0x2cafb60 loads the cached CRC32 hash and has an explicit call to logHashTableInconsistency at 0x2cafbcc -- confirming this is the crash path.

I believe the compilation flag mismatch was introduced in c7ad54b. It seems the copts were intended for crypto/crc-impl.c++, but that file is compiled under a separate bazel target (//src/workerd/api:crypto-crc-impl) that never received them. The flags ended up applying to sql.c++ and every other file in //src/workerd/io.

Changes

I'm not sure how to best fix the problem. This PR removes the CRC32 compilation options from //src/workerd/io and adds them to the //src/workerd/api:crypto-crc-impl target. However, this means that the CRC32 optimization isn't used anymore in //src/workerd/io, and that's kind of sad.

We could also remove the call to kj::hashCode in the identityHash implementation, but Kenton wrote that it's not clear if V8's GetIdentityHash() returns uniform results.

workerd/src/workerd/jsg/jsg.h

Lines 959 to 966 in 3274cc9

    
           HashableV8Ref(v8::Isolate* isolate, v8::Local<T> handle) 
        
               // TODO(perf): It's not clear if V8's `GetIdentityHash()` is intended to return uniform 
        
               //   results as required for KJ hashing, so we pass it to `kj::hashCode()` to further hash 
        
               //   it. This may be unnecessary. Note that there are several other call sites of 
        
               //   `GetIdentityHash()` which do the same -- if we decide we don't need this we should fix 
        
               //   all of them. 
        
               : V8Ref<T>(isolate, handle), 
        
                 identityHash(kj::hashCode(handle->GetIdentityHash())) {}

What's missing from this PR / Questions

Is there a way to create a test?
I wanted to create a test to reproduce this, but the hash table inconsistency is only a log and doesn't crash or throw.
What would be the best fix for this problem?
I'm not used to the codebase, so the change introduced in this PR is certainly not the best one. It does solve the problem, but it would be great to solve the problem and still use the crc32 compilation option for all the codebase.

fhanau · 2026-02-13T21:47:39Z

Thank you for investigating this – I'll need to check if the conclusions here are correct, but that does sound plausible.
Some initial thoughts:

For testing, if you can write a test that results in the HashIndex detected hash table inconsistency message, that would be helpful. We've encountered the issue of logs indicating bugs but not causing test failures before, we're currently planning to make error logs produce an exception in tests or in debug builds.
While moving compiler flags might be fixing the issue at hand, it sounds like the real problem is in libkj in that we can end up with two versions of kj::_::intHash32() calls that produce different results. I suggested that crc32 variants be used here in the first place, will think about how to best address this.
As for still benefiting from the CRC32 code path, we can certainly enable that for more bazel targets if the underlying issue is addressed, I think that's something we can get right relatively easily.

fhanau · 2026-02-13T21:50:12Z

The best approach to avoid compiler-flag dependent behavior might be to always use CRC32C, since we'll be hashing at most 8 bytes we should be able to get acceptable performance with a small lookup table when hardware support is unavailable.

fhanau · 2026-02-13T22:28:38Z

A small correction: When c7ad54b was added, crc-impl.c++ was part of the io target which had the ISA extension flags added. Therefore optimizations were being applied correctly to that file, although they are not being applied now that we have a separate target, another thing to fix.

Move CRC32C specific compilation options to only impact crc-impl.c++

244803a

FlorentCollin requested review from a team as code owners February 13, 2026 21:35

fhanau self-requested a review February 13, 2026 21:39

fhanau mentioned this pull request Feb 13, 2026

Always use CRC32C for intHash capnproto/capnproto#2564

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move CRC32C specific compilation options to only impact crc-impl.c++#6078

Move CRC32C specific compilation options to only impact crc-impl.c++#6078
FlorentCollin wants to merge 1 commit intomainfrom
fcollin/crc-different-translation-units

FlorentCollin commented Feb 13, 2026

Uh oh!

fhanau commented Feb 13, 2026

Uh oh!

fhanau commented Feb 13, 2026 •

edited

Loading

Uh oh!

fhanau commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// If the statement cache grew too big, drop the least-recently-used entry.
	while (statementCache.totalSize > SQL_STATEMENT_CACHE_MAX_SIZE) {
	auto& toRemove = *statementCache.lru.begin();
	auto oldQuery = jsg::JsString(toRemove.query.getHandle(js));
	statementCache.totalSize -= toRemove.statementSize;
	statementCache.lru.remove(toRemove);
	KJ_ASSERT(statementCache.map.eraseMatch(oldQuery));
	}

	kj::Rc<CachedStatement>& slot = statementCache.map.findOrCreate(querySql, [&]() {
	auto result = kj::rc<CachedStatement>(js, *this, db, querySql, js.toString(querySql));
	statementCache.totalSize += result->statementSize;
	return result;
	});

	HashableV8Ref(v8::Isolate* isolate, v8::Local<T> handle)
	// TODO(perf): It's not clear if V8's `GetIdentityHash()` is intended to return uniform
	// results as required for KJ hashing, so we pass it to `kj::hashCode()` to further hash
	// it. This may be unnecessary. Note that there are several other call sites of
	// `GetIdentityHash()` which do the same -- if we decide we don't need this we should fix
	// all of them.
	: V8Ref<T>(isolate, handle),
	identityHash(kj::hashCode(handle->GetIdentityHash())) {}

Conversation

FlorentCollin commented Feb 13, 2026

Description

Investigation

Changes

What's missing from this PR / Questions

Uh oh!

fhanau commented Feb 13, 2026

Uh oh!

fhanau commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fhanau commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fhanau commented Feb 13, 2026 •

edited

Loading