Skip to content

Comments

sqlite: optimize column name creation and text value encoding#61954

Open
thisalihassan wants to merge 2 commits intonodejs:mainfrom
thisalihassan:sqlite-one-byte-encoding
Open

sqlite: optimize column name creation and text value encoding#61954
thisalihassan wants to merge 2 commits intonodejs:mainfrom
thisalihassan:sqlite-one-byte-encoding

Conversation

@thisalihassan
Copy link
Contributor

@thisalihassan thisalihassan commented Feb 23, 2026

Skip the full UTF-8 decode path for text values that are pure ASCII by validating with simdutf and creating OneByte V8 strings, halving memory.Internalize column name strings so V8 shares hidden classes across row objects. Cache column names in the iterate() hot loop, invalidating on schema changes via SQLITE_STMTSTATUS_REPREPARE.

Benchmark: 30-run
sqlite-benchmark

Refs: nodejs/performance#181

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/sqlite

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. sqlite Issues and PRs related to the SQLite subsystem. labels Feb 23, 2026
@thisalihassan thisalihassan force-pushed the sqlite-one-byte-encoding branch from 6adb625 to ff6a788 Compare February 23, 2026 14:58
Use simdutf to detect ASCII text values and create them
via NewFromOneByte for compact one-byte representation.
Internalize column name strings with kInternalized so V8
shares hidden classes across row objects. Cache column
names on StatementSync for iterate(), invalidated via
SQLITE_STMTSTATUS_REPREPARE on schema changes.

Refs: nodejs/performance#181
@thisalihassan thisalihassan force-pushed the sqlite-one-byte-encoding branch from ff6a788 to 4fcb1ed Compare February 23, 2026 15:05
@thisalihassan
Copy link
Contributor Author

thisalihassan commented Feb 23, 2026

I didn't push the changes for sqlite-prepare-select-all-options and sqlite-prepare-select-all where I tested the iterate method because it slows down the benchmark by a lot.
Adding iterate nearly ~2.3x the total benchmark runtime from ~40 minutes to ~90 minutes, but if needed I can push that as well
Benchmark with iterate: 30 runs
sqlite-benchmark_all_2

@codecov
Copy link

codecov bot commented Feb 23, 2026

Codecov Report

❌ Patch coverage is 83.72093% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.72%. Comparing base (da5efc4) to head (fd4215d).
⚠️ Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
src/node_sqlite.cc 83.72% 4 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #61954      +/-   ##
==========================================
+ Coverage   88.84%   89.72%   +0.88%     
==========================================
  Files         674      672       -2     
  Lines      204957   204559     -398     
  Branches    39309    39274      -35     
==========================================
+ Hits       182087   183537    +1450     
+ Misses      15088    13326    -1762     
+ Partials     7782     7696      -86     
Files with missing lines Coverage Δ
src/node_sqlite.h 80.39% <ø> (ø)
src/node_sqlite.cc 80.92% <83.72%> (+0.02%) ⬆️

... and 122 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ronag ronag requested review from addaleax, anonrig and geeksilva97 and removed request for geeksilva97 February 24, 2026 09:45
Comment on lines 60 to 61
const char* data,
size_t length) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just take a std::string_view input as an argument rather than this C API like function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

NewStringType::kNormal,
len);
}
return String::NewFromUtf8(isolate, data, NewStringType::kNormal, len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use kInternalized here and in line 66?

Copy link
Contributor Author

@thisalihassan thisalihassan Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Utf8StringMaybeOneByte is used for text cell values (row data), not column names, text values are typically unique user data ("Ali Hassan", "ali.n@example.com", etc.) so the lookup would almost always miss

Column names are already interned separately in ColumnNameToName

if (!iter->stmt_->ColumnNameToName(i).ToLocal(&key)) return;
row_keys.emplace_back(key);
}
if (!iter->stmt_->GetCachedColumnNames(&row_keys)) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to here?

.As<Name>();
}

bool StatementSync::GetCachedColumnNames(LocalVector<Name>* keys) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment to this function?

@thisalihassan thisalihassan force-pushed the sqlite-one-byte-encoding branch from ad603d4 to fd4215d Compare February 24, 2026 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. sqlite Issues and PRs related to the SQLite subsystem.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants