research: add --no-serialized store generator option#19669
research: add --no-serialized store generator option#19669dashrews78 wants to merge 4 commits intomasterfrom
Conversation
Add a --no-serialized flag to the pg-table-bindings generator that
produces stores without the serialized bytea column. All proto fields
become individual DB columns, and reads reconstruct the proto from
column values instead of unmarshaling a blob.
Key design decisions:
- Writes use unnest()-based bulk INSERT for parent tables (single SQL
statement for all rows) with per-row UPDATE fallback for array columns
that can't be unnested (e.g., text[]). Child table rows are also
bulk-inserted via unnest.
- Reads scan individual columns into Go variables and construct the
proto directly. Child table fetching is opt-in via the exported
FetchChildren function — callers must explicitly request it since
analysis showed no existing ProcessIndicator caller reads child data.
- The search framework's GET query path selects individual columns
instead of the serialized blob when NoSerialized is set.
- A new NoSerializedStore type in pkg/search/postgres handles the
store lifecycle without requiring the ClonedUnmarshaler constraint.
Includes a copy of ProcessIndicator as ProcessIndicatorNoSerialized
for integration testing and benchmarking. Benchmark results at 1000
objects on Apple M3 Pro:
Writes (UpsertMany 1000):
Serialized: 11.7 ms
NoSerialized: 30.9 ms (2.6x, driven by child table inserts)
Reads (GetMany 100):
Serialized: 181 µs
NoSerialized: 260 µs (1.4x)
Reads (Get single):
Serialized: 48 µs
NoSerialized: 62 µs (1.3x)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a --jsonb flag to the pg-table-bindings generator that stores the serialized column as jsonb instead of bytea. The data is marshaled with protojson.Marshal and unmarshaled with protojson.Unmarshal, making it human-readable and SQL-queryable while preserving the single-column read/write pattern of the existing stores. Key changes: - Walker: Jsonb bool on Schema, getSerializedField returns jsonb type - Generator: --jsonb CLI flag, WithJsonb() walk option - Templates: conditional protojson marshal/unmarshal, scanRow/scanRows for jsonb that scan []byte and call protojson.Unmarshal - Reuses NoSerializedStore infrastructure (no new store type needed) Includes ProcessIndicatorJsonb example with integration tests and benchmarks comparing bytea vs jsonb vs no-serialized performance. Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eliminate the intermediate buildFromScan function and scan directly into proto struct fields where possible. Fields that need type conversion (datetime→timestamppb, enum→int32, uuid strings) still use temp vars, but all other fields (strings, uints, bools, slices) scan straight into &obj.Field, skipping the intermediate copy. For ProcessIndicatorNoSerialized (20 columns), this reduces temp vars from 20 to 6. Benchmarks show: - GetMany/500: 31% fewer allocs (17,654→12,154), 9% faster - GetMany/100: 30% fewer allocs (3,647→2,547), 4% faster - GetSingle: 7% fewer allocs (166→155) Partially generated by AI. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive comparison of bytea vs jsonb vs no-serialized store strategies covering Go-side performance (writes, reads, allocations), Postgres-side performance (storage, buffer hits, EXPLAIN ANALYZE), migration tradeoffs, and the direct-scan optimization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This change is part of the following stack: Change managed by git-spice. |
|
Skipping CI for Draft Pull Request. |
|
Images are ready for the commit at 25e3e07. To use with deploy scripts, first |
📝 WalkthroughSummary by CodeRabbitRelease Notes
WalkthroughThis pull request introduces two new storage variants for ProcessIndicator: JSONB-backed storage and no-serialized (column-based) storage. It adds protocol buffer definitions, schema generation infrastructure enhancements, store implementations with custom scanning logic, comprehensive tests, and benchmarks. The changes include updates to the walker and schema modules to support configurable serialization modes, new generic store implementations via Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (1)
tools/generate-helpers/pg-table-bindings/main.go (1)
181-188: Consider adding mutual exclusion validation for--no-serializedand--jsonb.Both flags can currently be set simultaneously, but they represent mutually exclusive storage strategies.
--jsonbuses a jsonb serialized column while--no-serializedeliminates the serialized column entirely.Proposed fix
var walkOpts []walker.WalkOption + if props.NoSerialized && props.Jsonb { + log.Fatal("--no-serialized and --jsonb are mutually exclusive") + } if props.NoSerialized { walkOpts = append(walkOpts, walker.WithNoSerialized()) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/generate-helpers/pg-table-bindings/main.go` around lines 181 - 188, The flags props.NoSerialized and props.Jsonb are mutually exclusive but currently both can be set; validate and error early before building walkOpts: in main (where walkOpts is constructed and walker.Walk is called) add a check that if props.NoSerialized && props.Jsonb then return an error/exit with a clear message; reference the variables props.NoSerialized, props.Jsonb and the walk option helpers walker.WithNoSerialized / walker.WithJsonb so the validation occurs before creating walkOpts and calling walker.Walk.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@central/processindicator_jsonb/store/postgres/bench_test.go`:
- Around line 237-247: The NoSerialized benchmark currently uses
noSerializedStore.New(db.DB) which creates a store without a ChildFetcher so
nSt.Get and GetMany only return parent records; change the benchmark to either
instantiate the no-serialized store with the same ChildFetcher used in the other
benchmarks (i.e., construct the store with the child fetcher and use that
instance in the NoSerialized sub-bench) or add a new sub-benchmark (e.g.,
"NoSerialized+Children") that uses noSerializedStore.New(db.DB, ChildFetcher:
<existing child fetcher>) and performs the same Get/GetMany loops (using
noSerializedIDs and seedCount) so the LineageInfo is populated and the
comparison to Serialized/Jsonb is apples-to-apples.
In `@central/processindicator_jsonb/store/postgres/pg_internals_test.go`:
- Around line 65-98: The test is only measuring the parent table
"process_indicator_no_serializeds" and omits its child table
"process_indicator_no_serializeds_lineage_infos", underreporting sizes/counts;
update the queries in pg_internals_test.go (the loop that builds sizeResult and
the separate block for lineage) so that when tbl ==
"process_indicator_no_serializeds" you aggregate both tables (e.g., use
pg_total_relation_size on each table and sum, or use a single query that SELECTs
pg_total_relation_size('parent')+pg_total_relation_size('parent_lineage_infos')
AS total and similarly sum pg_table_size and pg_indexes_size and COUNT(*) from
both tables) and populate r.totalSize/r.tableSize/r.indexSize/r.rowCount
accordingly, or alternatively change the logged label to explicitly state
"parent-table-only" where you do not aggregate; apply the same change to the
later block that currently queries
process_indicator_no_serializeds_lineage_infos so the reported footprint
represents the complete no-serialized footprint.
In `@central/processindicator_no_serialized/store/postgres/bench_test.go`:
- Around line 495-507: The benchmark is measuring parent-only reads twice
because nSt (created via New(db.DB)) lacks a ChildFetcher, so the
"WithChildFetch" sub-benchmarks end up using the same no-child semantics as
nStNoChildren; fix by ensuring one store is constructed with a real child
fetcher: create the child-enabled store by passing the ChildFetcher (e.g.,
NoSerializedStoreOpts{ChildFetcher: FetchChildren} or the equivalent option)
when constructing the store used for "WithChildFetch" (symbols to update: New,
nSt, pgSearch.NewNoSerializedStore, nStNoChildren,
NoSerializedStoreOpts{ChildFetcher: FetchChildren}, FetchChildren), and apply
the same change in the other affected block around lines 550-560.
In `@central/processindicator_no_serialized/store/postgres/store.go`:
- Around line 231-244: The COPY generator increments idx before writing it to
the row, causing an off-by-one shift (first lineage row written with idx=1). In
the pgx.CopyFromFunc closure (inputRows) change the order so the current index
is used for the row (use idx or a local rowIdx := idx) and only then increment
idx (or increment after returning the row) so COPY produces the same 0-based
ordering as the batched/un-nest paths; update references to
procesIndicatorNoSerializedID, idx, and pgx.CopyFromFunc accordingly.
In `@pkg/postgres/schema/process_indicator_jsonbs.go`:
- Around line 32-35: The code is using an anonymous numeric cast
v1.SearchCategory(86) in the SetOptionsMap and RegisterCategoryToTable calls,
which lacks a corresponding named enum value in the proto definitions and makes
the code brittle. Add a new named SearchCategory constant to
proto/api/v1/search_service.proto file, then replace both instances of the raw
v1.SearchCategory(86) cast in the SetOptionsMap and RegisterCategoryToTable
function calls within the process_indicator_jsonbs.go file with the new named
constant instead of the magic number.
In `@pkg/postgres/schema/process_indicator_no_serializeds.go`:
- Around line 37-41: The code registers ProcessIndicatorNoSerialized schema
under an unnamed enum value v1.SearchCategory(85); update the registration to
use a real declared enum constant (e.g., replace v1.SearchCategory(85) with the
appropriate generated enum like v1.SearchCategory_<NAME>) or, if this is only
for tests, remove mapping.RegisterCategoryToTable and the SetOptionsMap call for
production code and keep the schema confined to test-only wiring; look for
symbols ProcessIndicatorNoSerialized, v1.SearchCategory(85),
schema.SetOptionsMap, and mapping.RegisterCategoryToTable to make the change and
ensure the chosen enum exists in generated/api/v1/search_service.pb.go before
committing.
In `@pkg/postgres/walker/walker.go`:
- Around line 422-429: The InlinedSubMessage being appended only stores
FieldName as structField.Name, which loses the full nested access path and
breaks initializer generation; change the appended data so FieldName (or a new
Path field) contains the full assignment path using ctx.Getter(structField.Name)
(or combine GetterPrefix and FieldName) instead of just structField.Name, so
schema.Root().InlinedSubMessages entries provide the complete nested access
expression for codegen (refer to InlinedSubMessage, FieldName, GetterPrefix,
ctx.Getter, and structField.Name).
In `@pkg/search/postgres/no_serialized_store.go`:
- Around line 44-45: fetchChildren currently invokes the fetcher with s.db which
causes child queries to bypass any transaction stored in ctx; update
fetchChildren (and the similar calls around the 159-164 region) to extract the
transactional DB from ctx (the same DB used by Get/GetMany/GetByQuery) and pass
that DB into the ChildFetcher[T] instead of s.db so child queries run inside the
active transaction; locate usages of fetchChildren and the call site where
ChildFetcher is invoked and replace s.db with the DB obtained from the ctx
transaction helper (or fall back to s.db only if no tx exists).
In `@tools/generate-helpers/pg-table-bindings/funcs.go`:
- Around line 329-344: The file has formatting issues around the template helper
block (notably the getterToSetter anonymous function and nearby helpers like
joinPath and stripPointer) causing golangci-lint/gofmt failures; fix by running
gofmt (or go fmt) on tools/generate-helpers/pg-table-bindings/funcs.go to
reformat the block so the anonymous functions (getterToSetter, joinPath,
stripPointer) and their surrounding commas/indentation conform to gofmt rules,
then commit the modified file.
In `@tools/generate-helpers/pg-table-bindings/list.go`:
- Around line 99-100: The map entries using &storage.ProcessIndicatorJsonb{} and
&storage.ProcessIndicatorNoSerialized{} currently contain extra alignment spaces
that break gofmt/golangci-lint; open the map in list.go and remove the extra
spaces so each key:value pair uses normal Go spacing (e.g.
"&storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension," and
"&storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension,") then
run gofmt/golangci-lint to confirm formatting is fixed.
In `@tools/generate-helpers/pg-table-bindings/store.go.tpl`:
- Around line 524-539: FetchChildren assumes a single UUID parent key and builds
objsByID as map[string]*storeType, ids []string, and the SQL uses
ANY($1::uuid[]) with only one query argument; to fix, make FetchChildren handle
arbitrary primary key types and composite keys by (1) changing the in-memory key
map to use the actual primary key tuple type returned by pkGetter (or serialize
composite keys consistently), (2) building a parameter slice matching the number
of parent key columns in FieldsReferringToParent and passing one argument per
column to db.Query, (3) removing the hard-coded ::uuid[] cast and instead use
appropriate type casts or parameter binding based on each parent PK column type,
and (4) update the SQL q construction and the call to db.Query (the q variable
and rows, err := db.Query(ctx, q, ...)) to supply the correct number and form of
arguments so composite and non-UUID parents work correctly.
- Around line 425-444: The bulk unnest loop can panic when a parent nested
message or an individual child is nil because it dereferences obj.{{
getterToSetter $child.ObjectGetter }} and child.* directly; update the loop that
iterates over objs and for childIdx, child := range obj.{{ getterToSetter
$child.ObjectGetter }} to first check the parent submessage for nil (skip or
append nil-equivalents for all child arrays when parent is nil) and for each
child check if child == nil before accessing child.{{ joinPath (setterPath
$field) }}; when child is nil, append the appropriate zero/NULL placeholder (or
use the $singlePK.Getter "obj" for parent references) instead of dereferencing
so arr_child_{{ scanVarName $field }} remains safe for batch inserts.
- Around line 381-403: The JSONB unmarshalling currently uses
protojson.Unmarshal which rejects unknown fields; update both scanRow and
scanRows to use a tolerant unmarshal by calling
protojson.UnmarshalOptions{DiscardUnknown:true}.Unmarshal(data, msg) (replace
protojson.Unmarshal(data, msg)), referencing the functions scanRow and scanRows
and the storeType message so the JSONB rows tolerate unknown future fields
during rolling upgrades.
---
Nitpick comments:
In `@tools/generate-helpers/pg-table-bindings/main.go`:
- Around line 181-188: The flags props.NoSerialized and props.Jsonb are mutually
exclusive but currently both can be set; validate and error early before
building walkOpts: in main (where walkOpts is constructed and walker.Walk is
called) add a check that if props.NoSerialized && props.Jsonb then return an
error/exit with a clear message; reference the variables props.NoSerialized,
props.Jsonb and the walk option helpers walker.WithNoSerialized /
walker.WithJsonb so the validation occurs before creating walkOpts and calling
walker.Walk.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Central YAML (base), Organization UI (inherited)
Review profile: CHILL
Plan: Pro
Run ID: 65ada43b-debc-4691-b942-b11f5de4471b
⛔ Files ignored due to path filters (4)
generated/storage/process_indicator_jsonb.pb.gois excluded by!**/*.pb.go,!**/generated/**generated/storage/process_indicator_jsonb_vtproto.pb.gois excluded by!**/*.pb.go,!**/generated/**generated/storage/process_indicator_no_serialized.pb.gois excluded by!**/*.pb.go,!**/generated/**generated/storage/process_indicator_no_serialized_vtproto.pb.gois excluded by!**/*.pb.go,!**/generated/**
📒 Files selected for processing (24)
central/processindicator_jsonb/DESIGN.mdcentral/processindicator_jsonb/store/postgres/bench_test.gocentral/processindicator_jsonb/store/postgres/gen.gocentral/processindicator_jsonb/store/postgres/pg_internals_test.gocentral/processindicator_jsonb/store/postgres/store.gocentral/processindicator_jsonb/store/postgres/store_test.gocentral/processindicator_no_serialized/store/postgres/bench_test.gocentral/processindicator_no_serialized/store/postgres/gen.gocentral/processindicator_no_serialized/store/postgres/store.gocentral/processindicator_no_serialized/store/postgres/store_test.gopkg/postgres/schema/process_indicator_jsonbs.gopkg/postgres/schema/process_indicator_no_serializeds.gopkg/postgres/walker/schema.gopkg/postgres/walker/walker.gopkg/search/options.gopkg/search/postgres/common.gopkg/search/postgres/no_serialized_store.goproto/storage/process_indicator_jsonb.protoproto/storage/process_indicator_no_serialized.prototools/generate-helpers/pg-table-bindings/funcs.gotools/generate-helpers/pg-table-bindings/list.gotools/generate-helpers/pg-table-bindings/main.gotools/generate-helpers/pg-table-bindings/schema.go.tpltools/generate-helpers/pg-table-bindings/store.go.tpl
| b.Run("NoSerialized", func(b *testing.B) { | ||
| i := 0 | ||
| for b.Loop() { | ||
| id := noSerializedIDs[i%seedCount] | ||
| obj, exists, err := nSt.Get(ctx, id) | ||
| require.NoError(b, err) | ||
| require.True(b, exists) | ||
| require.NotNil(b, obj) | ||
| i++ | ||
| } | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Generated no-serialized store:"
sed -n '1,260p' central/processindicator_no_serialized/store/postgres/store.go
echo
echo "Shared no-serialized store implementation:"
sed -n '1,320p' pkg/search/postgres/no_serialized_store.go
echo
echo "FetchChildren and read-path call sites:"
rg -n -C3 'FetchChildren|GetMany\(|Get\(' central/processindicator_no_serialized/store/postgres pkg/search/postgresRepository: stackrox/stackrox
Length of output: 50373
🏁 Script executed:
cat -n central/processindicator_jsonb/store/postgres/bench_test.go | sed -n '213,315p'Repository: stackrox/stackrox
Length of output: 3446
Fix the NoSerialized benchmarks to include child fetches for fair comparison.
The benchmark at lines 237–247 and 305–312 uses noSerializedStore.New(db.DB), which creates a store without a ChildFetcher. This means Get() and GetMany() return parent records only, without populating LineageInfo from the child table. In contrast, the Serialized and Jsonb benchmarks reconstruct the full proto including all child fields. The comparison is not apples-to-apples.
Either use a store instance configured with ChildFetcher (as available in the no-serialized test setup), or clearly document that this benchmark measures parent-only reads and add a separate sub-benchmark for the full-fetch case for completeness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@central/processindicator_jsonb/store/postgres/bench_test.go` around lines 237
- 247, The NoSerialized benchmark currently uses noSerializedStore.New(db.DB)
which creates a store without a ChildFetcher so nSt.Get and GetMany only return
parent records; change the benchmark to either instantiate the no-serialized
store with the same ChildFetcher used in the other benchmarks (i.e., construct
the store with the child fetcher and use that instance in the NoSerialized
sub-bench) or add a new sub-benchmark (e.g., "NoSerialized+Children") that uses
noSerializedStore.New(db.DB, ChildFetcher: <existing child fetcher>) and
performs the same Get/GetMany loops (using noSerializedIDs and seedCount) so the
LineageInfo is populated and the comparison to Serialized/Jsonb is
apples-to-apples.
| for _, tbl := range []string{ | ||
| "process_indicators", | ||
| "process_indicator_jsonbs", | ||
| "process_indicator_no_serializeds", | ||
| } { | ||
| var r sizeResult | ||
| r.table = tbl | ||
| err := db.DB.QueryRow(ctx, fmt.Sprintf(` | ||
| SELECT | ||
| pg_size_pretty(pg_total_relation_size('%s')) AS total, | ||
| pg_size_pretty(pg_table_size('%s')) AS tbl, | ||
| pg_size_pretty(pg_indexes_size('%s')) AS idx, | ||
| (SELECT count(*) FROM %s)::int AS cnt | ||
| `, tbl, tbl, tbl, tbl)).Scan(&r.totalSize, &r.tableSize, &r.indexSize, &r.rowCount) | ||
| require.NoError(t, err) | ||
| t.Logf(" %-40s rows=%-5d total=%-10s table=%-10s indexes=%s", | ||
| r.table, r.rowCount, r.totalSize, r.tableSize, r.indexSize) | ||
| } | ||
|
|
||
| // Also show child table for no-serialized | ||
| { | ||
| var total, tbl, idx string | ||
| var cnt int | ||
| err := db.DB.QueryRow(ctx, ` | ||
| SELECT | ||
| pg_size_pretty(pg_total_relation_size('process_indicator_no_serializeds_lineage_infos')), | ||
| pg_size_pretty(pg_table_size('process_indicator_no_serializeds_lineage_infos')), | ||
| pg_size_pretty(pg_indexes_size('process_indicator_no_serializeds_lineage_infos')), | ||
| (SELECT count(*) FROM process_indicator_no_serializeds_lineage_infos)::int | ||
| `).Scan(&total, &tbl, &idx, &cnt) | ||
| require.NoError(t, err) | ||
| t.Logf(" %-40s rows=%-5d total=%-10s table=%-10s indexes=%s", | ||
| " └─ _lineage_infos (child)", cnt, total, tbl, idx) | ||
| } |
There was a problem hiding this comment.
Compare the complete no-serialized footprint, not just the parent table.
These sections use only process_indicator_no_serializeds for the no-serialized side, while the seeded objects also write child rows to process_indicator_no_serializeds_lineage_infos. That makes the no-serialized storage and EXPLAIN numbers systematically lower than the bytea/jsonb full-object numbers, and those figures appear to feed the tables in central/processindicator_jsonb/DESIGN.md. Either aggregate the child table into the no-serialized results or label the output as parent-table-only.
Also applies to: 149-239
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@central/processindicator_jsonb/store/postgres/pg_internals_test.go` around
lines 65 - 98, The test is only measuring the parent table
"process_indicator_no_serializeds" and omits its child table
"process_indicator_no_serializeds_lineage_infos", underreporting sizes/counts;
update the queries in pg_internals_test.go (the loop that builds sizeResult and
the separate block for lineage) so that when tbl ==
"process_indicator_no_serializeds" you aggregate both tables (e.g., use
pg_total_relation_size on each table and sum, or use a single query that SELECTs
pg_total_relation_size('parent')+pg_total_relation_size('parent_lineage_infos')
AS total and similarly sum pg_table_size and pg_indexes_size and COUNT(*) from
both tables) and populate r.totalSize/r.tableSize/r.indexSize/r.rowCount
accordingly, or alternatively change the logged label to explicitly state
"parent-table-only" where you do not aggregate; apply the same change to the
later block that currently queries
process_indicator_no_serializeds_lineage_infos so the reported footprint
represents the complete no-serialized footprint.
| nSt := New(db.DB) | ||
| // Also create a store without child fetcher to isolate parent-only reads | ||
| nStNoChildren := pgSearch.NewNoSerializedStore[storeType]( | ||
| db.DB, | ||
| schema, | ||
| pkGetter, | ||
| insertIntoProcessIndicatorNoSerializeds, | ||
| nil, | ||
| scanRow, | ||
| scanRows, | ||
| nil, nil, nil, | ||
| targetResource, | ||
| ) |
There was a problem hiding this comment.
The child-fetch breakdown is measuring parent-only reads twice.
New(db.DB) does not wire ChildFetcher, so the later "WithChildFetch" sub-benchmarks are using the same no-child store semantics as nStNoChildren. The reported child-fetch delta will stay at zero until the second store is constructed with NoSerializedStoreOpts{ChildFetcher: FetchChildren}.
Also applies to: 550-560
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@central/processindicator_no_serialized/store/postgres/bench_test.go` around
lines 495 - 507, The benchmark is measuring parent-only reads twice because nSt
(created via New(db.DB)) lacks a ChildFetcher, so the "WithChildFetch"
sub-benchmarks end up using the same no-child semantics as nStNoChildren; fix by
ensuring one store is constructed with a real child fetcher: create the
child-enabled store by passing the ChildFetcher (e.g.,
NoSerializedStoreOpts{ChildFetcher: FetchChildren} or the equivalent option)
when constructing the store used for "WithChildFetch" (symbols to update: New,
nSt, pgSearch.NewNoSerializedStore, nStNoChildren,
NoSerializedStoreOpts{ChildFetcher: FetchChildren}, FetchChildren), and apply
the same change in the other affected block around lines 550-560.
| idx := 0 | ||
| inputRows := pgx.CopyFromFunc(func() ([]any, error) { | ||
| if idx >= len(objs) { | ||
| return nil, nil | ||
| } | ||
| obj := objs[idx] | ||
| idx++ | ||
|
|
||
| return []interface{}{ | ||
| pgutils.NilOrUUID(procesIndicatorNoSerializedID), | ||
| idx, | ||
| obj.GetParentUid(), | ||
| obj.GetParentExecFilePath(), | ||
| }, nil |
There was a problem hiding this comment.
COPY FROM shifts lineage indexes by one.
idx is incremented before it is written into the row, so the first lineage element is stored as idx = 1. That makes the COPY-based path disagree with the batched insert and bulk-unnest paths on row keys and ordering. This needs to be fixed in the generator too.
Possible fix
idx := 0
inputRows := pgx.CopyFromFunc(func() ([]any, error) {
if idx >= len(objs) {
return nil, nil
}
- obj := objs[idx]
- idx++
+ currentIdx := idx
+ obj := objs[idx]
+ idx++
return []interface{}{
pgutils.NilOrUUID(procesIndicatorNoSerializedID),
- idx,
+ currentIdx,
obj.GetParentUid(),
obj.GetParentExecFilePath(),
}, nil
})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@central/processindicator_no_serialized/store/postgres/store.go` around lines
231 - 244, The COPY generator increments idx before writing it to the row,
causing an off-by-one shift (first lineage row written with idx=1). In the
pgx.CopyFromFunc closure (inputRows) change the order so the current index is
used for the row (use idx or a local rowIdx := idx) and only then increment idx
(or increment after returning the row) so COPY produces the same 0-based
ordering as the batched/un-nest paths; update references to
procesIndicatorNoSerializedID, idx, and pgx.CopyFromFunc accordingly.
| schema.SetOptionsMap(search.Walk(v1.SearchCategory(86), "processindicatorjsonb", (*storage.ProcessIndicatorJsonb)(nil))) | ||
| schema.ScopingResource = resources.DeploymentExtension | ||
| RegisterTable(schema, CreateTableProcessIndicatorJsonbsStmt) | ||
| mapping.RegisterCategoryToTable(v1.SearchCategory(86), schema) |
There was a problem hiding this comment.
Don't register an anonymous search category here.
v1.SearchCategory(86) is not a named enum value in the provided proto/api/v1/search_service.proto, so the schema registration and options map are hanging off an undefined category. That is brittle for any code that depends on generated enum names or shared category mappings. Please add a real SearchCategory constant in the proto and use it here instead of a raw cast.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/postgres/schema/process_indicator_jsonbs.go` around lines 32 - 35, The
code is using an anonymous numeric cast v1.SearchCategory(86) in the
SetOptionsMap and RegisterCategoryToTable calls, which lacks a corresponding
named enum value in the proto definitions and makes the code brittle. Add a new
named SearchCategory constant to proto/api/v1/search_service.proto file, then
replace both instances of the raw v1.SearchCategory(86) cast in the
SetOptionsMap and RegisterCategoryToTable function calls within the
process_indicator_jsonbs.go file with the new named constant instead of the
magic number.
| "scanVarType": scanVarType, | ||
| "scanVarName": scanVarName, | ||
| "setterPath": setterPath, | ||
| "joinPath": func(parts []string) string { return strings.Join(parts, ".") }, | ||
| "stripPointer": func(s string) string { return strings.TrimPrefix(s, "*") }, | ||
| "getterToSetter": func(getter string) string { | ||
| // Convert "GetSignal().GetLineageInfo()" to "Signal.LineageInfo" | ||
| parts := strings.Split(getter, ".") | ||
| var result []string | ||
| for _, part := range parts { | ||
| part = strings.TrimPrefix(part, "Get") | ||
| part = strings.TrimSuffix(part, "()") | ||
| result = append(result, part) | ||
| } | ||
| return strings.Join(result, ".") | ||
| }, |
There was a problem hiding this comment.
Fix formatting to pass CI.
The golangci-lint check is failing due to improper formatting around line 333.
Proposed fix - run gofmt
"scanVarType": scanVarType,
"scanVarName": scanVarName,
"setterPath": setterPath,
- "joinPath": func(parts []string) string { return strings.Join(parts, ".") },
- "stripPointer": func(s string) string { return strings.TrimPrefix(s, "*") },
+ "joinPath": func(parts []string) string { return strings.Join(parts, ".") },
+ "stripPointer": func(s string) string { return strings.TrimPrefix(s, "*") },
"getterToSetter": func(getter string) string {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "scanVarType": scanVarType, | |
| "scanVarName": scanVarName, | |
| "setterPath": setterPath, | |
| "joinPath": func(parts []string) string { return strings.Join(parts, ".") }, | |
| "stripPointer": func(s string) string { return strings.TrimPrefix(s, "*") }, | |
| "getterToSetter": func(getter string) string { | |
| // Convert "GetSignal().GetLineageInfo()" to "Signal.LineageInfo" | |
| parts := strings.Split(getter, ".") | |
| var result []string | |
| for _, part := range parts { | |
| part = strings.TrimPrefix(part, "Get") | |
| part = strings.TrimSuffix(part, "()") | |
| result = append(result, part) | |
| } | |
| return strings.Join(result, ".") | |
| }, | |
| "scanVarType": scanVarType, | |
| "scanVarName": scanVarName, | |
| "setterPath": setterPath, | |
| "joinPath": func(parts []string) string { return strings.Join(parts, ".") }, | |
| "stripPointer": func(s string) string { return strings.TrimPrefix(s, "*") }, | |
| "getterToSetter": func(getter string) string { | |
| // Convert "GetSignal().GetLineageInfo()" to "Signal.LineageInfo" | |
| parts := strings.Split(getter, ".") | |
| var result []string | |
| for _, part := range parts { | |
| part = strings.TrimPrefix(part, "Get") | |
| part = strings.TrimSuffix(part, "()") | |
| result = append(result, part) | |
| } | |
| return strings.Join(result, ".") | |
| }, |
🧰 Tools
🪛 GitHub Actions: Style
[error] 333-333: golangci-lint reported file is not properly formatted (gofmt) at tools/generate-helpers/pg-table-bindings/funcs.go:333:1.
🪛 GitHub Check: golangci-lint
[failure] 333-333:
File is not properly formatted (gofmt)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/generate-helpers/pg-table-bindings/funcs.go` around lines 329 - 344,
The file has formatting issues around the template helper block (notably the
getterToSetter anonymous function and nearby helpers like joinPath and
stripPointer) causing golangci-lint/gofmt failures; fix by running gofmt (or go
fmt) on tools/generate-helpers/pg-table-bindings/funcs.go to reformat the block
so the anonymous functions (getterToSetter, joinPath, stripPointer) and their
surrounding commas/indentation conform to gofmt rules, then commit the modified
file.
| &storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension, | ||
| &storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension, |
There was a problem hiding this comment.
Fix formatting to pass CI.
The golangci-lint check is failing due to improper formatting. The extra spaces for alignment don't conform to gofmt standards.
Proposed fix
- &storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension,
- &storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension,
+ &storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension,
+ &storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| &storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension, | |
| &storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension, | |
| &storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension, | |
| &storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension, |
🧰 Tools
🪛 GitHub Check: golangci-lint
[failure] 99-99:
File is not properly formatted (gofmt)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/generate-helpers/pg-table-bindings/list.go` around lines 99 - 100, The
map entries using &storage.ProcessIndicatorJsonb{} and
&storage.ProcessIndicatorNoSerialized{} currently contain extra alignment spaces
that break gofmt/golangci-lint; open the map in list.go and remove the extra
spaces so each key:value pair uses normal Go spacing (e.g.
"&storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension," and
"&storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension,") then
run gofmt/golangci-lint to confirm formatting is fixed.
| func scanRow(row pgx.Row) (*storeType, error) { | ||
| var data []byte | ||
| if err := row.Scan(&data); err != nil { | ||
| return nil, err | ||
| } | ||
| msg := &storeType{} | ||
| if err := protojson.Unmarshal(data, msg); err != nil { | ||
| return nil, err | ||
| } | ||
| return msg, nil | ||
| } | ||
|
|
||
| func scanRows(rows pgx.Rows) (*storeType, error) { | ||
| var data []byte | ||
| if err := rows.Scan(&data); err != nil { | ||
| return nil, err | ||
| } | ||
| msg := &storeType{} | ||
| if err := protojson.Unmarshal(data, msg); err != nil { | ||
| return nil, err | ||
| } | ||
| return msg, nil | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
git ls-files | head -20Repository: stackrox/stackrox
Length of output: 858
🏁 Script executed:
fd -t f "store.go.tpl" 2>/dev/nullRepository: stackrox/stackrox
Length of output: 114
🏁 Script executed:
if [ -f "tools/generate-helpers/pg-table-bindings/store.go.tpl" ]; then
echo "File exists"
wc -l "tools/generate-helpers/pg-table-bindings/store.go.tpl"
fiRepository: stackrox/stackrox
Length of output: 130
🌐 Web query:
protojson.Unmarshal default behavior unknown fields golang protobuf
💡 Result:
The default behavior of protojson.Unmarshal (from google.golang.org/protobuf/encoding/protojson) when encountering unknown fields in the input JSON is to return an error, such as "unknown field %v". Unknown fields are not preserved in the protobuf message's unknown fields set (unlike binary protobuf unmarshaling), and there is no option to preserve them; they can only be rejected (default) or discarded. To discard unknown fields instead of erroring, use UnmarshalOptions{DiscardUnknown: true}: go import "google.golang.org/protobuf/encoding/protojson" err := protojson.UnmarshalOptions{DiscardUnknown: true}.Unmarshal(jsonData, msg) This behavior is confirmed in the official Go package documentation and source code. It differs from standard encoding/json, which silently ignores unknown fields, and from binary proto.Unmarshal, which preserves unknowns by default.
Citations:
- 1: https://go.pact.im/doc/src/google.golang.org/protobuf/encoding/protojson/decode.go.html
- 2: https://pkg.go.dev/google.golang.org/protobuf/encoding/protojson
- 3: Preserve unknown fields when unmarshalling from JSON golang/protobuf#1390
- 4: protojson: fail to unmarshal JSON with reserved fields golang/protobuf#1606
- 5:
ProtoReflect().GetUnknown()Not Working golang/protobuf#1618
🏁 Script executed:
sed -n '381,403p' tools/generate-helpers/pg-table-bindings/store.go.tplRepository: stackrox/stackrox
Length of output: 619
🏁 Script executed:
sed -n '375,410p' tools/generate-helpers/pg-table-bindings/store.go.tplRepository: stackrox/stackrox
Length of output: 921
🏁 Script executed:
rg -A 5 "scanRow\|scanRows" tools/generate-helpers/pg-table-bindings/store.go.tpl | head -50Repository: stackrox/stackrox
Length of output: 43
🏁 Script executed:
grep -n "protojson" tools/generate-helpers/pg-table-bindings/store.go.tplRepository: stackrox/stackrox
Length of output: 359
Use tolerant JSONB unmarshalling on persisted rows.
protojson.Unmarshal rejects unknown fields by default. This means an older binary cannot read JSONB rows written by a newer binary that added fields—a critical rolling-upgrade regression compared to binary protobuf storage, which preserves unknown fields.
Fix
func scanRow(row pgx.Row) (*storeType, error) {
var data []byte
if err := row.Scan(&data); err != nil {
return nil, err
}
msg := &storeType{}
- if err := protojson.Unmarshal(data, msg); err != nil {
+ if err := (protojson.UnmarshalOptions{DiscardUnknown: true}).Unmarshal(data, msg); err != nil {
return nil, err
}
return msg, nil
}
func scanRows(rows pgx.Rows) (*storeType, error) {
var data []byte
if err := rows.Scan(&data); err != nil {
return nil, err
}
msg := &storeType{}
- if err := protojson.Unmarshal(data, msg); err != nil {
+ if err := (protojson.UnmarshalOptions{DiscardUnknown: true}).Unmarshal(data, msg); err != nil {
return nil, err
}
return msg, nil
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/generate-helpers/pg-table-bindings/store.go.tpl` around lines 381 -
403, The JSONB unmarshalling currently uses protojson.Unmarshal which rejects
unknown fields; update both scanRow and scanRows to use a tolerant unmarshal by
calling protojson.UnmarshalOptions{DiscardUnknown:true}.Unmarshal(data, msg)
(replace protojson.Unmarshal(data, msg)), referencing the functions scanRow and
scanRows and the storeType message so the JSONB rows tolerate unknown future
fields during rolling upgrades.
| for _, obj := range objs { | ||
| {{- range $field := $fields }} | ||
| {{- if canUnnest $field }} | ||
| arr_{{ scanVarName $field }} = append(arr_{{ scanVarName $field }}, {{ unnestAppendExpr $field }}) | ||
| {{- end }} | ||
| {{- end }} | ||
| {{- range $child := $schema.Children }} | ||
| for childIdx, child := range obj.{{ getterToSetter $child.ObjectGetter }} { | ||
| _ = child | ||
| {{- range $field := $child.DBColumnFields }} | ||
| {{- if eq $field.ColumnName "idx" }} | ||
| arr_child_{{ scanVarName $field }} = append(arr_child_{{ scanVarName $field }}, childIdx) | ||
| {{- else if $field.ObjectGetter.IsVariable }} | ||
| arr_child_{{ scanVarName $field }} = append(arr_child_{{ scanVarName $field }}, {{ $singlePK.Getter "obj" }}) | ||
| {{- else }} | ||
| arr_child_{{ scanVarName $field }} = append(arr_child_{{ scanVarName $field }}, child.{{ joinPath (setterPath $field) }}) | ||
| {{- end }} | ||
| {{- end }} | ||
| } | ||
| {{- end }} |
There was a problem hiding this comment.
Keep the bulk child path nil-safe.
The unnest code dereferences both the parent submessage and each child pointer directly. A batch containing a nil nested message or a nil child entry will panic here, while the per-row insert path still succeeds because it goes through getters.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/generate-helpers/pg-table-bindings/store.go.tpl` around lines 425 -
444, The bulk unnest loop can panic when a parent nested message or an
individual child is nil because it dereferences obj.{{ getterToSetter
$child.ObjectGetter }} and child.* directly; update the loop that iterates over
objs and for childIdx, child := range obj.{{ getterToSetter $child.ObjectGetter
}} to first check the parent submessage for nil (skip or append nil-equivalents
for all child arrays when parent is nil) and for each child check if child ==
nil before accessing child.{{ joinPath (setterPath $field) }}; when child is
nil, append the appropriate zero/NULL placeholder (or use the $singlePK.Getter
"obj" for parent references) instead of dereferencing so arr_child_{{
scanVarName $field }} remains safe for batch inserts.
| func FetchChildren(ctx context.Context, db postgres.DB, objs []*storeType) error { | ||
| if len(objs) == 0 { | ||
| return nil | ||
| } | ||
| objsByID := make(map[string]*storeType, len(objs)) | ||
| ids := make([]string, 0, len(objs)) | ||
| for _, obj := range objs { | ||
| id := pkGetter(obj) | ||
| objsByID[id] = obj | ||
| ids = append(ids, id) | ||
| } | ||
| {{- range $child := .Schema.Children }} | ||
| {{- $parentPKs := $.Schema.PrimaryKeys }} | ||
| { | ||
| q := "SELECT {{ range $index, $field := $child.DBColumnFields }}{{if $index}}, {{end}}{{$field.ColumnName}}{{end}} FROM {{$child.Table}} WHERE {{range $index, $field := $child.FieldsReferringToParent}}{{if $index}} AND {{end}}{{$field.ColumnName}} = ANY(${{add $index 1}}::uuid[]){{end}} ORDER BY {{range $index, $field := $child.FieldsReferringToParent}}{{if $index}}, {{end}}{{$field.ColumnName}}{{end}}, idx" | ||
| rows, err := db.Query(ctx, q, ids) |
There was a problem hiding this comment.
FetchChildren only works for a single UUID parent key.
This helper hard-codes map[string]*storeType, []string, ::uuid[], and one query argument even though the WHERE clause is generated from FieldsReferringToParent. Composite parents or non-UUID parent IDs will generate invalid SQL here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/generate-helpers/pg-table-bindings/store.go.tpl` around lines 524 -
539, FetchChildren assumes a single UUID parent key and builds objsByID as
map[string]*storeType, ids []string, and the SQL uses ANY($1::uuid[]) with only
one query argument; to fix, make FetchChildren handle arbitrary primary key
types and composite keys by (1) changing the in-memory key map to use the actual
primary key tuple type returned by pkGetter (or serialize composite keys
consistently), (2) building a parameter slice matching the number of parent key
columns in FieldsReferringToParent and passing one argument per column to
db.Query, (3) removing the hard-coded ::uuid[] cast and instead use appropriate
type casts or parameter binding based on each parent PK column type, and (4)
update the SQL q construction and the call to db.Query (the q variable and rows,
err := db.Query(ctx, q, ...)) to supply the correct number and form of arguments
so composite and non-UUID parents work correctly.
feat: add --no-serialized store generator option
Add a --no-serialized flag to the pg-table-bindings generator that
produces stores without the serialized bytea column. All proto fields
become individual DB columns, and reads reconstruct the proto from
column values instead of unmarshaling a blob.
Key design decisions:
statement for all rows) with per-row UPDATE fallback for array columns
that can't be unnested (e.g., text[]). Child table rows are also
bulk-inserted via unnest.
proto directly. Child table fetching is opt-in via the exported
FetchChildren function — callers must explicitly request it since
analysis showed no existing ProcessIndicator caller reads child data.
instead of the serialized blob when NoSerialized is set.
store lifecycle without requiring the ClonedUnmarshaler constraint.
Includes a copy of ProcessIndicator as ProcessIndicatorNoSerialized
for integration testing and benchmarking. Benchmark results at 1000
objects on Apple M3 Pro:
Writes (UpsertMany 1000):
Serialized: 11.7 ms
NoSerialized: 30.9 ms (2.6x, driven by child table inserts)
Reads (GetMany 100):
Serialized: 181 µs
NoSerialized: 260 µs (1.4x)
Reads (Get single):
Serialized: 48 µs
NoSerialized: 62 µs (1.3x)
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
feat: add --jsonb store generator option
Add a --jsonb flag to the pg-table-bindings generator that stores the
serialized column as jsonb instead of bytea. The data is marshaled with
protojson.Marshal and unmarshaled with protojson.Unmarshal, making it
human-readable and SQL-queryable while preserving the single-column
read/write pattern of the existing stores.
Key changes:
for jsonb that scan []byte and call protojson.Unmarshal
Includes ProcessIndicatorJsonb example with integration tests and
benchmarks comparing bytea vs jsonb vs no-serialized performance.
Partially generated by AI.
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
perf: scan directly into proto fields in NoSerialized store reads
Eliminate the intermediate buildFromScan function and scan directly into
proto struct fields where possible. Fields that need type conversion
(datetime→timestamppb, enum→int32, uuid strings) still use temp vars,
but all other fields (strings, uints, bools, slices) scan straight into
&obj.Field, skipping the intermediate copy.
For ProcessIndicatorNoSerialized (20 columns), this reduces temp vars
from 20 to 6. Benchmarks show:
Partially generated by AI.
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
docs: add design doc and benchmark analysis for --jsonb store option
Comprehensive comparison of bytea vs jsonb vs no-serialized store
strategies covering Go-side performance (writes, reads, allocations),
Postgres-side performance (storage, buffer hits, EXPLAIN ANALYZE),
migration tradeoffs, and the direct-scan optimization.
Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com
Description
change me!
User-facing documentation
Testing and quality
Automated testing
How I validated my change
change me!