Skip to content

research: add --no-serialized store generator option#19669

Draft
dashrews78 wants to merge 4 commits intomasterfrom
dashrews/experiment-non-serialized
Draft

research: add --no-serialized store generator option#19669
dashrews78 wants to merge 4 commits intomasterfrom
dashrews/experiment-non-serialized

Conversation

@dashrews78
Copy link
Copy Markdown
Contributor

feat: add --no-serialized store generator option

Add a --no-serialized flag to the pg-table-bindings generator that
produces stores without the serialized bytea column. All proto fields
become individual DB columns, and reads reconstruct the proto from
column values instead of unmarshaling a blob.

Key design decisions:

  • Writes use unnest()-based bulk INSERT for parent tables (single SQL
    statement for all rows) with per-row UPDATE fallback for array columns
    that can't be unnested (e.g., text[]). Child table rows are also
    bulk-inserted via unnest.
  • Reads scan individual columns into Go variables and construct the
    proto directly. Child table fetching is opt-in via the exported
    FetchChildren function — callers must explicitly request it since
    analysis showed no existing ProcessIndicator caller reads child data.
  • The search framework's GET query path selects individual columns
    instead of the serialized blob when NoSerialized is set.
  • A new NoSerializedStore type in pkg/search/postgres handles the
    store lifecycle without requiring the ClonedUnmarshaler constraint.

Includes a copy of ProcessIndicator as ProcessIndicatorNoSerialized
for integration testing and benchmarking. Benchmark results at 1000
objects on Apple M3 Pro:

Writes (UpsertMany 1000):
Serialized: 11.7 ms
NoSerialized: 30.9 ms (2.6x, driven by child table inserts)

Reads (GetMany 100):
Serialized: 181 µs
NoSerialized: 260 µs (1.4x)

Reads (Get single):
Serialized: 48 µs
NoSerialized: 62 µs (1.3x)

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

feat: add --jsonb store generator option

Add a --jsonb flag to the pg-table-bindings generator that stores the
serialized column as jsonb instead of bytea. The data is marshaled with
protojson.Marshal and unmarshaled with protojson.Unmarshal, making it
human-readable and SQL-queryable while preserving the single-column
read/write pattern of the existing stores.

Key changes:

  • Walker: Jsonb bool on Schema, getSerializedField returns jsonb type
  • Generator: --jsonb CLI flag, WithJsonb() walk option
  • Templates: conditional protojson marshal/unmarshal, scanRow/scanRows
    for jsonb that scan []byte and call protojson.Unmarshal
  • Reuses NoSerializedStore infrastructure (no new store type needed)

Includes ProcessIndicatorJsonb example with integration tests and
benchmarks comparing bytea vs jsonb vs no-serialized performance.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

perf: scan directly into proto fields in NoSerialized store reads

Eliminate the intermediate buildFromScan function and scan directly into
proto struct fields where possible. Fields that need type conversion
(datetime→timestamppb, enum→int32, uuid strings) still use temp vars,
but all other fields (strings, uints, bools, slices) scan straight into
&obj.Field, skipping the intermediate copy.

For ProcessIndicatorNoSerialized (20 columns), this reduces temp vars
from 20 to 6. Benchmarks show:

  • GetMany/500: 31% fewer allocs (17,654→12,154), 9% faster
  • GetMany/100: 30% fewer allocs (3,647→2,547), 4% faster
  • GetSingle: 7% fewer allocs (166→155)

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

docs: add design doc and benchmark analysis for --jsonb store option

Comprehensive comparison of bytea vs jsonb vs no-serialized store
strategies covering Go-side performance (writes, reads, allocations),
Postgres-side performance (storage, buffer hits, EXPLAIN ANALYZE),
migration tradeoffs, and the direct-scan optimization.

Co-Authored-By: Claude Opus 4.6 (1M context) noreply@anthropic.com

Description

change me!

User-facing documentation

Testing and quality

  • the change is production ready: the change is GA, or otherwise the functionality is gated by a feature flag
  • CI results are inspected

Automated testing

  • added unit tests
  • added e2e tests
  • added regression tests
  • added compatibility tests
  • modified existing tests

How I validated my change

change me!

dashrews78 and others added 4 commits March 27, 2026 10:55
Add a --no-serialized flag to the pg-table-bindings generator that
produces stores without the serialized bytea column. All proto fields
become individual DB columns, and reads reconstruct the proto from
column values instead of unmarshaling a blob.

Key design decisions:
- Writes use unnest()-based bulk INSERT for parent tables (single SQL
  statement for all rows) with per-row UPDATE fallback for array columns
  that can't be unnested (e.g., text[]). Child table rows are also
  bulk-inserted via unnest.
- Reads scan individual columns into Go variables and construct the
  proto directly. Child table fetching is opt-in via the exported
  FetchChildren function — callers must explicitly request it since
  analysis showed no existing ProcessIndicator caller reads child data.
- The search framework's GET query path selects individual columns
  instead of the serialized blob when NoSerialized is set.
- A new NoSerializedStore type in pkg/search/postgres handles the
  store lifecycle without requiring the ClonedUnmarshaler constraint.

Includes a copy of ProcessIndicator as ProcessIndicatorNoSerialized
for integration testing and benchmarking. Benchmark results at 1000
objects on Apple M3 Pro:

  Writes (UpsertMany 1000):
    Serialized:    11.7 ms
    NoSerialized:  30.9 ms (2.6x, driven by child table inserts)

  Reads (GetMany 100):
    Serialized:    181 µs
    NoSerialized:  260 µs (1.4x)

  Reads (Get single):
    Serialized:    48 µs
    NoSerialized:  62 µs (1.3x)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a --jsonb flag to the pg-table-bindings generator that stores the
serialized column as jsonb instead of bytea. The data is marshaled with
protojson.Marshal and unmarshaled with protojson.Unmarshal, making it
human-readable and SQL-queryable while preserving the single-column
read/write pattern of the existing stores.

Key changes:
- Walker: Jsonb bool on Schema, getSerializedField returns jsonb type
- Generator: --jsonb CLI flag, WithJsonb() walk option
- Templates: conditional protojson marshal/unmarshal, scanRow/scanRows
  for jsonb that scan []byte and call protojson.Unmarshal
- Reuses NoSerializedStore infrastructure (no new store type needed)

Includes ProcessIndicatorJsonb example with integration tests and
benchmarks comparing bytea vs jsonb vs no-serialized performance.

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Eliminate the intermediate buildFromScan function and scan directly into
proto struct fields where possible. Fields that need type conversion
(datetime→timestamppb, enum→int32, uuid strings) still use temp vars,
but all other fields (strings, uints, bools, slices) scan straight into
&obj.Field, skipping the intermediate copy.

For ProcessIndicatorNoSerialized (20 columns), this reduces temp vars
from 20 to 6. Benchmarks show:
- GetMany/500: 31% fewer allocs (17,654→12,154), 9% faster
- GetMany/100: 30% fewer allocs (3,647→2,547), 4% faster
- GetSingle: 7% fewer allocs (166→155)

Partially generated by AI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comprehensive comparison of bytea vs jsonb vs no-serialized store
strategies covering Go-side performance (writes, reads, allocations),
Postgres-side performance (storage, buffer hits, EXPLAIN ANALYZE),
migration tradeoffs, and the direct-scan optimization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dashrews78
Copy link
Copy Markdown
Contributor Author

This change is part of the following stack:

Change managed by git-spice.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 27, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@rhacs-bot
Copy link
Copy Markdown
Contributor

Images are ready for the commit at 25e3e07.

To use with deploy scripts, first export MAIN_IMAGE_TAG=4.11.x-464-g25e3e0720a.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

📝 Walkthrough

Summary by CodeRabbit

Release Notes

  • New Features

    • Added JSONB and No-Serialized storage formats for ProcessIndicator data.
    • New CLI flags to select storage format during table generation.
    • Added search fields for new storage variants.
  • Tests

    • Added benchmarks comparing CRUD performance across storage formats.
    • Added integration tests for storage functionality and access control.
  • Documentation

    • Added design documentation for JSONB storage implementation.

Walkthrough

This pull request introduces two new storage variants for ProcessIndicator: JSONB-backed storage and no-serialized (column-based) storage. It adds protocol buffer definitions, schema generation infrastructure enhancements, store implementations with custom scanning logic, comprehensive tests, and benchmarks. The changes include updates to the walker and schema modules to support configurable serialization modes, new generic store implementations via NoSerializedStore, and code generation templates that conditionally wire the appropriate storage backends.

Changes

Cohort / File(s) Summary
Design & Protobuf Definitions
central/processindicator_jsonb/DESIGN.md, proto/storage/process_indicator_jsonb.proto, proto/storage/process_indicator_no_serialized.proto
Added design documentation and protocol buffer message definitions for JSONB-backed and no-serialized ProcessIndicator variants, including signal metadata and lineage information structures.
Schema & Walker Infrastructure
pkg/postgres/walker/schema.go, pkg/postgres/walker/walker.go, pkg/postgres/schema/process_indicator_jsonbs.go, pkg/postgres/schema/process_indicator_no_serializeds.go
Extended walker/schema infrastructure with NoSerialized and Jsonb configuration fields, added WalkOption functional pattern for conditional schema building, updated serialized column type selection and Field.Include() logic, and generated schema files for both new storage variants.
JSONB Store Implementation
central/processindicator_jsonb/store/postgres/gen.go, central/processindicator_jsonb/store/postgres/store.go, central/processindicator_jsonb/store/postgres/store_test.go
Implemented JSONB-backed store using NoSerializedStore with protojson.Marshal/Unmarshal for serialization, custom row scanning, and integration tests validating CRUD, access control, and bulk operations.
No-Serialized Store Implementation
central/processindicator_no_serialized/store/postgres/gen.go, central/processindicator_no_serialized/store/postgres/store.go, central/processindicator_no_serialized/store/postgres/store_test.go
Implemented column-based no-serialized store with parent/child table support, unnest-based bulk upserts, explicit lineage fetching, and integration tests covering CRUD, access control, and bulk operations.
Search & Query Infrastructure
pkg/search/postgres/common.go, pkg/search/postgres/no_serialized_store.go, pkg/search/options.go
Added generic NoSerializedStore[T] interface with comprehensive read/write/delete operations, RowScanner/RowsScanner types for custom unmarshaling, updated GROUP BY logic for no-serialized queries, and new field label constants for both storage variants.
Code Generation Templates & Tools
tools/generate-helpers/pg-table-bindings/main.go, tools/generate-helpers/pg-table-bindings/schema.go.tpl, tools/generate-helpers/pg-table-bindings/store.go.tpl, tools/generate-helpers/pg-table-bindings/funcs.go, tools/generate-helpers/pg-table-bindings/list.go
Added --no-serialized and --jsonb CLI flags, enhanced code generation templates to conditionally wire walker options, added helper functions for type-aware scanning and unnest mapping, and registered new storage types in the resource registry.
Benchmarks
central/processindicator_jsonb/store/postgres/bench_test.go, central/processindicator_no_serialized/store/postgres/bench_test.go, central/processindicator_no_serialized/store/postgres/pg_internals_test.go
Added comprehensive integration benchmarks comparing CRUD performance across bytea/jsonb/no-serialized storage modes, and PostgreSQL internals inspection test measuring table size, row size, and query plan metrics.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 56.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive PR description provides detailed context for all three features and includes commit messages, benchmarks, and design decisions. However, the standard template's required sections (CHANGELOG, documentation PR, testing validation, CI inspection) are not filled in—they remain as unchecked checkboxes with placeholder text. Complete the template sections: update CHANGELOG.md, confirm documentation/feature flag status, document CI inspection, and provide explicit validation details for testing activities performed.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed Title covers one of the main features (--no-serialized) but omits two other significant features (--jsonb flag and performance optimization), presenting a partially but not fully representative overview of the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dashrews/experiment-non-serialized

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (1)
tools/generate-helpers/pg-table-bindings/main.go (1)

181-188: Consider adding mutual exclusion validation for --no-serialized and --jsonb.

Both flags can currently be set simultaneously, but they represent mutually exclusive storage strategies. --jsonb uses a jsonb serialized column while --no-serialized eliminates the serialized column entirely.

Proposed fix
 		var walkOpts []walker.WalkOption
+		if props.NoSerialized && props.Jsonb {
+			log.Fatal("--no-serialized and --jsonb are mutually exclusive")
+		}
 		if props.NoSerialized {
 			walkOpts = append(walkOpts, walker.WithNoSerialized())
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/generate-helpers/pg-table-bindings/main.go` around lines 181 - 188, The
flags props.NoSerialized and props.Jsonb are mutually exclusive but currently
both can be set; validate and error early before building walkOpts: in main
(where walkOpts is constructed and walker.Walk is called) add a check that if
props.NoSerialized && props.Jsonb then return an error/exit with a clear
message; reference the variables props.NoSerialized, props.Jsonb and the walk
option helpers walker.WithNoSerialized / walker.WithJsonb so the validation
occurs before creating walkOpts and calling walker.Walk.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@central/processindicator_jsonb/store/postgres/bench_test.go`:
- Around line 237-247: The NoSerialized benchmark currently uses
noSerializedStore.New(db.DB) which creates a store without a ChildFetcher so
nSt.Get and GetMany only return parent records; change the benchmark to either
instantiate the no-serialized store with the same ChildFetcher used in the other
benchmarks (i.e., construct the store with the child fetcher and use that
instance in the NoSerialized sub-bench) or add a new sub-benchmark (e.g.,
"NoSerialized+Children") that uses noSerializedStore.New(db.DB, ChildFetcher:
<existing child fetcher>) and performs the same Get/GetMany loops (using
noSerializedIDs and seedCount) so the LineageInfo is populated and the
comparison to Serialized/Jsonb is apples-to-apples.

In `@central/processindicator_jsonb/store/postgres/pg_internals_test.go`:
- Around line 65-98: The test is only measuring the parent table
"process_indicator_no_serializeds" and omits its child table
"process_indicator_no_serializeds_lineage_infos", underreporting sizes/counts;
update the queries in pg_internals_test.go (the loop that builds sizeResult and
the separate block for lineage) so that when tbl ==
"process_indicator_no_serializeds" you aggregate both tables (e.g., use
pg_total_relation_size on each table and sum, or use a single query that SELECTs
pg_total_relation_size('parent')+pg_total_relation_size('parent_lineage_infos')
AS total and similarly sum pg_table_size and pg_indexes_size and COUNT(*) from
both tables) and populate r.totalSize/r.tableSize/r.indexSize/r.rowCount
accordingly, or alternatively change the logged label to explicitly state
"parent-table-only" where you do not aggregate; apply the same change to the
later block that currently queries
process_indicator_no_serializeds_lineage_infos so the reported footprint
represents the complete no-serialized footprint.

In `@central/processindicator_no_serialized/store/postgres/bench_test.go`:
- Around line 495-507: The benchmark is measuring parent-only reads twice
because nSt (created via New(db.DB)) lacks a ChildFetcher, so the
"WithChildFetch" sub-benchmarks end up using the same no-child semantics as
nStNoChildren; fix by ensuring one store is constructed with a real child
fetcher: create the child-enabled store by passing the ChildFetcher (e.g.,
NoSerializedStoreOpts{ChildFetcher: FetchChildren} or the equivalent option)
when constructing the store used for "WithChildFetch" (symbols to update: New,
nSt, pgSearch.NewNoSerializedStore, nStNoChildren,
NoSerializedStoreOpts{ChildFetcher: FetchChildren}, FetchChildren), and apply
the same change in the other affected block around lines 550-560.

In `@central/processindicator_no_serialized/store/postgres/store.go`:
- Around line 231-244: The COPY generator increments idx before writing it to
the row, causing an off-by-one shift (first lineage row written with idx=1). In
the pgx.CopyFromFunc closure (inputRows) change the order so the current index
is used for the row (use idx or a local rowIdx := idx) and only then increment
idx (or increment after returning the row) so COPY produces the same 0-based
ordering as the batched/un-nest paths; update references to
procesIndicatorNoSerializedID, idx, and pgx.CopyFromFunc accordingly.

In `@pkg/postgres/schema/process_indicator_jsonbs.go`:
- Around line 32-35: The code is using an anonymous numeric cast
v1.SearchCategory(86) in the SetOptionsMap and RegisterCategoryToTable calls,
which lacks a corresponding named enum value in the proto definitions and makes
the code brittle. Add a new named SearchCategory constant to
proto/api/v1/search_service.proto file, then replace both instances of the raw
v1.SearchCategory(86) cast in the SetOptionsMap and RegisterCategoryToTable
function calls within the process_indicator_jsonbs.go file with the new named
constant instead of the magic number.

In `@pkg/postgres/schema/process_indicator_no_serializeds.go`:
- Around line 37-41: The code registers ProcessIndicatorNoSerialized schema
under an unnamed enum value v1.SearchCategory(85); update the registration to
use a real declared enum constant (e.g., replace v1.SearchCategory(85) with the
appropriate generated enum like v1.SearchCategory_<NAME>) or, if this is only
for tests, remove mapping.RegisterCategoryToTable and the SetOptionsMap call for
production code and keep the schema confined to test-only wiring; look for
symbols ProcessIndicatorNoSerialized, v1.SearchCategory(85),
schema.SetOptionsMap, and mapping.RegisterCategoryToTable to make the change and
ensure the chosen enum exists in generated/api/v1/search_service.pb.go before
committing.

In `@pkg/postgres/walker/walker.go`:
- Around line 422-429: The InlinedSubMessage being appended only stores
FieldName as structField.Name, which loses the full nested access path and
breaks initializer generation; change the appended data so FieldName (or a new
Path field) contains the full assignment path using ctx.Getter(structField.Name)
(or combine GetterPrefix and FieldName) instead of just structField.Name, so
schema.Root().InlinedSubMessages entries provide the complete nested access
expression for codegen (refer to InlinedSubMessage, FieldName, GetterPrefix,
ctx.Getter, and structField.Name).

In `@pkg/search/postgres/no_serialized_store.go`:
- Around line 44-45: fetchChildren currently invokes the fetcher with s.db which
causes child queries to bypass any transaction stored in ctx; update
fetchChildren (and the similar calls around the 159-164 region) to extract the
transactional DB from ctx (the same DB used by Get/GetMany/GetByQuery) and pass
that DB into the ChildFetcher[T] instead of s.db so child queries run inside the
active transaction; locate usages of fetchChildren and the call site where
ChildFetcher is invoked and replace s.db with the DB obtained from the ctx
transaction helper (or fall back to s.db only if no tx exists).

In `@tools/generate-helpers/pg-table-bindings/funcs.go`:
- Around line 329-344: The file has formatting issues around the template helper
block (notably the getterToSetter anonymous function and nearby helpers like
joinPath and stripPointer) causing golangci-lint/gofmt failures; fix by running
gofmt (or go fmt) on tools/generate-helpers/pg-table-bindings/funcs.go to
reformat the block so the anonymous functions (getterToSetter, joinPath,
stripPointer) and their surrounding commas/indentation conform to gofmt rules,
then commit the modified file.

In `@tools/generate-helpers/pg-table-bindings/list.go`:
- Around line 99-100: The map entries using &storage.ProcessIndicatorJsonb{} and
&storage.ProcessIndicatorNoSerialized{} currently contain extra alignment spaces
that break gofmt/golangci-lint; open the map in list.go and remove the extra
spaces so each key:value pair uses normal Go spacing (e.g.
"&storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension," and
"&storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension,") then
run gofmt/golangci-lint to confirm formatting is fixed.

In `@tools/generate-helpers/pg-table-bindings/store.go.tpl`:
- Around line 524-539: FetchChildren assumes a single UUID parent key and builds
objsByID as map[string]*storeType, ids []string, and the SQL uses
ANY($1::uuid[]) with only one query argument; to fix, make FetchChildren handle
arbitrary primary key types and composite keys by (1) changing the in-memory key
map to use the actual primary key tuple type returned by pkGetter (or serialize
composite keys consistently), (2) building a parameter slice matching the number
of parent key columns in FieldsReferringToParent and passing one argument per
column to db.Query, (3) removing the hard-coded ::uuid[] cast and instead use
appropriate type casts or parameter binding based on each parent PK column type,
and (4) update the SQL q construction and the call to db.Query (the q variable
and rows, err := db.Query(ctx, q, ...)) to supply the correct number and form of
arguments so composite and non-UUID parents work correctly.
- Around line 425-444: The bulk unnest loop can panic when a parent nested
message or an individual child is nil because it dereferences obj.{{
getterToSetter $child.ObjectGetter }} and child.* directly; update the loop that
iterates over objs and for childIdx, child := range obj.{{ getterToSetter
$child.ObjectGetter }} to first check the parent submessage for nil (skip or
append nil-equivalents for all child arrays when parent is nil) and for each
child check if child == nil before accessing child.{{ joinPath (setterPath
$field) }}; when child is nil, append the appropriate zero/NULL placeholder (or
use the $singlePK.Getter "obj" for parent references) instead of dereferencing
so arr_child_{{ scanVarName $field }} remains safe for batch inserts.
- Around line 381-403: The JSONB unmarshalling currently uses
protojson.Unmarshal which rejects unknown fields; update both scanRow and
scanRows to use a tolerant unmarshal by calling
protojson.UnmarshalOptions{DiscardUnknown:true}.Unmarshal(data, msg) (replace
protojson.Unmarshal(data, msg)), referencing the functions scanRow and scanRows
and the storeType message so the JSONB rows tolerate unknown future fields
during rolling upgrades.

---

Nitpick comments:
In `@tools/generate-helpers/pg-table-bindings/main.go`:
- Around line 181-188: The flags props.NoSerialized and props.Jsonb are mutually
exclusive but currently both can be set; validate and error early before
building walkOpts: in main (where walkOpts is constructed and walker.Walk is
called) add a check that if props.NoSerialized && props.Jsonb then return an
error/exit with a clear message; reference the variables props.NoSerialized,
props.Jsonb and the walk option helpers walker.WithNoSerialized /
walker.WithJsonb so the validation occurs before creating walkOpts and calling
walker.Walk.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Central YAML (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 65ada43b-debc-4691-b942-b11f5de4471b

📥 Commits

Reviewing files that changed from the base of the PR and between 0737966 and 25e3e07.

⛔ Files ignored due to path filters (4)
  • generated/storage/process_indicator_jsonb.pb.go is excluded by !**/*.pb.go, !**/generated/**
  • generated/storage/process_indicator_jsonb_vtproto.pb.go is excluded by !**/*.pb.go, !**/generated/**
  • generated/storage/process_indicator_no_serialized.pb.go is excluded by !**/*.pb.go, !**/generated/**
  • generated/storage/process_indicator_no_serialized_vtproto.pb.go is excluded by !**/*.pb.go, !**/generated/**
📒 Files selected for processing (24)
  • central/processindicator_jsonb/DESIGN.md
  • central/processindicator_jsonb/store/postgres/bench_test.go
  • central/processindicator_jsonb/store/postgres/gen.go
  • central/processindicator_jsonb/store/postgres/pg_internals_test.go
  • central/processindicator_jsonb/store/postgres/store.go
  • central/processindicator_jsonb/store/postgres/store_test.go
  • central/processindicator_no_serialized/store/postgres/bench_test.go
  • central/processindicator_no_serialized/store/postgres/gen.go
  • central/processindicator_no_serialized/store/postgres/store.go
  • central/processindicator_no_serialized/store/postgres/store_test.go
  • pkg/postgres/schema/process_indicator_jsonbs.go
  • pkg/postgres/schema/process_indicator_no_serializeds.go
  • pkg/postgres/walker/schema.go
  • pkg/postgres/walker/walker.go
  • pkg/search/options.go
  • pkg/search/postgres/common.go
  • pkg/search/postgres/no_serialized_store.go
  • proto/storage/process_indicator_jsonb.proto
  • proto/storage/process_indicator_no_serialized.proto
  • tools/generate-helpers/pg-table-bindings/funcs.go
  • tools/generate-helpers/pg-table-bindings/list.go
  • tools/generate-helpers/pg-table-bindings/main.go
  • tools/generate-helpers/pg-table-bindings/schema.go.tpl
  • tools/generate-helpers/pg-table-bindings/store.go.tpl

Comment on lines +237 to +247
b.Run("NoSerialized", func(b *testing.B) {
i := 0
for b.Loop() {
id := noSerializedIDs[i%seedCount]
obj, exists, err := nSt.Get(ctx, id)
require.NoError(b, err)
require.True(b, exists)
require.NotNil(b, obj)
i++
}
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Generated no-serialized store:"
sed -n '1,260p' central/processindicator_no_serialized/store/postgres/store.go

echo
echo "Shared no-serialized store implementation:"
sed -n '1,320p' pkg/search/postgres/no_serialized_store.go

echo
echo "FetchChildren and read-path call sites:"
rg -n -C3 'FetchChildren|GetMany\(|Get\(' central/processindicator_no_serialized/store/postgres pkg/search/postgres

Repository: stackrox/stackrox

Length of output: 50373


🏁 Script executed:

cat -n central/processindicator_jsonb/store/postgres/bench_test.go | sed -n '213,315p'

Repository: stackrox/stackrox

Length of output: 3446


Fix the NoSerialized benchmarks to include child fetches for fair comparison.

The benchmark at lines 237–247 and 305–312 uses noSerializedStore.New(db.DB), which creates a store without a ChildFetcher. This means Get() and GetMany() return parent records only, without populating LineageInfo from the child table. In contrast, the Serialized and Jsonb benchmarks reconstruct the full proto including all child fields. The comparison is not apples-to-apples.

Either use a store instance configured with ChildFetcher (as available in the no-serialized test setup), or clearly document that this benchmark measures parent-only reads and add a separate sub-benchmark for the full-fetch case for completeness.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@central/processindicator_jsonb/store/postgres/bench_test.go` around lines 237
- 247, The NoSerialized benchmark currently uses noSerializedStore.New(db.DB)
which creates a store without a ChildFetcher so nSt.Get and GetMany only return
parent records; change the benchmark to either instantiate the no-serialized
store with the same ChildFetcher used in the other benchmarks (i.e., construct
the store with the child fetcher and use that instance in the NoSerialized
sub-bench) or add a new sub-benchmark (e.g., "NoSerialized+Children") that uses
noSerializedStore.New(db.DB, ChildFetcher: <existing child fetcher>) and
performs the same Get/GetMany loops (using noSerializedIDs and seedCount) so the
LineageInfo is populated and the comparison to Serialized/Jsonb is
apples-to-apples.

Comment on lines +65 to +98
for _, tbl := range []string{
"process_indicators",
"process_indicator_jsonbs",
"process_indicator_no_serializeds",
} {
var r sizeResult
r.table = tbl
err := db.DB.QueryRow(ctx, fmt.Sprintf(`
SELECT
pg_size_pretty(pg_total_relation_size('%s')) AS total,
pg_size_pretty(pg_table_size('%s')) AS tbl,
pg_size_pretty(pg_indexes_size('%s')) AS idx,
(SELECT count(*) FROM %s)::int AS cnt
`, tbl, tbl, tbl, tbl)).Scan(&r.totalSize, &r.tableSize, &r.indexSize, &r.rowCount)
require.NoError(t, err)
t.Logf(" %-40s rows=%-5d total=%-10s table=%-10s indexes=%s",
r.table, r.rowCount, r.totalSize, r.tableSize, r.indexSize)
}

// Also show child table for no-serialized
{
var total, tbl, idx string
var cnt int
err := db.DB.QueryRow(ctx, `
SELECT
pg_size_pretty(pg_total_relation_size('process_indicator_no_serializeds_lineage_infos')),
pg_size_pretty(pg_table_size('process_indicator_no_serializeds_lineage_infos')),
pg_size_pretty(pg_indexes_size('process_indicator_no_serializeds_lineage_infos')),
(SELECT count(*) FROM process_indicator_no_serializeds_lineage_infos)::int
`).Scan(&total, &tbl, &idx, &cnt)
require.NoError(t, err)
t.Logf(" %-40s rows=%-5d total=%-10s table=%-10s indexes=%s",
" └─ _lineage_infos (child)", cnt, total, tbl, idx)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Compare the complete no-serialized footprint, not just the parent table.

These sections use only process_indicator_no_serializeds for the no-serialized side, while the seeded objects also write child rows to process_indicator_no_serializeds_lineage_infos. That makes the no-serialized storage and EXPLAIN numbers systematically lower than the bytea/jsonb full-object numbers, and those figures appear to feed the tables in central/processindicator_jsonb/DESIGN.md. Either aggregate the child table into the no-serialized results or label the output as parent-table-only.

Also applies to: 149-239

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@central/processindicator_jsonb/store/postgres/pg_internals_test.go` around
lines 65 - 98, The test is only measuring the parent table
"process_indicator_no_serializeds" and omits its child table
"process_indicator_no_serializeds_lineage_infos", underreporting sizes/counts;
update the queries in pg_internals_test.go (the loop that builds sizeResult and
the separate block for lineage) so that when tbl ==
"process_indicator_no_serializeds" you aggregate both tables (e.g., use
pg_total_relation_size on each table and sum, or use a single query that SELECTs
pg_total_relation_size('parent')+pg_total_relation_size('parent_lineage_infos')
AS total and similarly sum pg_table_size and pg_indexes_size and COUNT(*) from
both tables) and populate r.totalSize/r.tableSize/r.indexSize/r.rowCount
accordingly, or alternatively change the logged label to explicitly state
"parent-table-only" where you do not aggregate; apply the same change to the
later block that currently queries
process_indicator_no_serializeds_lineage_infos so the reported footprint
represents the complete no-serialized footprint.

Comment on lines +495 to +507
nSt := New(db.DB)
// Also create a store without child fetcher to isolate parent-only reads
nStNoChildren := pgSearch.NewNoSerializedStore[storeType](
db.DB,
schema,
pkGetter,
insertIntoProcessIndicatorNoSerializeds,
nil,
scanRow,
scanRows,
nil, nil, nil,
targetResource,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The child-fetch breakdown is measuring parent-only reads twice.

New(db.DB) does not wire ChildFetcher, so the later "WithChildFetch" sub-benchmarks are using the same no-child store semantics as nStNoChildren. The reported child-fetch delta will stay at zero until the second store is constructed with NoSerializedStoreOpts{ChildFetcher: FetchChildren}.

Also applies to: 550-560

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@central/processindicator_no_serialized/store/postgres/bench_test.go` around
lines 495 - 507, The benchmark is measuring parent-only reads twice because nSt
(created via New(db.DB)) lacks a ChildFetcher, so the "WithChildFetch"
sub-benchmarks end up using the same no-child semantics as nStNoChildren; fix by
ensuring one store is constructed with a real child fetcher: create the
child-enabled store by passing the ChildFetcher (e.g.,
NoSerializedStoreOpts{ChildFetcher: FetchChildren} or the equivalent option)
when constructing the store used for "WithChildFetch" (symbols to update: New,
nSt, pgSearch.NewNoSerializedStore, nStNoChildren,
NoSerializedStoreOpts{ChildFetcher: FetchChildren}, FetchChildren), and apply
the same change in the other affected block around lines 550-560.

Comment on lines +231 to +244
idx := 0
inputRows := pgx.CopyFromFunc(func() ([]any, error) {
if idx >= len(objs) {
return nil, nil
}
obj := objs[idx]
idx++

return []interface{}{
pgutils.NilOrUUID(procesIndicatorNoSerializedID),
idx,
obj.GetParentUid(),
obj.GetParentExecFilePath(),
}, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

COPY FROM shifts lineage indexes by one.

idx is incremented before it is written into the row, so the first lineage element is stored as idx = 1. That makes the COPY-based path disagree with the batched insert and bulk-unnest paths on row keys and ordering. This needs to be fixed in the generator too.

Possible fix
 	idx := 0
 	inputRows := pgx.CopyFromFunc(func() ([]any, error) {
 		if idx >= len(objs) {
 			return nil, nil
 		}
-		obj := objs[idx]
-		idx++
+		currentIdx := idx
+		obj := objs[idx]
+		idx++
 
 		return []interface{}{
 			pgutils.NilOrUUID(procesIndicatorNoSerializedID),
-			idx,
+			currentIdx,
 			obj.GetParentUid(),
 			obj.GetParentExecFilePath(),
 		}, nil
 	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@central/processindicator_no_serialized/store/postgres/store.go` around lines
231 - 244, The COPY generator increments idx before writing it to the row,
causing an off-by-one shift (first lineage row written with idx=1). In the
pgx.CopyFromFunc closure (inputRows) change the order so the current index is
used for the row (use idx or a local rowIdx := idx) and only then increment idx
(or increment after returning the row) so COPY produces the same 0-based
ordering as the batched/un-nest paths; update references to
procesIndicatorNoSerializedID, idx, and pgx.CopyFromFunc accordingly.

Comment on lines +32 to +35
schema.SetOptionsMap(search.Walk(v1.SearchCategory(86), "processindicatorjsonb", (*storage.ProcessIndicatorJsonb)(nil)))
schema.ScopingResource = resources.DeploymentExtension
RegisterTable(schema, CreateTableProcessIndicatorJsonbsStmt)
mapping.RegisterCategoryToTable(v1.SearchCategory(86), schema)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't register an anonymous search category here.

v1.SearchCategory(86) is not a named enum value in the provided proto/api/v1/search_service.proto, so the schema registration and options map are hanging off an undefined category. That is brittle for any code that depends on generated enum names or shared category mappings. Please add a real SearchCategory constant in the proto and use it here instead of a raw cast.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/postgres/schema/process_indicator_jsonbs.go` around lines 32 - 35, The
code is using an anonymous numeric cast v1.SearchCategory(86) in the
SetOptionsMap and RegisterCategoryToTable calls, which lacks a corresponding
named enum value in the proto definitions and makes the code brittle. Add a new
named SearchCategory constant to proto/api/v1/search_service.proto file, then
replace both instances of the raw v1.SearchCategory(86) cast in the
SetOptionsMap and RegisterCategoryToTable function calls within the
process_indicator_jsonbs.go file with the new named constant instead of the
magic number.

Comment on lines +329 to +344
"scanVarType": scanVarType,
"scanVarName": scanVarName,
"setterPath": setterPath,
"joinPath": func(parts []string) string { return strings.Join(parts, ".") },
"stripPointer": func(s string) string { return strings.TrimPrefix(s, "*") },
"getterToSetter": func(getter string) string {
// Convert "GetSignal().GetLineageInfo()" to "Signal.LineageInfo"
parts := strings.Split(getter, ".")
var result []string
for _, part := range parts {
part = strings.TrimPrefix(part, "Get")
part = strings.TrimSuffix(part, "()")
result = append(result, part)
}
return strings.Join(result, ".")
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix formatting to pass CI.

The golangci-lint check is failing due to improper formatting around line 333.

Proposed fix - run gofmt
 	"scanVarType":                  scanVarType,
 	"scanVarName":                  scanVarName,
 	"setterPath":                   setterPath,
-	"joinPath":                     func(parts []string) string { return strings.Join(parts, ".") },
-	"stripPointer":                func(s string) string { return strings.TrimPrefix(s, "*") },
+	"joinPath": func(parts []string) string { return strings.Join(parts, ".") },
+	"stripPointer": func(s string) string { return strings.TrimPrefix(s, "*") },
 	"getterToSetter": func(getter string) string {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"scanVarType": scanVarType,
"scanVarName": scanVarName,
"setterPath": setterPath,
"joinPath": func(parts []string) string { return strings.Join(parts, ".") },
"stripPointer": func(s string) string { return strings.TrimPrefix(s, "*") },
"getterToSetter": func(getter string) string {
// Convert "GetSignal().GetLineageInfo()" to "Signal.LineageInfo"
parts := strings.Split(getter, ".")
var result []string
for _, part := range parts {
part = strings.TrimPrefix(part, "Get")
part = strings.TrimSuffix(part, "()")
result = append(result, part)
}
return strings.Join(result, ".")
},
"scanVarType": scanVarType,
"scanVarName": scanVarName,
"setterPath": setterPath,
"joinPath": func(parts []string) string { return strings.Join(parts, ".") },
"stripPointer": func(s string) string { return strings.TrimPrefix(s, "*") },
"getterToSetter": func(getter string) string {
// Convert "GetSignal().GetLineageInfo()" to "Signal.LineageInfo"
parts := strings.Split(getter, ".")
var result []string
for _, part := range parts {
part = strings.TrimPrefix(part, "Get")
part = strings.TrimSuffix(part, "()")
result = append(result, part)
}
return strings.Join(result, ".")
},
🧰 Tools
🪛 GitHub Actions: Style

[error] 333-333: golangci-lint reported file is not properly formatted (gofmt) at tools/generate-helpers/pg-table-bindings/funcs.go:333:1.

🪛 GitHub Check: golangci-lint

[failure] 333-333:
File is not properly formatted (gofmt)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/generate-helpers/pg-table-bindings/funcs.go` around lines 329 - 344,
The file has formatting issues around the template helper block (notably the
getterToSetter anonymous function and nearby helpers like joinPath and
stripPointer) causing golangci-lint/gofmt failures; fix by running gofmt (or go
fmt) on tools/generate-helpers/pg-table-bindings/funcs.go to reformat the block
so the anonymous functions (getterToSetter, joinPath, stripPointer) and their
surrounding commas/indentation conform to gofmt rules, then commit the modified
file.

Comment on lines +99 to +100
&storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension,
&storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix formatting to pass CI.

The golangci-lint check is failing due to improper formatting. The extra spaces for alignment don't conform to gofmt standards.

Proposed fix
-		&storage.ProcessIndicatorJsonb{}:                         resources.DeploymentExtension,
-		&storage.ProcessIndicatorNoSerialized{}:                  resources.DeploymentExtension,
+		&storage.ProcessIndicatorJsonb{}:        resources.DeploymentExtension,
+		&storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
&storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension,
&storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension,
&storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension,
&storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension,
🧰 Tools
🪛 GitHub Check: golangci-lint

[failure] 99-99:
File is not properly formatted (gofmt)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/generate-helpers/pg-table-bindings/list.go` around lines 99 - 100, The
map entries using &storage.ProcessIndicatorJsonb{} and
&storage.ProcessIndicatorNoSerialized{} currently contain extra alignment spaces
that break gofmt/golangci-lint; open the map in list.go and remove the extra
spaces so each key:value pair uses normal Go spacing (e.g.
"&storage.ProcessIndicatorJsonb{}: resources.DeploymentExtension," and
"&storage.ProcessIndicatorNoSerialized{}: resources.DeploymentExtension,") then
run gofmt/golangci-lint to confirm formatting is fixed.

Comment on lines +381 to +403
func scanRow(row pgx.Row) (*storeType, error) {
var data []byte
if err := row.Scan(&data); err != nil {
return nil, err
}
msg := &storeType{}
if err := protojson.Unmarshal(data, msg); err != nil {
return nil, err
}
return msg, nil
}

func scanRows(rows pgx.Rows) (*storeType, error) {
var data []byte
if err := rows.Scan(&data); err != nil {
return nil, err
}
msg := &storeType{}
if err := protojson.Unmarshal(data, msg); err != nil {
return nil, err
}
return msg, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

git ls-files | head -20

Repository: stackrox/stackrox

Length of output: 858


🏁 Script executed:

fd -t f "store.go.tpl" 2>/dev/null

Repository: stackrox/stackrox

Length of output: 114


🏁 Script executed:

if [ -f "tools/generate-helpers/pg-table-bindings/store.go.tpl" ]; then
  echo "File exists"
  wc -l "tools/generate-helpers/pg-table-bindings/store.go.tpl"
fi

Repository: stackrox/stackrox

Length of output: 130


🌐 Web query:

protojson.Unmarshal default behavior unknown fields golang protobuf

💡 Result:

The default behavior of protojson.Unmarshal (from google.golang.org/protobuf/encoding/protojson) when encountering unknown fields in the input JSON is to return an error, such as "unknown field %v". Unknown fields are not preserved in the protobuf message's unknown fields set (unlike binary protobuf unmarshaling), and there is no option to preserve them; they can only be rejected (default) or discarded. To discard unknown fields instead of erroring, use UnmarshalOptions{DiscardUnknown: true}: go import "google.golang.org/protobuf/encoding/protojson" err := protojson.UnmarshalOptions{DiscardUnknown: true}.Unmarshal(jsonData, msg) This behavior is confirmed in the official Go package documentation and source code. It differs from standard encoding/json, which silently ignores unknown fields, and from binary proto.Unmarshal, which preserves unknowns by default.

Citations:


🏁 Script executed:

sed -n '381,403p' tools/generate-helpers/pg-table-bindings/store.go.tpl

Repository: stackrox/stackrox

Length of output: 619


🏁 Script executed:

sed -n '375,410p' tools/generate-helpers/pg-table-bindings/store.go.tpl

Repository: stackrox/stackrox

Length of output: 921


🏁 Script executed:

rg -A 5 "scanRow\|scanRows" tools/generate-helpers/pg-table-bindings/store.go.tpl | head -50

Repository: stackrox/stackrox

Length of output: 43


🏁 Script executed:

grep -n "protojson" tools/generate-helpers/pg-table-bindings/store.go.tpl

Repository: stackrox/stackrox

Length of output: 359


Use tolerant JSONB unmarshalling on persisted rows.

protojson.Unmarshal rejects unknown fields by default. This means an older binary cannot read JSONB rows written by a newer binary that added fields—a critical rolling-upgrade regression compared to binary protobuf storage, which preserves unknown fields.

Fix
 func scanRow(row pgx.Row) (*storeType, error) {
     var data []byte
     if err := row.Scan(&data); err != nil {
         return nil, err
     }
     msg := &storeType{}
-    if err := protojson.Unmarshal(data, msg); err != nil {
+    if err := (protojson.UnmarshalOptions{DiscardUnknown: true}).Unmarshal(data, msg); err != nil {
         return nil, err
     }
     return msg, nil
 }

 func scanRows(rows pgx.Rows) (*storeType, error) {
     var data []byte
     if err := rows.Scan(&data); err != nil {
         return nil, err
     }
     msg := &storeType{}
-    if err := protojson.Unmarshal(data, msg); err != nil {
+    if err := (protojson.UnmarshalOptions{DiscardUnknown: true}).Unmarshal(data, msg); err != nil {
         return nil, err
     }
     return msg, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/generate-helpers/pg-table-bindings/store.go.tpl` around lines 381 -
403, The JSONB unmarshalling currently uses protojson.Unmarshal which rejects
unknown fields; update both scanRow and scanRows to use a tolerant unmarshal by
calling protojson.UnmarshalOptions{DiscardUnknown:true}.Unmarshal(data, msg)
(replace protojson.Unmarshal(data, msg)), referencing the functions scanRow and
scanRows and the storeType message so the JSONB rows tolerate unknown future
fields during rolling upgrades.

Comment on lines +425 to +444
for _, obj := range objs {
{{- range $field := $fields }}
{{- if canUnnest $field }}
arr_{{ scanVarName $field }} = append(arr_{{ scanVarName $field }}, {{ unnestAppendExpr $field }})
{{- end }}
{{- end }}
{{- range $child := $schema.Children }}
for childIdx, child := range obj.{{ getterToSetter $child.ObjectGetter }} {
_ = child
{{- range $field := $child.DBColumnFields }}
{{- if eq $field.ColumnName "idx" }}
arr_child_{{ scanVarName $field }} = append(arr_child_{{ scanVarName $field }}, childIdx)
{{- else if $field.ObjectGetter.IsVariable }}
arr_child_{{ scanVarName $field }} = append(arr_child_{{ scanVarName $field }}, {{ $singlePK.Getter "obj" }})
{{- else }}
arr_child_{{ scanVarName $field }} = append(arr_child_{{ scanVarName $field }}, child.{{ joinPath (setterPath $field) }})
{{- end }}
{{- end }}
}
{{- end }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the bulk child path nil-safe.

The unnest code dereferences both the parent submessage and each child pointer directly. A batch containing a nil nested message or a nil child entry will panic here, while the per-row insert path still succeeds because it goes through getters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/generate-helpers/pg-table-bindings/store.go.tpl` around lines 425 -
444, The bulk unnest loop can panic when a parent nested message or an
individual child is nil because it dereferences obj.{{ getterToSetter
$child.ObjectGetter }} and child.* directly; update the loop that iterates over
objs and for childIdx, child := range obj.{{ getterToSetter $child.ObjectGetter
}} to first check the parent submessage for nil (skip or append nil-equivalents
for all child arrays when parent is nil) and for each child check if child ==
nil before accessing child.{{ joinPath (setterPath $field) }}; when child is
nil, append the appropriate zero/NULL placeholder (or use the $singlePK.Getter
"obj" for parent references) instead of dereferencing so arr_child_{{
scanVarName $field }} remains safe for batch inserts.

Comment on lines +524 to +539
func FetchChildren(ctx context.Context, db postgres.DB, objs []*storeType) error {
if len(objs) == 0 {
return nil
}
objsByID := make(map[string]*storeType, len(objs))
ids := make([]string, 0, len(objs))
for _, obj := range objs {
id := pkGetter(obj)
objsByID[id] = obj
ids = append(ids, id)
}
{{- range $child := .Schema.Children }}
{{- $parentPKs := $.Schema.PrimaryKeys }}
{
q := "SELECT {{ range $index, $field := $child.DBColumnFields }}{{if $index}}, {{end}}{{$field.ColumnName}}{{end}} FROM {{$child.Table}} WHERE {{range $index, $field := $child.FieldsReferringToParent}}{{if $index}} AND {{end}}{{$field.ColumnName}} = ANY(${{add $index 1}}::uuid[]){{end}} ORDER BY {{range $index, $field := $child.FieldsReferringToParent}}{{if $index}}, {{end}}{{$field.ColumnName}}{{end}}, idx"
rows, err := db.Query(ctx, q, ids)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

FetchChildren only works for a single UUID parent key.

This helper hard-codes map[string]*storeType, []string, ::uuid[], and one query argument even though the WHERE clause is generated from FieldsReferringToParent. Composite parents or non-UUID parent IDs will generate invalid SQL here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tools/generate-helpers/pg-table-bindings/store.go.tpl` around lines 524 -
539, FetchChildren assumes a single UUID parent key and builds objsByID as
map[string]*storeType, ids []string, and the SQL uses ANY($1::uuid[]) with only
one query argument; to fix, make FetchChildren handle arbitrary primary key
types and composite keys by (1) changing the in-memory key map to use the actual
primary key tuple type returned by pkGetter (or serialize composite keys
consistently), (2) building a parameter slice matching the number of parent key
columns in FieldsReferringToParent and passing one argument per column to
db.Query, (3) removing the hard-coded ::uuid[] cast and instead use appropriate
type casts or parameter binding based on each parent PK column type, and (4)
update the SQL q construction and the call to db.Query (the q variable and rows,
err := db.Query(ctx, q, ...)) to supply the correct number and form of arguments
so composite and non-UUID parents work correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants