-
-
Notifications
You must be signed in to change notification settings - Fork 79
Description
SUMMARY
StackQL intermittently fails when listing S3 bucket objects only in GitHub Actions, while the same credentials and queries work reliably from a local laptop.
Example failure:
Get "https://s3.ap-southeast-2.amazonaws.com/stackql-trial-bucket-02?max-keys=1000
": EOF
This is not an AWS service defect and not a credentials issue.
The root cause is Go’s default HTTP client connection reuse behavior (keep-alive / HTTP2) interacting poorly with the GitHub Actions network environment when using raw HTTP + SigV4 signing (no AWS SDK).
ENVIRONMENT
StackQL using any-sdk HTTP client
AWS SigV4 signing only (no AWS SDK HTTP client usage)
Same AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY locally and in GitHub Actions
Region explicitly provided to SigV4 signer
Fails only in GitHub Actions
Error manifests as immediate EOF while reading response
WHY THIS HAPPENS
GitHub Actions runners aggressively reuse TCP connections
NAT, proxying, and idle connection reaping are common
Server-side connection close is legal and frequent
Go net/http default transport enables:
HTTP keep-alives
HTTP/2
Idle connection reuse
A reused but half-dead connection results in:
Connection closed by server
Client receives EOF before response body
AWS S3 is behaving within specification
Early connection close is permitted
This is a client-side transport robustness issue
This does not reproduce locally
Local networks are more tolerant of connection reuse
CI networking is not
Retrying is not a viable solution
StackQL fans out across tens or hundreds of endpoints
Retrying on EOF causes request amplification and latency explosion
This is a transport correctness problem, not a transient API failure
EVIDENCE
Same credentials, same query, same endpoint succeed locally
Failures occur only in GitHub Actions
Disabling HTTP/2 via GODEBUG reduces failures
Eliminating connection reuse eliminates failures entirely
CORRECT FIX (VENDOR-AGNOSTIC)
Explicitly control the Go HTTP transport used by any-sdk / StackQL and disable connection reuse for AWS endpoints.
Proposed transport configuration:
transport := &http.Transport{
DisableKeepAlives: true,
ForceAttemptHTTP2: false,
}
client := &http.Client{
Transport: transport,
}
Effects:
One request per TCP connection
No reuse of half-closed sockets
Deterministic behavior in CI
No AWS SDK dependency
No retries required
SCOPE
Apply to AWS providers (or make configurable per provider)
Does not require AWS environment variables
Preserves StackQL vendor-independence
Acceptable performance tradeoff for control-plane style queries
NON-SOLUTIONS (INTENTIONALLY AVOIDED)
Adding retries on EOF
Introducing AWS SDK HTTP client
Requiring AWS_REGION / AWS_DEFAULT_REGION
CI-specific bash hacks
Treating this as an AWS outage or service bug
ACTION ITEMS
Add explicit HTTP transport ownership in any-sdk
Disable keep-alives and HTTP/2 for AWS providers
Document rationale (CI + NAT behavior)
Add CI regression coverage if possible
Metadata
Metadata
Assignees
Labels
Type
Projects
Status