-
Notifications
You must be signed in to change notification settings - Fork 93
Implement PERCENTILE_CONT and PERCENTILE_DISC aggregate functions #5471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: percentile_cont
Are you sure you want to change the base?
Conversation
- Add percentile_collect functions in sqllib aggregates - Implement NonLinearAggregate support in AggregateCompiler - Add PercentileSemigroup for accumulator combination - Cast percentile parameter from DECIMAL to f64 via into_inner() - Test issue3778 regression passes Co-authored-by: Karakatiza666 <8537259+Karakatiza666@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot Please design and implement tests for percentile_cont and percentile_disc to |
Co-authored-by: Karakatiza666 <8537259+Karakatiza666@users.noreply.github.com>
Signed-off-by: Ubuntu <bulakh.96@gmail.com>
|
@copilot I have pushed commits that make the implementation compile. Now run tests on it |
|
what is the point of this PR? |
|
I am experimenting with an implementation based on order-statistics multiset (based on b-tree) with O(log m) complexity |
|
that should start with a DBSP operator, the compiler support is easy after that |
The temp/src/udf.rs file was accidentally deleted but is required for test compilation. Restored from commit 3f98886. Co-authored-by: Karakatiza666 <8537259+Karakatiza666@users.noreply.github.com>
Tests have been successfully run on the percentile implementation: Java Tests (all passing):
Python Tests:
The Python tests require a running pipeline-manager instance to execute. The test structure follows the existing patterns in the codebase. Fixed Issue: Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
Adds support for SQL standard
PERCENTILE_CONTandPERCENTILE_DISCaggregate functions, leveraging Calcite's existing parser support.Implementation
Rust sqllib additions:
percentile_collect<P, V>()- Accumulates values intoTup2<Option<P>, Vec<V>>where P is percentile, V is data typePercentileSemigroup- Combines accumulators by taking first non-null percentile and concatenating vectorsSQL compiler extensions:
AggregateCompilerfor percentile functionspercentile_cont/percentile_discfunctions with nullable variantsinto_inner()Example
The accumulator maintains a single percentile value across all groups while collecting all data points for final computation.
Testing
Java Unit Tests (
AggregateTests.java):testPercentileCont- Validates PERCENTILE_CONT with mixed data types (integers, doubles)testPercentileDisc- Validates PERCENTILE_DISC with various percentiles (0.0, 0.5, 0.9, 1.0)testPercentileDescending- Tests descending order support with both functionsPython Runtime Tests (
aggregate_tests6.py):test_percentile_cont_basic- Basic PERCENTILE_CONT functionality with integer datatest_percentile_disc_basic- Basic PERCENTILE_DISC functionality with integer datatest_percentile_multiple- Multiple percentile calculations in single querytest_percentile_null_handling- NULL value handling in input datatest_percentile_empty_group- Empty group behaviortest_percentile_single_value- Single value edge casetest_percentile_with_groupby- GROUP BY clause integrationtest_percentile_floating_point- Floating point data with PERCENTILE_CONTAll tests follow existing patterns in their respective test suites and validate correct incremental computation behavior.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.