Skip to content

gh-144888: JIT executor bloom filter wide-type optimization and function Inlining#146114

Merged
Fidget-Spinner merged 1 commit intopython:mainfrom
cocolato:gh-144888
Mar 18, 2026
Merged

gh-144888: JIT executor bloom filter wide-type optimization and function Inlining#146114
Fidget-Spinner merged 1 commit intopython:mainfrom
cocolato:gh-144888

Conversation

@cocolato
Copy link
Contributor

@cocolato cocolato commented Mar 18, 2026

This PR refactors and optimizes the Bloom Filter used by the JIT executor invalidating. Key improvements include:

  1. Replacing uint32_t with wider word types (__uint128_t or uint64_t) to reduce the number of memory accesses
  2. Implementing core functions as static inline to eliminate function call overhead

microbench result on my virtual machine(AMD Ryzen 5 5600X (6C12T), 16 GB RAM, Debian 13 (trixie), Linux 6.12, GCC 14.2, x86_64):

=== Baseline ===
round 0: 1024.1 us  (1024 ns/scan)
round 1: 1065.8 us  (1065 ns/scan)
round 2: 1041.8 us  (1041 ns/scan)
round 3: 949.2 us  (949 ns/scan)
round 4: 1153.6 us  (1153 ns/scan)
round 5: 926.1 us  (926 ns/scan)
round 6: 982.2 us  (982 ns/scan)
round 7: 917.8 us  (917 ns/scan)
round 8: 1125.7 us  (1125 ns/scan)
round 9: 1158.8 us  (1158 ns/scan)

=== Optimized (128 bit) ===
round 0: 845.4 us  (845 ns/scan)
round 1: 824.7 us  (824 ns/scan)
round 2: 906.5 us  (906 ns/scan)
round 3: 843.8 us  (843 ns/scan)
round 4: 920.5 us  (920 ns/scan)
round 5: 1008.2 us  (1008 ns/scan)
round 6: 829.4 us  (829 ns/scan)
round 7: 855.9 us  (855 ns/scan)
round 8: 794.8 us  (794 ns/scan)
round 9: 855.3 us  (855 ns/scan)

Copy link
Member

@Fidget-Spinner Fidget-Spinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@kumaraditya303
Copy link
Contributor

Drive by: You might be able to speed this up further by using vector extension of gcc, see https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

@cocolato
Copy link
Contributor Author

Drive by: You might be able to speed this up further by using vector extension of gcc, see https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

Thanks for advice! I'll give it a try.

@Fidget-Spinner
Copy link
Member

Drive by: You might be able to speed this up further by using vector extension of gcc, see https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

Thanks for advice! I'll give it a try.

I think I'm not too keen on this. We're probably going to start reaching diminishing returns soon. So any more effort put into this might not show up on any real benchmarks.

Maybe in the future we can revisit this.

@Fidget-Spinner Fidget-Spinner merged commit 6fe91a9 into python:main Mar 18, 2026
74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants