[Stdlib] Add SIMD-optimized List.__contains__() for scalar types by msaelices · Pull Request #5918 · modular/modular

msaelices · 2026-02-07T22:50:38Z

Summary

Delegate List.__contains__ for scalar types to Span.__contains__ instead of duplicating SIMD logic
Fix Span.__contains__ SIMD width cascade to include widths 4 and 2, enabling SIMD for 64-bit types on AVX2
Add benchmarks for Int, Float64, and UInt8 across sizes 10-100k (50 repetitions)

Benchmark results (main vs this branch)

Machine: x86_64, Intel i7-12700H (AVX2, 256-bit SIMD), 50 repetitions

Benchmark	Size	main (ms)	branch (ms)	Speedup
Float64	10	0.000924	0.000404	2.3x
Float64	50	0.003033	0.001032	2.9x
Float64	100	0.005506	0.000785	7.0x
Float64	1,000	0.034644	0.010380	3.3x
Float64	10,000	0.355334	0.073229	4.9x
Float64	100,000	3.227314	1.003895	3.2x
UInt8	50	0.008404	0.001952	4.3x
UInt8	100	0.011563	0.000948	12.2x
UInt8	1,000	0.032597	0.001294	25.2x
UInt8	10,000	0.369522	0.012411	29.8x
UInt8	100,000	2.731753	0.098097	27.8x
Int hit	100	0.006290	0.003364	1.9x
Int miss	100	0.015014	0.006682	2.2x
Int hit	100,000	3.515771	3.039424	1.2x

Add a specialized `__contains__` overload for `List[Scalar[dtype]]` that uses SIMD vectorized comparison instead of element-by-element scanning. This yields up to 4-32x speedups depending on element size and SIMD width. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Add benchmarks covering Int (hit/miss), Float64, and UInt8 element types across sizes from 10 to 100,000 elements. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Copilot

Pull request overview

This PR adds a SIMD-specialized List.__contains__ implementation for scalar element types to speed up membership checks, along with tests and a new benchmark to measure performance across common scalar types.

Changes:

Added a SIMD-vectorized __contains__ overload for List[Scalar[dtype]] in the standard library.
Extended test_list_contains to exercise scalar list membership behavior on larger inputs.
Introduced a new collections benchmark for List.__contains__ across Int, Float64, and UInt8 sizes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`mojo/stdlib/std/collections/list.mojo`	Adds SIMD-optimized `List.__contains__` overload for scalar element lists.
`mojo/stdlib/test/collections/test_list.mojo`	Adds tests intended to cover the SIMD-optimized membership path for scalar types.
`mojo/stdlib/benchmarks/collections/bench_list_contains.mojo`	Adds a new benchmark suite for list membership across types and sizes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-07T22:55:02Z

mojo/stdlib/test/collections/test_list.mojo

+    # Test with UInt8.
+    var bytes = List[UInt8]()
+    for i in range(256):
+        bytes.append(UInt8(i))
+
+    assert_true(UInt8(0) in bytes)
+    assert_true(UInt8(127) in bytes)
+    assert_true(UInt8(255) in bytes)
+


The UInt8 tests only exercise the hit path: bytes contains every value 0..255, so there’s no way for x in bytes to return False. Add a UInt8 miss case (e.g., build a list missing one value and assert it’s not contained) to cover the negative path of the SIMD overload.

Done: msaelices@a614847

Copilot · 2026-02-07T22:55:03Z

mojo/stdlib/benchmarks/collections/bench_list_contains.mojo

+fn make_list_uint8[size: Int]() -> List[UInt8]:
+    var l = List[UInt8](capacity=size)
+    for i in range(size):
+        l.append(UInt8(i % 256))
+    return l^


make_list_uint8 repeats values for sizes > 256 (UInt8(i % 256)), so target in items will typically succeed very early (first occurrence within the first 256 elements) rather than scanning ~size/2 elements as intended. This makes the UInt8 benchmark results not representative; consider constructing data where the first match occurs near the middle (or add an explicit miss benchmark) to measure the real traversal cost.

Fixed: msaelices@0686f8e

Copilot · 2026-02-07T22:55:03Z

mojo/stdlib/benchmarks/collections/bench_list_contains.mojo

+fn bench_list_contains_uint8[size: Int](mut b: Bencher) raises:
+    """Search for a UInt8 element present in the middle of the list."""
+    var items = make_list_uint8[size]()
+    var target = UInt8((size // 2) % 256)
+


The UInt8 benchmark docstring says the element is “present in the middle of the list”, but with the current data generation (repeating 0..255), the target’s first occurrence is near the beginning for large sizes. Update the benchmark setup (and/or docstring) so the measured access pattern matches the description.

Fixed: msaelices@0686f8e

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Ensure value 0 first appears at position size // 2 so the benchmark measures scanning half the list, not finding a match early due to repeating values. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

mojo/stdlib/std/collections/list.mojo

Replace the SIMD-optimized List.__contains__ overload for scalar types with a simple delegation to Span.__contains__, which already has an equivalent (and more sophisticated) SIMD implementation. This eliminates code duplication and addresses PR review feedback. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Add widths 4 and 2 to the SIMD width cascade in Span.__contains__. Previously the smallest width was 8, which meant no SIMD code was generated for 64-bit types (Int, Float64) on AVX2 where simd_width_of returns 4. This gives 2-3x speedups for Float64 containment checks. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Increase num_repetitions from 5 to 50 for more stable and representative benchmark results. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

NathanSWard · 2026-02-13T04:58:38Z

mojo/stdlib/benchmarks/collections/bench_list_contains.mojo

+    @parameter
+    fn call_fn() raises:
+        for _ in range(100):
+            var res = target in items


I'm not totally sure we need these benchmarks as this PR is not doing any performance improvements, it's merely delegating the call to Span.__contains__.
If we do keep the benchmarks updating them to Span feels like a better approach.
However, if you just want to remove this benchmark, I'd be okay with that too :)

Removed: msaelices@d717ab0

NathanSWard · 2026-02-13T05:02:15Z

mojo/stdlib/benchmarks/collections/bench_list_contains.mojo

+    fn call_fn() raises:
+        for _ in range(100):
+            var res = target in items
+            keep(res)
+
+    b.iter[call_fn]()
+    keep(Bool(items))
+    keep(target)


A few style suggestion on benchmarks if we do keep them (IMO it'll be easier to just remove them),

Let's use black_box in the testing function

Prefer the overload of .iter that works with unified closures - this allows you to remove the items/target at the end of this function as they were only needed because @parameter closures emit a false-positive warning about "unused variables".

No need to mark the function raises unless it does actually raise an error.

Suggested change

fn call_fn() raises:

for _ in range(100):

var res = target in items

keep(res)

b.iter[call_fn]()

keep(Bool(items))

keep(target)

fn call_fn() unified {read}:

for _ in range(100):

var res = black_box(target) in items

keep(res)

b.iter(call_fn)

The PR now just delegates to Span.__contains__, so separate List-specific benchmarks are unnecessary. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices added 2 commits February 7, 2026 22:59

[Stdlib] Add benchmark for List.__contains__()

b276219

Add benchmarks covering Int (hit/miss), Float64, and UInt8 element types across sizes from 10 to 100,000 elements. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested a review from a team as a code owner February 7, 2026 22:50

Copilot AI review requested due to automatic review settings February 7, 2026 22:50

Copilot started reviewing on behalf of msaelices February 7, 2026 22:51 View session

Copilot AI reviewed Feb 7, 2026

View reviewed changes

msaelices added 2 commits February 8, 2026 00:15

[Stdlib] Add UInt8 miss cases to List.__contains__() tests

a614847

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

[Stdlib] Fix UInt8 benchmark to place target at list midpoint

0686f8e

Ensure value 0 first appears at position size // 2 so the benchmark measures scanning half the list, not finding a match early due to repeating values. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

NathanSWard reviewed Feb 8, 2026

View reviewed changes

mojo/stdlib/std/collections/list.mojo Outdated Show resolved Hide resolved

NathanSWard reviewed Feb 8, 2026

View reviewed changes

mojo/stdlib/std/collections/list.mojo Show resolved Hide resolved

msaelices added 4 commits February 8, 2026 11:22

Merge branch 'main' into contains-optimization

d134487

[Stdlib] Increase benchmark repetitions for List.__contains__

0c6b141

Increase num_repetitions from 5 to 50 for more stable and representative benchmark results. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested a review from NathanSWard February 8, 2026 22:26

msaelices added 3 commits February 10, 2026 11:40

Merge branch 'main' into contains-optimization

7c40020

Merge branch 'main' into contains-optimization

47ff495

Merge branch 'main' into contains-optimization

bd33315

NathanSWard requested changes Feb 13, 2026

View reviewed changes

msaelices added 2 commits February 13, 2026 11:13

Merge branch 'main' into contains-optimization

64cea95

[Stdlib] Remove List.__contains__ benchmarks

d717ab0

The PR now just delegates to Span.__contains__, so separate List-specific benchmarks are unnecessary. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested a review from NathanSWard February 13, 2026 10:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stdlib] Add SIMD-optimized List.contains() for scalar types#5918

[Stdlib] Add SIMD-optimized List.contains() for scalar types#5918
msaelices wants to merge 13 commits intomodular:mainfrom
msaelices:contains-optimization

msaelices commented Feb 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

msaelices Feb 7, 2026

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

msaelices Feb 7, 2026

Uh oh!

Copilot AI Feb 7, 2026

Uh oh!

msaelices Feb 7, 2026

Uh oh!

Uh oh!

Uh oh!

NathanSWard Feb 13, 2026

Uh oh!

msaelices Feb 13, 2026

Uh oh!

NathanSWard Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

msaelices commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark results (main vs this branch)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

msaelices Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

msaelices Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

msaelices Feb 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NathanSWard Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

msaelices Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

NathanSWard Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

msaelices commented Feb 7, 2026 •

edited

Loading