[Stdlib] Use SIMD-accelerated brace scanning in format string parsing by msaelices · Pull Request #5919 · modular/modular

msaelices · 2026-02-08T00:12:14Z

Summary

Generalize _memchr in string_slice.mojo into a single variadic function that accepts N needle characters via a VariadicPack, using SIMD to search for all of them simultaneously in a single pass
Inner loops use @parameter for with rebind[Scalar[dtype]] for compile-time unrolling
Add _find_next_brace() helper in format.mojo that uses _memchr to locate {/} characters
Replace byte-by-byte for loop in compile_entries_runtime() with while loop that jumps directly to the next brace
Add bench_format.mojo benchmark and SIMD boundary tests

Benchmark results (`main` vs this PR)

Median of 5 repetitions, 1000 format calls per iteration:

Benchmark	main (ms)	PR (ms)	Speedup
`literal[4B x 4]`	0.46	0.39	1.2x
`literal[64B x 4]`	0.99	0.55	1.8x
`literal[512B x 4]`	4.46	0.67	6.7x
`literal[4096B x 4]`	28.59	2.20	13.0x
`runtime_short`	0.59	0.24	2.5x

Replace the byte-by-byte loop in `compile_entries_runtime()` with a SIMD-vectorized `_find_next_brace()` helper that scans 16-64 bytes at once to locate `{` and `}` characters, skipping over literal text in chunks. Falls back to scalar at compile time. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Copilot

Pull request overview

This PR improves runtime format-string parsing performance in std::collections::string by adding a SIMD-based scanner to locate {/} bytes more efficiently, reducing per-byte branching in the main parsing loop.

Changes:

Added _find_next_brace() helper that scans for {/} using SIMD masks at runtime, with scalar fallback for compile-time evaluation and short strings.
Updated compile_entries_runtime() to use a brace-jumping while loop driven by _find_next_brace() instead of byte-by-byte iteration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-08T00:19:00Z

mojo/stdlib/std/collections/string/format.mojo

+        var i = 0
+        while True:
+            var next = _find_next_brace(fmt_ptr, i, fmt_len)
+            if next == -1:
+                break
+            i = next
            if fmt_ptr[i] == `{`:


The new SIMD-accelerated control flow is only taken when fmt_len is at least one SIMD vector wide, but existing format-string tests appear to primarily cover short inputs. Please add tests that exercise the SIMD path (e.g., long format strings with {/} located at and across vector boundaries, including escaped {{ / }} split across a boundary) to ensure correctness of the vectorized scan + scalar tail fallback.

NathanSWard · 2026-02-08T00:16:41Z

mojo/stdlib/std/collections/string/format.mojo

-                continue
+        var i = 0
+        while True:
+            var next = _find_next_brace(fmt_ptr, i, fmt_len)


question: This seems like a more general useful facility than just finding braces for this specific context. Something along the lines of StringSlice.find_one_of(codepoints...) or something like that.

Ups, Just read. Actually, I found we can use the _memchr function and I did it here: 2270a30

But, it is slower than before, so I've created a new _memchr2 to find any of the two chars passed: msaelices@724458c

It could be generalized to support n-char searches, I agree, but I’m not sure whether that would be within the scope of this PR.

I've added a TODO line for generalizing the _memchr and _memchr2 functions: 7990e34

NathanSWard · 2026-02-08T00:19:37Z

mojo/stdlib/std/collections/string/format.mojo

+
+
+@always_inline
+fn _find_next_brace(


comment: This is a nice optimization for sure, however a few things to consider:

For format-strings and when we eventually have f/t-strings this code will be run at compile time meaning the vectorized approach isn't saving us much as there is no runtime performance benefits and it may end up slowing down compile times.

If you do add benchmarks for this, we need to really only be testing for small strings, where I'm not sure how much perf improvement we'll get. Since most format strings relatively small e.g. "Hello, {}! I am {}".format(...). So I'd be interested to see these cases as long format sting aren't really a very common thing 👀

I did not know it. I guess we can parametrize the code to differentiate regular Strings vs f-ones, right?

Added benchmarks.

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Replace custom SIMD scanning with two _memchr calls (one for each brace character), reusing existing SIMD-optimized infrastructure instead of duplicating it. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Add _memchr2/_memchr2_impl to string_slice.mojo for finding the first occurrence of either of two byte values in a single SIMD pass. Use it in _find_next_brace instead of two separate _memchr calls. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

…mchr_any Signed-off-by: Manuel Saelices <msaelices@gmail.com>

martinvuyk · 2026-02-10T00:31:51Z

mojo/stdlib/std/collections/string/string_slice.mojo

+# TODO: Generalize _memchr/_memchr2 into a single variadic _memchr_any that
+# accepts N needle characters and builds the SIMD mask with a parameter loop,
+# similar to Rust's memchr crate which provides memchr, memchr2, and memchr3.
+
+
+@always_inline
+fn _memchr2[
+    dtype: DType, //
+](
+    source: Span[mut=False, Scalar[dtype], ...],
+    char1: Scalar[dtype],
+    char2: Scalar[dtype],
+) -> source.UnsafePointerType:


I think this is already generalizable nowadays. It should be just receiving a variadic pack and iterating over each element and oring a result variable

Sure. Done: 4582c8e

…t parsing Add tests for format strings with braces at SIMD vector boundaries (16, 32, 64 bytes), escaped braces straddling boundaries, and braces in scalar tail positions. Add bench_format_runtime_short for typical short format strings like "Hello, {}! I am {} years old and I like {}.". Signed-off-by: Manuel Saelices <msaelices@gmail.com>

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

martinvuyk · 2026-02-10T18:10:13Z

mojo/stdlib/std/collections/string/string_slice.mojo

+            for j in range(len(chars)):
+                if ptr[i] == chars[j]:
+                    return ptr + i


IMO you should unroll the loop at compile time here. Similarly in other places

We might leave this more complicated version for another time, up to you:

I'm not 100% sure how well the compiler might optimize this but it's best if you give it a shove in the right direction. Not sure if this optimization is faster or not either, just eyeballing it

Suggested change

for j in range(len(chars)):

if ptr[i] == chars[j]:

return ptr + i

var data = SIMD[dtype, next_power_of_to(len(chars))](chars[0])

@parameter

for j in range(1, len(chars)):

data[j] = chars[j]

if data.eq(ptr[i]).reduce_or():

return ptr + i

I think this does not work because @parameter for requires compile-time bounds, and VariadicList length is not known at compile-time in Mojo

I think this does not work because @parameter for requires compile-time bounds, and VariadicList length is not known at compile-time in Mojo

Yeah that limitation has irked me for a while now (see #4144). It's why I oginally meant for this to use a VariadicPack but while I know of a way to do it currently, it will be ugly. We can polish this later on (i.e. just leave the runtime loop for now).

This actually gave me an idea of how we could allow some nicer version of it, I'll go play around and hopefully figure it out

I've checked that now the speed-up is even better than before 🚀

Will try the VariadicPack approach

If variadic pack is too verbose and since variadic list doesn't have a comptime size, you could use InlineArray as an alternative too.

it works like a charm: msaelices@c4a3796

The benchmarks are similar or slightly better, specially in the runtime_short bench

Nice! FYI I opened #5935 to see if we ever get a nicer way to implement this without the rebind workaround. If this were a public function we'd have to comptime assert that the type of each of the VariadicPack elements is Scalar[dtype], IMO we can leave it as is since this is private

@NathanSWard @martinvuyk Does this PR look good to go?

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested a review from a team as a code owner February 8, 2026 00:12

Copilot AI review requested due to automatic review settings February 8, 2026 00:12

Copilot started reviewing on behalf of msaelices February 8, 2026 00:12 View session

msaelices force-pushed the format-optimization branch from 67ce5ad to bcf41f3 Compare February 8, 2026 00:13

msaelices marked this pull request as draft February 8, 2026 00:15

Copilot AI reviewed Feb 8, 2026

View reviewed changes

NathanSWard reviewed Feb 8, 2026

View reviewed changes

[Stdlib] Add benchmark for format string parsing

3269532

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices force-pushed the format-optimization branch from 1d8cc74 to 3269532 Compare February 8, 2026 00:30

msaelices added 3 commits February 8, 2026 01:41

[Stdlib] Refactor _find_next_brace to reuse _memchr from StringSlice

2270a30

Replace custom SIMD scanning with two _memchr calls (one for each brace character), reusing existing SIMD-optimized infrastructure instead of duplicating it. Signed-off-by: Manuel Saelices <msaelices@gmail.com>

[Stdlib] Add TODO for generalizing _memchr/_memchr2 into variadic _me…

7990e34

…mchr_any Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested a review from NathanSWard February 8, 2026 01:02

msaelices marked this pull request as ready for review February 8, 2026 01:03

Merge branch 'main' into format-optimization

ac59ac5

martinvuyk reviewed Feb 10, 2026

View reviewed changes

msaelices added 2 commits February 10, 2026 11:42

[Stdlib] Generalize _memchr into a single variadic function

4582c8e

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested a review from martinvuyk February 10, 2026 17:37

Merge branch 'main' into format-optimization

e40eb3f

martinvuyk reviewed Feb 10, 2026

View reviewed changes

martinvuyk mentioned this pull request Feb 10, 2026

[Feature Request] [stdlib] [mojo-lang] A way to initialize a type from an arguments list at the function callsite #5935

Open

1 task

[Stdlib] Use VariadicPack for compile-time unrolled _memchr loops

c4a3796

Signed-off-by: Manuel Saelices <msaelices@gmail.com>

msaelices requested a review from martinvuyk February 10, 2026 22:21

msaelices added 2 commits February 11, 2026 16:44

Merge branch 'main' into format-optimization

63a9dca

Merge branch 'main' into format-optimization

a6d6a80



		@always_inline
		fn _find_next_brace(

Conversation

msaelices commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark results (main vs this PR)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msaelices Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NathanSWard Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martinvuyk Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martinvuyk Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

msaelices commented Feb 8, 2026 •

edited

Loading

Benchmark results (`main` vs this PR)

msaelices Feb 8, 2026 •

edited

Loading

NathanSWard Feb 8, 2026 •

edited

Loading

martinvuyk Feb 10, 2026 •

edited

Loading

martinvuyk Feb 10, 2026 •

edited

Loading