Add pcre_u guard to functions using u flag #9724

USERSATOSHI · 2025-09-03T14:56:42Z

This PR adds _wp_can_use_pcre_u guards to all the functions that use pcre_u modifier flag in regex.

Currently WordPress assumes that u flag is available by default but when the pcre_u support isn't present this falls apart and functions like parse_shortcodes_atts like breaks returning NULL.

Trac ticket: https://core.trac.wordpress.org/ticket/63913

This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

github-actions · 2025-09-03T14:56:50Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props tusharbharti.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

github-actions · 2025-09-03T15:12:37Z

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

The Plugin and Theme Directories cannot be accessed within Playground.
All changes will be lost when closing a tab with a Playground instance.
All changes will be lost when refreshing the page.
A fresh instance is created each time the link below is clicked.
Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

dmsnell · 2025-09-22T22:51:48Z

src/wp-includes/class-wp-plugin-dependencies.php

+			} else {
+				if ( preg_match( '/^[a-z0-9]+(-[a-z0-9]+)*$/m', $slug ) ) {
+					$sanitized_slugs[] = $slug;
+				}


this PCRE pattern is only looking at US-ASCII characters and doesn’t even need the UTF-8 flag. do you see any reason not to update this simply to remove the flag?

dmsnell · 2025-09-23T07:54:24Z

src/wp-includes/comment.php

+				$pattern = "#$word#iu";
+			} else {
+				$pattern = "#$word#i";
+			}


I suspect this could be another case where it’s okay to remove the UTF-8 flag, because whatever $word is, it’s going to appear here as bytes, not as source code. that means it’s already matching sequences of the requested bytes/text.

it would be good to verify this. one setup would be to have PHP using an internal_encoding of latin1 (if that’s even possible, I can’t remember if changing the internal encoding has been removed) and then testing "b\xC3\xBCch" against "#b\xFCch#i. if these match then the PCRE functions are converting text before matching. if they don’t match, I think we can probably remove the flag.

dmsnell · 2025-09-23T07:54:53Z

src/wp-includes/comment.php

+			$pattern = "#$word#iu";
+		} else {
+			$pattern = "#$word#i";
+		}


same as above: is the u flag necessary here given that we’re injecting runtime bytes into the pattern and not attempting to translate source code?

dmsnell · 2025-09-23T07:56:30Z

src/wp-includes/formatting.php

-	if ( 1 === @preg_match( '/^./us', $text ) ) {
+	if ( 1 === preg_match( '/^./us', $text ) ) {
 		return $text;
 	}


this whole function has been updated in trunk. these changes are no longer relevant.

dmsnell · 2025-09-23T07:57:07Z

src/wp-includes/formatting.php

+			} else {
+				$words_array = array( str_split( $text ) );
+			}
+		}


I have this function slated for much bigger updates. I would recommend against updating the PCRE usage here because of that.

dmsnell · 2025-09-23T07:57:56Z

src/wp-includes/functions.php

+				$decline = preg_match( '#\b\d{1,2}\.? [^\d ]+\b#u', $date );
+			} else {
+				$decline = preg_match( '#\b\d{1,2}\.? [^\d ]+\b#', $date );
+			}


what is the difference between the \b with and without the UTF-8 flag?

dmsnell · 2025-09-23T08:01:42Z

src/wp-includes/pomo/po.php

+					$chars = array( mb_str_split( $line, 1, 'UTF-8' ) );
+				} else {
+					$chars = array( str_split( $line ) );
+				}


this splitting of lines into characters and looking for a backslash is something we can probably do away with entirely: a streaming approach with strpos( '\\' ) would suffice because all of the escapes are US-ASCII. this means we don’t need to split the lines and we don’t need a million string concatenations.

dmsnell · 2025-09-23T09:16:34Z

src/wp-includes/shortcodes.php

+		$text = preg_replace( "/[\x{00a0}\x{200b}]+/u", ' ', $text );
+	} else {
+		$text = str_replace( array( "\xc2\xa0", "\xe2\x80\x8b" ), ' ', $text );
+	}


if the regex isn’t necessary, it would seem fine to replace it directly with the str_replace(), but two ideas:

use strtr( $text, array( … ) )

use the Unicode string literals like "\u{00A0}" and "\u{200B}"

although it would be good to verify that all supported versions of PHP support that Unicode syntax without any extensions. I think they do.

fix: add guard to all pcre_u usage

2752085

fix: add function_exists check for mb_str_split

eef7d73

dmsnell reviewed Sep 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add pcre_u guard to functions using u flag #9724

Add pcre_u guard to functions using u flag #9724

Uh oh!

USERSATOSHI commented Sep 3, 2025

Uh oh!

github-actions bot commented Sep 3, 2025

Uh oh!

github-actions bot commented Sep 3, 2025

Uh oh!

dmsnell Sep 22, 2025

Uh oh!

dmsnell Sep 23, 2025

Uh oh!

dmsnell Sep 23, 2025

Uh oh!

dmsnell Sep 23, 2025

Uh oh!

dmsnell Sep 23, 2025

Uh oh!

dmsnell Sep 23, 2025

Uh oh!

dmsnell Sep 23, 2025

Uh oh!

dmsnell Sep 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add pcre_u guard to functions using u flag #9724

Are you sure you want to change the base?

Add pcre_u guard to functions using u flag #9724

Uh oh!

Conversation

USERSATOSHI commented Sep 3, 2025

Uh oh!

github-actions bot commented Sep 3, 2025

Uh oh!

github-actions bot commented Sep 3, 2025

Test using WordPress Playground

Some things to be aware of

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants