Skip to content

gh-105636: Add re.Pattern.compile_template()#135992

Open
serhiy-storchaka wants to merge 5 commits intopython:mainfrom
serhiy-storchaka:re-compile-template3
Open

gh-105636: Add re.Pattern.compile_template()#135992
serhiy-storchaka wants to merge 5 commits intopython:mainfrom
serhiy-storchaka:re-compile-template3

Conversation

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jun 26, 2025

cben added a commit to cben/cpython that referenced this pull request Feb 11, 2026
- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Extended "beans and spam" example to demonstrate both string & re.compile
  flags usage, `\1` templating, and moved it close to start.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
cben added a commit to cben/cpython that referenced this pull request Feb 11, 2026
- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Moved simplest "beans and spam" example from under "repl is a function"
  (which it wasn't!) close to start, extended to demonstrate
  both string & re.compile flags usage, and `\1` templating.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
cben added a commit to cben/cpython that referenced this pull request Feb 11, 2026
- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Extended "beans and spam" example to demonstrate both string & re.compile
  flags usage, `\1` templating, and moved it close to start.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
cben added a commit to cben/cpython that referenced this pull request Feb 12, 2026
- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Extended "beans and spam" example to demonstrate both string & re.compile
  flags usage, `\1` templating, and moved it close to start.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
Copy link

@cben cben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. It fleshes out the concept of template strings being a notation in their own right.

Q: Why is compile_template a methon on Pattern vs. being global? IIUC it allows validation that backreferences are valid. There are really two separate questions here:

  1. Is it a compile_template-time error to refer to groups missing in pattern? [trying re._compile_template on 3.14, YES] EDIT: ah right, that was a major motivation in #105636 to do this.
  2. Is it a sub()/expand()/call-time error to a use template prepared for a different RE (which has relevant groups)?
    [No strong opinion, but this being stdlib, I guess you either enforce strictness for now, or have to assume somebody will come to rely on it (even if undocumented).]

Terminology: now that we have t'template strings', is introducing yet another Template confusing?
I think until now the word "template" has been in the docs but not in API yet(?)
If the functions are named sub()/subn(), perhaps Substitution is a sane candidate?

  • Crazier Q: could re module work with t-strings directly?! re.sub(r'RE (1) (?P<name>2), t'{1}st {name}d', s) or something.
    Not really cause {those} are evaluated immediately, right?

... 'def myfunc():')
'static PyObject*\npy_myfunc(void)\n{'

If *repl* is a function, it is called for every non-overlapping occurrence of
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider s/is a function/is callable/.

P.S. What happens if it's both? Currently on 3.14.2:

>>> class CallMeMaybe(str):
...     def __call__(self, match): return 'call back'
...
>>> re.sub('@', CallMeMaybe('a string'), 'Did @')
'Did call back'

Dunno if docs should commit either way, but maybe worth a test case 🤷

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not directly related to this PR. Functions (callables) always were supported.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. There are now 3 allowed types and only 2 "if it is a string" / "If repl is a function" descriptions, and I was trying to sweep the Template case under the "callable" rug.
(Reader doesn't need to care that internally Template takes a faster path bypassing python call overhead.)

However I see the fact Templates are callable is not even mentioned until "Template Objects" section, so many readers would miss the implication. And your added paragraph + example below explains the 3rd case clearly.
=> keeping "function" LGTM.

@@ -1247,7 +1248,9 @@ pattern_subx(_sremodulestate* module_state,
if (PyCallable_Check(ptemplate)) {
/* sub/subn takes either a function or a template */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a template" now sounds ambiguous, making this comment less helpful. maybe

Suggested change
/* sub/subn takes either a function or a template */
/* sub/subn takes either a function/Template or a string */

cben added a commit to cben/cpython that referenced this pull request Feb 16, 2026
- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Extended "beans and spam" example to demonstrate both string & re.compile
  flags usage, `\1` templating, and moved it close to start.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
@serhiy-storchaka
Copy link
Member Author

  • could re module work with t-strings directly?!

AFAIK, all names referred in the t-string should be defined in the current scope.

@serhiy-storchaka
Copy link
Member Author

Why is compile_template a methon on Pattern vs. being global?

Because _parser.parse_template() requires the pattern object. And it requires it to be able to translate named group references to references by group number. Resolving them dynamically in sub() will be much slower. We can add the re global function, but it will require passing a compiled pattern as argument.

On the positive side, we can make the compiled pattern argument optional. If it is omitted, we will resolve names dynamically. But this will make the code more complex and the default case slower. I am not sure there is a need for such feature.

Terminology: now that we have t'template strings', is introducing yet another Template confusing?

There is also much older string.Template. This is a general term, it is used in many contexts. We could use "compiled replacement string" or "compiled replacement object", would it be better?

@cben
Copy link

cben commented Feb 18, 2026

Ah cool, I was thinking "optimize for specific pattern" as hypothetical future benefit, didn't realized it's already done for named references.
My Q was more curiosity than criticism; IMHO these are sufficient reasons to keep it a method on Pattern 👍 (This is, after all, a validation+performance feature, so it'd be silly to pick an API that prevents good performance.)

But that means if one re-uses a Template with a different pattern, the result can be surprisingly permuted, right?

p1 = re.compile(r'(?P<first>\w+) (?P<last>\w+)')
t1 = compile_template(r'First name:\g<first>\nLast name:\g<last>')
p2 = r'(?P<last>\w+), (?P<first>\w+)'  # same group names, different positions
p2.sub(t1, 'Bond, James')

Should docs warn against changing pattern? Should Template store the Pattern and assert it's same? (even Template.__call__(match) could do it too by comparing match.re.)

There is a tiny tension here with how I hoped this helps explaining:
"We take Match->str callables, and we have this shorthand notation which gets compiled into callables."
I mean yes, just the callables are more specific than the notation would suggest 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments