gh-105636: Add re.Pattern.compile_template()#135992
gh-105636: Add re.Pattern.compile_template()#135992serhiy-storchaka wants to merge 5 commits intopython:mainfrom
Conversation
24d60b1 to
25653db
Compare
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Extended "beans and spam" example to demonstrate both string & re.compile flags usage, `\1` templating, and moved it close to start. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Moved simplest "beans and spam" example from under "repl is a function" (which it wasn't!) close to start, extended to demonstrate both string & re.compile flags usage, and `\1` templating. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Extended "beans and spam" example to demonstrate both string & re.compile flags usage, `\1` templating, and moved it close to start. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Extended "beans and spam" example to demonstrate both string & re.compile flags usage, `\1` templating, and moved it close to start. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
There was a problem hiding this comment.
I like this. It fleshes out the concept of template strings being a notation in their own right.
Q: Why is compile_template a methon on Pattern vs. being global? IIUC it allows validation that backreferences are valid. There are really two separate questions here:
- Is it a compile_template-time error to refer to groups missing in pattern? [trying re._compile_template on 3.14, YES] EDIT: ah right, that was a major motivation in #105636 to do this.
- Is it a sub()/expand()/call-time error to a use template prepared for a different RE (which has relevant groups)?
[No strong opinion, but this being stdlib, I guess you either enforce strictness for now, or have to assume somebody will come to rely on it (even if undocumented).]
Terminology: now that we have t'template strings', is introducing yet another Template confusing?
I think until now the word "template" has been in the docs but not in API yet(?)
If the functions are named sub()/subn(), perhaps Substitution is a sane candidate?
- Crazier Q: could re module work with t-strings directly?!
re.sub(r'RE (1) (?P<name>2), t'{1}st {name}d', s)or something.
Not really cause {those} are evaluated immediately, right?
| ... 'def myfunc():') | ||
| 'static PyObject*\npy_myfunc(void)\n{' | ||
|
|
||
| If *repl* is a function, it is called for every non-overlapping occurrence of |
There was a problem hiding this comment.
Consider s/is a function/is callable/.
P.S. What happens if it's both? Currently on 3.14.2:
>>> class CallMeMaybe(str):
... def __call__(self, match): return 'call back'
...
>>> re.sub('@', CallMeMaybe('a string'), 'Did @')
'Did call back'Dunno if docs should commit either way, but maybe worth a test case 🤷
There was a problem hiding this comment.
This is not directly related to this PR. Functions (callables) always were supported.
There was a problem hiding this comment.
True. There are now 3 allowed types and only 2 "if it is a string" / "If repl is a function" descriptions, and I was trying to sweep the Template case under the "callable" rug.
(Reader doesn't need to care that internally Template takes a faster path bypassing python call overhead.)
However I see the fact Templates are callable is not even mentioned until "Template Objects" section, so many readers would miss the implication. And your added paragraph + example below explains the 3rd case clearly.
=> keeping "function" LGTM.
Modules/_sre/sre.c
Outdated
| @@ -1247,7 +1248,9 @@ pattern_subx(_sremodulestate* module_state, | |||
| if (PyCallable_Check(ptemplate)) { | |||
| /* sub/subn takes either a function or a template */ | |||
There was a problem hiding this comment.
"a template" now sounds ambiguous, making this comment less helpful. maybe
| /* sub/subn takes either a function or a template */ | |
| /* sub/subn takes either a function/Template or a string */ |
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Extended "beans and spam" example to demonstrate both string & re.compile flags usage, `\1` templating, and moved it close to start. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
AFAIK, all names referred in the t-string should be defined in the current scope. |
Because On the positive side, we can make the compiled pattern argument optional. If it is omitted, we will resolve names dynamically. But this will make the code more complex and the default case slower. I am not sure there is a need for such feature.
There is also much older |
|
Ah cool, I was thinking "optimize for specific pattern" as hypothetical future benefit, didn't realized it's already done for named references. But that means if one re-uses a Template with a different pattern, the result can be surprisingly permuted, right? Should docs warn against changing pattern? Should Template store the Pattern and assert it's same? (even There is a tiny tension here with how I hoped this helps explaining: |
📚 Documentation preview 📚: https://cpython-previews--135992.org.readthedocs.build/