gh-98836: Extend PyUnicode_FromFormat()#98838
Conversation
* Support for conversion specifiers o (octal) and X (uppercase hexadecimal). * Support for length modifiers j (intmax_t) and t (ptrdiff_t). * Length modifiers are now applied to all integer conversions. * Support for wchar_t C strings (%ls and %lV). * Support for variable width and precision (*). * Support for flag - (left alignment).
Doc/whatsnew/3.12.rst
Outdated
| :c:func:`PyUnicode_FromFormatV` now sets a :exc:`SystemError`. | ||
| In previous versions it caused all the rest of the format string to be | ||
| copied as-is to the result string, and any extra arguments discarded. | ||
| (Contributed by Serhiy Storchaka in :gh:`95781`.) |
There was a problem hiding this comment.
It seems like this paragraph was duplicated by mistake?
Lib/test/test_unicode.py
Outdated
| c_int, c_long, c_longlong, c_ssize_t, | ||
| c_uint, c_ulong, c_ulonglong, c_size_t, c_void_p) | ||
| c_uint, c_ulong, c_ulonglong, c_size_t, c_void_p, | ||
| sizeof, c_wchar, c_wchar_p) |
There was a problem hiding this comment.
Can you please a comment somewhere to list formats which are not tested and the reason why?
I see at least %j and %t which are not tested. Please mention that _testcapi.test_string_from_format() has a wider coverage of all formats, and test these formats.
Lib/test/test_unicode.py
Outdated
| check_format(' 0000123', b'%10.7i', c_int(123)) | ||
| check_format('0000000123', b'%010.7i', c_int(123)) | ||
| check_format('0000123 ', b'%-10.7i', c_int(123)) | ||
| check_format('0000123 ', b'%-010.7i', c_int(123)) |
There was a problem hiding this comment.
Would it be possible to generate these tests? The code is similar for the 6 groups of tests.
There was a problem hiding this comment.
It is not so similar. It is different for signed and unsigned types, but I'll try to generalize it.
There was a problem hiding this comment.
I just suggested to generate these tests. It's ok to leave them as they are if it's too complicated to generate them.
Lib/test/test_unicode.py
Outdated
| b'%V', None, b'xyz') | ||
|
|
||
| # test %ls | ||
| check_format('abc', b'%ls', c_wchar_p('abc')) |
There was a problem hiding this comment.
You can please add one non-ASCII character in wchar tests, %ls and %lV? In addition to tests truncating to 2 wchar_t.
There was a problem hiding this comment.
What do you mean? Isn't the following line enough?
There was a problem hiding this comment.
I'm thinking at adding a test with non-ASCII characters which fit into UCS-2 (16-bit wchar_t) characters. Something like:
check_format('a\xe9\u20ac', b'%ls', c_wchar_p('a\xe9\u20ac'))
Some bugs are only triggered depending on the maximum code point, and wchar_t code can take different code path depending if there is a surrogate character or not.
Lib/test/test_unicode.py
Outdated
| @@ -2880,10 +2989,11 @@ def check_format(expected, format, *args): | |||
| # check for crashes | |||
There was a problem hiding this comment.
this comment is misleading. Python doesn't crash but reject invalid format string with a SyntaxError. Would you mind to rephrase the comment? Like: # test invalid format strings?
There was a problem hiding this comment.
Done. Restored the part of the older comment.
| | | | :c:func:`PyObject_Repr`. | | ||
| +-------------------+---------------------+----------------------------------+ | ||
| ASCII-encoded string. | ||
|
|
There was a problem hiding this comment.
Can you please add something like [[fill]align][sign][z][#][0][width][grouping_option][.precision][type]? Example taken from: https://docs.python.org/dev/library/string.html#format-specification-mini-language
The printf format is complex, and it's not obvious in which order each part should be written. I understand that it's something like:
%[conversion flags][minimum width][.precision][length modifier][conversion type]
Or if you prefer a shorter version:
%[conversion][width][.precision][modifier][conversion]
There was a problem hiding this comment.
I copied the following paragraphs from the description of the print-like formatting in Python: https://docs.python.org/3/library/stdtypes.html#printf-style-string-formatting.
I think that it also can benefit from from similar changes, but let leave it to other issue.
Lib/test/test_unicode.py
Outdated
| check_format(' 0000123', b'%10.7i', c_int(123)) | ||
| check_format('0000000123', b'%010.7i', c_int(123)) | ||
| check_format('0000123 ', b'%-10.7i', c_int(123)) | ||
| check_format('0000123 ', b'%-010.7i', c_int(123)) |
There was a problem hiding this comment.
I just suggested to generate these tests. It's ok to leave them as they are if it's too complicated to generate them.
Lib/test/test_unicode.py
Outdated
| # Length modifiers "j" and "t" are not tested here because ctypes does | ||
| # not expose types for intmax_t and ptrdiff_t. | ||
| # _testcapi.test_string_from_format() has a wider coverage of all | ||
| # formats. |
There was a problem hiding this comment.
Thank you, this comment is useful to me at least :-)
Lib/test/test_unicode.py
Outdated
| b'%V', None, b'xyz') | ||
|
|
||
| # test %ls | ||
| check_format('abc', b'%ls', c_wchar_p('abc')) |
There was a problem hiding this comment.
I'm thinking at adding a test with non-ASCII characters which fit into UCS-2 (16-bit wchar_t) characters. Something like:
check_format('a\xe9\u20ac', b'%ls', c_wchar_p('a\xe9\u20ac'))
Some bugs are only triggered depending on the maximum code point, and wchar_t code can take different code path depending if there is a surrogate character or not.
|
Feel free to ignore my suggestions, the PR LGTM. |
Uh oh!
There was an error while loading. Please reload this page.