gh-148241: Fix json serialization for str subclasses#148249
gh-148241: Fix json serialization for str subclasses#148249vstinner wants to merge 3 commits intopython:mainfrom
Conversation
Fix json serialization: no longer call str(obj) on str subclasses. Replace PyUnicodeWriter_WriteStr() with PyUnicodeWriter_WriteASCII() and private _PyUnicodeWriter_WriteStr().
| if (PyUnicodeWriter_WriteStr(writer, pystr) < 0) { | ||
| // gh-148241: Avoid PyUnicodeWriter_WriteStr() which calls str(obj) | ||
| // on str subclasses | ||
| if (_PyUnicodeWriter_WriteStr((_PyUnicodeWriter*)writer, pystr) < 0) { |
There was a problem hiding this comment.
An alternative to using the private API (which is bad!) is to call PyUnicode_FromObject(). But PyUnicode_FromObject() has to copy the string, it's less efficient.
There was a problem hiding this comment.
See also the issue gh-148250 that I just created for this issue.
There was a problem hiding this comment.
using the private API
It would raise a deprecated warning.
_Py_DEPRECATED_EXTERNALLY(3.14) PyAPI_FUNC(int) _PyUnicodeWriter_WriteStr(
_PyUnicodeWriter *writer,
PyObject *str); There was a problem hiding this comment.
Is it safe to cast PyUnicodeWriter* to _PyUnicodeWriter*?
There was a problem hiding this comment.
Yes, it's safe. That's how PyUnicodeWriter_WriteStr() is implemented: by casting the first argument to _PyUnicodeWriter* and calling _PyUnicodeWriter_WriteStr().
Do you prefer calling PyUnicode_FromObject() to avoid the private API?
It would raise a deprecated warning.
_Py_DEPRECATED_EXTERNALLY() only emits a deprecation warning if the Py_BUILD_CORE macro is not defined. This macro is defined in Modules/_json.c, so no warning is emitted.
There was a problem hiding this comment.
Is it safe to cast PyUnicodeWriter* to _PyUnicodeWriter*?
Yes. I checked, PyUnicodeWriter is an opaque type that is always cast to _PyUnicodeWriter before use.
There was a problem hiding this comment.
I am in favor of public API.
Is PyUnicodeWriter_WriteSubstring applicable for this?
PyUnicodeWriter_WriteSubstring the only PyUnicodeWriter public API that treats str subclasses as str.
There was a problem hiding this comment.
It is fine. We already use private API in this file.
|
cc @nineteendo @methane @serhiy-storchaka (who worked on PR gh-133239 which introduced the regession) |
|
"Tests / CIFuzz / python3-libraries (address)" failure was already fixed by google/oss-fuzz#15317 but the fix was not deployed yet. |
|
Re-running, it should be OK now. |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
There is test_enum specially for such cases. I think it is better to add a test here.
Ok, I also added a test to |
serhiy-storchaka
left a comment
There was a problem hiding this comment.
LGTM. 👍
I am not sure we need so many tests, but this is up to you.
| if (PyUnicodeWriter_WriteStr(writer, pystr) < 0) { | ||
| // gh-148241: Avoid PyUnicodeWriter_WriteStr() which calls str(obj) | ||
| // on str subclasses | ||
| if (_PyUnicodeWriter_WriteStr((_PyUnicodeWriter*)writer, pystr) < 0) { |
There was a problem hiding this comment.
It is fine. We already use private API in this file.
Fix json serialization: no longer call str(obj) on str subclasses.
Replace PyUnicodeWriter_WriteStr() with PyUnicodeWriter_WriteASCII() and private _PyUnicodeWriter_WriteStr().