Skip to content

gh-145261: multiprocessing.shared_memory: fix ShareableList corruption for multi-byte strings and null bytes#145266

Open
zetzschest wants to merge 2 commits intopython:mainfrom
zetzschest:fix/multiprocessing_shareable_list_utf8
Open

gh-145261: multiprocessing.shared_memory: fix ShareableList corruption for multi-byte strings and null bytes#145266
zetzschest wants to merge 2 commits intopython:mainfrom
zetzschest:fix/multiprocessing_shareable_list_utf8

Conversation

@zetzschest
Copy link

@zetzschest zetzschest commented Feb 26, 2026

Issue

ShareableList has two issues:

  1. It uses len(item) (character count) for string slot allocation instead of len(item.encode('utf-8')) (byte count), causing UnicodeDecodeError with multi-byte UTF-8 characters.
  2. It uses rstrip(b'\x00') to recover bytes values, which strips legitimate trailing null bytes.

Reproducer

from multiprocessing.shared_memory import ShareableList

# String corruption
sl = ShareableList(['0\U00010000\U00010000'])
sl[0]  # UnicodeDecodeError

# Bytes corruption
sl = ShareableList([b'\x00'])
sl[0]  # b'' instead of b'\x00'

Fix

Use len(item.encode('utf-8')) for string slot allocation. For bytes, store the actual byte length in the format metadata so retrieval reads exactly the right number of bytes without needing rstrip(b'\x00').

Test updates

Two assertions in test_shared_memory_ShareableList_basics needed adjustments since they were based on the previous behavior:

  • Format string assertions updated to reflect actual byte lengths stored for bytes values.
  • Removed format comparison between original and copy, as copies may allocate differently with byte-accurate lengths.

…and bytes with trailing nulls

ShareableList had two bugs:
1. Used character count len(item) instead of byte count
   len(item.encode('utf-8')) for string slot allocation, causing
   UnicodeDecodeError with multi-byte UTF-8 characters.
2. Used rstrip(b'\x00') to recover bytes values, which stripped
   legitimate trailing null bytes.

Fix uses UTF-8 byte length for string allocation and stores the actual
byte length in the format metadata for bytes values, so retrieval reads
exactly the right number of bytes without needing rstrip.
@bedevere-app
Copy link

bedevere-app bot commented Feb 26, 2026

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@zetzschest zetzschest marked this pull request as ready for review February 26, 2026 17:41
@zetzschest zetzschest requested a review from gpshead as a code owner February 26, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant