-
-
Notifications
You must be signed in to change notification settings - Fork 34.3k
Description
Bug report
Bug description:
A regression caused by #88887
The best context for this issue comes from two places: my (1) #88887 (comment) report and independent confirmation from @itamaro in (2) #88887 (comment)
(1) """a regression in processes using fork() where a reference to the resource_tracker's pipe remains alive in another process. https://github.com/gpshead/cpython/blob/00d16dca6e911fb69c055aa874a2d25cb5e5fe6a/Lib/test/_test_multiprocessing.py#L6293-L6306 has an example of a regression test that demonstrates it.
Basically, at process shutdown the new __del__ finalizer is called and can hang in waitpid on a child process that is not exiting.
We could sever that relationship so the fd isn't inherited and the shared resource_tracker used by multiple sub-child processes when the "fork" start_method is used is no longer a feature - that'd undo #80849 's #5172 which added that as a feature (cc: @pitrou & @tomMoral) - but also "fork" as a start_method is rather frowned upon these days - people are better off avoiding it. But the default only just changed away in 3.14 so a lot of people still are - I encountered this in 3.13.9 & 3.13.11.
I would not undo a feature in a bugfix regardless.
One "easy" workaround for now is probably for anyone actually hitting this is possibly to restore previous behavior and re-gain this issue - which it feels like it was uncommon:
if hasattr(multiprocessing.resource_tracker.ResourceTracker, "__del__"):
del multiprocessing.resource_tracker.ResourceTracker.__del__A fix forward could basically be to undo #5172's feature."""
(2) """hey @gpshead, I believe I ran into this at least twice now, while migrating Meta to 3.12.
Trying to create a minimal reproducer, here's what I got:
import os
import sys
import time
from multiprocessing.resource_tracker import ensure_running
# Step 1: Start the resource tracker (creates the pipe with fds r, w).
ensure_running()
print("Resource tracker started.", flush=True)
# Step 2: Fork. The child inherits the write-end fd of the tracker pipe.
pid = os.fork()
if pid == 0:
# Child: stay alive so the inherited write-end fd remains open,
# preventing the tracker from seeing EOF.
print(f"[child {os.getpid()}] sleeping (holds write fd open)...", flush=True)
time.sleep(100.0)
print(f"[child {os.getpid()}] exiting...", flush=True)
sys.exit(0)
else:
# Parent: exit normally. During shutdown, ResourceTracker.__del__
# closes the write fd and calls waitpid() on the tracker process.
# The tracker never exits because the child still has the fd open.
print(f"[parent {os.getpid()}] exiting normally (child={pid})...", flush=True)and here's what I ended up doing in our global sitecustomize.py to workaround it:
https://github.com/facebook/buck2/blob/271de04a2a00041cee2e9e18d896fcd24f241598/prelude/python/tools/make_par/sitecustomize.py#L203-L246
(briefly: register at fork callback that resets the resource tracker inherited from the parent (if it was started) after in child)"""
My first draft of a regression test trying to reproduce it and a fix was in main...gpshead:cpython:claude/fix-resource-tracker-hang-XZw5P from January.
I'll turn something here into a real fix.
CPython versions tested on:
3.12, 3.13
Operating systems tested on:
Linux