Skip to content

Comments

src: fix deadlock in NodePlatform::DrainTasks on process shutdown#61963

Open
RajeshKumar11 wants to merge 2 commits intonodejs:mainfrom
RajeshKumar11:fix/54918-drain-tasks-deadlock
Open

src: fix deadlock in NodePlatform::DrainTasks on process shutdown#61963
RajeshKumar11 wants to merge 2 commits intonodejs:mainfrom
RajeshKumar11:fix/54918-drain-tasks-deadlock

Conversation

@RajeshKumar11
Copy link
Contributor

Summary

Fixes #54918.

NodePlatform::DrainTasks() was calling BlockingDrain() which waits
indefinitely for outstanding kUserBlocking worker tasks to complete.
This creates a deadlock when:

  1. A kUserBlocking worker task (e.g. Maglev JIT compilation) completes
    its work and posts a foreground task, then waits for that foreground task
    to run (e.g. via a std::promise<> resolve).
  2. The main thread (in DrainTasks) flushes the foreground queue — missing
    the newly-posted task because it was posted just after the flush.
  3. The main thread enters BlockingDrain()'s indefinite sleep.
  4. Both threads are now waiting on each other: a classic mutual deadlock.

Fix

Replace BlockingDrain() with TimedBlockingDrain(1ms). On timeout, flush
foreground tasks before retrying. This breaks the deadlock because the worker
task's foreground task will be flushed in the next 1ms window.

In the common (non-deadlock) case, tasks complete before the timeout and
NotifyOfOutstandingCompletion() wakes the wait immediately — no
performance regression.

New APIs added:

  • ConditionVariableBase::TimedWait() — wraps uv_cond_timedwait()
  • TaskQueue::Locked::TimedBlockingDrain() — timed version of BlockingDrain()
  • WorkerThreadsTaskRunner::TimedBlockingDrain() — delegates to TaskQueue

Test

Stress-tested locally with 50 (and 1000) iterations of
test/parallel/test-http2-large-write-multiple-requests.js — all pass:

[00:02|% 100|+  50|-   0]: Done
All tests passed.

Previously this test was a known reproducer for the deadlock on some machines.

Related

Replace the unconditional BlockingDrain() in DrainTasks() with a 1ms
timed drain loop that periodically flushes foreground tasks while
waiting for outstanding kUserBlocking worker tasks to complete.

This fixes a deadlock where a kUserBlocking worker task (e.g. Maglev
JIT compilation) posts a foreground task and waits for it. If the
foreground task is posted after the main thread flushes the foreground
queue but before BlockingDrain() enters its indefinite sleep, both
threads wait on each other — a mutual deadlock.

With the timed drain, the main thread wakes every 1ms and flushes any
foreground tasks that were posted in that window. In the common case,
tasks complete before the timeout and the wait is woken immediately by
NotifyOfOutstandingCompletion(), so there is no performance regression.

Adds ConditionVariableBase::TimedWait() backed by uv_cond_timedwait()
and TaskQueue::Locked::TimedBlockingDrain() to support this.

Fixes: nodejs#54918
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run. labels Feb 24, 2026
Fix clang-format alignment issues in node_mutex.h:
- Split cond_timedwait parameters onto separate lines
- Correct continuation alignment in TimedWait implementation
@codecov
Copy link

codecov bot commented Feb 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.77%. Comparing base (ad945c5) to head (01323a3).
⚠️ Report is 47 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #61963      +/-   ##
==========================================
+ Coverage   89.75%   89.77%   +0.02%     
==========================================
  Files         675      674       -1     
  Lines      204721   205619     +898     
  Branches    39344    39422      +78     
==========================================
+ Hits       183751   184600     +849     
- Misses      13230    13275      +45     
- Partials     7740     7744       +4     
Files with missing lines Coverage Δ
src/node_mutex.h 97.91% <100.00%> (+0.94%) ⬆️
src/node_platform.cc 75.32% <100.00%> (-0.92%) ⬇️
src/node_platform.h 91.66% <ø> (ø)

... and 126 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ronag ronag requested a review from joyeecheung February 24, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

c++ Issues and PRs that require attention from people who are familiar with C++. needs-ci PRs that need a full CI run.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Deadlock at process shutdown

2 participants