Skip to content

Conversation

@youknowone
Copy link
Member

@youknowone youknowone commented Dec 11, 2025

Summary by CodeRabbit

  • New Features

    • Enhanced cross-platform fault handler with richer crash diagnostics and user-facing controls (enable/disable, query status).
    • Stack-trace dumping improvements, including scheduled (timeout-based) dumps, cancellation, and per-signal configurable behavior.
  • Behavior

    • Error/report labeling changed from "" to "" for command-run outputs.
  • Chores

    • Added Windows diagnostic dependency to support improved Windows crash handling.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

Replaces the minimal Rust faulthandler with a full multi-threaded implementation: global enable/FD state, watchdog for delayed tracebacks, cross-platform fatal-signal/exception handlers (Unix and Windows), per-signal registration, signal-safe I/O/formatting utilities, many new public APIs and diagnostic entry points; also renames RunMode::Command source label to "".

Changes

Cohort / File(s) Summary
Faulthandler Core Rewrite
crates/stdlib/src/faulthandler.rs
Full reimplementation: added global state (ENABLED, FATAL_ERROR_FD), watchdog primitives (WatchdogState, WatchdogHandle, WATCHDOG) with scheduling (dump_traceback_later, cancel_dump_traceback_later), cross-platform fatal handlers (Unix signals and Windows exceptions), per-signal user registration/unregistration and chaining, signal-safe I/O/formatting helpers (write_str_noraise, truncate_name, collect_frame_info, format_timeout), revamped public API (dump_traceback with DumpTracebackArgs, enable with EnableArgs, disable, is_enabled, register, unregister), numerous test/diagnostic entry points (_sigsegv, _sigabrt, _sigfpe, _fatal_error_c_thread, _read_null, Windows _raise_exception), and platform-specific previous-handler plumbing (PREVIOUS_HANDLERS, user_signals).
Run Mode Source Label
src/lib.rs
Changed the source identifier for RunMode::Command from "<stdin>" to "<string>".
Windows Cargo Feature
crates/stdlib/Cargo.toml
Added Win32_System_Diagnostics_Debug to Windows-specific windows-sys feature list.

Sequence Diagram

sequenceDiagram
    participant OS as Operating System
    participant Handler as Faulthandler (signal/exception)
    participant VM as Virtual Machine
    participant Frame as Frame Inspector
    participant Output as Signal-safe Writer
    participant Prev as Previous Handler

    OS->>Handler: Deliver fatal signal/exception
    activate Handler
    Handler->>VM: Snapshot execution state
    activate VM
    VM->>Frame: Collect frames & locals
    activate Frame
    Frame-->>VM: Frame data
    deactivate Frame
    VM-->>Handler: Traceback data
    deactivate VM
    Handler->>Output: Write header and frames (async-signal-safe)
    Output-->>Handler: Write complete
    alt Chain to previous handler
        Handler->>Prev: Invoke previous handler
        Prev-->>Handler: Return
    end
    deactivate Handler
Loading
sequenceDiagram
    participant API as dump_traceback_later()
    participant Watchdog as Watchdog Thread
    participant Cond as Condvar/Mutex
    participant Timer as Timeout
    participant Handler as Faulthandler Trigger
    participant Output as Writer

    API->>Watchdog: Start/notify with timeout
    Watchdog->>Cond: Wait (with timeout)
    activate Watchdog
    Timer->>Watchdog: Timeout elapses
    Watchdog->>Handler: Request delayed traceback dump
    Handler->>Output: Write delayed traceback
    deactivate Watchdog
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

  • Review signal-safety and async-signal-safe calls in handler paths (faulthandler_fatal_error, faulthandler_user_signal).
  • Unix vs Windows handler registration/unregistration and PREVIOUS_HANDLERS correctness.
  • Watchdog concurrency: WatchdogState, WATCHDOG Mutex/Condvar interactions, cancellation semantics.
  • Global mutable/atomic state correctness (ENABLED, FATAL_ERROR_FD) and FD handling.
  • Public API argument parsing/types (DumpTracebackArgs, EnableArgs, RegisterArgs) and VM interactions.
  • Diagnostic/test entry points that intentionally trigger crashes require careful review.

Poem

🐰 I nibbled logs and chased the thread,
Signals calm where chaos spread,
Watchdogs hum and trace the night,
Frames unfold in careful light,
"" now sings — hop, debug, delight 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 78.38% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Faulthandler' is generic and does not clearly summarize the main changes. While it relates to the subject matter, it lacks specificity about what was implemented or changed. Consider using a more descriptive title that highlights the key change, such as 'Implement comprehensive multi-threaded faulthandler with cross-platform signal handling' or 'Add faulthandler module with watchdog-based traceback scheduling'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@youknowone youknowone marked this pull request as ready for review December 11, 2025 13:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
crates/stdlib/src/faulthandler.rs (2)

130-146: Implementation correctly dumps traceback in reverse order.

The single-thread implementation is correct. The all_threads parameter is marked TODO—consider tracking this as a follow-up issue if multi-thread support is needed.

Would you like me to open an issue to track the all_threads implementation?


534-540: Hardcoded NSIG = 64 may not be portable.

NSIG varies across Unix platforms (32 on some older systems, 128+ on Linux). Consider using libc::NSIG if available, or document the assumption.

-        const NSIG: usize = 64;
+        // Use libc::NSIG where available, fallback to 64
+        #[cfg(target_os = "linux")]
+        const NSIG: usize = libc::NSIG as usize;
+        #[cfg(not(target_os = "linux"))]
+        const NSIG: usize = 64;
📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0054394 and cb3d676.

⛔ Files ignored due to path filters (3)
  • Lib/test/test_dtrace.py is excluded by !Lib/**
  • Lib/test/test_faulthandler.py is excluded by !Lib/**
  • Lib/test/test_subprocess.py is excluded by !Lib/**
📒 Files selected for processing (2)
  • crates/stdlib/src/faulthandler.rs (1 hunks)
  • src/lib.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • src/lib.rs
  • crates/stdlib/src/faulthandler.rs
🧠 Learnings (3)
📚 Learning: 2025-11-29T12:17:28.606Z
Learnt from: CR
Repo: RustPython/RustPython PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-29T12:17:28.606Z
Learning: Use `cargo run -- script.py` instead of the standard `python` command to run and test Python code with RustPython

Applied to files:

  • src/lib.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/stdlib/src/faulthandler.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/stdlib/src/faulthandler.rs
🧬 Code graph analysis (1)
crates/stdlib/src/faulthandler.rs (1)
crates/vm/src/frame.rs (1)
  • current_location (173-175)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
🔇 Additional comments (6)
src/lib.rs (1)

208-212: LGTM! Correct alignment with CPython behavior.

CPython uses "<string>" as the source identifier for code executed via -c command, while "<stdin>" is reserved for interactive input. This change correctly reflects that distinction.

crates/stdlib/src/faulthandler.rs (5)

58-79: Signal-safe write implementation looks correct.

The implementation correctly uses raw libc::write for async-signal-safe output. Ignoring the return value is appropriate in signal context. Minor nit: the fd as libc::c_int cast is redundant since fd is already i32.


458-515: Watchdog setup looks reasonable.

The timeout conversion and validation are appropriate. The detached thread pattern is acceptable for a watchdog. Good practice calling cancel_dump_traceback_later() before starting a new watchdog to avoid orphaned threads.


625-635: Signal chaining is non-atomic but acceptable.

The signal→raise→re-register sequence at lines 628-632 has a race window where another signal could interrupt. This is a known limitation of POSIX signal chaining and is acceptable given the constraints. Consider adding a comment documenting this behavior.


846-859: Test utility _fatal_error_c_thread has fixed sleep.

The 1-second sleep (line 857) may be insufficient on slow systems. Consider using thread join with timeout or accept that this is best-effort for testing purposes.


53-56: The static mut PREVIOUS_HANDLERS pattern is correct for signal handler safety.

Zero-initializing libc::sigaction is portable and safe across all POSIX systems—all fields have safe zero values (sa_handler=SIG_DFL, sa_mask as empty set, sa_flags=0). More importantly, the current pattern correctly satisfies signal handler constraints: the array is initialized once before signal registration, and signal_handler only performs read-only access. This is the idiomatic approach for signal-safe Rust code.

The suggested Mutex refactor would break the code—mutexes cannot be acquired inside signal handlers and would cause deadlocks, violating async-signal-safe requirements.

Likely an incorrect or invalid review comment.

Comment on lines 210 to 222
#[cfg(unix)]
fn install_fatal_handlers(_vm: &VirtualMachine) {
for (idx, (signum, _)) in FATAL_SIGNALS.iter().enumerate() {
let mut action: libc::sigaction = unsafe { std::mem::zeroed() };
action.sa_sigaction = faulthandler_fatal_error as usize;
action.sa_flags = libc::SA_NODEFER;

unsafe {
libc::sigaction(*signum, &action, &mut PREVIOUS_HANDLERS[idx]);
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Incorrect signal handler setup: mixing sa_sigaction with simple handler.

Line 214 assigns to sa_sigaction but the handler signature is extern "C" fn(c_int), which is the sa_handler signature. Without SA_SIGINFO flag, you should use sa_handler instead. This is undefined behavior.

     #[cfg(unix)]
     fn install_fatal_handlers(_vm: &VirtualMachine) {
         for (idx, (signum, _)) in FATAL_SIGNALS.iter().enumerate() {
             let mut action: libc::sigaction = unsafe { std::mem::zeroed() };
-            action.sa_sigaction = faulthandler_fatal_error as usize;
+            action.sa_sigaction = faulthandler_fatal_error as libc::sighandler_t;
             action.sa_flags = libc::SA_NODEFER;

             unsafe {
                 libc::sigaction(*signum, &action, &mut PREVIOUS_HANDLERS[idx]);
             }
         }
     }

Note: On some platforms, sa_sigaction and sa_handler are a union. Ensure the cast is correct for your target platforms. The handler function signature extern "C" fn(c_int) is correct for a simple handler.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In crates/stdlib/src/faulthandler.rs around lines 210 to 221, the signal handler
is being assigned to sa_sigaction while the function has the simple handler
signature (extern "C" fn(c_int)); change the assignment to use sa_handler
(casting the function pointer to the correct type) and keep sa_flags as-is, or
if you truly need sa_sigaction then change the handler signature to the
SA_SIGINFO form and set SA_SIGINFO in sa_flags; ensure the cast respects
platforms where sa_handler/sa_sigaction are a union so the pointer type is
correct for the target.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
crates/stdlib/src/faulthandler.rs (2)

334-370: get_fd_from_file_opt is robust, but consider sharing it with enable to avoid duplicated file resolution paths.

The helper correctly handles:

  • integer file descriptors (with negative checks),
  • file objects via fileno() and flush(), and
  • falling back to sys.stderr.

Since you already have this logic, it’s a natural fit for enable(file=...) as well, instead of resolving the file in two different ways. Reusing this helper will keep the behavior of fatal-signal output consistent with watchdog/user-signal output and reduce drift if the resolution logic is tweaked later.


725-771: Infinite loop raising SIGSEGV in _sigsegv can hang if the process survives the signal.

On Windows, _sigsegv resets SIGSEGV to default and then does:

#[cfg(windows)]
{
    // On Windows, we need to raise SIGSEGV multiple times
    loop {
        unsafe {
            libc::raise(libc::SIGSEGV);
        }
    }
}

If, for any reason (environment, test harness, or future handler changes), the process doesn’t terminate from SIGSEGV, this becomes a tight infinite loop that pegs a core and hangs tests.

You already bounded the loop and added a fallback exit in the Windows fatal handler; consider mirroring that pattern here, e.g., a fixed number of raises followed by a hard exit/abort.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb3d676 and 4df4555.

⛔ Files ignored due to path filters (2)
  • Lib/test/test_faulthandler.py is excluded by !Lib/**
  • Lib/test/test_subprocess.py is excluded by !Lib/**
📒 Files selected for processing (3)
  • crates/stdlib/Cargo.toml (1 hunks)
  • crates/stdlib/src/faulthandler.rs (1 hunks)
  • src/lib.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib.rs
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/faulthandler.rs
🧠 Learnings (2)
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/stdlib/src/faulthandler.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/stdlib/src/faulthandler.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Check the WASM package and demo
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run snippets and cpython tests (macos-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Run tests under miri
🔇 Additional comments (6)
crates/stdlib/Cargo.toml (1)

144-157: windows-sys feature set now correctly covers Diagnostics::Debug APIs.

Adding "Win32_System_Diagnostics_Debug" here matches the windows_sys::Win32::System::Diagnostics::Debug usages in faulthandler.rs and should fix the previous Windows build break from missing features. Please re-run a Windows-target build (cargo check --target x86_64-pc-windows-msvc in crates/stdlib) to confirm.

crates/stdlib/src/faulthandler.rs (5)

112-124: collect_frame_info and dump_traceback look correct and avoid previous lasti == 0 underflow.

The explicit lasti() == 0 branch in collect_frame_info nicely sidesteps the earlier risk of indexing with lasti - 1 on a fresh frame, and pre-collecting frame_lines before constructing PyWriter avoids borrowing conflicts on vm.frames. This looks solid and matches the expected traceback formatting.

Also applies to: 140-148


177-225: Unix fatal handler setup appears consistent, but please double-check sigaction handler type/flags across supported platforms.

Using a plain extern "C" fn(c_int) handler, assigning it to sa_sigaction as libc::sighandler_t, and omitting SA_SIGINFO should result in a simple 1-arg handler on typical Unix platforms (where sa_handler/sa_sigaction share storage). However, libc::sigaction’s layout and sighandler_t alias vary by platform/ABI.

Please verify on all supported Unix targets that:

  • the libc::sigaction field you assign is the correct one for a 1-arg handler,
  • the cast to libc::sighandler_t is type-correct (no UB from mismatched function pointer types), and
  • SA_NODEFER alone matches the intended behavior (you’re manually re-installing the previous handler before re-raising).

Also applies to: 296-303


372-437: Watchdog thread locking pattern now avoids the earlier race; looks good.

Holding the mutex across wait_timeout, re-checking cancel after the wait, and only then extracting (repeat, exit, fd, header) before dropping the lock is the right pattern here. It removes the earlier race window and avoids double-checking a cached cancelled flag.


817-845: Windows crash-report suppression and _raise_exception wiring look consistent with Cargo features.

suppress_crash_report() and _raise_exception now use:

  • windows_sys::Win32::System::Diagnostics::Debug::{SEM_NOGPFAULTERRORBOX, SetErrorMode, RaiseException}, and
  • The Win32_System_Diagnostics_Debug feature is enabled for windows-sys in crates/stdlib/Cargo.toml.

This is a coherent setup for controlling crash dialogs and raising structured exceptions from tests. Once you’ve run cargo clippy and Windows CI, this part should be in good shape.

Also applies to: 847-886


319-332: The .transpose() usage is correct and matches the official pattern in RustPython.

The actual return type of try_index_opt is Option<PyResult<PyIntRef>>, not PyResult<Option<PyIntRef>>. The .transpose()? pattern correctly converts Option<PyResult<T>> to PyResult<Option<T>>, allowing the ? operator to work at the PyResult level. This is the same pattern used in the official try_index() method in crates/vm/src/protocol/number.rs and is consistently applied in other stdlib modules like select.rs. The code is correct as-is and requires no changes.

Comment on lines +14 to +29
static ENABLED: AtomicBool = AtomicBool::new(false);
static FATAL_ERROR_FD: AtomicI32 = AtomicI32::new(2); // stderr by default

// Watchdog thread state for dump_traceback_later
struct WatchdogState {
cancel: bool,
fd: i32,
timeout_us: u64,
repeat: bool,
exit: bool,
header: String,
}

type WatchdogHandle = Arc<(Mutex<WatchdogState>, Condvar)>;
static WATCHDOG: Mutex<Option<WatchdogHandle>> = Mutex::new(None);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

wc -l crates/stdlib/src/faulthandler.rs

Repository: RustPython/RustPython

Length of output: 102


🏁 Script executed:

cat -n crates/stdlib/src/faulthandler.rs | sed -n '1,50p'

Repository: RustPython/RustPython

Length of output: 2055


🏁 Script executed:

cat -n crates/stdlib/src/faulthandler.rs | sed -n '160,180p'

Repository: RustPython/RustPython

Length of output: 814


🏁 Script executed:

cat -n crates/stdlib/src/faulthandler.rs | sed -n '280,300p'

Repository: RustPython/RustPython

Length of output: 738


🏁 Script executed:

cat -n crates/stdlib/src/faulthandler.rs | sed -n '100,160p'

Repository: RustPython/RustPython

Length of output: 2528


🏁 Script executed:

grep -n "FATAL_ERROR_FD" crates/stdlib/src/faulthandler.rs

Repository: RustPython/RustPython

Length of output: 268


🏁 Script executed:

cat -n crates/stdlib/src/faulthandler.rs | sed -n '179,250p'

Repository: RustPython/RustPython

Length of output: 2971


🏁 Script executed:

rg "fn get_file_for_output" crates/stdlib/src/faulthandler.rs -A 15

Repository: RustPython/RustPython

Length of output: 731


🏁 Script executed:

rg "fn get_fd_from_file" crates/stdlib/src/faulthandler.rs

Repository: RustPython/RustPython

Length of output: 164


🏁 Script executed:

rg "PyObject_AsFileDescriptor\|get_file_descriptor\|fileno" crates/stdlib/src/faulthandler.rs

Repository: RustPython/RustPython

Length of output: 47


🏁 Script executed:

cat -n crates/stdlib/src/faulthandler.rs | sed -n '90,115p'

Repository: RustPython/RustPython

Length of output: 1239


🏁 Script executed:

rg "fn get_fd_from_file_opt" crates/stdlib/src/faulthandler.rs -A 20

Repository: RustPython/RustPython

Length of output: 1026


enable(file=...) does not affect FATAL_ERROR_FD; the file argument is effectively ignored for fatal signals.

FATAL_ERROR_FD is initialized to 2 (stderr) and is read by both the Unix and Windows fatal signal handlers, but it is never updated when enable() is called. The function only validates the file argument via get_file_for_output() and then discards it, meaning fatal-signal messages always write to stderr regardless of the file parameter.

To mirror CPython's faulthandler.enable(file=...) behavior, resolve the file argument to a file descriptor and store it in FATAL_ERROR_FD:

 fn enable(args: EnableArgs, vm: &VirtualMachine) -> PyResult<()> {
-    // Check that file is valid (if provided) or sys.stderr is not None
-    let _file = get_file_for_output(args.file, vm)?;
-
-    ENABLED.store(true, Ordering::Relaxed);
+    // Resolve and validate output; also update fatal-error fd.
+    let fd = get_fd_from_file_opt(args.file, vm)?;
+    FATAL_ERROR_FD.store(fd, Ordering::Relaxed);
+    ENABLED.store(true, Ordering::Relaxed);

Consider also resetting FATAL_ERROR_FD back to 2 in disable().

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Collaborator

@ShaharNaveh ShaharNaveh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! IIRC this is a blocker from updating libregrtest

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
crates/stdlib/src/faulthandler.rs (2)

162-175: enable(file=...) does not update FATAL_ERROR_FD; the file argument is effectively ignored for fatal signals.

The file argument is validated via get_file_for_output() but the resulting file descriptor is discarded. FATAL_ERROR_FD remains at its default value of 2 (stderr), so fatal signal messages always go to stderr regardless of the file parameter. To match CPython's behavior:

 #[pyfunction]
 fn enable(args: EnableArgs, vm: &VirtualMachine) -> PyResult<()> {
-    // Check that file is valid (if provided) or sys.stderr is not None
-    let _file = get_file_for_output(args.file, vm)?;
-
-    ENABLED.store(true, Ordering::Relaxed);
+    // Resolve file to fd and update fatal error output target
+    let fd = get_fd_from_file_opt(args.file, vm)?;
+    FATAL_ERROR_FD.store(fd, Ordering::Relaxed);
+    ENABLED.store(true, Ordering::Relaxed);

Also consider resetting FATAL_ERROR_FD back to 2 in disable().


526-593: Critical async-signal-safety violation: Mutex used in signal handler path.

faulthandler_user_signal (line 596) calls user_signals::get_user_signal which acquires USER_SIGNALS via Mutex::lock().unwrap(). This violates POSIX async-signal-safety requirements and creates a deadlock risk:

  1. register() or unregister() acquires the lock
  2. A signal arrives while the lock is held
  3. The handler tries to acquire the same non-reentrant Mutex → deadlock

Replace the Mutex-backed storage with lock-free alternatives. For example, use a static array of atomic fields that the signal handler can read safely:

use std::sync::atomic::{AtomicBool, AtomicI32, AtomicUsize, Ordering};

struct AtomicUserSignal {
    enabled: AtomicBool,
    fd: AtomicI32,
    chain: AtomicBool,
    previous: AtomicUsize, // libc::sighandler_t as usize
}

static USER_SIGNALS: [AtomicUserSignal; NSIG] = /* init with atomics */;

The signal handler can then perform atomic loads without any locking.

🧹 Nitpick comments (3)
crates/stdlib/src/faulthandler.rs (3)

754-767: Minor concern: test function has infinite loop, but acceptable given SIG_DFL reset.

The _sigsegv test function resets SIGSEGV to SIG_DFL before the loop, so the first raise(SIGSEGV) should terminate the process. The loop is a Windows workaround. Consider adding a brief comment explaining why the loop is needed on Windows.

             #[cfg(windows)]
             {
-                // On Windows, we need to raise SIGSEGV multiple times
+                // On Windows, the signal may need multiple raises before the process terminates.
+                // With SIG_DFL restored, one of these raises will terminate the process.
                 loop {
                     unsafe {
                         libc::raise(libc::SIGSEGV);
                     }
                 }
             }

470-475: Consider overflow handling for very large timeout values.

The conversion (timeout * 1_000_000.0) as u64 could overflow for extremely large timeout values (though unlikely in practice). Consider using a saturating conversion or explicit check.

         // Convert timeout to microseconds
-        let timeout_us = (timeout * 1_000_000.0) as u64;
+        let timeout_us = (timeout * 1_000_000.0).min(u64::MAX as f64) as u64;
         if timeout_us == 0 {
             return Err(vm.new_value_error("timeout must be greater than 0".to_owned()));
         }

629-645: Consider using platform-specific signal limits instead of hardcoded range.

The valid signal range is hardcoded as 1..64, but NSIG varies by platform (32 on macOS, 64+ on Linux). While the codebase defines its own signal::NSIG constant (currently 64), relying on a fixed value across all platforms is not portable. Per POSIX guidance, use libc::NSIG to determine the correct limit at compile time, or ensure the constant is platform-aware using #[cfg] attributes.

Additionally, the user_signals module also defines const NSIG: usize = 64; locally, creating duplication. Both should reference a single, platform-aware source.

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4df4555 and 96f44ac.

📒 Files selected for processing (2)
  • crates/stdlib/Cargo.toml (1 hunks)
  • crates/stdlib/src/faulthandler.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.rs: Follow the default rustfmt code style by running cargo fmt to format Rust code
Always run clippy to lint Rust code (cargo clippy) before completing tasks and fix any warnings or lints introduced by changes
Follow Rust best practices for error handling and memory management
Use the macro system (pyclass, pymodule, pyfunction, etc.) when implementing Python functionality in Rust

Files:

  • crates/stdlib/src/faulthandler.rs
🧠 Learnings (2)
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration using #ifdef checks rather than providing fallback values for other platforms.

Applied to files:

  • crates/stdlib/src/faulthandler.rs
📚 Learning: 2025-06-27T14:47:28.810Z
Learnt from: moreal
Repo: RustPython/RustPython PR: 5847
File: vm/src/stdlib/stat.rs:547-567
Timestamp: 2025-06-27T14:47:28.810Z
Learning: In RustPython's stat module implementation, platform-specific constants like SF_SUPPORTED and SF_SYNTHETIC should be conditionally declared only for the platforms where they're available (e.g., macOS), following CPython's approach of optional declaration rather than providing fallback values for other platforms.

Applied to files:

  • crates/stdlib/src/faulthandler.rs
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
  • GitHub Check: Run rust tests (windows-latest)
  • GitHub Check: Run tests under miri
  • GitHub Check: Ensure compilation on various targets
  • GitHub Check: Check Rust code with clippy
  • GitHub Check: Run snippets and cpython tests (windows-latest)
  • GitHub Check: Run snippets and cpython tests (ubuntu-latest)
  • GitHub Check: Run rust tests (ubuntu-latest)
  • GitHub Check: Run rust tests (macos-latest)
  • GitHub Check: Run snippets and cpython tests on wasm-wasi
🔇 Additional comments (8)
crates/stdlib/Cargo.toml (1)

144-157: LGTM!

The addition of Win32_System_Diagnostics_Debug correctly enables the Windows APIs (SetErrorMode, SEM_NOGPFAULTERRORBOX, RaiseException) used by the new faulthandler implementation.

crates/stdlib/src/faulthandler.rs (7)

58-79: LGTM!

Signal-safe write implementation correctly uses raw libc::write and handles platform differences. Ignoring the return value is appropriate for best-effort output in signal context.


112-124: LGTM!

The lasti == 0 edge case is now properly handled by checking before accessing current_location() and using first_line_number with a fallback to 1.


211-222: LGTM!

The signal handler installation correctly casts to libc::sighandler_t and uses SA_NODEFER to allow immediate re-delivery after re-raising.


224-278: LGTM!

The Windows signal handler correctly handles the SIGSEGV case by returning (letting the faulting instruction re-execute with the restored handler) and re-raising for other signals.


369-435: LGTM!

The watchdog thread now correctly holds the mutex across wait_timeout to prevent race conditions, checks for cancellation after waking, and releases the lock before performing I/O operations.


814-842: LGTM!

The suppress_crash_report implementation correctly handles platform differences: Windows uses SetErrorMode with SEM_NOGPFAULTERRORBOX, while Unix disables core dumps via setrlimit(RLIMIT_CORE, ...).


874-883: LGTM!

The Windows _raise_exception test helper correctly uses RaiseException from the Windows API, consistent with CPython's implementation.

@youknowone youknowone merged commit 04d55fa into RustPython:main Dec 11, 2025
11 of 13 checks passed
@youknowone youknowone deleted the faulthandler branch December 11, 2025 15:08
@youknowone
Copy link
Member Author

Sounds great, could you give a look to libregrtest?

@fanninpm
Copy link
Contributor

@youknowone @ShaharNaveh I can take a look this evening.

@fanninpm fanninpm mentioned this pull request Dec 11, 2025
@fanninpm
Copy link
Contributor

@youknowone @ShaharNaveh I can take a look this evening.

Good news: Libregrtest from CPython 3.13.10 is no longer crashing on faulthandler. Bad (but not that bad) news: it's now gated on inspect._shadowed_dict_from_weakref_mro_tuple. Also, we should investigate how to implement _struct._clearcache().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants