-
Notifications
You must be signed in to change notification settings - Fork 156
Richer errors for HyperlightVm #1162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
| use crate::hypervisor::virtual_machine::RegisterError; | ||
|
|
||
| /// Errors that can occur when determining the vCPU stop reason | ||
| #[derive(Debug, thiserror::Error, displaydoc::Display)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use displaydoc and not display from thiserror? At first glance this looks redundant as one could just write:
pub enum VcpuStopReasonError {
#[error(Failed to get registers: {0})]
GetRegs(#[from] RegisterError),
...
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, however each variant also requires a rustdoc comment, otherwise we get error: missing documentation for a variant. Because the comment was generally the same as what's passed to #[error()], I thought that this makes it a bit cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a comprehensive richer error type system for HyperlightVm operations, enabling more accurate determination of when a sandbox should be poisoned. The changes maintain backwards compatibility by converting internal errors to existing error types where appropriate, while providing much more detailed error information through a tree-like error structure.
Changes:
- Introduces new error enums with displaydoc for all HyperlightVm operations (dispatch, initialize, create, map/unmap regions, run, I/O handling, debug operations)
- Updates all virtual machine implementations (KVM, MSHV, WHP) to use new error types
- Adds poison detection logic based on error types to properly track sandbox state
- Removes old hypervisor-specific error variants from HyperlightError in favor of the new HyperlightVmError hierarchy
- Adds displaydoc dependency for better error message generation
Reviewed changes
Copilot reviewed 14 out of 17 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/hyperlight_host/src/hypervisor/hyperlight_vm.rs | Defines new error types (DispatchGuestCallError, InitializeError, RunVmError, HandleIoError, MapRegionError, UnmapRegionError, CreateHyperlightVmError, HandleDebugError, SendDbgMsgError, RecvDbgMsgError, HyperlightVmError) and updates function signatures |
| src/hyperlight_host/src/hypervisor/virtual_machine/mod.rs | Adds VmError, CreateVmError, RunVcpuError, RegisterError, MapMemoryError, UnmapMemoryError, HypervisorImplError enums |
| src/hyperlight_host/src/hypervisor/virtual_machine/whp.rs | Updates Windows Hypervisor Platform implementation to use new error types |
| src/hyperlight_host/src/hypervisor/virtual_machine/mshv.rs | Updates MSHV implementation to use new error types |
| src/hyperlight_host/src/hypervisor/virtual_machine/kvm.rs | Updates KVM implementation to use new error types |
| src/hyperlight_host/src/hypervisor/gdb/mod.rs | Adds DebugError, DebugMemoryAccessError and updates debug trait methods |
| src/hyperlight_host/src/hypervisor/gdb/arch.rs | Adds VcpuStopReasonError enum |
| src/hyperlight_host/src/sandbox/initialized_multi_use.rs | Updates dispatch call handling with error promotion and poison detection |
| src/hyperlight_host/src/sandbox/uninitialized_evolve.rs | Updates to use new HyperlightVmError wrapping |
| src/hyperlight_host/src/sandbox/outb.rs | Introduces HandleOutbError enum for outb operation errors |
| src/hyperlight_host/src/sandbox/trace/mem_profile.rs | Updates to return HandleOutbError instead of Result |
| src/hyperlight_host/src/error.rs | Adds HyperlightVmError variant, removes old hypervisor-specific errors, updates is_poison_error logic |
| src/hyperlight_host/Cargo.toml | Adds displaydoc 0.2.5 dependency |
| Cargo.lock | Updates dependencies (displaydoc, flatbuffers, proc-macro2, quote, syn) |
| src/tests/rust_guests/witguest/Cargo.lock | Updates test guest dependencies |
| src/tests/rust_guests/simpleguest/Cargo.lock | Updates test guest dependencies |
| src/hyperlight_host/src/hypervisor/mod.rs | Updates test to use unwrap instead of ? operator |
Signed-off-by: Ludvig Liljenberg <4257730+ludfjig@users.noreply.github.com>
This pr adds a new types of Errors for all possible operations possible on struct
HyperlightVm. The main purpose for doing so if to more accurately being able to determine whether a sandbox should be poisoned, since any error that doesn't let the guest properly unwind should poison a sandbox. However, another benefit with this PR is that errors surfaced to the user will be much richer in information about what went wrong. Due to these errors having tree-like structure, they almost serve as a kind of backtrace. To accomplish this, new richer errors types have been added to all possible code paths taken by allHyperlightVm, and existing HyperlightErrors are entirely avoided in these codepaths (with a couple of small exceptions).To remain backwards compatible (with tests and not to break users relying on certain existing errors), some of these internal errors are surfaced as existing errors. For example, stack overflow errors will be turned into the existing top-level error
HyperlightError::StackOverflow, and when guest execution is canceleld by host, the old error will be returned similarly.This PR does not change any logic at all, it only modifies returned error types.
My eventual (far away) goal is to add richer errors to all hyperlight modules, and very carefully distinguish between internal errors (that are not supposed to happen) and other errors that indeed are supposed to be able to happen. All new errors in this PR are "internal errors" in that they should not be relied upon to adhere to semver, etc, and may change heavily between versions. That said, they are still
pubsince they are returned to users. The errors in this PR can be visualized like this:Spoiler warning
HyperlightVmError ├── DispatchGuestCall(DispatchGuestCallError) │ ├── ConvertRspPointer(String) │ ├── SetupRegs(RegisterError) ──► [A] │ └── Run(RunVmError) ──► [B] │ ├── Initialize(InitializeError) │ ├── ConvertPointer(String) │ ├── SetupRegs(RegisterError) ──► [A] │ └── Run(RunVmError) ──► [B] │ ├── Create(CreateHyperlightVmError) │ ├── NoHypervisorFound │ ├── Vm(VmError) ──► [C] │ ├── ConvertRspPointer(Box) │ ├── SendDbgMsg(SendDbgMsgError) ──► [D] [gdb] │ └── AddHwBreakpoint(DebugError) ──► [G] [gdb] │ ├── MapRegion(MapRegionError) │ ├── NotPageAligned { page_size, region } │ └── MapMemory(MapMemoryError) │ └── UnmapRegion(UnmapRegionError) ├── RegionNotFound(MemoryRegion) └── UnmapMemory(UnmapMemoryError)[A] RegisterError
├── GetRegs(HypervisorImplError)
├── SetRegs(HypervisorImplError)
├── GetFpu(HypervisorImplError)
├── SetFpu(HypervisorImplError)
├── GetSregs(HypervisorImplError)
├── SetSregs(HypervisorImplError)
├── GetDebugRegs(HypervisorImplError)
├── SetDebugRegs(HypervisorImplError)
├── GetXsave(HypervisorImplError)
└── XsaveSizeMismatch { expected, actual }
[B] RunVmError
├── StackOverflow
├── MemoryAccessViolation { addr, access_type, region_flags }
├── MmioReadUnmapped(u64)
├── MmioWriteUnmapped(u64)
├── ExecutionCancelledByHost
├── UnexpectedVmExit(String)
├── RunVcpu(RunVcpuError)
├── HandleIo(HandleIoError) ──► [E]
├── GetRegs(RegisterError) ──► [A] [trace_guest]
├── CheckStackGuard(Box)
├── DebugHandler(HandleDebugError) ──► [F] [gdb]
├── VcpuStopReason(VcpuStopReasonError) [gdb]
└── CrashdumpGeneration(Box) [crashdump]
[C] VmError
├── CreateVm(CreateVmError)
├── RunVcpu(RunVcpuError)
├── Register(RegisterError) ──► [A]
├── MapMemory(MapMemoryError)
├── UnmapMemory(UnmapMemoryError)
└── Debug(DebugError) ──► [G] [gdb]
[D] SendDbgMsgError [gdb]
├── DebugNotEnabled
└── SendFailed(GdbTargetError)
[E] HandleIoError
├── NoData
├── GetRegs(RegisterError) ──► [A] [mem_profile]
└── Outb(HandleOutbError)
├── StackOverflow
├── GuestAborted { code, message }
├── InvalidPort(String)
├── ReadLogData(String)
├── TraceFormat(String)
├── ReadHostFunctionCall(String)
├── LockFailed(&str, u32, String)
├── WriteHostFunctionResponse(String)
├── InvalidDebugPrintChar(u32)
└── MemProfile(String) [mem_profile]
[F] HandleDebugError [gdb]
├── DebugNotEnabled
├── SendMessage(SendDbgMsgError) ──► [D]
├── ReceiveMessage(RecvDbgMsgError)
│ ├── DebugNotEnabled
│ └── RecvFailed(GdbTargetError)
└── ProcessRequest(ProcessDebugRequestError)
├── DebugNotEnabled
├── TryLockError(&str, u32)
├── Vm(VmError) ──► [C]
├── Debug(DebugError) ──► [G]
├── SwBreakpointNotFound(u64)
├── ReadMemory(DebugMemoryAccessError)
└── WriteMemory(DebugMemoryAccessError)
[G] DebugError [gdb]
├── HwBreakpointNotFound(u64)
├── TooManyHwBreakpoints(usize)
├── Register(RegisterError) ──► [A]
├── TranslateGva(u64)
└── Intercept { enable, inner: HypervisorImplError }
Partially addresses #998 (fixes the bug, but doesn't update the errors everywhere)