[23] | 1 | /* $Id: Docs-RawMode.cpp 98103 2023-01-17 14:15:46Z vboxsync $ */
|
---|
[1] | 2 | /** @file
|
---|
| 3 | * This file contains the documentation of the raw-mode execution.
|
---|
| 4 | */
|
---|
| 5 |
|
---|
| 6 | /*
|
---|
[98103] | 7 | * Copyright (C) 2006-2023 Oracle and/or its affiliates.
|
---|
[1] | 8 | *
|
---|
[96407] | 9 | * This file is part of VirtualBox base platform packages, as
|
---|
| 10 | * available from https://www.virtualbox.org.
|
---|
| 11 | *
|
---|
| 12 | * This program is free software; you can redistribute it and/or
|
---|
| 13 | * modify it under the terms of the GNU General Public License
|
---|
| 14 | * as published by the Free Software Foundation, in version 3 of the
|
---|
| 15 | * License.
|
---|
| 16 | *
|
---|
| 17 | * This program is distributed in the hope that it will be useful, but
|
---|
| 18 | * WITHOUT ANY WARRANTY; without even the implied warranty of
|
---|
| 19 | * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
---|
| 20 | * General Public License for more details.
|
---|
| 21 | *
|
---|
| 22 | * You should have received a copy of the GNU General Public License
|
---|
| 23 | * along with this program; if not, see <https://www.gnu.org/licenses>.
|
---|
| 24 | *
|
---|
| 25 | * SPDX-License-Identifier: GPL-3.0-only
|
---|
[1] | 26 | */
|
---|
| 27 |
|
---|
| 28 |
|
---|
| 29 |
|
---|
| 30 | /** @page pg_raw Raw-mode Code Execution
|
---|
| 31 | *
|
---|
[80045] | 32 | * VirtualBox 0.0 thru 6.0 implemented a mode of guest code execution that
|
---|
| 33 | * allowed executing mostly raw guest code directly the host CPU but without any
|
---|
| 34 | * support from VT-x or AMD-V. It was implemented for AMD64, AMD-V and VT-x
|
---|
| 35 | * were available (former) or even specified (latter two). This mode was
|
---|
| 36 | * removed in 6.1 (code ripped out) as it was mostly unused by that point and
|
---|
| 37 | * not worth the effort of maintaining.
|
---|
[1] | 38 | *
|
---|
[80045] | 39 | * A future VirtualBox version may reintroduce a new kind of raw-mode for
|
---|
| 40 | * emulating non-x86 architectures, making use of the host MMU to efficiently
|
---|
| 41 | * emulate the target MMU. This is just a wild idea at this point.
|
---|
[1] | 42 | *
|
---|
| 43 | *
|
---|
[80045] | 44 | * @section sec_old_rawmode Old Raw-mode
|
---|
[1] | 45 | *
|
---|
[80045] | 46 | * Running guest code unmodified on the host CPU is reasonably unproblematic for
|
---|
| 47 | * ring-3 code when it runs without IOPL=3. There will be some information
|
---|
| 48 | * leaks thru CPUID, a bunch of 286 area unprivileged instructions revealing
|
---|
| 49 | * privileged information (like SGDT, SIDT, SLDT, STR, SMSW), and hypervisor
|
---|
| 50 | * selectors can probably be identified using VERR, VERW and such instructions.
|
---|
| 51 | * However, it generally works fine for half friendly software when the CPUID
|
---|
| 52 | * difference between the target and host isn't too big.
|
---|
[1] | 53 | *
|
---|
[80045] | 54 | * Kernel code can be executed on the host CPU too, however it needs to be
|
---|
| 55 | * pushed up a ring (guest ring-0 to ring-1, guest ring-1 to ring2) to let the
|
---|
| 56 | * hypervisor (VMMRC.rc) be in charge of ring-0. Ring compression causes
|
---|
| 57 | * issues when CS or SS are pushed and inspected by the guest, since the values
|
---|
| 58 | * will have bit 0 set whereas the guest expects that bit to be cleared. In
|
---|
| 59 | * addition there are problematic instructions like POPF and IRET that the guest
|
---|
| 60 | * code uses to restore/modify EFLAGS.IF state, however the CPU just silently
|
---|
| 61 | * ignores EFLAGS.IF when it isn't running in ring-0 (or with an appropriate
|
---|
| 62 | * IOPL), which causes major headache. The SIDT, SGDT, STR, SLDT and SMSW
|
---|
| 63 | * instructions also causes problems since they will return information about
|
---|
| 64 | * the hypervisor rather than the guest state and cannot be trapped.
|
---|
[1] | 65 | *
|
---|
[80045] | 66 | * So, guest kernel code needed to be scanned (by CSAM) and problematic
|
---|
| 67 | * instructions or sequences patched or recompiled (by PATM).
|
---|
[1] | 68 | *
|
---|
[80045] | 69 | * The raw-mode execution operates in a slightly modified guest memory context,
|
---|
| 70 | * so memory accesses can be done directly without any checking or masking. The
|
---|
| 71 | * modification was to insert the hypervisor in an unused portion of the the
|
---|
| 72 | * page tables, making it float around and require it to be relocated when the
|
---|
| 73 | * guest mapped code into the area it was occupying.
|
---|
| 74 | *
|
---|
| 75 | * The old raw-mode code was 32-bit only because its inception predates the
|
---|
| 76 | * availability of the AMD64 architecture and the promise of AMD-V and VT-x made
|
---|
| 77 | * it unnecessary to do a 64-bit version of the mode. (A long-mode port of the
|
---|
| 78 | * raw-mode execution hypvisor could in theory have been used for both 32-bit
|
---|
| 79 | * and 64-bit guest, making the relocating unnecessary for 32-bit guests,
|
---|
| 80 | * however v8086 mode does not work when the CPU is operating in long-mode made
|
---|
| 81 | * it a little less attractive.)
|
---|
| 82 | *
|
---|
| 83 | *
|
---|
| 84 | * @section sec_rawmode_v2 Raw-mode v2
|
---|
| 85 | *
|
---|
| 86 | * The vision for the reinvention of raw-mode execution is to put it inside
|
---|
| 87 | * VT-x/AMD-V and run non-native instruction sets via a recompiler.
|
---|
| 88 | *
|
---|
| 89 | * The main motivation is TLB emulation using the host MMU. An added benefit is
|
---|
| 90 | * would be that the non-native instruction sets would be add-ons put on top of
|
---|
| 91 | * the existing x86/AMD64 virtualization product and therefore not require a
|
---|
| 92 | * complete separate product build.
|
---|
| 93 | *
|
---|
| 94 | *
|
---|
| 95 | * Outline:
|
---|
| 96 | *
|
---|
| 97 | * - Plug-in based, so the target architecture specific stuff is mostly in
|
---|
| 98 | * separate modules (ring-3, ring-0 (optional) and raw-mode images).
|
---|
| 99 | *
|
---|
| 100 | * - Only 64-bit mode code (no problem since VirtualBox requires a 64-bit host
|
---|
| 101 | * since 6.0). So, not reintroducing structure alignment pain from old RC.
|
---|
| 102 | *
|
---|
| 103 | * - Map the RC-hypervisor modules as ROM, using the shadowing feature for the
|
---|
| 104 | * data sections.
|
---|
| 105 | *
|
---|
| 106 | * - Use MMIO2-like regions for all the memory that the RC-hypervisor needs,
|
---|
| 107 | * all shared with the associated host side plug-in components.
|
---|
| 108 | *
|
---|
| 109 | * - The ROM and MMIO2 regions does not directly end up in the saved state, the
|
---|
| 110 | * state is instead saved by the ring-3 architecture module.
|
---|
| 111 | *
|
---|
| 112 | * - Device access thru MMIO mappings could be done transparently thru to the
|
---|
| 113 | * x86/AMD64 core VMM. It would however be possible to reintroduce the RC
|
---|
| 114 | * side device handling, as that will not be removed in the old-RC cleanup.
|
---|
| 115 | *
|
---|
| 116 | * - Virtual memory managed by the RC-hypervisor, optionally with help of the
|
---|
| 117 | * ring-3 and/or ring-0 architecture modules.
|
---|
| 118 | *
|
---|
| 119 | * - The mapping of the RC modules and memory will probably have to runtime
|
---|
| 120 | * relocatable again, like it was in the old RC. Though initially and for
|
---|
| 121 | * 32-bit target architectures, we will probably use a fixed mapping.
|
---|
| 122 | *
|
---|
| 123 | * - Memory accesses must unfortunately be range checked before being issued,
|
---|
| 124 | * in order to prevent the guest code from accessing the hypervisor. The
|
---|
| 125 | * recompiled code must be able to run, modify state, call ROM code, update
|
---|
| 126 | * statistics and such, so we cannot use page table stuff protect the
|
---|
| 127 | * hypervisor code & data. (If long mode implement segment limits, we
|
---|
| 128 | * could've used that, but it doesn't.)
|
---|
| 129 | *
|
---|
| 130 | * - The RC-hypervisor will make hypercalls to communicate with the ring-0 and
|
---|
| 131 | * ring-3 host code.
|
---|
| 132 | *
|
---|
| 133 | * - The host side should be able to dig out the current guest state from
|
---|
| 134 | * information (think AMD64 unwinding) stored in translation blocks.
|
---|
| 135 | *
|
---|
| 136 | * - Non-atomic state updates outside TBs could be flagged so the host know
|
---|
| 137 | * how to roll the back.
|
---|
| 138 | *
|
---|
| 139 | * - SMP must be taken into account early on.
|
---|
| 140 | *
|
---|
| 141 | * - As must existing IEM-based recompiler ideas, preferrably sharing code
|
---|
| 142 | * (basically compiling IEM targetting the other architecture).
|
---|
| 143 | *
|
---|
| 144 | * The actual implementation will depend a lot on which architectures are
|
---|
| 145 | * targeted and how they can be mapped onto AMD64/x86. It is possible that
|
---|
| 146 | * there are some significan roadblocks preventing us from using the host MMU
|
---|
| 147 | * efficiently even. AMD64 is for instance rather low on virtual address space
|
---|
| 148 | * compared to several other 64-bit architectures, which means we'll generate a
|
---|
| 149 | * lot of \#GPs when the guest tries to access spaced reserved on AMD64. The
|
---|
| 150 | * proposed 5-level page tables will help with this, of course, but that need to
|
---|
| 151 | * get into silicon and into user computers for it to be really helpful.
|
---|
| 152 | *
|
---|
| 153 | * One thing that helps a lot is that we don't have to consider 32-bit x86 any
|
---|
| 154 | * more, meaning that the recompiler only need to generate 64-bit code and can
|
---|
| 155 | * assume having 15-16 GPRs at its disposal.
|
---|
| 156 | *
|
---|
[1] | 157 | */
|
---|
[80045] | 158 |
|
---|