Chapter 10. Technical Background

Table of Contents

10.1. Where Oracle VM VirtualBox Stores its Files
10.1.1. Machines Created by Oracle VM VirtualBox Version 4.0 or Later
10.1.2. Machines Created by Oracle VM VirtualBox Versions Before 4.0
10.1.3. Global Configuration Data
10.1.4. Summary of 4.0 Configuration Changes
10.1.5. Oracle VM VirtualBox XML Files
10.2. Oracle VM VirtualBox Executables and Components
10.3. Hardware vs. Software Virtualization
10.4. Paravirtualization Providers
10.5. Details About Software Virtualization
10.6. Details About Hardware Virtualization
10.7. Nested Paging and VPIDs

This chapter provides additional information for readers who are familiar with computer architecture and technology and wish to find out more about how Oracle VM VirtualBox works under the hood. The contents of this chapter are not required reading in order to use Oracle VM VirtualBox successfully.

10.1. Where Oracle VM VirtualBox Stores its Files

In Oracle VM VirtualBox, a virtual machine and its settings are described in a virtual machine settings file in XML format. In addition, most virtual machine have one or more virtual hard disks, which are typically represented by disk images, such as those in VDI format. Where all these files are stored depends on which version of Oracle VM VirtualBox created the machine.

10.1.1. Machines Created by Oracle VM VirtualBox Version 4.0 or Later

By default, each virtual machine has one directory on your host computer where all the files of that machine are stored: the XML settings file, with a .vbox file extension, and its disk images.

By default, this machine folder is placed in a common folder called VirtualBox VMs, which Oracle VM VirtualBox creates in the current system user's home directory. The location of this home directory depends on the conventions of the host operating system, as follows:

  • On Windows, this is the location returned by the SHGetFolderPath function of the Windows system library Shell32.dll, asking for the user profile. On very old Windows versions which do not have this function or where it unexpectedly returns an error, there is a fallback based on environment variables. First, %USERPROFILE% is checked. If it does not exist then an attempt with %HOMEDRIVE%%HOMEPATH% is made. A typical location is C:\Users\username.

  • On Linux, Mac OS X, and Oracle Solaris, this is generally taken from the environment variable $HOME, except for the user root where it is taken from the account database. This is a workaround for the frequent trouble caused by users using Oracle VM VirtualBox in combination with the tool sudo which by default does not reset the environment variable $HOME. A typical location on Linux and Oracle Solaris is /home/username and on Mac OS X /Users/username.

For simplicity, we will abbreviate the location of the home directory as $HOME. Using that convention, the common folder for all virtual machines is $HOME/VirtualBox VMs.

As an example, when you create a virtual machine called "Example VM", Oracle VM VirtualBox creates the following:

  • A machine folder $HOME/VirtualBox VMs/Example VM/

  • In the machine folder, a settings file: Example VM.vbox

  • In the machine folder, a virtual disk image: Example VM.vdi.

This is the default layout if you use the Create New Virtual Machine wizard described in Section 1.8, “Creating Your First Virtual Machine”. Once you start working with the VM, additional files are added. Log files are in a subfolder called Logs, and if you have taken snapshots, they are in a Snapshots subfolder. For each VM, you can change the location of its snapshots folder in the VM settings.

You can change the default machine folder by selecting Preferences from the File menu in the Oracle VM VirtualBox main window. Then, in the displayed window, click on the General tab. Alternatively, use VBoxManage setproperty machinefolder. See Section 8.31, “VBoxManage setproperty”.

10.1.2. Machines Created by Oracle VM VirtualBox Versions Before 4.0

If you have upgraded to Oracle VM VirtualBox 4.0 from an earlier version of Oracle VM VirtualBox, you probably have settings files and disks in the earlier file system layout.

Before version 4.0, Oracle VM VirtualBox separated the machine settings files from virtual disk images. The machine settings files had an .xml file extension and resided in a folder called Machines under the global Oracle VM VirtualBox configuration directory. See Section 10.1.3, “Global Configuration Data”. On Linux, for example, this was the hidden directory $HOME/.VirtualBox/Machines. The default hard disks folder was called HardDisks and was also located in the .VirtualBox folder. Both locations could be changed by the user in the global preferences. The concept of a default hard disk folder was abandoned with Oracle VM VirtualBox 4.0, since disk images now reside in each machine's folder by default.

The old layout had the following severe disadvantages:

  • It was very difficult to move a virtual machine from one host to another because the files involved did not reside in the same folder. In addition, the virtual media of all machines were registered with a global registry in the central Oracle VM VirtualBox settings file, $HOME/.VirtualBox/VirtualBox.xml.

    To move a machine to another host, it was therefore not enough to move the XML settings file and the disk images, which were in different locations, but the hard disk entries from the global media registry XML had to be meticulously copied as well. This was close to impossible if the machine had snapshots and therefore differencing images.

  • Storing virtual disk images, which can grow very large, under the hidden .VirtualBox directory, at least on Linux and Oracle Solaris hosts, made many users wonder where their disk space had gone.

Whereas new VMs created with Oracle VM VirtualBox 4.0 or later will conform to the new layout, for maximum compatibility, old VMs are not converted to the new layout. Otherwise machine settings would be irrevocably broken if a user downgraded from 4.0 back to an older version of Oracle VM VirtualBox.

10.1.3. Global Configuration Data

In addition to the files of the virtual machines, Oracle VM VirtualBox maintains global configuration data in the following directory:

  • Linux and Oracle Solaris: $HOME/.config/VirtualBox.

    $HOME/.VirtualBox is used if it exists, for compatibility with legacy versions before Oracle VM VirtualBox 4.3.

  • Windows: $HOME/.VirtualBox.

  • Mac OS X: $HOME/Library/VirtualBox.

Oracle VM VirtualBox creates this configuration directory automatically, if necessary. Optionally, you can specify an alternate configuration directory by setting the VBOX_USER_HOME environment variable, or additionally on Linux or Oracle Solaris by using the standard XDG_CONFIG_HOME variable. Since the global VirtualBox.xml settings file points to all other configuration files, this enables switching between several Oracle VM VirtualBox configurations.

Most importantly, in this directory, Oracle VM VirtualBox stores its global settings file, another XML file called VirtualBox.xml. This includes global configuration options and the list of registered virtual machines with pointers to their XML settings files. Neither the location of this file nor its directory has changed with Oracle VM VirtualBox 4.0.

Before Oracle VM VirtualBox 4.0, all virtual media, such as disk image files, were also contained in a global registry in this settings file. For compatibility, this media registry still exists if you upgrade Oracle VM VirtualBox and there are media from machines which were created with a version before 4.0. If you have no such machines, then there will be no global media registry. With Oracle VM VirtualBox 4.0, each machine XML file has its own media registry.

Also before Oracle VM VirtualBox 4.0, the default Machines folder and the default HardDisks folder resided under the Oracle VM VirtualBox configuration directory, such as $HOME/.VirtualBox/Machines on Linux. If you are upgrading from an Oracle VM VirtualBox version before 4.0, files in these directories are not automatically moved in order not to break backwards compatibility.

10.1.4. Summary of 4.0 Configuration Changes

The following table gives a brief overview of the configuration changes between legacy versions and version 4.0 or later.

Table 10.1. Configuration Changes in Version 4.0 or Above

Setting

Before 4.0

4.0 or above

Default machines folder

$HOME/.VirtualBox/Machines

$HOME/VirtualBox VMs

Default disk image location

$HOME/.VirtualBox/HardDisks

In each machine's folder

Machine settings file extension

.xml

.vbox

Media registry

Global VirtualBox.xml file

Each machine settings file

Media registration

Explicit open/close required

Automatic on attach

10.1.5. Oracle VM VirtualBox XML Files

Oracle VM VirtualBox uses XML for both the machine settings files and the global configuration file, VirtualBox.xml.

All Oracle VM VirtualBox XML files are versioned. When a new settings file is created, for example because a new virtual machine is created, Oracle VM VirtualBox automatically uses the settings format of the current Oracle VM VirtualBox version. These files may not be readable if you downgrade to an earlier version of Oracle VM VirtualBox. However, when Oracle VM VirtualBox encounters a settings file from an earlier version, such as after upgrading Oracle VM VirtualBox, it attempts to preserve the settings format as much as possible. It will only silently upgrade the settings format if the current settings cannot be expressed in the old format, for example because you enabled a feature that was not present in an earlier version of Oracle VM VirtualBox.

As an example, before Oracle VM VirtualBox 3.1, it was only possible to enable or disable a single DVD drive in a virtual machine. If it was enabled, then it would always be visible as the secondary master of the IDE controller. With Oracle VM VirtualBox 3.1, DVD drives can be attached to arbitrary slots of arbitrary controllers, so they could be the secondary slave of an IDE controller or in a SATA slot. If you have a machine settings file from an earlier version and upgrade Oracle VM VirtualBox to 3.1 and then move the DVD drive from its default position, this cannot be expressed in the old settings format; the XML machine file would get written in the new format, and a backup file of the old format would be kept.

In such cases, Oracle VM VirtualBox backs up the old settings file in the virtual machine's configuration directory. If you need to go back to the earlier version of Oracle VM VirtualBox, then you will need to manually copy these backup files back.

We intentionally do not document the specifications of the Oracle VM VirtualBox XML files, as we must reserve the right to modify them in the future. We therefore strongly suggest that you do not edit these files manually. Oracle VM VirtualBox provides complete access to its configuration data through its the VBoxManage command line tool, see Chapter 8, VBoxManage and its API, see Chapter 11, Oracle VM VirtualBox Programming Interfaces.

10.2. Oracle VM VirtualBox Executables and Components

Oracle VM VirtualBox was designed to be modular and flexible. When the Oracle VM VirtualBox graphical user interface (GUI) is opened and a VM is started, at least the following three processes are running:

  • VBoxSVC, the Oracle VM VirtualBox service process which always runs in the background. This process is started automatically by the first Oracle VM VirtualBox client process and exits a short time after the last client exits. The first Oracle VM VirtualBox service can be the GUI, VBoxManage, VBoxHeadless, the web service amongst others. The service is responsible for bookkeeping, maintaining the state of all VMs, and for providing communication between Oracle VM VirtualBox components. This communication is implemented using COM/XPCOM.

    Note

    When we refer to clients here, we mean the local clients of a particular VBoxSVC server process, not clients in a network. Oracle VM VirtualBox employs its own client/server design to allow its processes to cooperate, but all these processes run under the same user account on the host operating system, and this is totally transparent to the user.

  • The GUI process, VirtualBoxVM, a client application based on the cross-platform Qt library. When started without the --startvm option, this application acts as the VirtualBox Manager, displaying the VMs and their settings. It then communicates settings and state changes to VBoxSVC and also reflects changes effected through other means, such as the VBoxManage command.

  • If the VirtualBoxVM client application is started with the --startvm argument, it loads the VMM library which includes the actual hypervisor and then runs a virtual machine and provides the input and output for the guest.

Any Oracle VM VirtualBox front-end, or client, will communicate with the service process and can both control and reflect the current state. For example, either the VM selector or the VM window or VBoxManage can be used to pause the running VM, and other components will always reflect the changed state.

The Oracle VM VirtualBox GUI application is only one of several available front ends, or clients. The complete list shipped with Oracle VM VirtualBox is as follows:

  • VirtualBoxVM: The Qt front end implementing the VirtualBox Manager and running VMs.

  • VBoxManage: A less user-friendly but more powerful alternative. See Chapter 8, VBoxManage.

  • VBoxHeadless: A VM front end which does not directly provide any video output and keyboard or mouse input, but enables redirection through the VirtualBox Remote Desktop Extension. See Section 7.1.2, “VBoxHeadless, the Remote Desktop Server”.

  • vboxwebsrv: The Oracle VM VirtualBox web service process which enables control of an Oracle VM VirtualBox host remotely. This is described in detail in the Oracle VM VirtualBox Software Development Kit (SDK) reference. See Chapter 11, Oracle VM VirtualBox Programming Interfaces.

  • The Oracle VM VirtualBox Python shell: A Python alternative to VBoxManage. This is also described in the SDK reference.

Internally, Oracle VM VirtualBox consists of many more or less separate components. You may encounter these when analyzing Oracle VM VirtualBox internal error messages or log files. These include the following:

  • IPRT: A portable runtime library which abstracts file access, threading, and string manipulation. Whenever Oracle VM VirtualBox accesses host operating features, it does so through this library for cross-platform portability.

  • VMM (Virtual Machine Monitor): The heart of the hypervisor.

  • EM (Execution Manager): Controls execution of guest code.

  • REM (Recompiled Execution Monitor): Provides software emulation of CPU instructions.

  • TRPM (Trap Manager): Intercepts and processes guest traps and exceptions.

  • HM (Hardware Acceleration Manager): Provides support for VT-x and AMD-V.

  • GIM (Guest Interface Manager): Provides support for various paravirtualization interfaces to the guest.

  • PDM (Pluggable Device Manager): An abstract interface between the VMM and emulated devices which separates device implementations from VMM internals and makes it easy to add new emulated devices. Through PDM, third-party developers can add new virtual devices to Oracle VM VirtualBox without having to change Oracle VM VirtualBox itself.

  • PGM (Page Manager): A component that controls guest paging.

  • PATM (Patch Manager): Patches guest code to improve and speed up software virtualization.

  • TM (Time Manager): Handles timers and all aspects of time inside guests.

  • CFGM (Configuration Manager): Provides a tree structure which holds configuration settings for the VM and all emulated devices.

  • SSM (Saved State Manager): Saves and loads VM state.

  • VUSB (Virtual USB): A USB layer which separates emulated USB controllers from the controllers on the host and from USB devices. This component also enables remote USB.

  • DBGF (Debug Facility): A built-in VM debugger.

  • Oracle VM VirtualBox emulates a number of devices to provide the hardware environment that various guests need. Most of these are standard devices found in many PC compatible machines and widely supported by guest operating systems. For network and storage devices in particular, there are several options for the emulated devices to access the underlying hardware. These devices are managed by PDM.

  • Guest Additions for various guest operating systems. This is code that is installed from within a virtual machine. See Chapter 4, Guest Additions.

  • The "Main" component is special. It ties all the above bits together and is the only public API that Oracle VM VirtualBox provides. All the client processes listed above use only this API and never access the hypervisor components directly. As a result, third-party applications that use the Oracle VM VirtualBox Main API can rely on the fact that it is always well-tested and that all capabilities of Oracle VM VirtualBox are fully exposed. It is this API that is described in the Oracle VM VirtualBox SDK. See Chapter 11, Oracle VM VirtualBox Programming Interfaces.

10.3. Hardware vs. Software Virtualization

Oracle VM VirtualBox enables software in the virtual machine to run directly on the processor of the host, but an array of complex techniques is employed to intercept operations that would interfere with your host. Whenever the guest attempts to do something that could be harmful to your computer and its data, Oracle VM VirtualBox steps in and takes action. In particular, for lots of hardware that the guest believes to be accessing, Oracle VM VirtualBox simulates a certain "virtual" environment according to how you have configured a virtual machine. For example, when the guest attempts to access a hard disk, Oracle VM VirtualBox redirects these requests to whatever you have configured to be the virtual machine's virtual hard disk. This is normally an image file on your host.

Unfortunately, the x86 platform was never designed to be virtualized. Detecting situations in which Oracle VM VirtualBox needs to take control over the guest code that is executing, as described above, is difficult. There are two ways in which to achieve this:

  • Since 2006, Intel and AMD processors have had support for so-called hardware virtualization. This means that these processors can help Oracle VM VirtualBox to intercept potentially dangerous operations that a guest operating system may be attempting and also makes it easier to present virtual hardware to a virtual machine.

    These hardware features differ between Intel and AMD processors. Intel named its technology >VT-x. AMD calls theirs AMD-V. The Intel and AMD support for virtualization is very different in detail, but not very different in principle.

    Note

    On many systems, the hardware virtualization features first need to be enabled in the BIOS before Oracle VM VirtualBox can use them.

  • As opposed to other virtualization software, for many usage scenarios, Oracle VM VirtualBox does not require hardware virtualization features to be present. Through sophisticated techniques, Oracle VM VirtualBox virtualizes many guest operating systems entirely in software. This means that you can run virtual machines even on older processors which do not support hardware virtualization.

Even though Oracle VM VirtualBox does not always require hardware virtualization, enabling it is required in the following scenarios:

  • Certain rare guest operating systems like OS/2 make use of very esoteric processor instructions that are not supported with our software virtualization. For virtual machines that are configured to contain such an operating system, hardware virtualization is enabled automatically.

  • Oracle VM VirtualBox's 64-bit guest support, added with version 2.0, and multiprocessing (SMP), added with version 3.0, both require hardware virtualization to be enabled. This is not much of a limitation since the vast majority of today's 64-bit and multicore CPUs ship with hardware virtualization anyway. The exceptions to this rule are older Intel Celeron and AMD Opteron CPUs, for example.

Warning

Do not run other hypervisors, either open source or commercial virtualization products, together with Oracle VM VirtualBox. While several hypervisors can normally be installed in parallel, do not attempt to run several virtual machines from competing hypervisors at the same time. Oracle VM VirtualBox cannot track what another hypervisor is currently attempting to do on the same host, and especially if several products attempt to use hardware virtualization features such as VT-x, this can crash the entire host. Also, within Oracle VM VirtualBox, you can mix software and hardware virtualization when running multiple VMs. In certain cases a small performance penalty will be unavoidable when mixing VT-x and software virtualization VMs. We recommend not mixing virtualization modes if maximum performance and low overhead are essential. This does not apply to AMD-V.

10.4. Paravirtualization Providers

Oracle VM VirtualBox enables the exposure of a paravirtualization interface, to facilitate accurate and efficient execution of software within a virtual machine. These interfaces require the guest operating system to recognize their presence and make use of them in order to leverage the benefits of communicating with the Oracle VM VirtualBox hypervisor.

Most modern mainstream guest operating systems, including Windows and Linux, ship with support for one or more paravirtualization interfaces. Hence, there is typically no need to install additional software in the guest to take advantage of this feature.

Exposing a paravirtualization provider to the guest operating system does not rely on the choice of host platforms. For example, the Hyper-V paravirtualization provider can be used for VMs to run on any host platform supported by Oracle VM VirtualBox and not just Windows.

Oracle VM VirtualBox provides the following interfaces:

  • Minimal: Announces the presence of a virtualized environment. Additionally, reports the TSC and APIC frequency to the guest operating system. This provider is mandatory for running any Mac OS X guests.

  • KVM: Presents a Linux KVM hypervisor interface which is recognized by Linux kernels version 2.6.25 or later. Oracle VM VirtualBox's implementation currently supports paravirtualized clocks and SMP spinlocks. This provider is recommended for Linux guests.

  • Hyper-V: Presents a Microsoft Hyper-V hypervisor interface which is recognized by Windows 7 and newer operating systems. Oracle VM VirtualBox's implementation currently supports paravirtualized clocks, APIC frequency reporting, guest debugging, guest crash reporting and relaxed timer checks. This provider is recommended for Windows guests.

10.5. Details About Software Virtualization

Implementing virtualization on x86 CPUs with no hardware virtualization support is an extraordinarily complex task because the CPU architecture was not designed to be virtualized. The problems can usually be solved, but at the cost of reduced performance. Thus, there is a constant clash between virtualization performance and accuracy.

The x86 instruction set was originally designed in the 1970s and underwent significant changes with the addition of protected mode in the 1980s with the 286 CPU architecture and then again with the Intel 386 and its 32-bit architecture. Whereas the 386 did have limited virtualization support for real mode operation with V86 mode, as used by the "DOS Box" of Windows 3.x and OS/2 2.x, no support was provided for virtualizing the entire architecture.

In theory, software virtualization is not overly complex. There are four privilege levels, called rings, provided by the hardware. Typically only two rings are used: ring 0 for kernel mode and ring 3 for user mode. Additionally, one needs to differentiate between host context and guest context.

In host context, everything is as if no hypervisor was active. This might be the active mode if another application on your host has been scheduled CPU time. In that case, there is a host ring 3 mode and a host ring 0 mode. The hypervisor is not involved.

In guest context, however, a virtual machine is active. So long as the guest code is running in ring 3, this is not much of a problem since a hypervisor can set up the page tables properly and run that code natively on the processor. The problems mostly lie in how to intercept what the guest's kernel does.

There are several possible solutions to these problems. One approach is full software emulation, usually involving recompilation. That is, all code to be run by the guest is analyzed, transformed into a form which will not allow the guest to either modify or see the true state of the CPU, and only then executed. This process is obviously highly complex and costly in terms of performance. Oracle VM VirtualBox contains a recompiler based on QEMU which can be used for pure software emulation, but the recompiler is only activated in special situations, described below.

Another possible solution is paravirtualization, in which only specially modified guest OSes are allowed to run. This way, most of the hardware access is abstracted and any functions which would normally access the hardware or privileged CPU state are passed on to the hypervisor instead. Paravirtualization can achieve good functionality and performance on standard x86 CPUs, but it can only work if the guest OS can actually be modified, which is obviously not always the case.

Oracle VM VirtualBox chooses a different approach. When starting a virtual machine, through its ring-0 support kernel driver, Oracle VM VirtualBox has set up the host system so that it can run most of the guest code natively, but it has inserted itself at the "bottom" of the picture. It can then assume control when needed. If a privileged instruction is executed, the guest traps, in particular because an I/O register was accessed and a device needs to be virtualized, or external interrupts occur. Oracle VM VirtualBox may then handle this and either route a request to a virtual device or possibly delegate handling such things to the guest or host OS. In guest context, Oracle VM VirtualBox can therefore be in one of three states:

  • Guest ring 3 code is run unmodified, at full speed, as much as possible. The number of faults will generally be low, unless the guest allows port I/O from ring 3. This is something we cannot do as we do not want the guest to be able to access real ports. This is also referred to as raw mode, as the guest ring-3 code runs unmodified.

  • For guest code in ring 0, Oracle VM VirtualBox employs a clever trick. It actually reconfigures the guest so that its ring-0 code is run in ring 1 instead, which is normally not used in x86 operating systems). As a result, when guest ring-0 code, actually running n ring 1, such as a guest device driver attempts to write to an I/O register or execute a privileged instruction, the Oracle VM VirtualBox hypervisor in the "real" ring 0 can take over.

  • The hypervisor (VMM) can be active. Every time a fault occurs, Oracle VM VirtualBox looks at the offending instruction and can relegate it to a virtual device or the host OS or the guest OS or run it in the recompiler.

    In particular, the recompiler is used when guest code disables interrupts and Oracle VM VirtualBox cannot figure out when they will be switched back on. In these situations, Oracle VM VirtualBox actually analyzes the guest code using its own disassembler. Also, certain privileged instructions such as LIDT need to be handled specially. Finally, any real-mode or protected-mode code, such as BIOS code, a DOS guest, or any operating system startup, is run in the recompiler entirely.

Unfortunately this only works to a degree. Among others, the following situations require special handling:

  • Running ring 0 code in ring 1 causes a lot of additional instruction faults, as ring 1 is not allowed to execute any privileged instructions, of which guest's ring-0 contains plenty. With each of these faults, the VMM must step in and emulate the code to achieve the desired behavior. While this works, emulating thousands of these faults is very expensive and severely hurts the performance of the virtualized guest.

  • There are certain flaws in the implementation of ring 1 in the x86 architecture that were never fixed. Certain instructions that should trap in ring 1 do not. This affects, for example, the LGDT/SGDT, LIDT/SIDT, or POPF/PUSHF instruction pairs. Whereas the "load" operation is privileged and can therefore be trapped, the "store" instruction always succeed. If the guest is allowed to execute these, it will see the true state of the CPU, not the virtualized state. The CPUID instruction also has the same problem.

  • A hypervisor typically needs to reserve some portion of the guest's address space, both linear address space and selectors, for its own use. This is not entirely transparent to the guest OS and may cause clashes.

  • The SYSENTER instruction, used for system calls, executed by an application running in a guest OS always transitions to ring 0. But that is where the hypervisor runs, not the guest OS. In this case, the hypervisor must trap and emulate the instruction even when it is not desirable.

  • The CPU segment registers contain a "hidden" descriptor cache which is not software-accessible. The hypervisor cannot read, save, or restore this state, but the guest OS may use it.

  • Some resources must, and can, be trapped by the hypervisor, but the access is so frequent that this creates a significant performance overhead. An example is the TPR (Task Priority) register in 32-bit mode. Accesses to this register must be trapped by the hypervisor. But certain guest operating systems, notably Windows and Oracle Solaris, write this register very often, which adversely affects virtualization performance.

To fix these performance and security issues, Oracle VM VirtualBox contains a Code Scanning and Analysis Manager (CSAM), which disassembles guest code, and the Patch Manager (PATM), which can replace it at runtime.

Before executing ring 0 code, CSAM scans it recursively to discover problematic instructions. PATM then performs in-situ patching. It replaces the instruction with a jump to hypervisor memory where an integrated code generator has placed a more suitable implementation. In reality, this is a very complex task as there are lots of odd situations to be discovered and handled correctly. So, with its current complexity, one could argue that PATM is an advanced in-situ recompiler.

In addition, every time a fault occurs, Oracle VM VirtualBox analyzes the offending code to determine if it is possible to patch it in order to prevent it from causing more faults in the future. This approach works well in practice and dramatically improves software virtualization performance.

10.6. Details About Hardware Virtualization

With Intel VT-x, there are two distinct modes of CPU operation: VMX root mode and non-root mode.

  • In root mode, the CPU operates much like older generations of processors without VT-x support. There are four privilege levels, called rings, and the same instruction set is supported, with the addition of several virtualization specific instruction. Root mode is what a host operating system without virtualization uses, and it is also used by a hypervisor when virtualization is active.

  • In non-root mode, CPU operation is significantly different. There are still four privilege rings and the same instruction set, but a new structure called VMCS (Virtual Machine Control Structure) now controls the CPU operation and determines how certain instructions behave. Non-root mode is where guest systems run.

Switching from root mode to non-root mode is called "VM entry", the switch back is "VM exit". The VMCS includes a guest and host state area which is saved/restored at VM entry and exit. Most importantly, the VMCS controls which guest operations will cause VM exits.

The VMCS provides fairly fine-grained control over what the guests can and cannot do. For example, a hypervisor can allow a guest to write certain bits in shadowed control registers, but not others. This enables efficient virtualization in cases where guests can be allowed to write control bits without disrupting the hypervisor, while preventing them from altering control bits over which the hypervisor needs to retain full control. The VMCS also provides control over interrupt delivery and exceptions.

Whenever an instruction or event causes a VM exit, the VMCS contains information about the exit reason, often with accompanying detail. For example, if a write to the CR0 register causes an exit, the offending instruction is recorded, along with the fact that a write access to a control register caused the exit, and information about source and destination register. Thus the hypervisor can efficiently handle the condition without needing advanced techniques such as CSAM and PATM described above.

VT-x inherently avoids several of the problems which software virtualization faces. The guest has its own completely separate address space not shared with the hypervisor, which eliminates potential clashes. Additionally, guest OS kernel code runs at privilege ring 0 in VMX non-root mode, obviating the problems by running ring 0 code at less privileged levels. For example the SYSENTER instruction can transition to ring 0 without causing problems. Naturally, even at ring 0 in VMX non-root mode, any I/O access by guest code still causes a VM exit, allowing for device emulation.

The biggest difference between VT-x and AMD-V is that AMD-V provides a more complete virtualization environment. VT-x requires the VMX non-root code to run with paging enabled, which precludes hardware virtualization of real-mode code and non-paged protected-mode software. This typically only includes firmware and OS loaders, but nevertheless complicates VT-x hypervisor implementation. AMD-V does not have this restriction.

Of course hardware virtualization is not perfect. Compared to software virtualization, the overhead of VM exits is relatively high. This causes problems for devices whose emulation requires high number of traps. One example is the VGA device in 16-color modes, where not only every I/O port access but also every access to the framebuffer memory must be trapped.

10.7. Nested Paging and VPIDs

In addition to normal hardware virtualization, your processor may also support the following additional sophisticated techniques:

  • Nested paging implements some memory management in hardware, which can greatly accelerate hardware virtualization since these tasks no longer need to be performed by the virtualization software.

    With nested paging, the hardware provides another level of indirection when translating linear to physical addresses. Page tables function as before, but linear addresses are now translated to "guest physical" addresses first and not physical addresses directly. A new set of paging registers now exists under the traditional paging mechanism and translates from guest physical addresses to host physical addresses, which are used to access memory.

    Nested paging eliminates the overhead caused by VM exits and page table accesses. In essence, with nested page tables the guest can handle paging without intervention from the hypervisor. Nested paging thus significantly improves virtualization performance.

    On AMD processors, nested paging has been available starting with the Barcelona (K10) architecture. They now call it rapid virtualization indexing (RVI). Intel added support for nested paging, which they call extended page tables (EPT), with their Core i7 (Nehalem) processors.

    If nested paging is enabled, the Oracle VM VirtualBox hypervisor can also use large pages to reduce TLB usage and overhead. This can yield a performance improvement of up to 5%. To enable this feature for a VM, you use the VBoxManage modifyvm --largepages command. See Section 8.8, “VBoxManage modifyvm”.

    If you have an Intel CPU with EPT, please consult Section 13.4.1, “CVE-2018-3646” for security concerns regarding EPT.

  • On Intel CPUs, a hardware feature called Virtual Processor Identifiers (VPIDs) can greatly accelerate context switching by reducing the need for expensive flushing of the processor's Translation Lookaside Buffers (TLBs).

    To enable these features for a VM, you use the VBoxManage modifyvm --vtxvpid and --largepages commands. See Section 8.8, “VBoxManage modifyvm”.