VirtualBox

Ticket #16643 (closed defect: fixed)

Opened 5 months ago

Last modified 4 weeks ago

rdtsc is not reset on CPU reset => Fixed in SVN

Reported by: axeld Owned by:
Priority: major Component: other
Version: VirtualBox 5.1.18 Keywords: cpu rdtsc reset
Cc: Guest type: other
Host type: Windows

Description

According to the official Intel documentation, this counter should be reset when the CPU is reset (chapter 17.15 in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 from September 2016):

"The time-stamp counter (as implemented in the P6 family, Pentium, Pentium M, Pentium 4, Intel Xeon, Intel Core Solo and Intel Core Duo processors and later processors) is a 64-bit counter that is set to 0 following a RESET of the processor."

You can easily reproduce this using Haiku which uses rdtsc to compute the uptime of the system.

Attachments

VBox.log.2 Download (130.5 KB) - added by jeffcourteau 7 weeks ago.
Log file of a VM not booting
VBox.log.1 Download (115.0 KB) - added by jeffcourteau 7 weeks ago.
VBox.log.3 Download (157.5 KB) - added by jeffcourteau 7 weeks ago.
SRVWEB02.VBox.log.2 Download (144.3 KB) - added by jeffcourteau 7 weeks ago.

Change History

comment:1 Changed 4 months ago by klaus

It's clear that the TSC is set to 0 following a CPU reset - but what situations trigger a CPU reset? Does a warm/cold reboot by jumping to the BIOS reset the CPU (I wouldn't expect it to)? Which way(s) is Haiku using to reboot the system?

comment:2 Changed 4 months ago by axeld

Haiku tries to reboot via ACPI first, and if that fails, uses the keyboard controller to reset the machine. If that fails, too, it overwrites the local descriptor table with null:  http://code.metager.de/source/xref/haiku/src/system/kernel/arch/x86/arch_cpu.cpp#1212

I'm not sure which method is actually used here, I'd guess it's ACPI. In any case, the machine reboots just fine otherwise :-)

comment:3 Changed 4 months ago by michaln

Yes, the TSC is set to zero on CPU reset, but an OS does not take control at CPU reset. The system could be sitting at some boot menu for an hour or a day or a month. In addition, the firmware / boot loader is free to set the TSC to any specific or random value before the OS boots.

The upshot is that an OS cannot extrapolate anything from the value the TSC has when the OS boots. An OS can only compare the current TSC value with the value the TSC had when the OS first read it.

comment:4 Changed 4 months ago by axeld

One could argue that the time spent in the boot loader accounts to the system uptime as well, but this ticket should not be about that Haiku could handle computing the system uptime differently (you're free to report a bug at the project's bug tracker, though :-)).

It just reports a bug in VirtualBox, and Haiku merely offers a way to reproduce it conveniently.

comment:5 Changed 4 months ago by michaln

If you could find some official document stating that the firmware is not allowed to manipulate the TSC and that every reboot must result in a hard reset of the CPU, that would help. Otherwise it just looks like Haiku making invalid assumptions.

comment:6 Changed 4 months ago by axeld

I'm afraid there is no such document. However, there is the Intel specification that VirtualBox clearly violates, no matter what kind of assumptions Haiku makes, again it's just a convenient test case.

If you want to ignore this, fine, I just reported it to improve the software. But please don't come up with excuses why VirtualBox behaves within the spec here. It does not.

comment:7 Changed 7 weeks ago by jeffcourteau

I may have found a bug caused by this. On multi-CPU VBox VMs, if they have been running for quite some time (sometimes I do not need to reboot for a month or 2), and that the OS running in the VM is Windows (2012R2 in my case, but I think this applies to any Windows version), Windows will not load past the splash screen (spinning dots on W2K12R2). I have to issue a power off / power on to solve the problem.

I have found a VMware bug ( https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2092807&src=vmw_so_vex_mbrad_895) that has been corrected in VMware. It says that the TSC (Time Stamp Counter) is incremented as CPU cycles go. But cycles are not always equally distributed for all CPUs, and when you ask Windows to reboot, it does a "soft reset", not a complete CPU reset, which does not clear the TSC (at least in VMware). When TSC is not nearly equal on all CPUs, Windows refuses to boot (though I do not see how Windows should be concerned by this at boot time...)

Maybe a patch should be issued in VBox to reset the TSC, or at least set it equal on all CPUs, when an ACPI soft reset is asked by the OS.

comment:8 Changed 7 weeks ago by frank

jeffcourteau, did you really experience such a problem with VirtualBox? If so, can you provide a VBox.log file of such a VM session?

Changed 7 weeks ago by jeffcourteau

Log file of a VM not booting

Changed 7 weeks ago by jeffcourteau

comment:9 Changed 7 weeks ago by jeffcourteau

VBox.log.1 is the log file when I rebooted the VM and it hung at Windows splash screen. Even a reset would not do the trick, I had to poweroff / power on.

VBox.log.2 is the log file after I powered off / powered on.

comment:10 Changed 7 weeks ago by frank

Thanks for the log files but you probably mixed something up. VBox.log.2 shows RESTTING after 12:58 and the following events hint that the guest was not stuck: The guest driver was loaded and there are several screen resize events up to 1600x900. So unlikely that this guest hung during splash screen. VBox.log.1 has a long uptime of 327:42h but does not show a single RESTTING line.

Did you attach the wrong files?

Changed 7 weeks ago by jeffcourteau

comment:11 Changed 7 weeks ago by jeffcourteau

Here with the oldest log file, VBox.log.3, I know I encountered the situation in this one. You can see the first reset I issued at the 1279:29:59.186625 timestamp, after 24 minutes stuck on the splashscreen...

Changed 7 weeks ago by jeffcourteau

comment:12 Changed 7 weeks ago by michaln

Yes, that log is much more interesting. The VM reset happened after about 53.3 days of uptime. The calculations in the VMware KB article indicate that on a 4 GHz system the Windows bug will be triggered after about 52 days (slower TSC means longer uptime before the bug triggers).

There is good evidence that you really did hit this Windows bug. In the VBox.log.3 file, you can see at the end that /TM/TSC/offCPU1 is wildly different from /TM/TSC/offCPU0. That's because Windows changed the TSC on CPU0 but left it alone on CPU1. In VBox.log.1 the two are identical after almost two weeks of uptime.

Your summary of the problem in comment 7 is not at all what VMware describes. The problem isn't that the TSC is out of sync, the problem is that Windows makes the TSC out of sync if the TSC value at boot time is relatively high (greater than 0x40000000000000).

comment:13 Changed 7 weeks ago by jeffcourteau

OK so basically it would be a Windows bug when it runs inside a VM, but not on bare metal? If the bug is not there on bare metal, isn't it a VBox / VMware bug, that should mimic bare metal behavior?

comment:14 follow-up: ↓ 16 Changed 7 weeks ago by frank

We will probably change VBox to behave like most bare metal (ie reset the TSC on reset).

But so far there is no evidence for any VBox bug -- only for a Windows bug! The different behavior is not necessarily an evidence for a VBox bug. If bare metal would reset the TSC just for paranoia (without any clearly documented need) then users of Windows are just lucky that the Windows bug does not hit them.

comment:15 Changed 7 weeks ago by michaln

It's not virtual vs. physical system, it's more a question of platform/firmware behavior.

I have to assume that it's trivial (if tedious) to reproduce the same behavior on a physical system. Power on the system, let it sit at some boot prompt for 2-4 months, then continue booting. That should let the TSC advance enough that Windows will get confused.

comment:16 in reply to: ↑ 14 Changed 7 weeks ago by axeld

Replying to frank:

But so far there is no evidence for any VBox bug -- only for a Windows bug!

How do interprete the Intel CPU spec then? Do you generally not care about it that much, ie. is it not a reliable source of information?

comment:17 Changed 7 weeks ago by michaln

Does the Intel SDM say somewhere that firmware must perform a hard reset of the CPU on system reboot? If so, I'd like to know where exactly.

And no, in general the SDM is not an entirely reliable source of information. There are quite a few errors and omissions. Is it a good source of information? Absolutely. Is it 100% reliable? Absolutely not.

comment:18 Changed 4 weeks ago by michaln

  • Status changed from new to closed
  • Resolution set to fixed
  • Summary changed from rdtsc is not reset on CPU reset to rdtsc is not reset on CPU reset => Fixed in SVN

We confirmed that Windows 10 is buggy (as VMware found) and will therefore reset the TSC on VM reset in the next VirtualBox maintenance update.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use