VirtualBox

Ticket #6250 (closed defect: fixed)

Opened 4 years ago

Last modified 4 years ago

Virtual clock synchronization giving up on new kernels

Reported by: zerem Owned by:
Priority: critical Component: VMM
Version: VirtualBox 3.1.4 Keywords: TM time sync lag
Cc: Guest type: Windows
Host type: Linux

Description (last modified by frank) (diff)

Hello, last references for lines like following i found for VirtualBox version 2.x.x, so i am reporting as new defect (unsuccesfully have been fighting with it for few weeks).

199591902:03:47.066 TM: u64DeltaPrev=-1326594272658
    u64PrevNanoTS=0x0000016f3d7e0d13 u64NanoTS=0x0000003a5e4fdf81
199591902:03:47.066 TM: Not bothering to attempt catching up a
    18 446 742 747 087 831 003 ns lag; new total: 18 446 744 073 654 626 217

These lines filling Virtual Machine (Windows 7, Ubuntu) log file after powering, machine is so slow it won't boot.

Source of this error is in  http://www.virtualbox.org/svn/vbox/trunk/src/VBox/VMM/TM.cpp (tmR3TimerQueueRunVirtualSync).

Enabling/disabling in VM VT-x has no effect, (VT-x is enabled in bios, cpu is cappable).

I tried several kernels (ubuntu karmic) so result is following

2.6.30-5 - works excellent
2.6.31-17 - failing
2.6.32-13 - failing
2.6.32-02063208 - failing

I also used few versions of VirtualBox, all failing with new kernels Official 3.1.4, 3.0.12, Opensouce 3.1.2

Attachments

Windows 7 A-2010-02-20-10-00-39 - Failing.log Download (141.6 KB) - added by zerem 4 years ago.
zerem-lshw Download (25.0 KB) - added by zerem 4 years ago.
Used HW
zerem-kernel Download (2.6 KB) - added by zerem 4 years ago.
Kernel info
zerem-cpuinfo Download (3.0 KB) - added by zerem 4 years ago.
Cpuinfo
dmesg Download (51.4 KB) - added by zerem 4 years ago.
fresh start + run vm
ose-dmesg Download (58.1 KB) - added by zerem 4 years ago.
ose test dmesg
ose-vbox-working.log Download (59.7 KB) - added by zerem 4 years ago.
ose test - 1st run
ose-vbox-failing.log Download (89.9 KB) - added by zerem 4 years ago.
ose test - 2nd run
testcase-vbox.log Download (299.4 KB) - added by zerem 4 years ago.
testcase log (file trimmed to 6k lines)
c-state-disable.jpg Download (99.5 KB) - added by zerem 4 years ago.
i had to disable this option
second-dmesg Download (52.1 KB) - added by zerem 4 years ago.
second-timer_list Download (10.9 KB) - added by zerem 4 years ago.
working-dmesg Download (53.2 KB) - added by zerem 4 years ago.
working-timer_list Download (12.5 KB) - added by zerem 4 years ago.
working-vbox.log Download (63.4 KB) - added by zerem 4 years ago.

Change History

Changed 4 years ago by zerem

Changed 4 years ago by zerem

Used HW

Changed 4 years ago by zerem

Kernel info

Changed 4 years ago by zerem

Cpuinfo

comment:1 Changed 4 years ago by frank

Please attach the complete output of dmesg of the host after you run such a VM.

Changed 4 years ago by zerem

fresh start + run vm

comment:2 Changed 4 years ago by zerem

before that, i did fresh download vbox 3.1.4 and new kernel module. same results

comment:3 Changed 4 years ago by zerem

also found out that vbox is running fine on kernel 6.31-9-rt (ubuntu realtime kernel)

comment:4 Changed 4 years ago by frank

Hmm, is that dmesg output really from your VBox host? I don't see any output of the VBox kernel module within this log ...

comment:5 Changed 4 years ago by zerem

Uploaded the dmesg right after powering off vbox.

ondrej@hnedojed:~$ grep vbox /proc/modules vboxnetadp 6528 0 - Live 0xffffffffa0063000 vboxnetflt 14288 0 - Live 0xffffffffa0595000 vboxdrv 1777740 2 vboxnetadp,vboxnetflt, Live 0xffffffffa03e0000 ondrej@hnedojed:~$ grep vbox /var/log/dmesg [ 33.424971] vboxdrv: Trying to deactivate the NMI watchdog permanently... [ 33.424974] vboxdrv: Warning: 2.6.31+ kernel detected. Most likely the hardware performance [ 33.424975] vboxdrv: counter framework which can generate NMIs is active. You have to prevent [ 33.424976] vboxdrv: the usage of hardware performance counters by [ 33.424977] vboxdrv: echo 2 > /proc/sys/kernel/perf_counter_paranoid [ 33.424980] vboxdrv: Found 4 processor cores. [ 33.425059] vboxdrv: fAsync=1 offMin=0x5f824 offMax=0x5f824 [ 33.425098] vboxdrv: TSC mode is 'asynchronous', kernel timer mode is 'normal'. [ 33.425100] vboxdrv: Successfully loaded version 3.1.4 (interface 0x00100001).

comment:6 Changed 4 years ago by zerem

Sorry, I accidentally hit submit instead of preview

ondrej@hnedojed:~$ grep vbox /proc/modules 
vboxnetadp 6528 0 - Live 0xffffffffa0063000
vboxnetflt 14288 0 - Live 0xffffffffa0595000
vboxdrv 1777740 2 vboxnetadp,vboxnetflt, Live 0xffffffffa03e0000
ondrej@hnedojed:~$ grep vbox /var/log/dmesg
[   33.424971] vboxdrv: Trying to deactivate the NMI watchdog permanently...
[   33.424974] vboxdrv: Warning: 2.6.31+ kernel detected. Most likely the hardware performance
[   33.424975] vboxdrv: counter framework which can generate NMIs is active. You have to prevent
[   33.424976] vboxdrv: the usage of hardware performance counters by
[   33.424977] vboxdrv:   echo 2 > /proc/sys/kernel/perf_counter_paranoid
[   33.424980] vboxdrv: Found 4 processor cores.
[   33.425059] vboxdrv: fAsync=1 offMin=0x5f824 offMax=0x5f824
[   33.425098] vboxdrv: TSC mode is 'asynchronous', kernel timer mode is 'normal'.
[   33.425100] vboxdrv: Successfully loaded version 3.1.4 (interface 0x00100001).

Changed 4 years ago by zerem

ose test dmesg

Changed 4 years ago by zerem

ose test - 1st run

Changed 4 years ago by zerem

ose test - 2nd run

comment:7 Changed 4 years ago by zerem

I did another test. Cleaned virtualbox-3.1, restart, installed virtualbox-ose. First run was ok, speed and response of guest was excellent. Windows downloaded and installed some updates, then i did clean shutdown. Second run was failing from begining (filling log with u64DeltaPrev, vbox bios hasn't even shown). then i had to do "pkill VirtualBox".

Thanks for cooperation :)

comment:8 Changed 4 years ago by frank

We still cannot reproduce this issue. So we will probably send you some test build which could help debugging this problem.

comment:9 Changed 4 years ago by zerem

ok, i'll wait & try my best finding this issue.

Changed 4 years ago by zerem

testcase log (file trimmed to 6k lines)

comment:10 Changed 4 years ago by zerem

Issue was fixed by disabling Intel C-STATE tech in bios.

Changed 4 years ago by zerem

i had to disable this option

comment:11 Changed 4 years ago by frank

Thank you for this valuable information! We will try to find a workaround which doesn't require this change.

comment:12 Changed 4 years ago by harmscon

Thanks from me too. We have a similar issue in that the time in the WinXP guest is runs slowly on Linux host, with the same "TM: Not bothering" message. Every now and then it gets synchronised.. not sure if that is from Guest additions or windows time service - whatever.. it shouldn't run slowly in the first place. This is a good lead for me to follow. I looked up C-state and got this nice summary:  http://www.tomshardware.com/forum/217331-28-state

We have an AMD system with Cool and Quiet enabled so when I get a quiet moment I'll reboot the system, disable that and report back here. Thanks to all.

comment:13 Changed 4 years ago by frank

Btw, another workaround should be to boot the host Linux with processor.max_cstate=1.

comment:14 Changed 4 years ago by frank

Everyone who is experiencing such a problem: Could you give me the output of

dmesg | grep vbox

and

cat /proc/timer_list > ~/timer_list.log

? Please attach the file timer_list.log to this ticket. Make sure that you don't use any of the workarounds as suggested above.

comment:15 follow-up: ↓ 20 Changed 4 years ago by frank

I think I found the correct fix which does not depend on a workaround. To fix the problem you have to edit /usr/src/vboxdrv/r0drv/linux/timer-r0drv-linux.c and replace the two mod_timer() calls by mod_timer_pinned(). After that, recompile the host kernel module (/etc/init.d/vboxdrv setup).

comment:16 Changed 4 years ago by joy

Frank has sent me here from #5620. So here's my report on Ubuntu 9.10 AMD 64bit with CPU manually set to Performance mode (i.e. 100% speed all the time):

1) VirtualBox 3.0.12 (that I've been using for months and was happy with it) is _slow_ on stock 2.6.31 Ubuntu kernel - about three times slower than on 2.6.32-02063204 kernel.

2) VirtualBox 3.0.12 with the above mentioned fix (mod_timer() -> mod_timer_pinned()) on stock kernel 2.6.31 - VEERY SLOW (about 10 times slower than without the _pinned fix)

3) fresh install of VirtualBox 3.1.4 - does NOT even boot up the WindowsXP on dual CPU. Black screen. Stuck.

4) VirtualBox 3.1.4 with the above mentioned fix (mod_timer() -> mod_timer_pinned()) on stock kernel 2.6.31 - boots up the WindowsXP guest but is as slow as the good old 3.0.12 without the pinned() fix.

My summary is that 3.1.x doesn't work until you manually modify the timer-r0drv-linux.c and even after that it's not faster than 3.0.12, i.e. it's still very slow.

If the problem is indeed in the Ubuntu stock 2.6.31 kernel (as upgrading the kernel to 2.6.32 fixes the 3.0.12 slowness) and there is really no application workaround then I would suggest to name that problem clearly and offer Ubuntu users a good kernel to upgrade to (or better yet let Ubuntu developers know there's a problem with their kernels).

comment:17 Changed 4 years ago by frank

First, we still don't know where the problem with the standard Karmic kernel is. Some users report slowness, for other users (including myself) the standard 2.6.31 Karmic kernel works well. Second, your report that with the pinned fix you were not able to boot on a stock kernel is probably either a strange coincidence or perhaps you did something wrong. The pinned fix is valid for Linux kernels >= 2.6.31 and it is required for CPUs which are handled in asynchronous timer mode (see the VBox.log file). I had a look at the VBox.log files in various tickets and redirected users to this ticket where these two requirements are fulfilled. The reason for the pinned fix is a change in the semantics of a Linux kernel function and the fix is most likely correct.

Again, this fix is probably not related to the usual Karmic slowness problem users reported in #5501.

comment:18 follow-up: ↓ 19 Changed 4 years ago by frank

  • Description modified (diff)

Changed 4 years ago by zerem

Changed 4 years ago by zerem

comment:19 in reply to: ↑ 18 Changed 4 years ago by msteffens@…

Replying to frank: Frank also redirected me here from #4392. I tried changing the lines of source you mentioned and recompiling. My SMP guest still uses > 50% CPU when idle. Not sure if relevant but I also checked the BIOS on my machine and there aren't any C-STATE settings there.

Changed 4 years ago by zerem

Changed 4 years ago by zerem

Changed 4 years ago by zerem

comment:20 in reply to: ↑ 15 Changed 4 years ago by zerem

Replying to frank: With c-state tech enabled I applied your suggested patch, recompiled kernel; working perfect. Subjectively with same speed, when guest is idle vbox process on host is eating 0-1% of CPU.

comment:21 Changed 4 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Fixed in 3.1.6.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use