VirtualBox

Ticket #6842 (reopened defect)

Opened 4 years ago

Last modified 2 months ago

Guests clocks not kept synchronized after upgrade to VirtualBox 3.2.0, Ubuntu Server 10.04

Reported by: BACONputing Owned by:
Priority: major Component: other
Version: VirtualBox 3.2.0 Keywords: clock time sync synchronization ubuntu x64 AMD64
Cc: Guest type: Windows
Host type: Linux

Description (last modified by frank) (diff)

I have a computer with an AMD Athlon64 X2 3600+, 6 GB RAM, and Cool 'n Quiet currently disabled in an attempt to solve this problem. It runs 64-bit Ubuntu Server and hosts two 64-bit Ubuntu Server and one 64-bit Windows Server 2003 guests. Back when I was using VirtualBox 3.1.x and the host and two Linux guests were running Ubuntu Server 9.10, I had a problem where the clocks on the two Ubuntu guests were kept in sync but the entire Windows guest (including its clock) would run slow after any significant load was placed on that system. (More information can be found in  this forum post. I will not detail that problem in this bug unless asked to do so.)

I recently upgraded the host and two Linux guests to Ubuntu Server 10.04 and also upgraded to VirtualBox 3.2.0 using Aptitude. Since then, none of the guests' clocks are kept in sync. Sometimes the Ubuntu guests' clocks are more in sync than the Windows guest, sometime it's the other way around, and sometimes they're off by different amounts. Currently, the Windows guest is 607 seconds behind, one Ubuntu guest is 1,127 seconds behind, and the other is 183 seconds behind. They seem to speed up and slow down with no rhyme or reason. All guests have the latest version of VirtualBox Guest Additions installed. The two Ubuntu guests are using the linux-virtual kernel package from Aptitude, and the host uses the linux-server kernel; all are currently version 2.6.32-22-server. None of the guests see very heavy usage; they're idle most of the time and mainly just file servers that provide a few other services.

With VirtualBox 3.1.x/Ubuntu Server 9.04 I noticed long repetitions of "TM: Giving up catch-up attempt at a..." and "TM: u64DeltaPrev=... u64PrevNanoTS=... u64NanoTS=..." entries in the Windows guest's log; now those entries appear in the logs of all three guests.

Attachments

dmesg-2010-05-30 Download (54.2 KB) - added by BACONputing 4 years ago.
dmesg [2010-05-30]
Subvuntu-VBox-2010-05-30.log Download (175.4 KB) - added by BACONputing 4 years ago.
Subvuntu guest VBox.log [2010-05-30]
Win2003Std64-VBox-2010-05-30.log Download (86.9 KB) - added by BACONputing 4 years ago.
Win2003Std64 guest VBox.log [2010-05-30]
Loop.bat Download (33 bytes) - added by BACONputing 4 years ago.
Script for simulating load on Windows guest
Win2003Std64-VBox-2010-06-13.log Download (390.2 KB) - added by BACONputing 4 years ago.
Win2003Std64 guest VBox.log [2010-06-13]; two guests powered on; VBoxService.exe -vvv

Change History

Changed 4 years ago by BACONputing

dmesg [2010-05-30]

Changed 4 years ago by BACONputing

Subvuntu guest VBox.log [2010-05-30]

Changed 4 years ago by BACONputing

Win2003Std64 guest VBox.log [2010-05-30]

comment:1 Changed 4 years ago by BACONputing

With the Windows guest's clock over 1,200 seconds behind, I shut down both Ubuntu guests, at which point the Windows guest's clock instantly jumped to only 16 seconds behind and catching up. In under a minute of wall time, that guest's clock is now in sync, and the "Giving up catch-up attempt at a..." entries have stopped appearing in the log; the last two lines are:

281:51:32.360 Guest Log: VBOXNP: DLL loaded.
281:51:32.376 Guest Log: VBOXNP: DLL unloaded.

The problem now is that running top on the host reports that guest's VBoxHeadless process as causing ~85% CPU usage even though the guest is idle. As soon as I power another guest back on, the Windows guest's VBoxHeadless process goes back to negligible CPU usage but the guest is rapidly losing time again and the "TM: u64DeltaPrev=..." and "TM: Giving up catch-up attempt at a..." entries start re-appearing in the log. I am using the w32tm utility to monitor the guest's clock from a Windows 7 workstation.

comment:2 Changed 4 years ago by BACONputing

After powering on the two Ubuntu guests in my previous comment, with all three guests running the Windows guest was still rapidly losing time while the two Ubuntu guests' clock were both in sync. Within 10 or 20 minutes of powering on the Ubuntu guests, the Windows guest was already 900 seconds behind, although when I checked it five or ten minutes later it was somehow back to 26 seconds behind though continuing to rapidly lose time. The three VBoxHeadless processes on the host were not showing any appreciable processor usage.

I then shut down the Windows guest. After about 20 minutes the two Ubuntu guests clocks are still in sync and processor usage on the host is minimal. After powering the Windows guest back on and with it now fully booted up, that guest is slowly fluctuating between several seconds fast and several seconds slow. 30 minutes later, all three guests are perfectly in sync. If I log in to the Windows guest and run Microsoft Update or perform some other light tasks, all three guests stay in sync. If I do something that causes a lot of load (such as running the attached batch file), however, then the entire guest becomes extremely sluggish (the Start Menu may take several seconds to open, and clock seems to tick about once every three seconds wall time). The two Ubuntu guests are still in sync but the Windows guest is losing time again. Even when the load on the guest stops, the Windows guest will be stuck in this sluggish state until that virtual machine is powered down and back on again (rebooting the guest doesn't seem to help), and sometimes even that doesn't fix it; in this case it didn't.

This is very similar to the problem I saw with VirtualBox 3.1.x/Ubuntu Server 9.10, except I've never seen the Ubuntu guests get out of sync like this.

Changed 4 years ago by BACONputing

Script for simulating load on Windows guest

comment:3 in reply to: ↑ description Changed 4 years ago by sergebass

I also have the same problem with guest clock going out of sync: host=Ubuntu 10.04; guest=WinXPSP3

comment:4 Changed 4 years ago by frank

First of all, the time synchronization of a guest is done using the VBoxService process. This process will request the current host time and try to slowly adapt the guest time to the host time. If the guest time drifts to fast, then the adaption will not be successful. If the drift is bigger than 20 minutes, the VBoxService process in the guest will set the time hard.

You can make the synchronization visible with a Windows guest if you start regedit and search for VBoxService.exe. Then change the entry to VBoxService.exe -vvv, then reboot the guest. Now have a look at the VBox.log file of that guest, it should print detailed information from the VBoxService process, including information regarding the time synchronization. The same can be done for a Linux guest by stopping the VBoxService daemon (sudo /etc/init.d/vboxadd-service stop) and starting it manually (sudo /usr/sbin/VBoxService -fvvv).

Furthermore, as your host has only two cores it does not make much sense to start VMs which in summary would need three cores. This will work but as soon as the load in one VM or one the host is bigger you have to expect problems.

comment:5 Changed 4 years ago by BACONputing

One of the Ubuntu guests is a Samba file server that doesn't see much load, and the other hosts Subversion repositories via Apache and, frankly, just sits around doing nothing all day. So, I don't think it's an issue of the host being overloaded. The Windows guest is a domain controller and file server, so it does see load from that, but nothing excessive. I do think it's not so much an issue of its clock running at a different speed, but rather that, as I described in previous comments, something happens on that guest where it gets permanently "stuck" running at one-third speed, and the slow clock is just a symptom of that. When it gets "stuck" like that it can take several seconds just to open the Start Menu, and if I try to restart Windows, even with no applications running, it can take up to 30 seconds before the logout/shutdown process even begins. Again, there is no other load on the host when it's running slow like this, it's just that at some point in the past there was load on that Windows guest that caused it to enter this slow state.

Just to eliminate it as a possibility, though, I have left my Subversion host guest powered off, so it's now only the Windows and Samba guests that are currently powered on. Last night even with only the two guests running the Windows guest still managed to get wildly out of sync. I have enabled -vvv logging for the VirtualBox service on the Windows guest, and powered it off and back on again so its clock is back in sync and staying in sync. I will take a look at the log when it gets out of sync, which I'm sure it will eventually do.

Changed 4 years ago by BACONputing

Win2003Std64 guest VBox.log [2010-06-13]; two guests powered on; VBoxService.exe -vvv

comment:6 Changed 4 years ago by BACONputing

@sergebass, is your Windows guest migrated from a VMware product, by any chance?

My 64-bit Windows Server 2003 guest that is having slowness and clock synchronization problems was created with and ran for several years on VMware Server 1.x. The same is true of the other two 64-bit Ubuntu Servers guests. All three guests were migrated to VirtualBox using the same .vmdk files from VMware connected to their new VirtualBox guests as SCSI drives.

I finally got around to creating another similarly-configured guest and installing 64-bit Windows Server 2003 on it, and it is not exhibiting those problems with slowness or the clock getting behind that would happen after the problem system was under load. If I run that Loop.bat test script there is no deviation of the clock at all. I installed .NET Framework 2.0, 3.0, and 3.5, which is probably a better test of the system under load because after each version installs numerous instances of NGEN (mscorvw.exe) run one after the other, and during this extended period of 100% processor usage the guest's clock was slowly losing time (about a tenth of a second for each second of wall time). However, once all of the NGEN instances were finished, the clock would start catching up again until it was eventually roughly in sync (oscillating between +/- 10 seconds actual time). I also tried installing SQL Server 2008 Express Edition since the other virtual machine had it as well, and the clock still only got out of sync from prolonged periods of processor usage, but then would catch up again when the processor was idle. At no point did this test virtual machine get "stuck" in that permanent slow state like I'm accustomed to on the other guest.

The only problem I would note is that, after letting this guest run overnight, it did seem like sometimes it could never quite get itself in-sync. After watching it for a bit it would get up to 50 seconds behind, than catch up to about 10 seconds behind, then slow down to 50 seconds behind again, back and forth, again and again. This was all while that guest was completely idle (because nothing was installed on it, aside from an instance of SQL Server with no databases) as well as the entire host. I also noticed the usual "TM: u64DeltaPrev=..." and "TM: Giving up catch-up attempt at a..." entries in the log, although they weren't as frequent as with the other guest. So, it seems like this test machine is exhibiting shades of the problem I'm having on the other guest, though this is still a great improvement. Obviously, I'd like for its clock to be almost perfectly in-sync, but less than a minute behind isn't that bad at all; as long as it's within five minutes of the actual time I'm happy because otherwise that's when systems start having authentication problems. I have since deleted the test guest and am hoping to try 64-bit @Windows Server 2008 R2@ in a new guest within the next few days to see if that works just as well.

Anyways, it would seem that the problem might be because my problematic guest came from VMware. The only difference between that guest and the one I was testing with is that the test guest used a single growable VDI disk connected via SCSI, whereas the problem guest has a half-dozen VMDK disks connected via SCSI, some backed by LVM volumes using raw disk access and some being regular disk files. The test guest also defaulted to Nested Paging and VT-x VPID being enabled even though neither of those should be supported by my processor, although I'd tried toggling both of those settings a while back in an effort to troubleshoot this problem and they didn't seem to make any difference.

comment:7 Changed 4 years ago by sergebass

is your Windows guest migrated from a VMware product, by any chance?

No, not at all. This was a clean installation from scratch under VirtualBox. By the way, I forgot to add that my Ubuntu host system is also 64-bit but the guest XP is 32-bit (in case that may be relevant).

comment:8 Changed 4 years ago by caryb

Can confirm problem on Ubuntu 10.04 (Lucid) 64bit & Windows XP/sp3 32 bit. I have checked & reinstalled the guest addons but issue is not resolved. I loose approx 12 hours in 24

comment:9 Changed 4 years ago by BACONputing

In the past few weeks I've reinstalled the operating system on the 64-bit Windows Server 2003 guest (upgrading to 64-bit Windows Server 2008 in the process) and one of the 64-bit Ubuntu Server guests. The Windows guest's clock seems to fluctuate between 15-45 seconds behind real time, and the Ubuntu guest's clock is perfectly in-sync. In both cases the old guest and new guest are configured the same, except the old guest had VMDK disks attached via SCSI and the new guest has fixed VDI disks attached via SATA, and also uses virtio network adapters. The clock on the other Ubuntu guest I've yet to reinstall is still all over the place. So, not the best solution because I would hope that migrating from VMware to VirtualBox would not require a complete guest reinstall, but at least it seems to finally fix my problem.

comment:10 Changed 4 years ago by pfaf

I also have similar problems with a Linux Host running Debian Lenny v5.04 and two windows guests. Guest (A) starts to loose time after 12 hours of operation!!! The other guest (B) is almost always in sync with the host. Both are running MS W2K3 Std with all service packs installed. Version of VBox is v3.2.4. More can be found at  http://forums.virtualbox.org/viewtopic.php?f=7&t=32975 where I have also attached the logs of the two machines and the machine xml files.

What is very interesting is that when guest (A) starts to loose time, if I restart it without shutting it down, it will immediately start to deviate from the host clock time.

If I power it off and start it up again, then the time deviation begins after 12 hours.

This leads me to the conclusion that it must be a problem with VirtualBox, not with the guest's OS.

What do you think?

comment:11 Changed 4 years ago by sbrokerag

Problem still there at virtualbox 3.2.6-63112~Ubuntu~lucid (latest Ubuntu updates installed). Guest is Windows XP SP3. There are many entries like

02:37:29.485 TM: Giving up catch-up attempt at a 60 001 520 988 ns lag; new total: 1 320 045 030 155 ns 02:39:44.060 TM: Giving up catch-up attempt at a 60 001 425 426 ns lag; new total: 1 380 046 455 581 ns 02:41:30.315 TM: Giving up catch-up attempt at a 60 006 308 593 ns lag; new total: 1 440 052 764 174 ns

Windows XP looses Active Directory access, so system becomes unusable.

comment:12 Changed 4 years ago by togume

This is still an issue in Ubuntu 10.04 with 3.2.8 and Windows 7 guest. I'm getting the same log entries as @sbrokerag above.

Please let me know if there is anything we can provide to help diagnose and fix this problem.

comment:13 Changed 4 years ago by togume

FYI - I tried the solution suggested in ticket 6250  http://www.virtualbox.org/ticket/6250

"edit /usr/src/vboxdrv/r0drv/linux/timer-r0drv-linux.c and replace the two mod_timer() calls by mod_timer_pinned()"

Problem persists.

comment:14 Changed 4 years ago by togume

FYI - Opened up a new ticket ( http://www.virtualbox.org/ticket/7520) to see if we can get traction on this.

comment:15 Changed 4 years ago by frank

We think we know the reason for these timing problems but the fix is invasive. Therefore it will be probably included in the next but one maintenance release.

comment:16 Changed 4 years ago by togume

Ok. Thanks for the response. I'm glad this is getting attention. Please let me know if there is anything else I can provide.

comment:17 Changed 4 years ago by frank

Linux users compiling VirtualBox OSE theirself from trunk should try r32798 or later. Code for the other hosts will follow. These fixes should also make 1000Hz Linux kernels run more smoothly.

comment:18 Changed 4 years ago by lkraav

experiencing this as well, cc-ing

comment:19 Changed 3 years ago by KenHagan

I'm seeing something similar on both Win2k3 and Ubuntu 10.10 hosts (various Windows guests), with 3.2.12. (I presume that "next but one" has become "soon".) Like the OP, I have more guests than cores but mostly the guests are inactive, and the problem is only obvious when one or more guests is heavily loaded. Of course, once VB starts wheel-spinning during its catch-up attempts, loading is 100%, all guests become unusable and the problem spirals out of control. Occasionally I have a VM abort on me.

comment:20 Changed 3 years ago by trcoley

Similar symptoms with the following setup. VirtualBox 3.2.12 r68302

HOST: Windows 7 Ultimate x64 (hardware noted below in case it is relevant) GUEST 1: Windows XP x32 - NO time sync problem GUEST 2: Windows 2003 Server x32 - NO time sync problem GUEST 3: Windows XP Pro x64 - PROBLEM, see notes.

Notes: seems to stay 7-8 minutes behind the host. After syncing to Internet time server, this guest rapidly loses time. The amount of time lost is not constant and can exceed 7-8 minutes, but the conditions leading to this are not clear.

Hardware: Intel Core 2 Duo E6600 Conroe 2.4GHz LGA 775 65W Dual-Core Processor Intel BOXD975XBX2KR LGA 775 Intel 975X ATX Intel Motherboard

comment:21 Changed 3 years ago by jamiegau

I moved a vmware host (ubuntu 10.04LTS) to unbuntu 10.10 host running virtualbox.
I run it headless...
It seemed to work great until.. Cron seemed to be skipping tasks.
Then I noticed the clocks are running super slow..
Install the guest addons etc. better but still no..
So I set up a cron job that simple, every min, ptu the date into a file..
I then go this interesting output..
---
Fri Feb 11 17:25:01 EST 2011
Fri Feb 11 17:26:01 EST 2011
Fri Feb 11 17:27:01 EST 2011
Fri Feb 11 17:28:01 EST 2011
Fri Feb 11 17:29:01 EST 2011
Fri Feb 11 17:30:01 EST 2011 <------ next job happens 20 mins latter.
Fri Feb 11 17:51:59 EST 2011
Fri Feb 11 17:52:01 EST 2011
---
This explains why my mtrg graphs have all these blank areas happening periodically. What can I do to fix this. Should I re-create host from scratch? (I have another host guest and it "seems" to be ok once I installed the vbox guest addons. Buts its not very busy..) Should I goto the unsupported ubuntu, but available Version 4 of virtualbox?

Advise really appreciated.
James

comment:22 Changed 3 years ago by KenHagan

(Mainly directed @James.)

I've moved to 4.0.x and I'm very happy with it. (I run one VM almost permanently on Ubuntu 10.10 at home and a dozen or so intermittently on Windows at work.)

I think I may have seen the "giving up catch-up attempt" at some point in the past month or so, but it is certainly rare. That might simply reflect the fact that my work pattern has changed since the end of last year, but it could equally mean that the problem has been (more or less) fixed.

comment:23 Changed 3 years ago by jamiegau

@KenHagan yes.

4.0 seems to fix my problems...

all seems well now..

Thank god..

I am now quite happy with my cheap as chips VM server. This is the way to go if you want to do some VMs on the cheap..

Great work.

comment:24 Changed 3 years ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Please reopen if still relevant with VBox 4.1.4.

comment:25 Changed 22 months ago by JRR

  • Status changed from closed to reopened
  • Resolution fixed deleted

VirtualBox 4.1.14

I am running a Windows 7 x64 Guest VM and also have a guest clock that is running extremely slow. The guest is in sync for about 20 minutes then gets to almost 5 minutes out of sync, before syncing again (trying w32tm /resync via the Task Scheduler), or needs to be manually resync'd since it never gets to the next time to resync. The applications I need to run are very time sensitive (i.e. decorating incoming frames from a video source), and require quite a bit of CPU. The VM has approximately 4 cores at 2.53GHz, and 8 GB RAM allocated. The Host has a total of 24 cores, and 48 GB RAM, and this is the only VM. However, the host is also running some cpu, and RAM semi-intensive software.

Logs Sample:

21:14:23.684 Display::handleDisplayResize(): uScreenId = 0, pvVRAM=00007fb7bb5f8000 w=1448 h=1008 bpp=32 cbLine=0x16A0, flags=0x1

21:14:26.562 RTC: period=0x20 (32) 1024 Hz

21:17:25.913 Starting host clipboard service

21:17:25.913 ClipConstructX11: X11 DISPLAY variable not set -- disabling shared clipboard

21:17:25.913 Guest Additions capability report: (0x4) seamless: no, hostWindowMapping: no, graphics: yes

21:17:25.931 Guest Additions capability report: (0x5) seamless: yes, hostWindowMapping: no, graphics: yes

21:48:53.398 TM: Giving up catch-up attempt at a 62 487 574 789 ns lag; new total: 10 081 566 875 307 ns

21:49:57.450 TM: Giving up catch-up attempt at a 62 477 836 648 ns lag; new total: 10 144 044 711 955 ns

21:50:58.670 TM: Giving up catch-up attempt at a 60 028 971 891 ns lag; new total: 10 204 073 683 846 ns

21:52:02.263 TM: Giving up catch-up attempt at a 62 865 881 874 ns lag; new total: 10 266 939 565 720 ns

Last edited 22 months ago by JRR (previous) (diff)

comment:26 Changed 22 months ago by JRR

A correction, and a note.

Correction: Host has 16 cores.

Note: Host OS: Scientific Linux 6.1

comment:27 Changed 22 months ago by frank

  • Description modified (diff)

Please attach the VBox.log file, your log example is unreadable.

comment:28 follow-up: ↓ 29 Changed 22 months ago by frank

  • Status changed from reopened to closed
  • Resolution set to worksforme

Please reopen once you provided the complete log file.

comment:29 in reply to: ↑ 28 Changed 22 months ago by JRR

Replying to frank:

Please reopen once you provided the complete log file.

I'm no longer able to get the log file for this box. However, when/if I hit the issue again, I will make sure to get the actual log files to provide. As a note, I only hit this issue once I placed the host machine in a static network. The guest VM did not appear to lose time before I did this, or I was not watching close enough at that time to see if it was happening.

comment:30 Changed 20 months ago by DanKegel

  • Status changed from closed to reopened
  • Resolution worksforme deleted

I'm seeing this now. Host is Ubuntu 10.04 64 bit, guest is Ubuntu 12.04 64 bit. I'll try to attach a log file Monday.

comment:31 Changed 2 months ago by mizzao

I'm still seeing this on 4.3.6. Are there more duplicates of this issue somewhere?

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use