VirtualBox

Ticket #8511 (closed defect: fixed)

Opened 3 years ago

Last modified 22 months ago

Regular crashes or freezing

Reported by: java_artisan Owned by:
Priority: critical Component: other
Version: VirtualBox 4.0.4 Keywords: freeze crash
Cc: Guest type: Linux
Host type: Linux

Description (last modified by frank) (diff)

Host: Xubuntu 10.10 amd_64 expansion pack installed

Guest: idem guest additions installed

Since I installed the 4.0.4 release the client VM crashes. Or maybe freezes. Because I can still see what was on the screen. Even after switching back and forth between host and guest. Anyway, the guest vm is not accepting mouse- or keyboard input any more. In a time span of 8 hours it's usually crashing twice of three times.

The guest is not consuming much CPU when frozen.

Unfortunately I have no indication as to what is causing the freeze. It just happens at seemly random occasions.

What I can say is that I'm using the VM to develop Java applications. I'm usually having these open:

  • Chrome
  • Terminal
  • Intellij
  • Netbeans
  • Thunderbird
  • Skype
  • Filezilla

Maybe one of those is a known source of interference ...

Attachments

VBox.Logs.tar.gz Download (58.1 KB) - added by java_artisan 3 years ago.
All the log files I found for this VM.
Logs.tar.gz Download (46.2 KB) - added by java_artisan 3 years ago.
Logs.tar.2.gz Download (48.3 KB) - added by java_artisan 3 years ago.
Even more logs.
VBox.log.1 Download (59.7 KB) - added by jeyk 3 years ago.
Log file of test build crashing
Vbox.sirkubax_apollo.log Download (54.5 KB) - added by sirkubax 2 years ago.
Vbox.log form host with errors
HOST.dmesg.sirkubax_moon.log Download (135.7 KB) - added by sirkubax 2 years ago.
Host side dmesg (with disk error)
Guest.dmesg.sirkubax_apollo.log Download (48.8 KB) - added by sirkubax 2 years ago.
Guest side dmesg with disk errors

Change History

Changed 3 years ago by java_artisan

All the log files I found for this VM.

comment:1 Changed 3 years ago by frank

A  core dump would help to debug this issue. If you can provide one please contact me at frank _dot_ mehnert _at_ oracle _dot_ com.

comment:2 Changed 3 years ago by java_artisan

Is it possible the crashes/freezes are related to the sound board ?

It was disabled but had to enable it a few days ago. It may be a coincidence, but I don't have had any crashes since.

comment:3 Changed 3 years ago by java_artisan

Nope. Not related to the sound board. I'm crashing way less often though.

I've set up the VM for a dump when frozen. I'll send it as soon as it freezes.

Thanks !

Jan

comment:4 Changed 3 years ago by java_artisan

Since I enabled the sound card I have almost no crashes.

I had one yesterday and tried the "kill -4" option to force the VM to dump its core. Unfortunately I couldn't find any core.* star.

The process id was 2005. So I should look for a file named "core.2005", right ?

comment:5 Changed 3 years ago by frank

To get a core dump you need to enable them first because core dumps are normally disabled. See my first comment in this defect. The core dump is normally written into the directory which was current when the VBox process started. This is normally the home directory.

comment:6 Changed 3 years ago by mtwomey

I am experiencing nearly identical behavior with Ubuntu Desktop 10.10 amd_64. I also have the expansion pack installed. I have setup a test with 8 identical guests (all clones of the same drive) each has a different combination of settings. I will attempt to get a core dump when they hang/crash. Please me know if I should continue here, or open a different bug.

comment:7 Changed 3 years ago by mtwomey

Also I'm experiencing this behavior with Arch Linux amd64 (2010/05) as well.

comment:8 Changed 3 years ago by java_artisan

I had FAR less crashes when enabling the sound card. Maybe that'll help for you too ?

comment:9 Changed 3 years ago by frank

If enabling the sound card helps reducing these crashes then there might be a lot of reasons for that and every reason is hard to debug. The cause could be memory corruption on the host, memory corruption on the guest and so on. Note that VBox 4.0.4 has some known issues which are fixed in the next maintenance release but after looking at the core dump of java_artisan I didn't got a hint where the problem is. Disabling 3D could be a first step to check if 3D is the reason (perhaps a buggy host driver).

comment:10 Changed 3 years ago by java_artisan

I am pretty sure it made most of crashes go away. It's now set to Hostdriver: pulseaudio and Controller: Intel HD Audio.

I'm betting on a freeze of the input because the screen is rebuilt each time I'm switching between the host and the guest. But it doesn't respond to the mouse or keyboard any more.

The 3D acceleration was enabled. I've switched it off. Let's see if that changes something. I would be more helped with 2D acceleration anyway - as I'm mainly doing Java development with the VM. But alas, nothing as such for Linux.

comment:11 Changed 3 years ago by java_artisan

Since a few days it's definitely crashing time again. Already twice today.

But this time there's something interesting going on: I think the keyboard input ceased to work. Suddenly the mouse started to select text. The left mouse button click does not work any more although the software reacts on hovering the cursor over buttons, images, etc... Right mouse button worked completely. Tried several things to reset the mouse integration, resize the screen, pause/resume the VM, etc...

There's no apparent response to keyboard input. Even when sending ACPI, CTRL-ALT-BACKSPACE, ... via the VM menu.

A logical explanation would be that the keyboard integration stopped working all of a sudden.

I've let the VM dump its core. I'm sending it now.

comment:12 Changed 3 years ago by java_artisan

Including the logs files.

Changed 3 years ago by java_artisan

comment:13 Changed 3 years ago by mtwomey

I have the same symptom regarding the host screen redrawing, but input seems hung. I actually have always have both 2d and 3c accel off, so for me - that doesn't appear related. I just fired up my testbed of 8 VMs last night, so I will monitor those. None of them had audio, so I will created a few more with audio to see if that helps.

comment:14 Changed 3 years ago by mtwomey

I've had another crash with these same symptoms - RDP will still paint the screen, but the guest is unresponsive. No input (keyboard or mouse). I can ping it, but if I try to ssh in, I don't get past the tcp handshake (per tcpdump). This leave me to believe that some low level things are still functioning in the guest. CPU utilization on the host is normal for this guest (pretty low).

This time I was able to get a core dump with gcore. I've got it tarred/gzipped at approximately 216mb along with the logs, and .vbox file. Please let me know the preferred method of getting this to you (if you think it will help). Thanks!

comment:15 Changed 3 years ago by java_artisan

There's another core dump coming your way (core.1978.gz). To the contrary of the previous cases this time the lack of input response didn't go away at once. For some time I was able to click around. But unfortunately not to invoke a terminal window and see what the thing was doing.

I've stopped the VM by saving its state and resuming it. But unfortunately that didn't help. It was still stuck.

What was new is that the repaint of certain applications didn't work either.

So the VM looks very busy with something that is gradually paralyzing it. From the host standpoint the guest is not consuming much CPU. A mere 2%. So we're looking to an I/O problem ?

Changed 3 years ago by java_artisan

Even more logs.

comment:16 Changed 3 years ago by java_artisan

Does everybody here uses ext4 for the file system ? Or are some people on others ? ext3 ? rsfs ?

comment:17 Changed 3 years ago by mtwomey

I am indeed using ext4. I have a guest crashed right now - again with nearly identical symptoms to what you've described. I can switch windows, but one of them refuses to repaint. I have a terminal window open in the guest which does repaint properly but I can't type in it. I also can get a tcl handshake for an ssh session, but it goes no further than the handshake. I do think an I/O problem would exhibit similar symptoms.

I am now running on a self complied version of Virtualbox from SVN (hoping this might resolve things - but it hasn't). I have two core dumps - but how are you getting them to Frank, email? They're quite large.

comment:18 Changed 3 years ago by java_artisan

If I'm not mistaken you should contact Frank, as described here:  http://www.virtualbox.org/wiki/Core_dump

First gzipping them and send to their ftp server. Luckily this is the broadband age. Sending a gig is no problem any more. :-)

Have you been fiddling with other parameters ? Chipsets ? Acceleration ?

comment:19 Changed 3 years ago by Technologov

Some people say that using "clocksource=jiffies" as kernel parameter solves the problem for them.

Can you confirm?

-Technologov

comment:20 Changed 3 years ago by Technologov

"clocksource=jiffies" must be applied to Linux guest bootloader.

comment:21 Changed 3 years ago by java_artisan

I can try - but what's the logic ? Is it a timing issue ?

comment:22 Changed 3 years ago by mtwomey

I will try this. I had a one of my guests start exhibiting problems last night (it was graphing netflow traffic). When I got up this morning, I saw that the latest time on the graph was 3:09am. I had left a console open on the guest, so I checked the time at the command line and it reported 3:09am (while in reality it was 10:48am). Then I did a little testing:

####################################

mtwomey@netflow:~$ date +%k:%M:%S.%N

3:09:01.823660423

mtwomey@netflow:~$ date +%k:%M:%S.%N

3:09:02.486918205

mtwomey@netflow:~$ date +%k:%M:%S.%N

3:09:03.787365263

mtwomey@netflow:~$ date +%k:%M:%S.%N

3:08:59.742212587

mtwomey@netflow:~$ date +%k:%M:%S.%N

3:09:00.354516674

mtwomey@netflow:~$ date +%k:%M:%S.%N

3:09:01.801733692

mtwomey@netflow:~$

####################################

This goes on - time "looping" between 3:08:59.xxxx and 3:09:03.xxxx. So there definitely does appear to be something time related going on here.

I have put this machine in a saved state in case anyone want me to run any other commands before it crashes completely.

comment:23 Changed 3 years ago by mtwomey

Also for what it's worth here's the current clocksource:

mtwomey@netflow:/sys/devices/system/clocksource/clocksource0$ cat available_clocksource 
acpi_pm 
mtwomey@netflow:/sys/devices/system/clocksource/clocksource0$ cat current_clocksource
acpi_pm
mtwomey@netflow:

comment:24 Changed 3 years ago by frank

mtwomey, do you have VBox guest additions installed in this VM?

comment:25 Changed 3 years ago by mtwomey

@Frank: Yes, I have guest additions installed on nearly every VM I have, including this one.

As it turns out, setting my clock source to jiffies has indeed completely solved the crashing issue for me (every guest on which I've changed that setting has remained up). I don't know what this means really. Some basic googling has told me that jiffies is not the most desirable clocksource, but I couldn't really get any specifics. I'm not clear if setting it this way may cause me other issues down the line?

comment:26 Changed 3 years ago by jeyk

I am not sure if I am experiencing exactly the same problem, but it feels very similar.

I am running VirtualBox 4.0.4 (headless) on an Ubuntu 10.04 64-bit host. The guests are currently 8.04 64-bit, however, I have also had this issue with the current 10.10-desktop LiveCD and other LiveCDs. Whenever I put the virtual hard disk under heavy load, the operation freezes. The guest is still responding via RDP (console switching, simple commands) and answering to pings. However, apparently everything that involves timing and/or the disk, fails. That includes running commands, SSHing in (it connects, but doesn't spawn a shell), shutdown etc. Only resetting or powering off the VM helps.

Based on several suggestions in the forum, I have reduced the VMs to 1 core each. So far, I have not had any more freezes. However, this can not be a permanent solution as the guests really need more than one core.

Please let me know if I can do anything to help debug this problem.

comment:27 follow-up: ↓ 29 Changed 3 years ago by java_artisan

Is everybody in this discussion using ext4 file systems ? Maybe everybody is using ubuntu 10.10 ?

comment:28 Changed 3 years ago by mtwomey

@java_artisan: I was primarily using ext4, but since you pointed that out earlier in the thread I built a new test system with ext3 and had the exact same issue.

@jeyk: Your symptoms sound identical to mine. If you can still get into the console and run basic commands, try running 'date' a few times and see if the clock is incrementing and not looping (see my post above).

Setting my clock source to jiffies, has completely solved the issue for me (not a single crash on any guest VM I've set that way and they're all ext4 with 8 cores). I'm not sure if using jiffies has any unrelated negative ramifications though (it's described as the "lowest common denominator" time setting).

comment:29 in reply to: ↑ 27 Changed 3 years ago by jeyk

Replying to java_artisan:

Is everybody in this discussion using ext4 file systems ? Maybe everybody is using ubuntu 10.10 ?

No, I am using ext3 and Ubuntu 10.04 as host. I have had this issue with many different guests, including Ubuntu 8.04 and 10.10 and the SystemRescueCD from sysresccd.org.

Replying to mtwomey:

@jeyk: Your symptoms sound identical to mine. If you can still get into the console and run basic commands, try running 'date' a few times and see if the clock is incrementing and not looping (see my post above).

I tried that for a few (about 15) seconds. My clock was advancing with what looked like normal speed. It did not loop. Also, the system behaved quite normally as long as I didn't do anything that involved disk access. Because this particular system is already in production use, I did not play around for long. I simply reduced the number of virtual CPUs to 1 and it has been running fine since.

I am going to set up a test machine to try out different solutions to this problem. Running single-core is not a long-term option for these machines.

comment:30 Changed 3 years ago by java_artisan

@everybody Thanks ! At least I know it's useless to reformat my file systems into ext3. :-)

Tomorrow I'm back in the office. I'll write a cron script to check the system time. And report when the clock was set to an earlier time. When that happens I'll reset the clock source and run the same script.

For the not so system-savvy (like me) here's how to set the clock source:  http://www.vr.org/knowledgebase/161/My-server-clock-or-date-and-time-is-wrong-or-drifts.html

And some more documentation:  http://blog.jolexa.net/2010/03/05/virtual-machine-clocksource-issue/

comment:31 follow-up: ↓ 33 Changed 3 years ago by java_artisan

Did somebody tried with VirtualBox 4.0.6 yet ?

comment:32 Changed 3 years ago by mtwomey

I haven't no - although I did build the latest SVN version a few weeks ago and had the same problem. Also - your corn job might not catch it, because it may stop firing when the time stops progressing.

comment:33 in reply to: ↑ 31 Changed 3 years ago by jeyk

Replying to java_artisan:

Did somebody tried with VirtualBox 4.0.6 yet ?

I had the same problem with 4.0.6. It killed one of my MySQL databases :-(

comment:34 Changed 3 years ago by jeyk

I managed to reliably reproduce this problem. Actually it is quite easy: run this command on both the guest and the host:

dd bs=4M if=/dev/zero of=testfile

During my tests, it did not even take five minutes until the guest started showing error messages and finally hanging.

I will now try different approaches to solve the problem.

comment:35 follow-up: ↓ 36 Changed 3 years ago by java_artisan

Have you seen the warning when creating a new VM with ext4 ? Maybe it's related to that...

comment:36 in reply to: ↑ 35 Changed 3 years ago by jeyk

Replying to java_artisan:

Have you seen the warning when creating a new VM with ext4 ? Maybe it's related to that...

I am using ext3 on both host and guest...

comment:37 Changed 3 years ago by jeyk

I did Some more testing. I removed the disks from the SATA controller of my test machine and instead attached them to the IDE controller. The machine has been running fine for about an hour of heavy disk I/O on the host and guest. There was only one error message in the guest kernel log:

May  7 23:30:35 test kernel: [  209.010367] ata2: lost interrupt (Status 0x48)
May  7 23:30:35 test kernel: [  209.010439] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
May  7 23:30:35 test kernel: [  209.023409] ata2.00: failed command: WRITE DMA
May  7 23:30:35 test kernel: [  209.031144] ata2.00: cmd ca/00:00:30:7b:8c/00:00:00:00:00/e0 tag 0 dma 131072 out
May  7 23:30:35 test kernel: [  209.031148]          res 40/00:00:00:00:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
May  7 23:30:35 test kernel: [  209.041033] ata2.00: status: { DRDY }
May  7 23:30:35 test kernel: [  209.044757] ata2: soft resetting link
May  7 23:30:40 test kernel: [  214.240136] ata2: link is slow to respond, please be patient (ready=0)
May  7 23:30:42 test kernel: [  215.442218] ata2.00: configured for UDMA/33
May  7 23:30:42 test kernel: [  215.442225] ata2.00: device reported invalid CHS sector 0
May  7 23:30:42 test kernel: [  215.442235] ata2: EH complete

The VBox.log file showed one of these messages every few minutes:

00:05:34.432 PIIX3 ATA: execution time for ATA command 0xca was 8 seconds

Apart from that, the guest (and host) ran fine and were quite responsive.

So, there either seems to be a problem with the SATA emulation, or the IDE emulation is sufficiently slow (used 30-40% CPU on the host) so that the problem is mitigated.

comment:38 Changed 3 years ago by frank

In that case this sounds like a duplicate of #8826. Would you like to test a test build? For which Linux distribution?

comment:39 Changed 3 years ago by jeyk

Sure, a test build would be great. The host is Ubuntu 10.04.2 LTS x64.

comment:40 Changed 3 years ago by frank

jeyk,  here is the requested test build.

comment:41 Changed 3 years ago by jeyk

Nope, sorry, didn't work. The guest started to show error messages about SATA timeouts not even a minute after I started the test. A few seconds later the whole virtual machine crashed. It did not even give a reason in the log file.

comment:42 Changed 3 years ago by frank

Could you attach a VBox.log file of such a VM session?

Changed 3 years ago by jeyk

Log file of test build crashing

comment:43 Changed 3 years ago by jeyk

@frank: I hope this is the correct one (with SATA). I also tried SAS and diffenrent chipsets. They all crashed sooner or later.

comment:44 Changed 3 years ago by jeyk

I just decided to look into my syslog and found these messages (line breaks addded for readability):

May  9 22:15:27 hive kernel: [1118755.707271] vboxdrv: Found 8 processor cores.
May  9 22:15:27 hive kernel: [1118755.707324] VBoxDrv: dbg - g_abExecMemory=ffffffffa06c1a40
May  9 22:15:27 hive kernel: [1118755.707619] vboxdrv: fAsync=0 offMin=0x3bd offMax=0x34d5e
May  9 22:15:27 hive kernel: [1118755.707693] vboxdrv: TSC mode is 'synchronous', kernel timer mode is 'normal'.
May  9 22:15:27 hive kernel: [1118755.707697] vboxdrv: Successfully loaded version 4.0.7 (interface 0x00180000).
May  9 22:16:16 hive kernel: [1118804.870571] device vboxnet0 entered promiscuous mode
May  9 22:16:33 hive kernel: [1118822.130195] device vboxnet0 left promiscuous mode
May  9 22:17:30 hive kernel: [1118878.726724] device vboxnet0 entered promiscuous mode
May  9 22:20:48 hive kernel: [1119077.335820] __ratelimit: 9 callbacks suppressed
May  9 22:20:48 hive kernel: [1119077.335825] VBoxHeadless[9229]: segfault at 0 ip 00007f4a92b4b894
  sp 00007f4aa18f0a30 error 4 in VBoxDD.so[7f4a92b1f000+217000]
May  9 22:20:49 hive kernel: [1119078.079689] device vboxnet0 left promiscuous mode

So the crash was caused by a segfault.

comment:45 Changed 3 years ago by mtwomey

@jeyk - did you ever try changing your clock source?

comment:46 Changed 3 years ago by jeyk

Yes, I tried jiffies. It didn't help, unfortunately.

comment:47 Changed 3 years ago by jeyk

I just tried switching the host I/O-scheduler from "deadline" to "cfq". The guest did not report any disk errors during the test, however, it became quite unresponsive while the host had I/O load.

comment:48 Changed 3 years ago by java_artisan

VB 4.0.8 is released. Maybe this will solve your trouble ?

comment:49 Changed 3 years ago by mtwomey

@java_artisan Just curious, have you found a work around for the trouble you were experiencing? I've been up and running for weeks with no issues with my guest clock sources set to jiffies, but I know this didn't work for jeyk. What's your status?

comment:50 Changed 3 years ago by java_artisan

Well... to be honest I stopped using it. I was using the VM as a development environment. But once the filesystem got corrupted I lost quite some work and stepped out of out it. Pity... but it's much faster though.

Anyway, I'm keeping an eye on this to see if this gets fixed someday.

comment:51 Changed 3 years ago by Technologov

You can do development using Shared Folder, so you save all your work to host every few minutes...

This way, even if VBox crashes, your work won't be gone.

-Technologov

comment:52 Changed 3 years ago by jeyk

@java_artisan Upgrading to 4.0.8 seemed to solve the main problem. I did not get any more I/O errors during my tests.

However, high host I/O load now completely starves the guest I/O. And I mean completely. Using dd with bs=4M or 1M got me around 120-140 MB/s on the host and NONE on the guest when both were running at the same time. When the host had no I/O, the guest managed about 75 MB/s. I tried different combinations of I/O schedulers on the host and guest, but all showed the same result: When the host I/O was saturated, the guest I/O starved.

I guess that's OK though. I do not expect to have such high I/O loads over extended periods of time. When the host I/O was not fully saturated, all was well.

comment:53 Changed 3 years ago by codeslingercompsalot

this sounds like the same bug I am experiencing. Here are a few more data points for you.

vbox 4.0.6 My cpu is AMD Phenom 960 with 4 cores host os is ubuntu 10.10 32bit, with kernel 2.6.35-28-generic-pae

guest os: 3d and 2d acceleration is off except for the win2k all file systems are EXT3

win2k 32bit 1cpu ide -- no crashes/freezes ubuntu 8 32bit 1cpu ide -- no crashes/freezes ubuntu 9 32bit 1cpu ide -- no crashes/freezes

ubuntu 10 32bit 4cpu PAE, SATA -- frequent freeze ubuntu 10 64bit 4cpu PAE, SATA -- frequent freeze

I am frequently switching focus between host and guest os. often I experience flaky keyboard / lost keystrokes just before it turns into full freeze.

The freezes are very frequent, making the vbox effectively unusable

after reading various bugs, I have now turned off PAE and switched to 1 cpu and changed drive to IDE on the two guests that are crashing. I will test with this new config for awhile and see what happens.

comment:54 Changed 3 years ago by codeslingercompsalot

ouch, sorry, should have previewed first, this form made a mess out of my formatting, making it very hard to read the above.

In summary, I have 5 different guest machines with various config options. In answer to questions above, I am running EXT3 file system.

The fundamental difference between the two systems that freeze so often as to be unusable and the three systems that never freeze are:

good: 1 cpu, no-PAE, IDE

bad: 4 cpu, PAE, SATA


none of the other differences seem relevant.

comment:55 Changed 3 years ago by codeslingercompsalot

the following bugs appear to be similar: #8628 #8826 #7919

comment:56 Changed 3 years ago by lefticus

I'm experiencing the same symptoms that others have reported here and #8628 - slow degradation of input quality followed by unusability. With rare case that the system will recover.

It's always Linux guests that have the problem (I am on quad core Windows 7 Ultimate 64bit host). Ubuntu 10.04 runs with no problem at all. Anything later than 10.04 has this problem.

I have managed to fairly conclusively determine that the issues only happen if guest additions are installed - ANY part of the guest additions installed causes the issue (I've tried selectively removing parts, kernel modules, X11 drivers). Also, the issues appear to be exacerbated by "absolute pointing device" turned on.

I'm going to attempt the "jiffies" fix mentioned above.

comment:57 Changed 3 years ago by lefticus

The jiffies fix is working stably for me so far. If anyone needs more info about my platform / settings / log files, let me know what you need.

comment:58 follow-up: ↓ 59 Changed 3 years ago by frank

I want to summarize these comments a bit to get a better picture about the problem.

@java_artisan, as you used a lot of guest memory you were probably affected by the corruption bug which was mentioned in the 4.0.8 release (for guest with more than 2GB guest memory and the guest attached by SATA).

So @all, before reporting any more I/O errors or guest corruption, make sure you are using VBox 4.0.8.

Regarding clocksource=jiffies: What does it solve for you in particular? Some users wrote that the guest crashes without this parameter, does the whole VM crash or does some guest application crash?

comment:59 in reply to: ↑ 58 ; follow-up: ↓ 62 Changed 3 years ago by lefticus

Replying to frank:

So @all, before reporting any more I/O errors or guest corruption, make sure you are using VBox 4.0.8.

Regarding clocksource=jiffies: What does it solve for you in particular? Some users wrote that the guest crashes without this parameter, does the whole VM crash or does some guest application crash?

VBox 4.0.8 with latest guest additions and latest extension pack.

Prior to setting clocksource=jiffies I would have a degradation of mouse and keyboard responsiveness. Specifically, I use the VM for lots of VI programming, I would nearly always notice the errors in this order:

  1. Everything works perfectly fine as expected
  2. Occasional lags in keyboard response
  3. Unable to type into vim
    1. still able to alt-tab to another application
    2. still able to type into other application
    3. able to interact with Console application - change tabs, etc
  4. After switching apps a few times
    1. Unable to switch apps any longer
    2. Mouse cursor moves but becomes useless (no click responsiveness)
  5. Complete UI unresponsiveness. Screen redraws but will not respond to VirtualBox resize events
  6. Forced to kill and restart VM. Often a reset does not work (the OS does not restart)
  7. Occasionally get a crash of VirtualBox itself during the reboot process.

After setting the clocksource=jiffies I have not seen any of these symptoms at all, yet.

I work almost exclusively over rdesktop to the VM's and notice the problem VERY quickly. Usually with the guest OS becoming unusable within an hour.

My host:

  • AMD Athlon II X4 3.2 Ghz
  • AMD Radeon Graphics
  • 8GB Ram
  • 1 TB WD Caviar Blue (Host OS drive / Linux Guest virtual drives)
  • 1 TB WD Caviar Black (Windows Guest drives)
  • Windows 7 Ultimate 64bit
  • PCI Wifi Nic

This might seem completely random, but since the issues seem to be keyboard/mouse related, and in my experience are exacerbated by "absolute pointing device" I think I should point out that my host has a touch pad connected to it.

Thanks, Jason

comment:60 Changed 3 years ago by topse

Especially the last description from lefticus is exactly the same as in ticket 7619  http://www.virtualbox.org/ticket/7619.

comment:61 Changed 3 years ago by mtwomey

Also replying to Frank:

My experience is nearly identical to the course of events described by lefticus. In addition to this, the last time this happened, before things degraded completely and I was still able to type in shell window (on the guest), I noticed the clock looping (as I described earlier). As with lefticus, this occurs on guest without clocksource set to jiffies,

My suspicion and the the clock ticks start having issues and the other problems manifest from there. I've also definitely noticed that the more I use a guest in these circumstances, the faster it will crash.

comment:62 in reply to: ↑ 59 Changed 3 years ago by xaz0r

Replying to lefticus:

This is an exact description of my problem. I didn't have any problems until I checked the "Enable VT-x/AMD-V" and "Enable Nested paging" checkboxes and bumped the CPUs up to 2. After that I started having the degredation/freezing all day until I found this page. I changed the clocksource to jiffies and now it's working just fine. I have no idea what difference that makes, but I hope it's only a temporary measure.

comment:63 Changed 3 years ago by mtwomey

Yes, I agree and hope this is solved soon - my guests use significantly more host CPU while idling when the clocksource is set to jiffies.

comment:64 Changed 3 years ago by frank

Please try VBox 4.0.10, the ACPI timer should now return monotonic values so clocksource=jiffies should not be necessary anymore.

comment:65 Changed 3 years ago by darknight670

Well. My Ubuntu Server 11.04 under Solaris still crashes on 4.0.10

comment:66 Changed 3 years ago by darknight670

I switch to PIIX3 from ICH9 so that Ubuntu accept "clocksource=acpi_pm"

I will see if it hangs. Are there any performances degradations due to PIIX3?

comment:67 Changed 3 years ago by frank

No. The benefit of the ICH9 chip is mostly that it supports more PCI devices and that some guests (e.g. OSX server) don't work with the older PIIX chipsets.

comment:68 Changed 3 years ago by codeslingercompsalot

it is hard to prove a negative... however, I have now used to different vm's enough to feel comfortable saying that I've gone from frequent crashes to zero crashes after upgrading to 4.0.10. So "it works for me". Thanks!!!

But, there are still some problems:

1) Somebody has made a "big bloody mess" out of the Guest Installer. It is quite broken, in multiple ways. That should probably be a separate bug though.

2) An even worse problem is with installing this update itself. When it is installed as an upgrade onto a host computer (Ubuntu 10.10) via synaptic, then you see some scary error messages about an invalid DKMS configuration. But despite this, Vbox does run after the update. On the other hand doing a clean install of 4.0.10 onto a newly installed Ubuntu FAILS with the same error messages about an invalid DKMS config. A clean install of Vbox won't run, to work-around this I installed an older version of Vbox and then installed the 4.0.10 update over it and then it worked.

comment:69 Changed 3 years ago by codeslingercompsalot

to different vm's == two (2) different vm's (32 bit and 64 bit Ubuntu 10.10 guests on a 32 bit Ubuntu 10.10 host)

comment:70 Changed 3 years ago by frank

Seems that your DKMS config is somehow messed up, this could be even a bug of an older VirtualBox release which was fixed in the meantime. I assume that 'dkms status' shows some error messages. You can clean up missing modules by removing the wrong directories in /var/lib/dkms yourself. If you post the output of 'dkms status' I can tell you what you need to do.

I don't know what your problem with the guest installer is. It runs very fine here and a few more information is required to understand where your problem is, but yes, please do this in a new ticket.

comment:71 Changed 3 years ago by java_artisan

I had a cryptic error message about DKMS with 4.1beta1 when starting a VM created with 4.8. It advised to run "/etc/init.d/vboxdrv setup". Which solved the problem. Maybe it's related to a Linux kernel update ?

But never a problem with the installer though.

comment:72 Changed 3 years ago by codeslingercompsalot

on a different computer, also running Ubuntu 10.10. I just did an automatic update from 4.0.8? to 4.0.12 and I got all the same messages about corrupt and missing DKMS config files. I would love to be able to make a copy of these errors but there are too many to type by hand and they scroll quickly... for some strange reason, the auto-update program does not support "copy" from it's status window -- very frustrating. However, after the update everything ran fine despite the warnings.

The 4.0.12 Guest works great, it is a previous version (10 ?) that I had the problems with (root user permissions etc).

Beyond that, I want to say how fantastically amazing this latest version is. I did a clean install of Ubuntu 11.04 32 bit, followed by a clean install of VirtualBox 4.0.12 followed by creating a new vm running Ubuntu 11.04 32 bit guest plus Guest additions. (VTx, PAE, 4 cpus, 1 gig RAM, 48meg video, defaults for everything else).

The install was flawless, no error messages encountered and the performance was the best I have ever experienced. I was actually able to play a video in 480p without any audio stuttering and with good video quality. I have never ever been able to do that before. No crashes, hangs, stutters etc.

THANK YOU!!!

comment:73 Changed 3 years ago by mtwomey

I've finally upgraded to 4.1.2. I was able to switch my clock source from jiffies to ACPI and everything has been running for 2 weeks with zero issues. The host processor is also no longer "bogged down" by the jiffies setting. As far as I'm concerned, the clocksource issues have been addressed by the newer Virtualbox versions. Thanks!

comment:74 Changed 2 years ago by sirkubax

Hi,

This is still true under 4.1.8 Version

System: Gentoo partirion in RAID-1 filesystem: ext4

Errors occur on high load (rsync of 1TB of data).

I've noticed that almost every people have problem with ext4 + Vbox, so I'm thinking about formating partition to xfs or reiserfs.

The errors are (guest system):

uname -a
Linux apollo 2.6.35-vs2.3.0.36.32-gentoo #4 SMP Fri Feb 17 14:45:08 CET 2012 x86_64 Intel(R) Xeon(R) CPU W3520 @ 2.67GHz GenuineIntel GNU/Linux

 3250.748007] cron used greatest stack depth: 2960 bytes left
[47695.713250] ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[47695.713254] ata2.00: failed command: WRITE FPDMA QUEUED
[47695.713258] ata2.00: cmd 61/13:00:ad:e0:42/00:00:1f:00:00/40 tag 0 ncq 9728 out
[47695.713259]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[47695.713261] ata2.00: status: { DRDY }
[47695.713270] ata2: hard resetting link
[47696.038956] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[47701.040849] ata2.00: qc timeout (cmd 0xec)
[47701.040885] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[47701.040890] ata2.00: revalidation failed (errno=-5)
[47701.040902] ata2: hard resetting link
[47701.351852] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[47711.352255] ata2.00: qc timeout (cmd 0xec)
[47711.352288] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[47711.352291] ata2.00: revalidation failed (errno=-5)
[47711.352298] ata2: limiting SATA link speed to 1.5 Gbps
[47711.352306] ata2: hard resetting link
[47711.658240] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[47720.491124] ata2.00: configured for UDMA/133
[47720.491138] ata2.00: device reported invalid CHS sector 0
[47720.491143] ata2: EH complete
[56619.745644] ata2.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[56619.745653] ata2.00: failed command: WRITE FPDMA QUEUED
[56619.745662] ata2.00: cmd 61/40:00:06:16:43/00:00:1f:00:00/40 tag 0 ncq 32768 out
[56619.745667]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

[56743.013103] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[56748.013744] ata2.00: qc timeout (cmd 0xec)
[56748.013796] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[56748.013814] ata2.00: revalidation failed (errno=-5)
[56748.013838] ata2: hard resetting link
[56748.331958] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[56758.331096] ata2.00: qc timeout (cmd 0xec)
[56758.331129] ata2.00: failed to IDENTIFY (I/O error, err_mask=0x4)
[56758.331131] ata2.00: revalidation failed (errno=-5)
[56758.331145] ata2: hard resetting link
[56758.646370] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[56767.742769] ata2.00: configured for UDMA/133
[56767.742774] ata2.00: device reported invalid CHS sector 0
[56767.742776] ata2.00: device reported invalid CHS sector 0
[56767.742778] ata2.00: device reported invalid CHS sector 0
[56767.742779] ata2.00: device reported invalid CHS sector 0
[56767.742781] ata2.00: device reported invalid CHS sector 0

So, there is still sth to be done...

comment:75 Changed 2 years ago by frank

  • Description modified (diff)

sirkubax, please add a VBox.log file of such a VM session.

Changed 2 years ago by sirkubax

Vbox.log form host with errors

Changed 2 years ago by sirkubax

Host side dmesg (with disk error)

Changed 2 years ago by sirkubax

Guest side dmesg with disk errors

comment:76 follow-up: ↓ 77 Changed 2 years ago by sirkubax

Hi,

I tought it is a problem with ext4 partition on host side, so i did mkfs.xfs, but errors occurred again. Now, there are some errors also on host side (not only VBox guest). The errors happen on high disk load (rsync of many GB of data).

I did attache logs.

Some ppl have problem's "device reported invalid CHS sector 0" indicating disk, motherboard, disk wire or voltage issue, although no one provide solution.

I was 99% sure it is Vbox issue with disk writing pool, but now I'm not sure... It might also kernel issue... grhhh... I'm not sure anymore... I'm waiting for my backup to complete and I'm going to reboot and test again.

comment:77 in reply to: ↑ 76 Changed 2 years ago by Hachiman

Replying to sirkubax: 4.1.8 is quite old. Could you please try with 4.1.12 and attach the log from newer version?

comment:78 Changed 2 years ago by sirkubax

Hi,

Thx for reply. I will try to update, although Vbox 4.1.8 is current stable verison on gentoo, and I'm usually against installing software not from portage (package system) on production server (yes, it's part of my prod).

I do not deny, but it's hardly possible that's hardware error, cause I've found same error on my other, 2 years old Vbox server, witch 200 days of uptime.

Jul  8 18:11:12 localhost kernel: [    0.000000] Linux version 2.6.37-gentoo-r4 (root@eywa) (gcc version 4.4.4 (Gentoo 4.4.4-r2 p1.3, pie-0.4.5) ) #1 SMP Mon May 16 20:27:20 CEST 201

Aug 22 05:49:24 localhost kernel: [3833981.422390] ata5.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x6 frozen
Aug 22 05:49:24 localhost kernel: [3833981.422393] ata5.00: failed command: READ FPDMA QUEUED
Aug 22 05:49:24 localhost kernel: [3833981.422398] ata5.00: cmd 60/20:00:c9:c7:a0/00:00:53:00:00/40 tag 0 ncq 16384 in
Aug 22 05:49:24 localhost kernel: [3833981.422399]          res 40/00:01:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 22 05:49:24 localhost kernel: [3833981.422402] ata5.00: status: { DRDY }
Aug 22 05:49:24 localhost kernel: [3833981.422404] ata5.00: failed command: READ FPDMA QUEUED
Aug 22 05:49:24 localhost kernel: [3833981.422408] ata5.00: cmd 60/38:08:89:b3:42/00:00:57:00:00/40 tag 1 ncq 28672 in
Aug 22 05:49:24 localhost kernel: [3833981.422409]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 22 05:49:24 localhost kernel: [3833981.422411] ata5.00: status: { DRDY }
Aug 22 05:49:24 localhost kernel: [3833981.422413] ata5.00: failed command: READ FPDMA QUEUED
Aug 22 05:49:24 localhost kernel: [3833981.422417] ata5.00: cmd 60/38:10:c9:b3:42/00:00:57:00:00/40 tag 2 ncq 28672 in
Aug 22 05:49:24 localhost kernel: [3833981.422418]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 22 05:49:24 localhost kernel: [3833981.422420] ata5.00: status: { DRDY }
Aug 22 05:49:24 localhost kernel: [3833981.422421] ata5.00: failed command: READ FPDMA QUEUED
Aug 22 05:49:24 localhost kernel: [3833981.422425] ata5.00: cmd 60/28:18:09:b4:42/00:00:57:00:00/40 tag 3 ncq 20480 in
Aug 22 05:49:24 localhost kernel: [3833981.422426]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 22 05:49:24 localhost kernel: [3833981.422428] ata5.00: status: { DRDY }
Aug 22 05:49:24 localhost kernel: [3833981.422430] ata5.00: failed command: READ FPDMA QUEUED
Aug 22 05:49:24 localhost kernel: [3833981.422434] ata5.00: cmd 60/40:20:39:b4:42/00:00:57:00:00/40 tag 4 ncq 32768 in
Aug 22 05:49:24 localhost kernel: [3833981.422435]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 22 05:49:24 localhost kernel: [3833981.422437] ata5.00: status: { DRDY }
Aug 22 05:49:24 localhost kernel: [3833981.422438] ata5.00: failed command: READ FPDMA QUEUED
Aug 22 05:49:24 localhost kernel: [3833981.422443] ata5.00: cmd 60/48:28:91:b4:42/00:00:57:00:00/40 tag 5 ncq 36864 in
Aug 22 05:49:24 localhost kernel: [3833981.422444]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 22 05:49:24 localhost kernel: [3833981.422446] ata5.00: status: { DRDY }
Aug 22 05:49:24 localhost kernel: [3833981.422448] ata5.00: failed command: READ FPDMA QUEUED
Aug 22 05:49:24 localhost kernel: [3833981.422452] ata5.00: cmd 60/08:30:e9:b4:42/00:00:57:00:00/40 tag 6 ncq 4096 in
Aug 22 05:49:24 localhost kernel: [3833981.422453]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 22 05:49:24 localhost kernel: [3833981.422455] ata5.00: status: { DRDY }
Aug 22 05:49:24 localhost kernel: [3833981.422457] ata5.00: failed command: READ FPDMA QUEUED
Aug 22 05:49:24 localhost kernel: [3833981.422460] ata5.00: cmd 60/08:38:f9:b4:42/00:00:57:00:00/40 tag 7 ncq 4096 in
Aug 22 05:49:24 localhost kernel: [3833981.422461]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 22 05:49:24 localhost kernel: [3833981.422464] ata5.00: status: { DRDY }
Aug 22 05:49:24 localhost kernel: [3833981.422466] ata5.00: failed command: READ FPDMA QUEUED
Aug 22 05:49:24 localhost kernel: [3833981.422470] ata5.00: cmd 60/18:40:09:b5:42/00:00:57:00:00/40 tag 8 ncq 12288 in
Aug 22 05:49:24 localhost kernel: [3833981.422471]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Aug 22 05:49:24 localhost kernel: [3833981.422473] ata5.00: status: { DRDY }
Aug 22 05:49:24 localhost kernel: [3833981.422477] ata5: hard resetting link
Aug 22 05:49:24 localhost kernel: [3833981.726670] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Aug 22 05:49:25 localhost kernel: [3833981.805128] ata5.00: configured for UDMA/133
Aug 22 05:49:25 localhost kernel: [3833981.805135] ata5.00: device reported invalid CHS sector 0
Aug 22 05:49:25 localhost kernel: [3833981.805139] ata5.00: device reported invalid CHS sector 0
Aug 22 05:49:25 localhost kernel: [3833981.805142] ata5.00: device reported invalid CHS sector 0
Aug 22 05:49:25 localhost kernel: [3833981.805145] ata5.00: device reported invalid CHS sector 0
Aug 22 05:49:25 localhost kernel: [3833981.805147] ata5.00: device reported invalid CHS sector 0
Aug 22 05:49:25 localhost kernel: [3833981.805151] ata5.00: device reported invalid CHS sector 0
Aug 22 05:49:25 localhost kernel: [3833981.805153] ata5.00: device reported invalid CHS sector 0
Aug 22 05:49:25 localhost kernel: [3833981.805156] ata5.00: device reported invalid CHS sector 0
Aug 22 05:49:25 localhost kernel: [3833981.805159] ata5.00: device reported invalid CHS sector 0
Aug 22 05:49:25 localhost kernel: [3833981.805172] ata5: EH complete


Part of guest Vbox.log

00:00:00.240 VirtualBox 4.0.10 r72479 linux.amd64 (Jun 24 2011 15:48:02) release log
00:00:00.240 Log opened 2012-04-02T23:14:26.648753000Z
00:00:08.393 OS Product: Linux
00:00:08.393 OS Release: 2.6.37-gentoo-r4
00:00:08.393 OS Version: #1 SMP Mon May 16 20:27:20 CEST 2011
00:00:08.393 DMI Product Name: X8STi

Version 0, edited 2 years ago by sirkubax (next)

comment:79 Changed 2 years ago by sirkubax

I did switch to 4.1.10, and up to now, there is no error, but I did not stress the disk fully yet. I will let you know after weekend :-)

Here is interensting quotiation.

 https://forums.virtualbox.org/viewtopic.php?f=3&t=46722&p=210902&hilit=invalid+chs+sector#p211916

One of the most important answers is right there - if you're on a Linux host and doing heavy disk I/O, do not use the host cache for the VMs, ever. The Linux I/O subsystem not very smart, it batches gobs of dirty pages in the filesystem cache, and when it runs out of free memory, flushes out everything to disk. That can take quite a long time (minutes) and there's nothing VirtualBox can do about it.

The asynchronous I/O in VirtualBox was designed explicitly to work around this host OS deficiency. The I/O doesn't go through the host's cache and is written to disk much more frequently in smaller chunks. However, VirtualBox isn't necessarily the only process running on the host and something else still may trigger the undesirable behavior.

The corollary to the above is obvious: If your host can't cope with the I/O load generated by the VMs plus the rest of the system, there will be trouble. Virtualization isn't magic and can't turn a slow disk into a fast one.

There might be a case with dropping some sectors on disk bandwidth limit.

comment:80 Changed 22 months ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Please reopen if still relevant with VBox 4.1.18.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use