#10031 reopened defect

High I/O causing filesystem corruption

Reported by:	Indigo42	Owned by:
Component:	virtual disk	Version:	VirtualBox 4.1.4
Keywords:		Cc:
Guest type:	Linux	Host type:	Linux

Description (last modified by Klaus Espenlaub)

Hello, We are running an Ubuntu 10.04 32bit guest on a RHEL 64 bit host. The guest storage control is SATA AHCI with host I/O caching enabled. Host info: OS Type: Linux (2.6.18-194.el5) VirtualBox: 4.1.4 (74291) Processors: Intel(R) Xeon(R) CPU 3060 @ 2.40GHz (2) HWVirtEx, PAE, Long Mode (64-bit)

It seems during times of high I/O, like backups and etc, the guest filesystem resets to read-only. We have this issue with both ext3 and ext4 filesystems. Switching to ext3 seemed to lessen the occurrences though.

Here is a snip of syslog around the time of the corruption..

Dec 15 08:01:48 vault sm-mta[1437]: rejecting connections on daemon MTA-v4: load average: 12
Dec 15 08:01:48 vault sm-mta[1437]: rejecting connections on daemon MSP-v4: load average: 12
Dec 15 08:02:03 vault sm-mta[1437]: rejecting connections on daemon MTA-v4: load average: 12
Dec 15 08:02:03 vault sm-mta[1437]: rejecting connections on daemon MSP-v4: load average: 12
Dec 15 08:02:18 vault sm-mta[1437]: rejecting connections on daemon MTA-v4: load average: 12
Dec 15 08:02:18 vault sm-mta[1437]: rejecting connections on daemon MSP-v4: load average: 12
Dec 15 08:02:29 vault kernel: [ 2338.011698] ata4.00: exception Emask 0x0 SAct 0xffff SErr 0x0 action 0x6 frozen
Dec 15 08:02:29 vault kernel: [ 2338.011714] ata4.00: failed command: WRITE FPDMA QUEUED
Dec 15 08:02:29 vault kernel: [ 2338.011721] ata4.00: cmd 61/00:00:b7:5c:9d/04:00:09:00:00/40 tag 0 ncq 524288 out
Dec 15 08:02:29 vault kernel: [ 2338.011723]          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
Dec 15 08:02:29 vault kernel: [ 2338.011726] ata4.00: status: { DRDY }
Dec 15 08:02:29 vault kernel: [ 2338.011729] ata4.00: failed command: WRITE FPDMA QUEUED
Dec 15 08:02:29 vault kernel: [ 2338.011736] ata4.00: cmd 61/00:08:77:31:9d/04:00:09:00:00/40 tag 1 ncq 524288 out
Dec 15 08:02:29 vault kernel: [ 2338.011737]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 15 08:02:29 vault kernel: [ 2338.011740] ata4.00: status: { DRDY }
Dec 15 08:02:29 vault kernel: [ 2338.011743] ata4.00: failed command: WRITE FPDMA QUEUED
Dec 15 08:02:29 vault kernel: [ 2338.011749] ata4.00: cmd 61/00:10:77:35:9d/04:00:09:00:00/40 tag 2 ncq 524288 out
Dec 15 08:02:29 vault kernel: [ 2338.011751]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

And then later on in the file....

Dec 15 08:02:29 vault kernel: [ 2338.011969] ata4: hard resetting link
Dec 15 08:02:29 vault kernel: [ 2338.332268] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Dec 15 08:02:33 vault kernel: [ 2342.108804] ata4.00: configured for UDMA/133
Dec 15 08:02:33 vault kernel: [ 2342.108811] ata4.00: device reported invalid CHS sector 0
Dec 15 08:02:33 vault kernel: [ 2342.108815] ata4.00: device reported invalid CHS sector 0
Dec 15 08:02:33 vault kernel: [ 2342.108818] ata4.00: device reported invalid CHS sector 0
Dec 15 08:02:33 vault kernel: [ 2342.108820] ata4.00: device reported invalid CHS sector 0
Dec 15 08:02:33 vault kernel: [ 2342.108823] ata4.00: device reported invalid CHS sector 0
Dec 15 08:02:33 vault kernel: [ 2342.108826] ata4.00: device reported invalid CHS sector 0
Dec 15 08:02:33 vault kernel: [ 2342.108829] ata4.00: device reported invalid CHS sector 0
Dec 15 08:02:33 vault kernel: [ 2342.108831] ata4.00: device reported invalid CHS sector 0
Dec 15 08:02:33 vault kernel: [ 2342.108834] ata4.00: device reported invalid CHS sector 0
Dec 15 08:02:33 vault kernel: [ 2342.108837] ata4.00: device reported invalid CHS sector 0
Dec 15 08:02:33 vault kernel: [ 2342.108840] ata4.00: device reported invalid CHS sector 0

At this point the guest needs to be shut down and the drive mounted and fsck, which complains about allot of IO handle stuff.

Attachments (8)

VBox.log (54.0 KB ) - added by Indigo42 13 years ago.: Logfile for the guest
VBox.log.1 (93.5 KB ) - added by Indigo42 13 years ago.: Second Logfile for the Guest.
VBox.2.log (153.6 KB ) - added by eguerin 7 years ago.
VBoxHardening.log (340.4 KB ) - added by eguerin 7 years ago.
VBox.log.2.1 (83.2 KB ) - added by eguerin 7 years ago.
VBox.log.2 (86.1 KB ) - added by eguerin 7 years ago.
VBox.log.3 (491.2 KB ) - added by eguerin 7 years ago.
messages (149.1 KB ) - added by eguerin 7 years ago.

Download all attachments as: .zip

Change History (33)

by Indigo42, 13 years ago

Attachment:	VBox.log added

Logfile for the guest

by Indigo42, 13 years ago

Attachment:	VBox.log.1 added

Second Logfile for the Guest.

comment:1 by localhostargentina, 13 years ago

This bug happens long time ago, since 3.x versions, i tried to 4.1.6 and kernel 3.1.5 and it still happens. All the guest linux when have a lot of disk usage, they filesystem crash and go to read only. We have to restart, repair filesystem and reboot again to make it work. We tried AMD and Intel (with VT extension) too and all the flags like ioapic, pae, nestedpaging, vtxvpid, different chipsets, hwvirtex, hwvirtexexcl, hpet, largepages, synthcpu and pagefusion on and off, and with/without virtualbox-guest-additions and nothing changes.

Please to reproduce, use various virtualbox virtual machines on the same server and one vm use for a while too much disk resources and the filesystem will crash.

In the host, i have a lot of "page allocation failure" like this:

VBoxHeadless:

page allocation failure: order:9, mode:0x344d2
Pid: 3776, comm: VBoxHeadless Not tainted 3.1.5-gentoo #1
Call Trace:
 [<ffffffff8109e887>] ? warn_alloc_failed+0x106/0x130
 [<ffffffff810a18be>] ? __alloc_pages_nodemask+0x6bc/0x6f9
 [<ffffffffa0020a48>] ? rtR0MemAllocEx+0xbe/0x116 [vboxdrv]
 [<ffffffffa0020dde>] ? rtR0TermNative+0x264/0x54c [vboxdrv]
 [<ffffffffa0020f0f>] ? rtR0TermNative+0x395/0x54c [vboxdrv]
 [<ffffffffa0033edf>] ? g_abExecMemory+0x3e3f/0x180000 [vboxdrv]
 [<ffffffffa003874d>] ? g_abExecMemory+0x86ad/0x180000 [vboxdrv]
 [<ffffffffa0044367>] ? g_abExecMemory+0x142c7/0x180000 [vboxdrv]
 [<ffffffffa0046179>] ? g_abExecMemory+0x160d9/0x180000 [vboxdrv]
 [<ffffffffa0046c30>] ? g_abExecMemory+0x16b90/0x180000 [vboxdrv]
 [<ffffffffa003aac7>] ? g_abExecMemory+0xaa27/0x180000 [vboxdrv]
 [<ffffffffa001cc75>] ? supdrvIOCtl+0x128b/0x23d2 [vboxdrv]
 [<ffffffffa0020a48>] ? rtR0MemAllocEx+0xbe/0x116 [vboxdrv]
 [<ffffffffa001834a>] ? SUPR0Printf+0x1bb/0x235 [vboxdrv]
 [<ffffffff810e3692>] ? do_vfs_ioctl+0x3fd/0x43e
 [<ffffffff8107a99e>] ? sys_futex+0x132/0x144
 [<ffffffff810e370f>] ? sys_ioctl+0x3c/0x61
 [<ffffffff810e36d5>] ? sys_ioctl+0x2/0x61
 [<ffffffff813f293b>] ? system_call_fastpath+0x16/0x1b
Mem-Info:
Node 0 DMA per-cpu:
CPU    0: hi:    0, btch:   1 usd:   0
CPU    1: hi:    0, btch:   1 usd:   0
CPU    2: hi:    0, btch:   1 usd:   0
CPU    3: hi:    0, btch:   1 usd:   0
Node 0 DMA32 per-cpu:
CPU    0: hi:  186, btch:  31 usd:  88
CPU    1: hi:  186, btch:  31 usd:   0
CPU    2: hi:  186, btch:  31 usd:   0
CPU    3: hi:  186, btch:  31 usd:   0
Node 0 Normal per-cpu:
CPU    0: hi:  186, btch:  31 usd:  35
CPU    1: hi:  186, btch:  31 usd:   0
CPU    2: hi:  186, btch:  31 usd:   0
CPU    3: hi:  186, btch:  31 usd:   0
active_anon:55213 inactive_anon:13423 isolated_anon:0
 active_file:591922 inactive_file:592035 isolated_file:0
 unevictable:0 dirty:38310 writeback:280 unstable:0
 free:227453 slab_reclaimable:39250 slab_unreclaimable:26638
 mapped:83791 shmem:0 pagetables:6131 bounce:0
Node 0 DMA free:15924kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15700kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 3254 16132 16132
Node 0 DMA32 free:219600kB min:13620kB low:17024kB high:20428kB active_anon:57196kB inactive_anon:35036kB active_file:755084kB inactive_file:755128kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3332896kB mlocked:0kB dirty:55196kB writeback:220kB mapped:8040kB shmem:0kB slab_reclaimable:43016kB slab_unreclaimable:9960kB kernel_stack:8kB pagetables:2396kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:208 all_unreclaimable? no
lowmem_reserve[]: 0 0 12877 12877
Node 0 Normal free:674288kB min:53896kB low:67368kB high:80844kB active_anon:163656kB inactive_anon:18656kB active_file:1612428kB inactive_file:1613012kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13186560kB mlocked:0kB dirty:98044kB writeback:900kB mapped:327124kB shmem:0kB slab_reclaimable:113984kB slab_unreclaimable:96592kB kernel_stack:2096kB pagetables:22128kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0 0
Node 0 DMA: 1*4kB 0*8kB 1*16kB 1*32kB 2*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15924kB
Node 0 DMA32: 3024*4kB 2090*8kB 1462*16kB 875*32kB 528*64kB 249*128kB 96*256kB 48*512kB 22*1024kB 1*2048kB 0*4096kB = 219600kB
Node 0 Normal: 4490*4kB 5913*8kB 4506*16kB 3159*32kB 1876*64kB 965*128kB 313*256kB 107*512kB 54*1024kB 1*2048kB 0*4096kB = 674288kB
1184083 total pagecache pages
130 pages in swap cache
Swap cache stats: add 9947, delete 9817, find 594/648
Free swap  = 13349756kB
Total swap = 13386492kB
4194288 pages RAM
2629507 pages reserved
893107 pages shared
433355 pages non-shared

and in the guest, they log things like this:

Buffer I/O error on device sda2, logical block 4099963
sd 0:0:0:0: [sda] Unhandled error code
sd 0:0:0:0: [sda]  Result: hostbyte=0x00 driverbyte=0x06
sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 69 65 3e 00 00 08 00
end_request: I/O error, dev sda, sector 6907198
Buffer I/O error on device sda2, logical block 835286
EXT4-fs warning (device sda2): ext4_end_bio:242: I/O error writing to inode 2341288 (offset 0 size 4096 starting block 863400)
JBD2: Detected IO errors while flushing file data on sda2-8

ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata1.00: failed command: WRITE DMA
ata1.00: cmd ca/00:08:d6:a0:1b/00:00:00:00:00/e0 tag 0 dma 4096 out
        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata1.00: status: { DRDY }
ata1: hard resetting link
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: configured for UDMA/133
ata1.00: device reported invalid CHS sector 0
ata1: EH complete

ata1.00: qc timeout (cmd 0xec)
ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata1.00: revalidation failed (errno=-5)
ata1: hard resetting link

ata1.00: status: { DRDY }
ata1.00: failed command: WRITE FPDMA QUEUED
ata1.00: cmd 61/08:30:a6:a1:44/00:00:04:00:00/40 tag 6 ncq 4096 out
        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

ata1.00: device reported invalid CHS sector 0

Please fix this problem, we are still waiting for a long time for this fix.

Regards;

comment:2 by msurkova, 13 years ago

We also have similar messages at guest logs sometimes during high I/O load (sometimes caused by guests, sometimes caused by host applications):

Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771243] ata4.00: status: { DRDY } Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771247] ata4.00: failed command: WRITE FPDMA QUEUED Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771253] ata4.00: cmd 61/08:08:a7:43:48/00:00:07:00:00/40 tag 1 ncq 4096 out Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771257] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771261] ata4.00: status: { DRDY } Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771265] ata4.00: failed command: WRITE FPDMA QUEUED Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771270] ata4.00: cmd 61/10:10:2f:1b:87/00:00:07:00:00/40 tag 2 ncq 8192 out Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771273] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771277] ata4.00: status: { DRDY } Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771280] ata4.00: failed command: WRITE FPDMA QUEUED Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771285] ata4.00: cmd 61/b0:18:87:a4:87/00:00:07:00:00/40 tag 3 ncq 90112 out Dec 28 17:09:07 xxxxxxxxx kernel: [182026.771288] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

However we do not have drastic consequences like filesystem corruption maybe because problem goes away before guest OS retries operation. We are running 64-bit Ubuntu 10.04 server on 24-CPU server with 64Gb RAM as host and mix of OpenSuse 11.3 and Ubuntu 10.10 as guests. host cache is enabled since VM disk images are on ext4 filesystem. After digging around it seems that it may be not a bug in VirtualBox itself, but rather I/O timeout issue when host OS flushes dirty disk buffers to disk storage. This may result in VirtualBox disk I/O subsystem to stop on write() call for some time and in turn cause timeouts in guest SATA code. So this may be problem for any type 2 virtualizer running on top of host file system cache.

If this is sensible, I would like to get advice how to prevent such undesired behavior. Following approaches are possible: 1). Upgrade host OS kernel to 2.6.36+ and disable "host I/O cache" for virtual SATA adapters. This will allow VirtualBox to bypass host file system buffers, not pollute them with dirty pages and not be stopped during buffer flush. In fact this is even recommended in VirtualBox manual (except for mentioning issue with ext4 host file system) 2). Try to tune host os page flusher behavior to avoid accumulating of significant amount of dirty pages in host file system buffers. For example, set vm.dirty_background_ratio to 5 or below (this is still large size - about 1Gb) or set vm.dirty_background_bytes to something smaller like 20971520 to make page flusher more active. Note that this may make host file system cache less effective, but it prevents write operation stuck during buffer flush. 3). Tune VirtualBox to flush VM image files more often to prevent making lot of pages in host buffer cache dirty as described here: http://www.virtualbox.org/manual/ch12.html#configPeriodicFlush

Could VirtualBox developers tell if this makes sense and if so, which approach to take ?

Thanks a lot !

follow-up: 4 comment:3 by g.andrew.stone@…, 13 years ago

I am seeing the exact same thing. Host Ubuntu 11.10/64 guest ubuntu 10.04/64 virtual box 4.1.2

in reply to: 3 comment:4 by Frank Mehnert, 13 years ago

Replying to g.andrew.stone@gmail.com:

I am seeing the exact same thing. Host Ubuntu 11.10/64 guest ubuntu 10.04/64 virtual box 4.1.2

Please try VirtualBox 4.1.8.

comment:5 by Olegch, 13 years ago

I am experiencing exactly same problem with VirtualBox 4.1.8

Host: Ubuntu Linux 3.0.0-15-generic #26-Ubuntu SMP Fri Jan 20 17:23:00 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Guest: CentOS 5 Linux 2.6.18-274.17.1.el5 #1 SMP Tue Jan 10 17:25:58 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

comment:6 by FrozenCow, 13 years ago

I seem to have exactly the same problem. The host machine runs Ubuntu 10.04 with VirtualBox 4.1.8r75467. It runs 5 virtual machines, both Arch Linux and Ubuntu 10.04. I've been running VirtualBox for a year now in this setup and it has giving me this error a number of times over the year. When this happens most guests output lines with "failed command: WRITE FPDMA QUEUED". The host machine outputs following to dmesg:

[2583901.668803] VBoxHeadless: page allocation failure. order:4, mode:0x4020
[2583901.668808] Pid: 1482, comm: VBoxHeadless Not tainted 2.6.32-21-generic #32-Ubuntu
[2583901.668811] Call Trace:
[2583901.668822]  [<ffffffff810f9d7e>] __alloc_pages_slowpath+0x56e/0x580
[2583901.668827]  [<ffffffff810f9eee>] __alloc_pages_nodemask+0x15e/0x1a0
[2583901.668832]  [<ffffffff81131c12>] kmalloc_large_node+0x62/0xb0
[2583901.668837]  [<ffffffff81136265>] __kmalloc_node_track_caller+0x125/0x180
[2583901.668842]  [<ffffffff81452e7d>] ? dev_alloc_skb+0x1d/0x40
[2583901.668845]  [<ffffffff81452b90>] __alloc_skb+0x80/0x190
[2583901.668849]  [<ffffffff81452e7d>] dev_alloc_skb+0x1d/0x40
[2583901.668856]  [<ffffffffa02ccb0a>] vboxNetFltLinuxSkBufFromSG+0x3a/0x410 [vboxnetflt]
[2583901.668861]  [<ffffffffa02ccf85>] vboxNetFltPortOsXmit+0xa5/0xd0 [vboxnetflt]
[2583901.668866]  [<ffffffffa02ce109>] vboxNetFltPortXmit+0x69/0x1d0 [vboxnetflt]
[2583901.668871]  [<ffffffffa02ce0a1>] ? vboxNetFltPortXmit+0x1/0x1d0 [vboxnetflt]
[2583901.668886]  [<ffffffffa0155029>] g_abExecMemory+0x4ec89/0x180000 [vboxdrv]
[2583901.668891]  [<ffffffff81052a60>] ? update_cpu_power+0x80/0xf0
[2583901.668896]  [<ffffffff81540bbf>] ? _spin_lock_irqsave+0x2f/0x40
[2583901.668910]  [<ffffffffa00f5985>] ? VBoxHost_RTSpinlockReleaseNoInts+0x15/0x20 [vboxdrv]
[2583901.668922]  [<ffffffffa0153b77>] ? g_abExecMemory+0x4d7d7/0x180000 [vboxdrv]
[2583901.668934]  [<ffffffffa0159b45>] g_abExecMemory+0x537a5/0x180000 [vboxdrv]
[2583901.668938]  [<ffffffff81540c9e>] ? _spin_lock+0xe/0x20
[2583901.668951]  [<ffffffffa00f5942>] ? VBoxHost_RTSpinlockRelease+0x12/0x20 [vboxdrv]
[2583901.668961]  [<ffffffffa00e9da0>] ? SUPR0ObjAddRefEx+0x130/0x240 [vboxdrv]
[2583901.668976]  [<ffffffffa00fd4ac>] ? VBoxHost_RTHandleTableLookupWithCtx+0x9c/0x130 [vboxdrv]
[2583901.668988]  [<ffffffffa015aa49>] g_abExecMemory+0x546a9/0x180000 [vboxdrv]
[2583901.668999]  [<ffffffffa01876ec>] g_abExecMemory+0x8134c/0x180000 [vboxdrv]
[2583901.669011]  [<ffffffffa01886d0>] g_abExecMemory+0x82330/0x180000 [vboxdrv]
[2583901.669022]  [<ffffffffa0189985>] g_abExecMemory+0x835e5/0x180000 [vboxdrv]
[2583901.669033]  [<ffffffffa0125a5b>] g_abExecMemory+0x1f6bb/0x180000 [vboxdrv]
[2583901.669044]  [<ffffffffa0126770>] ? g_abExecMemory+0x203d0/0x180000 [vboxdrv]
[2583901.669055]  [<ffffffffa0117e81>] g_abExecMemory+0x11ae1/0x180000 [vboxdrv]
[2583901.669067]  [<ffffffffa011511e>] ? g_abExecMemory+0xed7e/0x180000 [vboxdrv]
[2583901.669078]  [<ffffffffa0152907>] g_abExecMemory+0x4c567/0x180000 [vboxdrv]
[2583901.669088]  [<ffffffffa011c2bc>] g_abExecMemory+0x15f1c/0x180000 [vboxdrv]
[2583901.669098]  [<ffffffffa00e899a>] supdrvIOCtlFast+0x8a/0xa0 [vboxdrv]
[2583901.669110]  [<ffffffffa00e82ec>] VBoxDrvLinuxIOCtl_4_1_8+0x4c/0x1d0 [vboxdrv]
[2583901.669114]  [<ffffffff81053980>] ? __dequeue_entity+0x30/0x50
[2583901.669119]  [<ffffffff8101179c>] ? __switch_to+0x1ac/0x320
[2583901.669126]  [<ffffffff81152a92>] vfs_ioctl+0x22/0xa0
[2583901.669129]  [<ffffffff8153e738>] ? thread_return+0x48/0x420
[2583901.669133]  [<ffffffff81152d41>] do_vfs_ioctl+0x81/0x380
[2583901.669137]  [<ffffffff811530c1>] sys_ioctl+0x81/0xa0
[2583901.669141]  [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
[2583901.669144] Mem-Info:
[2583901.669146] Node 0 DMA per-cpu:
[2583901.669149] CPU    0: hi:    0, btch:   1 usd:   0
[2583901.669151] CPU    1: hi:    0, btch:   1 usd:   0
[2583901.669154] CPU    2: hi:    0, btch:   1 usd:   0
[2583901.669157] CPU    3: hi:    0, btch:   1 usd:   0
[2583901.669159] CPU    4: hi:    0, btch:   1 usd:   0
[2583901.669161] CPU    5: hi:    0, btch:   1 usd:   0
[2583901.669164] CPU    6: hi:    0, btch:   1 usd:   0
[2583901.669166] CPU    7: hi:    0, btch:   1 usd:   0
[2583901.669168] Node 0 DMA32 per-cpu:
[2583901.669171] CPU    0: hi:  186, btch:  31 usd: 123
[2583901.669173] CPU    1: hi:  186, btch:  31 usd: 172
[2583901.669175] CPU    2: hi:  186, btch:  31 usd:  29
[2583901.669178] CPU    3: hi:  186, btch:  31 usd:  41
[2583901.669180] CPU    4: hi:  186, btch:  31 usd: 138
[2583901.669182] CPU    5: hi:  186, btch:  31 usd:  11
[2583901.669185] CPU    6: hi:  186, btch:  31 usd: 153
[2583901.669187] CPU    7: hi:  186, btch:  31 usd: 147
[2583901.669189] Node 0 Normal per-cpu:
[2583901.669191] CPU    0: hi:  186, btch:  31 usd:  44
[2583901.669194] CPU    1: hi:  186, btch:  31 usd: 166
[2583901.669196] CPU    2: hi:  186, btch:  31 usd:  64
[2583901.669198] CPU    3: hi:  186, btch:  31 usd:  58
[2583901.669200] CPU    4: hi:  186, btch:  31 usd:  71
[2583901.669203] CPU    5: hi:  186, btch:  31 usd:  14
[2583901.669205] CPU    6: hi:  186, btch:  31 usd:  31
[2583901.669207] CPU    7: hi:  186, btch:  31 usd: 168
[2583901.669213] active_anon:115139 inactive_anon:15245 isolated_anon:8
[2583901.669214]  active_file:2290459 inactive_file:2298973 isolated_file:48
[2583901.669215]  unevictable:0 dirty:474987 writeback:30553 unstable:0
[2583901.669217]  free:28813 slab_reclaimable:136963 slab_unreclaimable:20844
[2583901.669218]  mapped:888150 shmem:86 pagetables:3706 bounce:0
[2583901.669221] Node 0 DMA free:15800kB min:12kB low:12kB high:16kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15216kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[2583901.669232] lowmem_reserve[]: 0 2991 24201 24201
[2583901.669237] Node 0 DMA32 free:85700kB min:2460kB low:3072kB high:3688kB active_anon:2152kB inactive_anon:4308kB active_file:1202816kB inactive_file:1203848kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3063520kB mlocked:0kB dirty:224284kB writeback:13476kB mapped:50360kB shmem:0kB slab_reclaimable:137644kB slab_unreclaimable:8784kB kernel_stack:24kB pagetables:140kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[2583901.669250] lowmem_reserve[]: 0 0 21210 21210
[2583901.669254] Node 0 Normal free:13752kB min:17440kB low:21800kB high:26160kB active_anon:458404kB inactive_anon:56672kB active_file:7959020kB inactive_file:7992044kB unevictable:0kB isolated(anon):32kB isolated(file):192kB present:21719040kB mlocked:0kB dirty:1675664kB writeback:108736kB mapped:3502240kB shmem:344kB slab_reclaimable:410208kB slab_unreclaimable:74592kB kernel_stack:2672kB pagetables:14684kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[2583901.669267] lowmem_reserve[]: 0 0 0 0
[2583901.669270] Node 0 DMA: 2*4kB 2*8kB 2*16kB 2*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15800kB
[2583901.669282] Node 0 DMA32: 65*4kB 10*8kB 3*16kB 4*32kB 3*64kB 202*128kB 155*256kB 18*512kB 6*1024kB 0*2048kB 1*4096kB = 85700kB
[2583901.669292] Node 0 Normal: 2672*4kB 116*8kB 19*16kB 1*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 13744kB
[2583901.669302] 4595822 total pagecache pages
[2583901.669305] 6272 pages in swap cache
[2583901.669307] Swap cache stats: add 12918, delete 6646, find 494548/494821
[2583901.669309] Free swap  = 48799272kB
[2583901.669311] Total swap = 48829432kB
[2583901.749131] 6291440 pages RAM
[2583901.749133] 992618 pages reserved
[2583901.749135] 3897567 pages shared
[2583901.749136] 1382636 pages non-shared

Each time this happened all virtual machines needed a reboot, since they remounted their root-partition as read-only. 3 of those times the disk of one particular (Ubuntu) virtual machine became corrupted: Ubuntu was not able to find some of its files when it rebooted afterwards. Using fsck I was able to get the important files back and needed to reinstall all packages to be able to boot again.

I don't know whether it is a bug in my hardware, Linux/Ubuntu or VirtualBox, but it is quite a severe problem for me.

What msurkova said (disk caching) is what I have been been thinking too after some more digging. I'd also -really- like some advice to try, so that I'm doing what is most sensible and not screw with settings I don't fully understand myself.

Thank you in advance.

comment:7 by V-Alexeev, 13 years ago

We're having exactly the same problem here. VirtualBox 4.1.8, Ubuntu 10.10 host, Ubuntu 11.10 guest.

Although it is much more severe for us: our server has a lot of i/o intensive tasks, and now the problem started to happen to us every 30-40 minutes, rendering guests completely unusable.

follow-up: 9 comment:8 by V-Alexeev, 13 years ago

Configuring periodic flush seems to fix the problem for us (https://www.virtualbox.org/manual/ch12.html#configPeriodicFlush)

in reply to: 8 comment:9 by fade2gray, 13 years ago

Replying to V-Alexeev:

Configuring periodic flush seems to fix the problem for us (https://www.virtualbox.org/manual/ch12.html#configPeriodicFlush)

Thanks for posting this - also worked for me running Ubuntu sever guest on low spec Windows host.

comment:10 by Klaus Espenlaub, 13 years ago

Description:	modified (diff)

Forcing periodic flushes can only be a workaround for some bug in the Linux kernel (losing control over I/O scheduling when lots of unwritten data is in the cache). So I'm not convinced at all that this is a good idea or the final word. VirtualBox does normal user land reads/writes from/to image files. No kernel level trickery involved at all. There have been other complaints about Linux becoming unresponsive when lots of data is written, e.g. https://bugzilla.kernel.org/show_bug.cgi?id=43094 - that one has been closed on formal grounds, but it's highly likely that the nvidia kernel module makes any difference. We'll see if we can provide a very very simple testcase for the Linux folks to reproduce.

comment:11 by m.s.s., 13 years ago

Yes, VirtualBox does normal user land read/writes. But most of normal user land applications are not too sensitive to read/write operation timing while guest kernel running in VirtualBox is. Therefore if write operation in the host takes too long (what can happen if host flushes too much dirty pages to disk) then guest kernel may consider it as SATA timeout and trigger error handling procedure. There are several ways to overcome this issue: 1). Configure virtualbox to bypass host cache (use direct I/O). This may cause issues with ext4 in some kernel versions (like in Ubuntu 10.04) and also may not be enough if host itself performs heavy write operations. 2). Use "raw" VMDK on top of LVM partitions to store VM disks rather than files on file system (this also avoids double file system overhead). 3). Configure host kernel to flush buffer cache more frequently to not create heavy write batches. 4). Use periodic flush in VirtualBox to limit amount of data in host buffer cache.

In general I think that may be only type 1 hypervisors with para-virtual disk drivers are completely free from this issue. Emulating of real hardware on top of user-land processes may always cause such issues.

comment:12 by aeichner, 9 years ago

Resolution:	→ obsolete
Status:	new → closed

Please reopen if still relevant with a recent VirtualBox release.

comment:13 by Steffen Moser, 7 years ago

Resolution:	obsolete
Status:	closed → reopened

Today, I observed file system corruption to two Ubuntu 16.04.3 LTS guests (all of them ext4 filesystems) simultaneously during high load on the host's I/O subsystem. The host is a Oracle Solaris 11.3 (SRU 28) server running VirtualBox 5.2.6. My question: Are there any known regressions in 5.2, maybe especially with Solaris hosts?

I've been running this setup for several years, but never observed these problems before upgrading to VirtualBox 5.2 about a month ago.

The host's typical I/O load patterns haven't changed since the VirtualBox upgrade. Today, I was copying some large files (> 30 GB) from one directory of the host's filesystem to another directory on the same filesystem. The filesystem is ZFS located on a SAS RAID. I recognized that during the copying process the I/O throughput on the host filesystem has been almost stalled. While I anticipated a slowdown of the VirtualBox VMs' throughput during the copy process, I didn't expect any corruption of their filesystems.

by eguerin, 7 years ago

Attachment:	VBox.2.log added

by eguerin, 7 years ago

Attachment:	VBoxHardening.log added

by eguerin, 7 years ago

Attachment:	VBox.log.2.1 added

by eguerin, 7 years ago

Attachment:	VBox.log.2 added

by eguerin, 7 years ago

Attachment:	VBox.log.3 added

by eguerin, 7 years ago

Attachment:	messages added

comment:14 by eguerin, 7 years ago

I have the same problem since 2/3 month.

Host: Windows 10
Guest: Red Hat 7.5 (7.4 too)
4 CPU cores
20 GB RAM
Controller SATA

When I'm decompressing an archive (>2GB) there is many I/O and then it's crash with DRDY message. See attach file "messages" file.

Tested on VirtualBox 5.2.8 and 5.2.6

Last edited 7 years ago by eguerin (previous) (diff)

comment:15 by idar, 7 years ago

I am also having this issue in 5.2.8 host ubuntu 16.04. Guest SLES 12sp3

We are still using 5.1, it works there.

This is stopping us from upgrading to ubuntu 18.04 as there is no package for virtualbox 5.1 on ubuntu 18.04 yet.

comment:16 by eguerin, 7 years ago

Could you, please, reopen this ticket for latest version? That's a critical bug: all vm with high I/O could be impacted. Currently my VM is dead.

comment:17 by cordway, 7 years ago

Same issue on FreeBSD 11.1 with virtualbox-ose-nox11-5.2.12_1.

comment:18 by Fingon Palantir, 7 years ago

Same issue on Ubuntu 18.04 with virtualbox-5.2.16-123759~Ubuntu~bionic

comment:19 by Armandzhao, 7 years ago

Seeing this happening with virtualbox 5.1.26r117224 VMs. Please help fix this.

comment:20 by coyotte, 6 years ago

Same here.

Host : Ubuntu 18.04 + Virtualbox 5.2.18-123745

A big impact each night at backup time

comment:21 by coyotte, 6 years ago

Any hope to have a fix or workaround ?

comment:22 by HonkingTrucker, 6 years ago

I have been suffering from this a lot with FreeBSD 11.2 and FreeBSD 12 hosts and Centos 7 guest using VBox version 5.2.26_3. I have tried different disks, individual vs raid, different file systems and dynamic vs fixed disk image sizes and this error was happening every single time during apt update.

Just out of coincidence I noticed useHostIOCache setting by glancing through the .vbox file. It was set false. I turned it on, rebooted, ran apt update and it went through. There has been zero issues ever since.

Hopefully it helps somebody, but YMMV.

Last edited 6 years ago by HonkingTrucker (previous) (diff)

comment:23 by Persistent_Annoyance, 5 years ago

Just to throw in my 2 cents. Having pretty must all the same issues on both ParrotOs (Debian) and Arch. On startup of the Windows 10 guest filesystem becomes read-only, tonnes of I/O errors, requires running fsck repair in order to boot back into host.

Not really looking for a solution seeing as this has been going on for the better part of a decade. I’m opting to rather ditch virtualbox for either another tool or just dual boot. Thought I’d add this in case anyone is struggling to find recent info on all the errors and problems

comment:24 by Graham Perrin, 4 years ago

Cc: grahamperrin

comment:25 by adrelanos, 4 years ago

Just randomly suggesting something that might help.

The following settings cannot fix already existing filesystem corruption inside the virtual harddrive. However, these settings might prevent future filesystem corruption. Therefore the best time to experiment with these settings is before starting a virtual machine for the first time or at least before issues happened.

VirtualBox -> VM Settings -> Storage -> controller for harddrive -> Name: virtio-scsi -> OK

Or.

VirtualBox -> VM Settings -> Storage -> controller for harddrive -> Name: experiment with other settings -> OK

Does that help?

Note: See TracTickets for help on using tickets.

Download in other formats: