#9975 closed defect (fixed)
Virtual HDD becomes unavailable for guest after a Canceled write is logged
Reported by: | Ariel | Owned by: | |
---|---|---|---|
Component: | virtual disk | Version: | VirtualBox 4.1.6 |
Keywords: | Cc: | ||
Guest type: | Windows | Host type: | Linux |
Description (last modified by )
The guest is a Windows SBS Server 2008 with 2 virtual CPUs, 4 GB of virtual RAM and two virtual HDDs (C: with 40 GB and D: with 20 GB, both with lots of free space). This SBS server acts as an SBS primary server (i.e., AD controller); it has had its MS Exchange and MS SharePoint removed and an SQL Server 2008 has been installed (besides the SQL Server 2005 Express that comes with SBS and which cannot be removed). This is a supported configuration for SBS, even if it's a bit unusual. It has not been customized significantly yet because we were just testing feasibility. The guest is running the latest Guest Additions (matching the host's VirtualBox version).
The problem is the guest works fine... until it doesn't. When the problem appears, guest applications start hanging pretty fast. A few tests have shown that the problem is that the (virtual) C: drive has stopped responding, and appications die only when they try to access it. The failing drive can also be the D: drive instead. The guest system cannot be brought down in this situation: the only option is powering it off or waiting for the Windows kernel to bluescreen and reset.
When the guest system is hanged like that, one or several messages like this one can be seen in the Vbox.log file:
53:11:46.112 AHCI#0: Canceled write at offset 29189742592 (4096 bytes left) returned rc=VINF_SUCCESS
The return code, if that's what it is, is always VINF_SUCCESS. The offset and the amount of bytes left may vary.
Thinking it could be a problem with flushing I have tried setting:
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#0/Config/IgnoreFlush" 0
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#1/Config/IgnoreFlush" 0
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#0/Config/FlushInterval" 1000000
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#1/Config/FlushInterval" 1000000
And:
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#0/Config/IgnoreFlush" 0
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#1/Config/IgnoreFlush" 0
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#0/Config/FlushInterval" 1
VBoxManage setextradata "VM002" "VBoxInternal/Devices/ahci/0/LUN#1/Config/FlushInterval" 1
But both ended up failing with the same error.
The problem is not easily reproduceable. It sometimes fails several times in a row, then sometimes it works for a couple of days before failing again. It might be a VirtualBox bug somehow related to host load (but this server is nowhere near full utilization).
This problem has been found on an HP ML370 G6 dual Quad Xeon host with lots of RAM and 4 SAS 15000 RPM HDDs working in a RAID10 array. The host runs CentOS 5.7 and this host is, as you might guess, unusually fast. This host also runs another VirtualBox guest: it's a Windows Server 2008 R2 system that works fine and hasn't suffered from this problem (fingers crossed). This other guest runs under a different host user.
Attachments (4)
Change History (23)
by , 13 years ago
comment:1 by , 13 years ago
comment:2 by , 13 years ago
No, no messages in dmesg at that time, and nothing suspicious before that.
comment:3 by , 13 years ago
In case you're interested, I have a core dump of VBoxHeadless while the guest is locked up.
comment:4 by , 13 years ago
More info obtained through further testing:
The "Canceled write" messages sometimes appear a while before any symptom appears in the guest. Other times it happens simultaneously. On one occasion I left it hanged and the Windows guest showed a blue screen with error 0x1000009f. On a different occasion I had seen a different error code which I don't have at hand now, but I researched it at the time and it was clearly related to a "missing" C: drive.
comment:5 by , 13 years ago
This problem has now been confirmed with the same virtual machine on a different host (an HP ML110 G7 with SATA disks).
comment:6 by , 13 years ago
On only one occasion I got the following error message *after* all the "Canceled write" ones:
00:01:40.232 VRDP: shadowBufferOrder: pointer 00002aab91f36981/606183713 is out of range [00002aaab5f3b344; 00002aaab6150aa2) after op 7, 00002aaab5f3b364, 2185022
Also, a couple of times I saw the same guest symptoms while there had been no "Canceled write" messages in the log. I now think the canceled write messages may be just a symptom, but not near to the cause of the problem; they are just correlated to it.
I have a couple of snapshots where the problem arises soon almost every time upon doing some operation. Let me know if any of this is useful to you.
comment:7 by , 13 years ago
This bug is still present in VirtualBox 4.1.8 . Please let me know how I can help this get resolved. Thanks.
comment:8 by , 13 years ago
Try setting your HDD as an IDE drive rather than a SATA, I was having the same issue until I did so.
comment:9 by , 13 years ago
I'm hit by the same bug. Under Ubuntu 8.04 Linux 64bits, on 4.1.6 and 4.1.8 Virtualbox, suddently, AHCI was causing this error (was running ok for a long time). I had to switch to ide. The guest S.O. is Ubuntu 8.04 server.
On another server, where i have 8 VMs (all Windows XP Guest), i got hit by this bug randomnly on each of those VMs.
comment:10 by , 12 years ago
I am having issues similar to this, switching to IDE *seems* to resolve the issue.
follow-up: 13 comment:11 by , 12 years ago
Description: | modified (diff) |
---|
Then I assume that enabling the host I/O cache for the SATA controller would have the same effect (VM settings).
comment:12 by , 11 years ago
Have same problem with 4.3.8-92456
313:49:08.022482 AHCI#0P2: Cancelled task 5
313:49:08.069531 AHCI#0: Port 2 reset
313:49:08.425542 AHCI#0P2: Canceled write at offset 13813657088 (4096 bytes left) returned rc=VINF_SUCCESS
313:49:09.082520 AHCI#0: Port 2 reset
This is easily reproduceable, you need the low speed disk system (in my case it is RAID6 array) on the host, ACHI on the guest and heavy write data flow to disk(not long).
For me it's a really problem. I was faced with this problem around half a year ago, and long time sought any information, tried a lot of variants of configuration, and have no result.
Now i move critical important vdi files to SSD, but sometimes like today i faced with this problem on not SSD drive and need to reboot guest. It's very very very very very poorly.
What i should to do to resolve or circumvent this problem? iSCSI?
comment:13 by , 10 years ago
Replying to frank:
Then I assume that enabling the host I/O cache for the SATA controller would have the same effect (VM settings).
Nope. I have seen this problem with host I/O cache enabled. BTW, I am the original post author (karmapolis). I couldn't recover that account after they merged the bug tracker with the Oracle authentication system.
comment:15 by , 10 years ago
I have the same problem occurs almost every hour when copying files. The host system is Ubuntu Linux. The guest is Solaris 11. VirtualBox v4.3.20. When copying a large number of files on the guest system via NFS virtual hard disks fail at the guest about 1 time per hour. VirtualBox writes error: AHCI#0: Port 0 reset AHCI#0: Port 1 reset AHCI#0: Port 2 reset AHCI#0: Port 3 reset AHCI#0: Port 4 reset AHCI#0: Port 5 reset
Guest system writes an error: sata: WARNING: /pci@0,0/pci8086,2829@d:
SATA device detached at port 0
sata: WARNING: /pci@0,0/pci8086,2829@d:
SATA device detached at port 1
sata: WARNING: /pci@0,0/pci8086,2829@d:
SATA device detached at port 2
sata: WARNING: /pci@0,0/pci8086,2829@d:
SATA device detached at port 3
sata: WARNING: /pci@0,0/pci8086,2829@d:
SATA device detached at port 4
sata: WARNING: /pci@0,0/pci8086,2829@d:
SATA device detached at port 5
The order of error varies from case to case, but always 6 disc fails. Guest system then not shutdown normally. VirtualBox falls during shutdown of the host system. This error occurs last 20 starts copying files. I attach logs VirtualBox.
comment:16 by , 10 years ago
I have the same problem, these are the last lines of my syslog ... https://gist.github.com/schmunk42/bb6dafcd4636c827102e
VM is running from an iSCSI device.
comment:17 by , 9 years ago
I keep having this problem with a Windows host with a Linux guest. System runs fine for a few days, then all the SATA ports reset and the load skyrockets. I can magic-sysrq reboot, but the guest never reboots. Resetting the guest locks up the virtualbox window, and I have to abort/end task.
comment:18 by , 7 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Please reopen if still relevant with VBox 5.1.22.
comment:19 by , 2 years ago
i have de same problem in version 6.1.26_Ubuntur145957 its running 2 vm with high io usage the host have a raid10 with 4 sata 3 from time to time 2 or 3 days a vm fail loosing his disk
In VBox.log i see: 152692-15:22:34.884927 VD#1: Write request was active for 30 seconds 152754:15:22:34.884928 VD#1: Aborted write (0 bytes left) returned rc=VERR_PDM_MEDIAEX_IOREQ_CANCELED 152849:15:22:34.884931 AHCI#0P1: Canceled write at offset 144357367808 (1310720 bytes left) returned rc=VERR_PDM_MEDIAEX_IOREQ_CANCELED
i have tried using isci and sata controller in the guest. i have also tried reducing de cpu asigned to vm leaving a phisical free for vbox
Did you see any messages in the host kernel log (dmesg) when this happens?