VirtualBox

Ticket #10040 (new defect)

Opened 2 years ago

Last modified 2 years ago

Periodic crash of Windows Guests on Linux Host

Reported by: z5d69 Owned by:
Priority: major Component: other
Version: VirtualBox 4.1.6 Keywords: Periodic Crash
Cc: david.e.lawrencejr@… Guest type: Windows
Host type: Linux

Description

This has been going on for a while and it's somewhat sporadic, say once a month, that this will happen. I have 3 guests running on an opensuse 11.4 host. The host has 8 GB of ram with 4 AMD cores and I have roughly 6 GB in use at any given point in time (free -m will typically show between 1.7 and 2 GB of ram free under buffers/cache). I also have 5 GB of swap available for use by the host and this is basically unused (maybe 30k out of 5 GB).

One of the 2 windows guests on the host will crash periodically. I don't believe that it has ever happened with the linux guest. Now, this latest time, the guest (SMP install) with a cpucap of 4 and limit set at 80%, crashed while just sitting there. I had rebooted it about 3 weeks ago and it was just sitting there when it crashed.

I start this guest from the command line using nohup and have this output as well. Another strange thing is that the nohup output appears to have more information in it than the log file. Anyway, I'm attaching all of it in the hopes that you can tell me what is wrong.

I have searched through the forums and don't see anything like this.

Attachments

nohup.out Download (135.6 KB) - added by z5d69 2 years ago.
Nohup output file.
all_output.tar.gz Download (103.5 KB) - added by z5d69 2 years ago.
This contains the nohup.out, .vbox config. and vbox.log file.

Change History

Changed 2 years ago by z5d69

Nohup output file.

Changed 2 years ago by z5d69

This contains the nohup.out, .vbox config. and vbox.log file.

comment:1 follow-up: ↓ 2 Changed 2 years ago by frank

According to the log file it seems that the drive where you stored your virtual disk image is very slow. Also, are you running the three guests in parallel?

comment:2 in reply to: ↑ 1 ; follow-up: ↓ 3 Changed 2 years ago by z5d69

Replying to frank:

According to the log file it seems that the drive where you stored your virtual disk image is very slow. Also, are you running the three guests in parallel?

For whatever reason, I had an issue with my raid 1 implementation (1 disk working, the other not synching). Potentially that's the source of that problem. I had to rebuild the raid the VM is on and now the raid is no longer degraded. Since restarting I haven't seen the async IO log message (not sure if that's the basis for the slow disk remark) that I sent however I have also only executed 2 VMs, not 3 at a time.

Yes, typically I run 3 at a time but have cut back to 2 because of this issue.

comment:3 in reply to: ↑ 2 Changed 2 years ago by z5d69

How do I determine the time of an entry in the log? I'm a little confused as to how to identify clock time with log time.

comment:4 Changed 2 years ago by frank

The log file shows the time since the VM started. 287:41:24.759 means 287 hours, 41 minutes and 24 seconds since the VM started. By setting an environment variable this format could also be changed if necessary.

comment:5 Changed 2 years ago by z5d69

Okay, I don't know if this is the end-all/be-all of this issue however I determined that one of the drives underlying the raid 1 that these VMs are running on was defective. Once the drive was replaced, no further issues were identified. Now, that said, WHY would a degraded array cause this. IO performance suffered admittedly as a result of low IO because of the defective drive. When the bad drive "kicked out" however, I would think that the VMs would continue to run fine. I could run them, start them, etc. as the array was still running. In a nutshell, I'm wondering if the "solution" is just a workaround. That is, one drive was slowing down the IO for the VMs and now that IO is running full-speed, the problem no longer shows up BUT if IO latency increased (IE, either more VMs running at once or a slower set of drives) then the problem would show up again?

Does it make sense that a degraded, but still functional, array would cause these VM crashes? Alternatively, is this actually independent of VirtualBox and related to a problem with windows where high-latency in IO would cause a crash? I've never had this happen in a bare iron install of windows but maybe this is because of luck with hard-drives, until now?

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use