VirtualBox

Opened 8 years ago

Closed 8 years ago

#15223 closed defect (fixed)

VBoxNetNAT high CPU utilisation => Fixed in SVN

Reported by: planck_length Owned by:
Component: network/NAT Version: VirtualBox 5.0.16
Keywords: Cc:
Guest type: other Host type: Mac OS X

Description

I'm using NAT network on my Linux guests. Host is OS X 10.11.3

When I run Activity Monitor on OS X, VBoxNetNAT is always at 100% CPU utilisation (or close to it).

This only happens while there is at least one guest running.

Attachments (6)

VBox.log (167.3 KB ) - added by planck_length 8 years ago.
VBoxNetNAT_2016-03-10_161727_UfsN.sample.txt (46.3 KB ) - added by planck_length 8 years ago.
VBoxNetNAT_2016-03-11_112554_9uPT.sample.txt (54.9 KB ) - added by planck_length 8 years ago.
Screen Shot 2016-03-12 at 8.08.33 AM.png (95.4 KB ) - added by planck_length 8 years ago.
Screen Shot 2016-03-14 at 9.25.22 AM.png (99.5 KB ) - added by planck_length 8 years ago.
nat1.log.gz (199.1 KB ) - added by planck_length 8 years ago.

Download all attachments as: .zip

Change History (28)

by planck_length, 8 years ago

Attachment: VBox.log added

comment:1 by Valery Ushakov, 8 years ago

Host type: otherMac OS X

comment:2 by Valery Ushakov, 8 years ago

Please, can you sample the VBoxNetNAT process in that state?

by planck_length, 8 years ago

comment:3 by Valery Ushakov, 8 years ago

I don't see how that sample can correspond to 100% cpu load. Almost all out of 8908 samples are in some blocking/waiting primitive, so most of the time threads wait for stuff to happen. Only a few dozens (i.e. less than 1%) of samples are elsewhere. There's one call to tcp timer on lwIP2 thread (they are called every 250ms, but only when there are active tcp pcbs). There's some activity - a dozen of samples on lwIP3 thread that polls the host-side sockets. IOW, this sampling looks pretty idle to me.

in reply to:  3 ; comment:4 by planck_length, 8 years ago

Replying to vushakov:

I don't see how that sample can correspond to 100% cpu load...

When took the sample, Activity Monitor said it didn't work, and I had to do it from the command line using sudo. I don't know if that has anything to do with it.

Nevertheless, Activity Monitor shows it consistently at 100%. This only starts happening a few hours after I spin up the first guest, but once it starts, it stays at 100% until I shutdown down all guests.

Judging by the temperature of my laptop, something is obviously chewing up CPU cycles, even if it isn't this process.

Is there anything else I can send you? The output of top or ps perhaps? Or a screenshot of Activity Monitor?

in reply to:  4 comment:5 by planck_length, 8 years ago

Here's the output of ps. It shows the CPU at around 100%

Also note at the end how I can't run sample on this process unless I use sudo. This is despite the fact the process is owned by me.

$ ps aux | grep VBoxNetNAT
na          6196 100.0  0.1  2541024   6752   ??  R    10:23am  28:41.29 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196 100.0  0.1  2541024   6752   ??  R    10:23am  28:51.97 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196  99.9  0.1  2541024   6752   ??  R    10:23am  28:53.48 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196  99.3  0.1  2541024   6752   ??  R    10:23am  29:00.67 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196 100.0  0.1  2541024   6752   ??  R    10:23am  29:01.42 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196 100.0  0.1  2541024   6752   ??  R    10:23am  29:01.99 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196 100.0  0.1  2541024   6752   ??  R    10:23am  29:02.64 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ sample VBoxNetNAT
sample[6363]: sample cannot find any existing process you have access to which has a name like 'VBoxNetNAT'; try running with `sudo`.

I've taken another sample, this time for 100 seconds. Interestingly the CPU utilisation of VBoxNetNAT drops to 90% while sample is running, presumably because it is being suspended every 1ms.

by planck_length, 8 years ago

comment:6 by Frank Mehnert, 8 years ago

planck_length, any chance to find out which of the VBoxNetNAT threads is responsible for the CPU load? The 'htop' tool might help.

in reply to:  6 comment:7 by planck_length, 8 years ago

Replying to frank:

planck_length, any chance to find out which of the VBoxNetNAT threads is responsible for the CPU load? The 'htop' tool might help.

I installed htop using homebrew, but I can't find any way of showing individual thread usage. Pressing the H key doesn't do anything. Any ideas?

I'm going to attach a screen shot of htop. It looks like most of the CPU usage is occurring in the kernel (red CPU bars). So perhaps there's a system call that's responsible?

by planck_length, 8 years ago

comment:8 by Valery Ushakov, 8 years ago

Hmm, system ps(1) has -M option to show information for individual threads, but there seems to be no way to identify individual threads...

Aside - for htop(1) output, please, use text display instead of bars in the future. I have mild deuteranomaly, so telling apart red and green lines 1-2 pixels wide is not impossible, but is very hard.

in reply to:  8 comment:9 by planck_length, 8 years ago

Replying to vushakov:

Hmm, system ps(1) has -M option to show information for individual threads, but there seems to be no way to identify individual threads...

How frustrating! Would it take long for one of you guys to write some code that I could compile and run here, that will list each thread ID for a given process ID, along with its CPU usage time? Then I could run it here and send you the results.

Aside - for htop(1) output, please, use text display instead of bars in the future. I have mild deuteranomaly, so telling apart red and green lines 1-2 pixels wide is not impossible, but is very hard.

Do you know how I can get htop to do this?

I'm keen to help you nail this bug, so if there's anything I can do to help, please let me know.

Version 0, edited 8 years ago by planck_length (next)

by planck_length, 8 years ago

comment:10 by Valery Ushakov, 8 years ago

You switched htop to monochrome color scheme. What I meant was switching it to <F2> (Setup) -> Meters -> CPUs (1/1) [Text] - from [Bar]. You can toggle the display with space or enter, different versions seems to use different keys. This is just for future reference, don't attach another screenshot.

comment:11 by Valery Ushakov, 8 years ago

To get kernel stack traces, please, can you run

$ sudo spindump VBoxNetNAT

in reply to:  11 comment:12 by planck_length, 8 years ago

Replying to vushakov:

To get kernel stack traces, please, can you run

$ sudo spindump VBoxNetNAT

The sample file is too large to upload to here, so you'll need to download it from http://buildoneforme.com/VBoxNetNAT_6196.spindump.txt.gz

comment:13 by Valery Ushakov, 8 years ago

Please, can you download and install the latest 5.0 test build (revision 106028+).

Start VirtualBox manually with something like

$ (export VBOXNET_nat1_RELEASE_LOG="+nat_service.e.l.l2.l3.f -nat_service.restrict"; \
   export VBOXNET_nat1_RELEASE_LOG_FLAGS="thread timeprog"; \
   VirtualBox &)

where nat1 in the variable names is the name of your NAT Network as per the VBox.log file you've attached.

This will generate quite a bit of extra info in the ~/Library/VirtualBox/nat1.log log file.

To stop generating those extra logs, save or shutdown all VMs and quit VirtualBox manager. Wait for VBoxSVC process to terminate automatically after a small timeout. Restart VirtualBox without those environment variables.

Thanks.

by planck_length, 8 years ago

Attachment: nat1.log.gz added

in reply to:  13 comment:14 by planck_length, 8 years ago

Replying to vushakov:

This will generate quite a bit of extra info in the ~/Library/VirtualBox/nat1.log log file.

You don't say!

It took 2 hours before the problem started again, and another hour before I realised it was happening. By then the log file was already 17G!! I truncated it down to 2MB and gzipped that. It's easy to see where the problem starts: it's "sock 54: HUP" all the way down.

comment:15 by Valery Ushakov, 8 years ago

Thanks! I have reproduced the problem now. My guess is that some external client connects to VM's http server via port-forwarding, requests a lot of data, half-closes its tx side and/then aborts the transfer. Or something along this lines. You can probably match this with whatever is really happening in your setup so that you can reproduce this easier.

I'm working on a fix.

in reply to:  15 comment:16 by planck_length, 8 years ago

Replying to vushakov:

Thanks! I have reproduced the problem now. My guess is that some external client connects to VM's http server via port-forwarding, requests a lot of data, half-closes its tx side and/then aborts the transfer. Or something along this lines. You can probably match this with whatever is really happening in your setup so that you can reproduce this easier.

I'm working on a fix.

Fantastic! Yes, I'm using a browser on the host to request data from a server running on the VM.

comment:17 by Valery Ushakov, 8 years ago

Please, can you give test build 106083+ a try?

in reply to:  17 comment:18 by planck_length, 8 years ago

Replying to vushakov:

Please, can you give test build 106083+ a try?

Apologies for the late reply. I just installed it, and so far no problem. However, I'm not sure whether I know exactly how to reproduce the problem, so I'll keep it running for a few days and see if it reoccurs.

comment:19 by Valery Ushakov, 8 years ago

Does it still hold out?

in reply to:  19 comment:20 by planck_length, 8 years ago

Replying to vushakov:

Does it still hold out?

No problems so far - all good here.

comment:21 by Valery Ushakov, 8 years ago

Summary: VBoxNetNAT high CPU utilisationVBoxNetNAT high CPU utilisation => Fixed in SVN

comment:22 by Frank Mehnert, 8 years ago

Resolution: fixed
Status: newclosed

Fix is part of VBox 5.0.18.

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use