VirtualBox

Ticket #15223 (closed defect: fixed)

Opened 21 months ago

Last modified 19 months ago

VBoxNetNAT high CPU utilisation => Fixed in SVN

Reported by: planck_length Owned by:
Priority: major Component: network/NAT
Version: VirtualBox 5.0.16 Keywords:
Cc: Guest type: other
Host type: Mac OS X

Description

I'm using NAT network on my Linux guests. Host is OS X 10.11.3

When I run Activity Monitor on OS X, VBoxNetNAT is always at 100% CPU utilisation (or close to it).

This only happens while there is at least one guest running.

Attachments

VBox.log Download (167.3 KB) - added by planck_length 21 months ago.
VBoxNetNAT_2016-03-10_161727_UfsN.sample.txt Download (46.3 KB) - added by planck_length 21 months ago.
VBoxNetNAT_2016-03-11_112554_9uPT.sample.txt Download (54.9 KB) - added by planck_length 21 months ago.
Screen Shot 2016-03-12 at 8.08.33 AM.png Download (95.4 KB) - added by planck_length 21 months ago.
Screen Shot 2016-03-14 at 9.25.22 AM.png Download (99.5 KB) - added by planck_length 21 months ago.
nat1.log.gz Download (199.1 KB) - added by planck_length 20 months ago.

Change History

Changed 21 months ago by planck_length

comment:1 Changed 21 months ago by vushakov

  • Host type changed from other to Mac OS X

comment:2 Changed 21 months ago by vushakov

Please, can you sample the VBoxNetNAT process in that state?

Changed 21 months ago by planck_length

comment:3 follow-up: ↓ 4 Changed 21 months ago by vushakov

I don't see how that sample can correspond to 100% cpu load. Almost all out of 8908 samples are in some blocking/waiting primitive, so most of the time threads wait for stuff to happen. Only a few dozens (i.e. less than 1%) of samples are elsewhere. There's one call to tcp timer on lwIP2 thread (they are called every 250ms, but only when there are active tcp pcbs). There's some activity - a dozen of samples on lwIP3 thread that polls the host-side sockets. IOW, this sampling looks pretty idle to me.

comment:4 in reply to: ↑ 3 ; follow-up: ↓ 5 Changed 21 months ago by planck_length

Replying to vushakov:

I don't see how that sample can correspond to 100% cpu load...

When took the sample, Activity Monitor said it didn't work, and I had to do it from the command line using sudo. I don't know if that has anything to do with it.

Nevertheless, Activity Monitor shows it consistently at 100%. This only starts happening a few hours after I spin up the first guest, but once it starts, it stays at 100% until I shutdown down all guests.

Judging by the temperature of my laptop, something is obviously chewing up CPU cycles, even if it isn't this process.

Is there anything else I can send you? The output of top or ps perhaps? Or a screenshot of Activity Monitor?

comment:5 in reply to: ↑ 4 Changed 21 months ago by planck_length

Here's the output of ps. It shows the CPU at around 100%

Also note at the end how I can't run sample on this process unless I use sudo. This is despite the fact the process is owned by me.

$ ps aux | grep VBoxNetNAT
na          6196 100.0  0.1  2541024   6752   ??  R    10:23am  28:41.29 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196 100.0  0.1  2541024   6752   ??  R    10:23am  28:51.97 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196  99.9  0.1  2541024   6752   ??  R    10:23am  28:53.48 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196  99.3  0.1  2541024   6752   ??  R    10:23am  29:00.67 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196 100.0  0.1  2541024   6752   ??  R    10:23am  29:01.42 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196 100.0  0.1  2541024   6752   ??  R    10:23am  29:01.99 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ ps aux | grep VBoxNetNAT
na          6196 100.0  0.1  2541024   6752   ??  R    10:23am  29:02.64 /Applications/VirtualBox.app/Contents/MacOS/VBoxNetNAT --ip-address 10.0.2.1 --netmask 255.255.255.0 --network nat1 --trunk-type whatever

$ sample VBoxNetNAT
sample[6363]: sample cannot find any existing process you have access to which has a name like 'VBoxNetNAT'; try running with `sudo`.

I've taken another sample, this time for 100 seconds. Interestingly the CPU utilisation of VBoxNetNAT drops to 90% while sample is running, presumably because it is being suspended every 1ms.

Changed 21 months ago by planck_length

comment:6 follow-up: ↓ 7 Changed 21 months ago by frank

planck_length, any chance to find out which of the VBoxNetNAT threads is responsible for the CPU load? The 'htop' tool might help.

comment:7 in reply to: ↑ 6 Changed 21 months ago by planck_length

Replying to frank:

planck_length, any chance to find out which of the VBoxNetNAT threads is responsible for the CPU load? The 'htop' tool might help.

I installed htop using homebrew, but I can't find any way of showing individual thread usage. Pressing the H key doesn't do anything. Any ideas?

I'm going to attach a screen shot of htop. It looks like most of the CPU usage is occurring in the kernel (red CPU bars). So perhaps there's a system call that's responsible?

Changed 21 months ago by planck_length

comment:8 follow-up: ↓ 9 Changed 21 months ago by vushakov

Hmm, system ps(1) has -M option to show information for individual threads, but there seems to be no way to identify individual threads...

Aside - for htop(1) output, please, use text display instead of bars in the future. I have mild deuteranomaly, so telling apart red and green lines 1-2 pixels wide is not impossible, but is very hard.

comment:9 in reply to: ↑ 8 Changed 21 months ago by planck_length

Replying to vushakov:

Hmm, system ps(1) has -M option to show information for individual threads, but there seems to be no way to identify individual threads...

How frustrating! Would it take long for one of you guys to write some code that I could compile and run here, that will list each thread ID for a given process ID, along with its CPU usage time? Then I could run it here and send you the results.

Aside - for htop(1) output, please, use text display instead of bars in the future. I have mild deuteranomaly, so telling apart red and green lines 1-2 pixels wide is not impossible, but is very hard.

Do you know how I can get htop to do this? EDIT: I figured it out - attaching screenshot now. '*' is kernel and '#' is user,

I'm keen to help you nail this bug, so if there's anything I can do to help, please let me know.

Last edited 21 months ago by planck_length (previous) (diff)

Changed 21 months ago by planck_length

comment:10 Changed 21 months ago by vushakov

You switched htop to monochrome color scheme. What I meant was switching it to <F2> (Setup) -> Meters -> CPUs (1/1) [Text] - from [Bar]. You can toggle the display with space or enter, different versions seems to use different keys. This is just for future reference, don't attach another screenshot.

comment:11 follow-up: ↓ 12 Changed 21 months ago by vushakov

To get kernel stack traces, please, can you run

$ sudo spindump VBoxNetNAT

comment:12 in reply to: ↑ 11 Changed 21 months ago by planck_length

Replying to vushakov:

To get kernel stack traces, please, can you run

$ sudo spindump VBoxNetNAT

The sample file is too large to upload to here, so you'll need to download it from  http://buildoneforme.com/VBoxNetNAT_6196.spindump.txt.gz

comment:13 follow-up: ↓ 14 Changed 20 months ago by vushakov

Please, can you download and install the latest 5.0 test build (revision 106028+).

Start VirtualBox manually with something like

$ (export VBOXNET_nat1_RELEASE_LOG="+nat_service.e.l.l2.l3.f -nat_service.restrict"; \
   export VBOXNET_nat1_RELEASE_LOG_FLAGS="thread timeprog"; \
   VirtualBox &)

where nat1 in the variable names is the name of your NAT Network as per the VBox.log file you've attached.

This will generate quite a bit of extra info in the ~/Library/VirtualBox/nat1.log log file.

To stop generating those extra logs, save or shutdown all VMs and quit VirtualBox manager. Wait for VBoxSVC process to terminate automatically after a small timeout. Restart VirtualBox without those environment variables.

Thanks.

Changed 20 months ago by planck_length

comment:14 in reply to: ↑ 13 Changed 20 months ago by planck_length

Replying to vushakov:

This will generate quite a bit of extra info in the ~/Library/VirtualBox/nat1.log log file.

You don't say!

It took 2 hours before the problem started again, and another hour before I realised it was happening. By then the log file was already 17G!! I truncated it down to 2MB and gzipped that. It's easy to see where the problem starts: it's "sock 54: HUP" all the way down.

comment:15 follow-up: ↓ 16 Changed 20 months ago by vushakov

Thanks! I have reproduced the problem now. My guess is that some external client connects to VM's http server via port-forwarding, requests a lot of data, half-closes its tx side and/then aborts the transfer. Or something along this lines. You can probably match this with whatever is really happening in your setup so that you can reproduce this easier.

I'm working on a fix.

comment:16 in reply to: ↑ 15 Changed 20 months ago by planck_length

Replying to vushakov:

Thanks! I have reproduced the problem now. My guess is that some external client connects to VM's http server via port-forwarding, requests a lot of data, half-closes its tx side and/then aborts the transfer. Or something along this lines. You can probably match this with whatever is really happening in your setup so that you can reproduce this easier.

I'm working on a fix.

Fantastic! Yes, I'm using a browser on the host to request data from a server running on the VM.

comment:17 follow-up: ↓ 18 Changed 20 months ago by vushakov

Please, can you give test build 106083+ a try?

comment:18 in reply to: ↑ 17 Changed 20 months ago by planck_length

Replying to vushakov:

Please, can you give test build 106083+ a try?

Apologies for the late reply. I just installed it, and so far no problem. However, I'm not sure whether I know exactly how to reproduce the problem, so I'll keep it running for a few days and see if it reoccurs.

comment:19 follow-up: ↓ 20 Changed 20 months ago by vushakov

Does it still hold out?

comment:20 in reply to: ↑ 19 Changed 20 months ago by planck_length

Replying to vushakov:

Does it still hold out?

No problems so far - all good here.

comment:21 Changed 20 months ago by vushakov

  • Summary changed from VBoxNetNAT high CPU utilisation to VBoxNetNAT high CPU utilisation => Fixed in SVN

comment:22 Changed 19 months ago by frank

  • Status changed from new to closed
  • Resolution set to fixed

Fix is part of VBox 5.0.18.

Note: See TracTickets for help on using tickets.

www.oracle.com
ContactPrivacy policyTerms of Use