VirtualBox

Opened 3 years ago

Last modified 3 months ago

#20131 new defect

rcu_sched detected stalls on CPUs/tasks: linux guest and host ?ryzen issue

Reported by: RT_db Owned by:
Component: other Version: VirtualBox 6.1.16
Keywords: Cc:
Guest type: Linux Host type: Linux

Description

Linux debian guest on Debian host Virtualbox 6.1.16

Host: Linux 5.6.0-0.bpo.2-amd64 #1 SMP Debian 5.6.14-2~bpo10+1 (2020-06-09) x86_64 GNU/Linux Debian 10 AMD Ryzen 9 3900X 12-Core 32Gb ram

guest 4 cores, 4gb ram dmesg:

[ 225.539622] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:

[ 225.540724] rcu: 0-...!: (27 ticks this GP) idle=c50/0/0x0 softirq=2529/2 530 fqs=0

[ 225.541761] rcu: 1-...!: (0 ticks this GP) idle=ac8/0/0x0 softirq=2101/21 01 fqs=0

[ 225.542767] rcu: 3-...!: (26 ticks this GP) idle=99c/0/0x0 softirq=2173/2 174 fqs=0

[ 225.543767] (detected by 2, t=5271 jiffies, g=2529, q=2)

[ 225.543770] Sending NMI from CPU 2 to CPUs 0:

[ 225.543825] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0 x10

[ 225.544771] Sending NMI from CPU 2 to CPUs 1:

[ 225.544797] NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xe/0 x10

[ 225.545769] Sending NMI from CPU 2 to CPUs 3:

[ 225.545796] NMI backtrace for cpu 3 skipped: idling at native_safe_halt+0xe/0 x10

[ 225.546769] rcu: rcu_sched kthread starved for 5272 jiffies! g2529 f0x0 RCU_G P_WAIT_FQS(5) ->state=0x402 ->cpu=1

[ 225.547792] rcu: RCU grace-period kthread stack dump:

[ 225.548812] rcu_sched I 0 11 2 0x80004000

[ 225.548816] Call Trace:

[ 225.548836] ? schedule+0x2d8/0x760

[ 225.548838] ? switch_to_asm+0x40/0x70

[ 225.548840] ? switch_to_asm+0x40/0x70

[ 225.548842] schedule+0x4a/0xb0

[ 225.548843] schedule_timeout+0x15e/0x300

[ 225.548850] ? next_timer_interrupt+0xd0/0xd0

[ 225.548853] rcu_gp_kthread+0x452/0x8d0

[ 225.548864] kthread+0xf9/0x130

[ 225.548869] ? kfree_call_rcu+0x10/0x10

[ 225.548870] ? kthread_park+0x90/0x90

[ 225.548872] ret_from_fork+0x22/0x40

[ 285.421645] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:

[ 285.422242] rcu: 0-...!: (0 ticks this GP) idle=da0/0/0x0 softirq=2535/25 35 fqs=0

[ 285.422780] rcu: 1-...!: (37 ticks this GP) idle=d4c/0/0x0 softirq=2103/2 103 fqs=0

[ 285.423305] rcu: 3-...!: (46 ticks this GP) idle=b84/0/0x0 softirq=2178/2 179 fqs=0

[ 285.423815] (detected by 2, t=5268 jiffies, g=2541, q=866)

[ 285.423817] Sending NMI from CPU 2 to CPUs 0:

[ 285.423858] NMI backtrace for cpu 0 skipped: idling at native_safe_halt+0xe/0 x10

[ 285.424818] Sending NMI from CPU 2 to CPUs 1:

[ 285.424845] NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xe/0 x10

[ 285.425817] Sending NMI from CPU 2 to CPUs 3:

[ 285.425847] NMI backtrace for cpu 3 skipped: idling at native_safe_halt+0xe/0 x10

[ 285.426815] rcu: rcu_sched kthread starved for 5268 jiffies! g2541 f0x0 RCU_G P_WAIT_FQS(5) ->state=0x402 ->cpu=0

[ 285.427228] rcu: RCU grace-period kthread stack dump:

[ 285.427620] rcu_sched I 0 11 2 0x80004000

[ 285.427623] Call Trace:

[ 285.427629] ? schedule+0x2d8/0x760

[ 285.427630] ? switch_to_asm+0x40/0x70

[ 285.427631] ? switch_to_asm+0x40/0x70

[ 285.427633] schedule+0x4a/0xb0

[ 285.427634] schedule_timeout+0x15e/0x300

[ 285.427637] ? next_timer_interrupt+0xd0/0xd0

[ 285.427640] rcu_gp_kthread+0x452/0x8d0

[ 285.427643] kthread+0xf9/0x130

[ 285.427645] ? kfree_call_rcu+0x10/0x10

[ 285.427647] ? kthread_park+0x90/0x90

[ 285.427648] ret_from_fork+0x22/0x40

System is unusable.

Guest transferred from Intel i5 core system, were it worked perfectly. Same error in Debian 9, 10 and bullseye guests. Tried multiple combination of cores and ram - no effect.

Interestingly Alpine 3.12 guest on same system has no problem.

Current workaround: install guest additions on guest and error messages go.

I've attached the virtualbox logs. host1 - is related to the dmesg output above. host2 and host3 are logs from the same system, just running longer.

Many thanks for your help.

Attachments (2)

host1.log (96.9 KB ) - added by RT_db 3 years ago.
host log 1
host2.log (340.3 KB ) - added by RT_db 3 years ago.
host log 2

Download all attachments as: .zip

Change History (5)

by RT_db, 3 years ago

Attachment: host1.log added

host log 1

by RT_db, 3 years ago

Attachment: host2.log added

host log 2

comment:1 by RT_db, 3 years ago

Adding guest additions didn't ultimately fix the problem - it just delayed the onset of the problem.

comment:2 by rincebrain, 2 years ago

I ran into this or something very much like it on my new Ryzen 5900x, Win10 host and Debian 11 guest (though I reproduced it with a Fedora 35 guest too, for example...), and with some minimal experimenting, found a workaround that's worked for me so far.

I found that "perf top" was good at stalling it out a bit, and doing a "vboxmanage modifyvm foo --hpet on" on the host made the problem occur virtually never or not at all for that VM, even while every other VM without that change was stalling.

Hopefully that's helpful to people who stumble onto this until whatever wacky root cause is run down (or it's just...made the default on AMD systems, heh).

comment:3 by normanb2, 3 months ago

This issue did not appear to affect me until I upgraded from a Ryzen 9 3900X to an AMD Ryzen 9 5950X. I applied your 'VBoxManage modifyvm <vmname> --hpet on' fix and it seems to have gone away. Thank you for your post.

Note: See TracTickets for help on using tickets.

© 2023 Oracle
ContactPrivacy policyTerms of Use