id summary reporter owner description type status component version resolution keywords cc guest host 3619 PXE problems with multiple DHCP servers Tobias Evert "We've recently been having problems PXE booting VMs using VB 2.1.4. We're simulating a cluster by running many VMs connected through Internal Networks. The network has two servers which facilitate DHCP and PXE. When booting a VM through PXE there is a high probability (50%) that the boot fails. The network layout:[[BR]] Server 1 IP: 192.168.236.1[[BR]] Server 2 IP: 192.168.236.2[[BR]] Client N IP: 192.168.236.N + 2[[BR]] The PXE output when the boot fails for Client 1: ----Start---- Searching for server (DHCP)......[[BR]] Me: 192.168.236.3, DHCP: 192.168.236.2[[BR]] Loading 192.168.236.2:nodes/3/boot/boot.0 ...(PXE).......done[[BR]] PXELINUX 3.11 0x4639e5ce Copyright (C) 1994-2005 H. Peter Anvin[[BR]] UNDI data segment at: 0009E000[[BR]] UNDI data segment size: 1000[[BR]] UNDI code segment at: 0009F000[[BR]] UNDI code segment size: 0B1D[[BR]] PXE entry point found (we hope) at 9F00:0680 [[BR]] My IP address seems to be C0A8EC03 192.168.236.3 [[BR]] ip=192.168.236.3:192.168.236.1:0.0.0.0:255:255:255:0 [[BR]] TFTP prefix: nodes/3/boot/[[BR]] Trying to load: pxelinux.conf[[BR]] 192.168.236.1 is not in my arp table! [[BR]] 192.168.236.1 is not in my arp table! [[BR]] 192.168.236.1 is not in my arp table! [[BR]] 192.168.236.1 is not in my arp table! [[BR]] ----End----- From what I gather the problem seems to start on the ip-line: [[BR]] ""ip=192.168.236.3:192.168.236.1:0.0.0.0:255:255:255:0"" [[BR]] It shouldn't even know about the 192.168.236.1 server, since it got it's address from the other server. On the occasions that the boot actually works the ip-line looks like: [[BR]] ""ip=192.168.236.3:192.168.236.2:0.0.0.0:255:255:255:0"" [[BR]] Having done packet analysis during the boot-up stage I see that Client 1 sends out a DHCP request (twice), then receiving replies from both servers (2 first), it selects the address it received from the first reply (actually, it gets the same address from both servers), advertises it's choice and gets ACKs back from both Servers. Maybe the problem for Etherboot is that the replies are with the exact same IP addresses, and same Transaction ID. I'm guessing there has to be something special, since multi-DHCP server environments isn't exactly uncommon. I see that the version of Etherboot in VB 2.1.4 isn't the very newest. Maybe there is a fix for this in a newer version. Are there plans to go over to gPXE? I know that it's probably non-trivial, and that you have some local patches against Etherboot in your source, but apart from maintenance, development on Etherboot has stopped, so going over to gPXE has to be done some time. Attached is two dumps in tcpdump format, one when the boot worked, and one where it failed." defect closed network VirtualBox 2.1.4 obsolete PXE Linux Linux