Don't bind to 255.255.255.255
I was trying to debug a problem netbooting a Sun system, as the beginning of a jumpstart process. This process uses RARP and then a tftp of the boot loader from the system that answered the RARP.
What in the world could be going on here? The server, 172.21.14.17, replied from port 32806 in packet 4, but when the booting client replied in packet 5, the server replied with an error in packet 6, that the port was unreachable.
The key is that the tftp server code uses the address to which the request was sent (packet 3) to bind to. This is important on a multi-homed host or a system with multiple IP addresses for virtual hosting - if it decides to reply from a different IP address, the tftp client may not accept the packet for security reasons.
However, if you bind to 255.255.255.255, apparently Linux will send the packet out with the interface's address, but will fail to demux the incoming packet since it's not addressed to 255.255.255.255.
The tftp server in question (atftp) used the following code sequence:
The right thing to do here is to only bind if it's not the broadcast, and move the getsockname after the connect so that if it wasn't explicitly bound, the implicit bind performed by connect still gives it an address and port.
This is one of the more subtle items in the socket interface. On some systems, the bind would have failed immediately, causing the error to be reported by the tftp server; on others, you get this hard-to-debug behavior.
1: 07:02:23.142790 rarp who-is 00:03:ba:1a:fa:43 tell 00:03:ba:1a:fa:43
2: 07:02:23.143043 rarp reply 00:03:ba:1a:fa:43 at 172.21.14.15
3: 07:02:23.143446 IP 172.21.14.15.32769 > 255.255.255.255.tftp: 17 RRQ "AC150E0F" octet
4: 07:02:23.144024 IP 172.21.14.17.32806 > 172.21.14.15.32769: UDP, length 516
5: 07:02:23.144540 IP 172.21.14.15.32769 > 172.21.14.17.32806: UDP, length 4
6: 07:02:23.144553 IP 172.21.14.17 > 172.21.14.15: icmp 40:
172.21.14.17 udp port 32806 unreachable
7: 07:02:27.162049 IP 172.21.14.15.32769 > 172.21.14.17.32806: UDP, length 4
8: 07:02:27.162073 IP 172.21.14.17 > 172.21.14.15: icmp 40:
172.21.14.17 udp port 32806 unreachable
What in the world could be going on here? The server, 172.21.14.17, replied from port 32806 in packet 4, but when the booting client replied in packet 5, the server replied with an error in packet 6, that the port was unreachable.
The key is that the tftp server code uses the address to which the request was sent (packet 3) to bind to. This is important on a multi-homed host or a system with multiple IP addresses for virtual hosting - if it decides to reply from a different IP address, the tftp client may not accept the packet for security reasons.
However, if you bind to 255.255.255.255, apparently Linux will send the packet out with the interface's address, but will fail to demux the incoming packet since it's not addressed to 255.255.255.255.
The tftp server in question (atftp) used the following code sequence:
bind(s, &to, sizeof(to))
getsockname(s, &bound, ...)
connect(s, &client, sizeof(client))
The right thing to do here is to only bind if it's not the broadcast, and move the getsockname after the connect so that if it wasn't explicitly bound, the implicit bind performed by connect still gives it an address and port.
if (to.sin_addr.s_addr != INADDR_BROADCAST)
bind(s, &to, sizeof(to))
connect(s, &client, sizeof(client))
getsockname(s, &bound, ...)
This is one of the more subtle items in the socket interface. On some systems, the bind would have failed immediately, causing the error to be reported by the tftp server; on others, you get this hard-to-debug behavior.

0 Comments:
Post a Comment
<< Home