OpenBSD bug: CARP + Arpbalance on i386 and IPv4

Contents

Top

Introduction

This is a breif article outlining the bugs found with the arpbalance code in every release of OpenBSD since 3.5. The problem in 3.5 was replaced with a different problem in 3.6 onwards. I have written patches (below) which will correct these problems. Please Note: This documentation has had the original subnet addresses replaced with "10.0.0." for privacy purposes. This article only addresses issues in the IPv4 portion of code, and no assumptions are made as to if the problems outlined below affect the IPv6 subsystem. The systems used to detect and diagnose these problems were both Intel machines. The first setup is a load-balancing firewall, with 3 physical interfaces. For load balancing, this requires a total of 6 CARP interfaces. The second setup consisted of two basic PIII machines with single interfaces, requiring only two CARP interfaces for load balancing. DMESGs for these systems can be found at the end of this article. Top

Acknowledgements

I would like to acknowledge Jason Dixon for his initial support and time he gave me in the beginning to help me try and fix my issue. I would also like to acknowlege Sorot Panichprecha for his assistance with C, and for his assistance in helping to determine the problem. Top

Problem Description (3.5)

For a /25 and /24 subnet, the arpbalance code in the OpenBSD 3.5 kernel does not function as documented. Since this isn't a current problem, detail will not be explained here, but suffice to say that the source address of the incoming packet was not being bit-converted from network-order to host-order. This means that the order of bits in the packet needs to be reversed when being used by the host. Since they were not being reversed, the numbers were not representing their proper addresses, and turned out for the subnets mentioned above, ALL addresses were being interpreted by the system as either all-odd or all-even numbers. For an arp-balancing system of only two hosts, this means that only one host was being selected for use. We should mention that if the CARP interfaces were facing the internet, then this problem would not have surfaced - as many subnets would have accessed the CARP host. It was discovered that the last digit of the first portion of the IP address (i.e. __x.___.___.___) was the number that decided which host serviced the request. This was corrected by using the function ntohl() in the carp_iamatch() code before performing a modulus on the source address. This then ordered the number correctly for i386 architectures, such that it was the last digit of the last portion (i.e. ___.___.___.__x) which determined which host serviced the request, which is more desirable for subnets. A patch to apply this to the 3.5 kernel source can be downloaded below. Top

Problem Description (3.6+)

For a /25 and /24 subnet, the arpbalance option did not work as documented. Further to this, logs frequently reported that the CARP IP was duplicated on the network. The carp_iamatch() function from sys/netinet/ip_carp.c had been re-written and differs from the 3.5 kernel, so the 3.5 problem no longer applies to this source. The function prototype had also changed, passing the source MAC address instead of the source packet header, amongst other things. The insertion of the following line into carp_iamatch() revealed the problem: log(LOG_DEBUG,"Address being modulated: %u (%s)\n", ia->ia_addr.sin_addr.s_addr, inet_ntoa(ia->ia_addr.sin_addr)); where ia->ia_addr.sin_addr.s_addr is what is being modulated against to select which virtual host should handle this request, further down in the code. The debug log showed the following: /bsd: Address being modulated: 2114368899 (10.0.0.126) /bsd: Address being modulated: 2114368899 (10.0.0.126) /bsd: Address being modulated: 2114368899 (10.0.0.126) /bsd: duplicate IP address 10.0.0.126 sent from ethernet address 00:00:5e:00:01:15 Although this doesn't seem out of the ordinary, the IP addresses being modulated are actually the addresses of the CARP interfaces receiving the packet, NOT the source IP of the packet. For proof: [root@secure-master ~]# ifconfig carp ... carp11: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 carp: MASTER carpdev em1 vhid 11 advbase 1 advskew 0 inet 10.0.0.126 netmask 0xffffff80 broadcast 10.0.0.127 ... carp21: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 carp: BACKUP carpdev em1 vhid 21 advbase 1 advskew 100 inet 10.0.0.126 netmask 0xffffff80 broadcast 10.0.0.127 (only relevant carp entries shown) The actual sources for those entries were 10.0.0.73, 10.0.0.4 and 10.0.0.23, by SSHing from those hosts to 10.0.0.126. This means that both hosts were using a fixed IP to modulate against, once again resulting in only one host being chosen to handle requests, regardless of the source IP. This also seems to have resulted in the second message above regarding duplicate IP addresses. Top

Solution 1: MAC-Based Balancing

It was noted that, even though the source MAC address was being passed to carp_iamatch(), it was never actually used. Secondly, since the source IP of the packet is no longer being passed to carp_iamatch(), one's first thought would be to modulate on the source MAC. This was implemented easily enough, and a patch can be downloaded below which works on 3.6 -> 3.8. Only the last portion of the MAC address (i.e. __:__:__:__:xx) was considered, which makes this fairly efficient. Anything more than this is probably just going to waste cycles, as it's the last digits that determine the result of the modulus. There are caveats with this solution. Firstly, since the last portion is only 8 bits, this limits the use of arp-balancing virtual hosts to 256. Although this isn't a problem (as using 256 virtual hosts is probably not the best solution to begin with), it is a limition that the IP-based solution does not suffer from. Secondly, it was shown that in my environments, by chance I happened to have a large number of MAC addresses whereby the last portion was even. This resulted in uneven distribution of traffic between the load-balancing hosts. Top

Solution 2: IP-Based Balancing

I preferred to balance based on IP, because at least the traffic can be somewhat equally distributed if the range of IPs in use is being filled consecutively. Unfortunately, this required the modification of the carp_iamatch() function prototype, along with some of its source, and modification of sys/netinet/ if_ether.c so carp_iamatch() was called correctly. This now forces the function to modulate based on source IP, as per the original function in the 3.5 kernel. This also removes the restriction of the first solution being limited to 256 virtual hosts. Downloads for both these patches can be found below. A different patch is required for 3.8 to do IP-based balancing due to changes in if_ether.c which prevents the patch from installing correctly. Top

Patches

Source should be downloaded and patches applied as such: # wget http://some.mirror.com/pub/OpenBSD/3.8/sys.tar.gz # tar -zxvf sys.tar.gz # patch -p0 < patchname.patch
  • OpenBSD 3.5 kernel - Source IP-Based Balancing
  • OpenBSD 3.6-3.8 kernel - Source MAC-Based Balancing
  • OpenBSD 3.6-3.7 kernel - Source IP-Based Balancing
  • OpenBSD 3.8 kernel - Source IP-Based Balancing Top

    DMESG - P4 Blade Firewalls

    OpenBSD 3.7 (GENERIC) #1: Wed Nov 2 09:05:34 EST 2005 root@secure-master:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Intel(R) Pentium(R) 4 CPU 3.00GHz ("GenuineIntel" 686-class) 3 GHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,ACPI, MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,PNI,MWAIT,CNXT-ID real mem = 536387584 (523816K) avail mem = 482652160 (471340K) using 4278 buffers containing 26923008 bytes (26292K) of memory mainbus0 (root) bios0 at mainbus0: AT/286+(00) BIOS, date 12/03/04, BIOS32 rev. 0 @ 0xf0010 pcibios0 at bios0: rev 2.1 @ 0xf0000/0x10000 pcibios0: PCI IRQ Routing Table rev 1.0 @ 0xf51b0/208 (11 entries) pcibios0: no compatible PCI ICU found: ICU vendor 0x8086 product 0x25a1 pcibios0: PCI bus #3 is the last bus bios0: ROM list: 0xc0000/0x8000 cpu0 at mainbus0 pci0 at mainbus0 bus 0: configuration mode 1 (no bios) pchb0 at pci0 dev 0 function 0 "Intel 82875P Host" rev 0x02 ppb0 at pci0 dev 3 function 0 "Intel 82875P PCI-CSA" rev 0x02 pci1 at ppb0 bus 1 em0 at pci1 dev 1 function 0 "Intel PRO/1000CT (82547EI)" rev 0x00: irq 3, address: 00:04:23:b3:6a:f6 ppb1 at pci0 dev 28 function 0 "Intel 6300ESB PCIX" rev 0x02 pci2 at ppb1 bus 2 em1 at pci2 dev 2 function 0 "Intel PRO/1000MT DP (82546EB)" rev 0x03: irq 9, address: 00:04:23:b0:4c:e8 em2 at pci2 dev 2 function 1 "Intel PRO/1000MT DP (82546EB)" rev 0x03: irq 9, address: 00:04:23:b0:4c:e9 uhci0 at pci0 dev 29 function 0 "Intel 6300ESB USB" rev 0x02: irq 5 usb0 at uhci0: USB revision 1.0 uhub0 at usb0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1 at pci0 dev 29 function 1 "Intel 5300ESB USB" rev 0x02: irq 9 usb1 at uhci1: USB revision 1.0 uhub1 at usb1 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered "Intel 6300ESB WDT" rev 0x02 at pci0 dev 29 function 4 not configured "Intel 6300ESB APIC" rev 0x02 at pci0 dev 29 function 5 not configured ehci0 at pci0 dev 29 function 7 "Intel 6300ESB USB" rev 0x02: irq 7 ehci0: EHCI version 1.0 ehci0: companion controllers, 2 ports each: uhci0 uhci1 usb2 at ehci0: USB revision 2.0 uhub2 at usb2 uhub2: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub2: single transaction translator uhub2: 4 ports with 4 removable, self powered ppb2 at pci0 dev 30 function 0 "Intel 82801BA AGP" rev 0x0a pci3 at ppb2 bus 3 vga1 at pci3 dev 0 function 0 "ATI Rage XL" rev 0x27 wsdisplay0 at vga1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) fxp0 at pci3 dev 1 function 0 "Intel 82557" rev 0x10, i82550: irq 11, address 00:04:23:b3:6a:f7 inphy0 at fxp0 phy 1: i82555 10/100 PHY, rev. 4 ichpcib0 at pci0 dev 31 function 0 "Intel 6300ESB LPC" rev 0x02 pciide0 at pci0 dev 31 function 2 "Intel 6300ESB SATA" rev 0x02: DMA, channel 0 configured to compatibility, channel 1 configured to compatibility atapiscsi0 at pciide0 channel 0 drive 1 scsibus0 at atapiscsi0: 2 targets cd0 at scsibus0 targ 0 lun 0: <TOSHIBA, DVD-ROM SD-C2102, 1139> SCSI0 5/cdrom removable cd0(pciide0:0:1): using PIO mode 4, DMA mode 2 wd0 at pciide0 channel 1 drive 0: <WDC WD800JD-23JNA1> wd0: 16-sector PIO, LBA, 76324MB, 156312576 sectors wd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 5 "Intel 6300ESB SMBus" rev 0x02 at pci0 dev 31 function 3 not configured isa0 at ichpcib0 isadma0 at isa0 pckbc0 at isa0 port 0x60/5 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0 (mux 1 ignored for console): console keyboard, using wsdisplay0 pcppi0 at isa0 port 0x61 midi0 at pcppi0: <PC speaker> sysbeep0 at pcppi0 lm0 at isa0 port 0x290/8: W83627HF npx0 at isa0 port 0xf0/16: using exception 16 pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec biomask f7e5 netmask ffed ttymask ffef pctr: user-level cycle counter enabled dkcsum: wd0 matched BIOS disk 80 root on wd0a rootdev=0x0 rrootdev=0x300 rawdev=0x302

    DMESG - PIII Simple Configuration

    OpenBSD 3.5 (GENERIC) #6: Fri Oct 28 17:07:09 EST 2005 root@faker-dmz:/usr/src/sys/arch/i386/compile/GENERIC cpu0: Intel Pentium III ("GenuineIntel" 686-class) 599 MHz cpu0: FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE real mem = 133791744 (130656K) avail mem = 117886976 (115124K) using 1658 buffers containing 6791168 bytes (6632K) of memory mainbus0 (root) bios0 at mainbus0: AT/286+(f1) BIOS, date 02/23/00, BIOS32 rev. 0 @ 0xfd7a0 apm0 at bios0: Power Management spec V1.2 apm0: AC on, battery charge unknown pcibios0 at bios0: rev. 2.1 @ 0xfd7a0/0x860 pcibios0: PCI IRQ Routing Table rev. 1.0 @ 0xfdf30/176 (9 entries) pcibios0: PCI Interrupt Router at 000:07:0 ("Intel 82371FB ISA" rev 0x00) pcibios0: PCI bus #1 is the last bus bios0: ROM list: 0xc0000/0x8000 0xc8000/0x800 0xe0000/0x4000! 0xe4000/0xc000 pci0 at mainbus0 bus 0: configuration mode 1 (no bios) pchb0 at pci0 dev 0 function 0 "Intel 82443BX AGP" rev 0x03 ppb0 at pci0 dev 1 function 0 "Intel 82443BX AGP" rev 0x03 pci1 at ppb0 bus 1 vga1 at pci1 dev 0 function 0 "Matrox MGA G400/G450 AGP" rev 0x04 wsdisplay0 at vga1: console (80x25, vt100 emulation) wsdisplay0: screen 1-5 added (80x25, vt100 emulation) pcib0 at pci0 dev 7 function 0 "Intel 82371AB PIIX4 ISA" rev 0x02 pciide0 at pci0 dev 7 function 1 "Intel 82371AB IDE" rev 0x01: DMA, channel 0 wired to compatibility, channel 1 wired to compatibility wd0 at pciide0 channel 0 drive 0: <ST310212A> wd0: 32-sector PIO, LBA, 9768MB, 20005650 sectors wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2 atapiscsi0 at pciide0 channel 1 drive 0 scsibus0 at atapiscsi0: 2 targets cd0 at scsibus0 targ 0 lun 0: <ATAPI, CD-ROM DRIVE-40X, N0CP> SCSI0 5/cdrom removable cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2 uhci0 at pci0 dev 7 function 2 "Intel 82371AB USB" rev 0x01: irq 9 usb0 at uhci0: USB revision 1.0 uhub0 at usb0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered "Intel 82371AB Power Mgmt" rev 0x02 at pci0 dev 7 function 3 not configured yds0 at pci0 dev 12 function 0 "Yamaha 740C" rev 0x03: irq 10 ac97: codec id 0x41445303 (Analog Devices AD1819) ac97: codec features Analog Devices Phat Stereo audio0 at yds0 xl0 at pci0 dev 14 function 0 "3Com 3c905C 100Base-TX" rev 0x74: irq 5 address 00:01:02:53:4f:04 exphy0 at xl0 phy 24: Broadcom 3C905C internal PHY, rev. 6 isa0 at pcib0 isadma0 at isa0 pckbc0 at isa0 port 0x60/5 pckbd0 at pckbc0 (kbd slot) pckbc0: using irq 1 for kbd slot wskbd0 at pckbd0: console keyboard, using wsdisplay0 pcppi0 at isa0 port 0x61 midi0 at pcppi0: <PC speaker> sysbeep0 at pcppi0 lpt0 at isa0 port 0x378/4 irq 7 npx0 at isa0 port 0xf0/16: using exception 16 pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo pccom1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec opl0 at yds0: model OPL3 midi1 at opl0: <DS-1 integrated Yamaha OPL3> mpu at yds0 not configured mpu at yds0 not configured mpu at yds0 not configured mpu at yds0 not configured biomask c240 netmask c260 ttymask c2e2 pctr: 686-class user-level performance counters enabled mtrr: Pentium Pro MTRR support dkcsum: wd0 matched BIOS disk 80 root on wd0a rootdev=0x0 rrootdev=0x300 rawdev=0x302

  • Author: Matt Bradford <openbsd at mattbradford dot com>
    Created: 2 Nov, 2005
    Modified: 2 Jan, 2006