Mid-Michigan's Bigger Dealer With Better Deals
This started off as a simple architecture change from Fast Ethernet to Gigabit Ethernet but it has snowballed into the network from hell. Adding to the confusion is the provider having issues. I have two issues that make no sense so bear with me a bit.
We are using redundant F5 BigIP's in an active/standby manner for local load balancing at our datacenters. We have successfully upgraded one of our datacenters to Gigabit. Went to do the same at another datacenter and the Big IP has physical problems with the upgrade. So we rolled back to the Fast Ethernet not an issue. Next day the provider starts having all sorts of router(Cisco 12000 Series) issues and still is to this day. Our HSRP layer 3 interface is on the 2 Cisco 12000 but our physical interface is on a set of 6509's.
Since we have rolled back to the Fast Ethernet original architecture and machines we are experiencing the following. Approximately 50-60% of all incoming packets are replayed or echoed. This is causing a duplicate IP error message to show up. When using a sniffer no internal traffic is experiencing this issue. In fact today I had the provider shutdown our interface on the 12000's and the issue disappearred. Internal traffic flowed fine with our issue when he enabled just one interface the first approx 400 packets were all duplicated with the error messages. I pulled some of the packets apart hoping to find at least a different MAC address but the packets are 100% identical down to the checksums.
When attempting the upgrade to the Gigabit architecture we change the following: The redundant drops move from a 6509 to a meshed set of Cisco 3508G's. The external side of the BigIP's move from a layer 2 VLAN on the 6509 to the 3508G's. The internal side of the BigIP's move from a Fast Ethernet port to a Gigabit port in the same Layer 2 VLAN. The drops from the provider move from the Fast Ethernet ports to Gigabit ports on the provider's 6509s (one drop per 6509, two drops in total) in the same Layer 2 VLAN. The provider's 6509 trunks all the VLAN's to the redundant 12000's for Layer 3 traffic.
With the architecture change out of the way the issue is this: Everything works fine until failover. At failover everything breaks the BigIP's switch roles from active to standby and vice versa. The problem is the traffic still goes to the formerly active BigIP. At first we thought this was an ARP issue so we had the provider lower the ARP cache time on the 12000's to 5 seconds. No dice, traffic still flows to the standyby (formerly active box). F5, Cisco and the provider all claim there gear is working correctly too. We did a TCPDUMP on both BigIP's and here is the kicker, the Destination MAC address is the correct MAC address but the traffic goes to the other BigIP which has a different MAC address. Also, we see Issue 1 in this setup so getting around Issue 1 by fixing Issue 2 won't work.
If anyone has any ideas let me know. I'm all out. Some of things tried is just using one 3508G and having everything go through it, didn't work. Just about variation has been attempted. I'm at home now. To much security info to edit out and post packet captures but I may, if i get the time, post an edited Visio tomorrow.
I didn't find the right solution from the internet.