I recently got the opportunity to take over an HA FortiGate cluster that has over 5900 policies, thousands of objects, hundreds of IP Pools, even more VIPs, and a plethora of configuration issues. Today the engineering team of this firewall performed a firmware update to push it from 5.2.9 to 5.2.10. This is a regular task that is performed on a regular basis by these guys.
I hop out of the shower this morning cold because I ran out of hot water and groggy. I glanced at my phone to notice that I had 900 million missed called. The world had ended at this organization and I was the last to know.
I book it to the office and plop down to assess the damage. The firmware upgrade took just fine but no traffic was traversing the Gate. The team had rolled back the configuration AND the firmware by the time I arrived. Still no traffic moving.
I checked the usual suspects and was not immediately able to determine the cause. After all, nothing had changed. After the roll back the device was back to the way it was previously but the issue persisted.
Verified the gateway was up (could ping it from the Gate) and verified the inside interfaces were up. This drove us to have a colleague run a continuous ping while we diag deb flowed the traffic and watched. I’ll be damn. It isn’t routing out. Default route was there though, this IP for the next hop should have easily been located. Router details for the gateway failed though.
Decided to grep from the full config for the ip of the gateway and guess what. Someone placed an IP Pool on the Gate a very long time ago that was the gateway IP. Apparently, during the upgrade this Pool was FINALLY realized by the Gate and caused ARP to thing the gateway of the Gate was the Gate itself. Take note with this configured this way things should have never worked (before or after the firmware upgrade).
Talk about an interesting day!