High Availability
FortiGate-7000 supports a variation of active-passive FortiGate Clustering Protocol (FGCP) high availability between two identical FortiGate-7000 chassis. With active-passive FortiGate-7000 HA, you create redundant network connections to two identical FortiGate-7000s and add redundant HA heartbeat connections. Then you configure the FIM interface modules for HA. A cluster forms and a primary chassis is selected.
Example FortiGate-7040
All traffic is processed by the primary (or master) chassis. The backup chassis operates in hot standby mode. The configuration, active sessions, routing information, and so on is synchronized to the backup chassis. If the primary chassis fails, traffic automatically fails over to the backup chassis.
The primary chassis is selected based on a number of criteria including the configured priority, the bandwidth, the number of FIM interface failures, and the number of FPM or FIM modules that have failed. As part of the HA configuration you assign each chassis a chassis ID and you can set the priority of each FIM interface module and configure module failure tolerances and the link failure thresholds.
Before you begin configuring HA
Before you begin, the chassis should be running the same FortiOS firmware version and interfaces should not be configured to get their addresses from DHCP or PPPoE. Register and apply licenses to the each FortiGate-7000 Connect the M1 and M2 interfaces for HA heartbeat communication
before setting up the HA cluster. This includes licensing for FortiCare, IPS, AntiVirus, Web Filtering, Mobile Malware, FortiClient, FortiCloud, and additional virtual domains (VDOMs). Both FortiGate-7000s in the cluster must have the same level of licensing for FortiGuard, FortiCloud, FortiClient, and VDOMs. FortiToken licenses can be added at any time because they are synchronized to all cluster members.
If required, you should configure split ports on the FIMs on both chassis before configuring HA because the modules have to reboot after split ports is configured. For example, to split the C1, C2, and C4 interfaces of an FIM-7910E in slot 1, enter the following command:
config system global set split-port 1-C1 2-C1 2-C4
end
After configuring split ports, the chassis reboots and the configuration is synchronized.
On each chassis, make sure configurations of the modules are synchronized before starting to configure HA. You can use the following command to verify that the configurations of all of the modules are synchronized: diagnose sys confsync chsum | grep all all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e all: c0 68 d2 67 e1 23 d9 3a 10 50 45 c5 50 f1 e6 8e
If the modules are synchronized, the checksums displayed should all be the same.
You can also use the following command to list the modules that are synchronized. The example output shows all four FIM modules have been configured for HA and added to the cluster.
diagnose sys configsync status | grep in_sync
Master, uptime=692224.19, priority=1, slot_1d=1:1, idx=0, flag=0x0, in_sync=1
Slave, uptime=676789.70, priority=2, slot_1d=1:2, idx=1, flag=0x0, in_sync=1
Slave, uptime=692222.01, priority=17, slot_1d=1:4, idx=2, flag=0x64, in_sync=1
Slave, uptime=692271.30, priority=16, slot_1d=1:3, idx=3, flag=0x64, in_sync=1 In this command output in_sync=1 means the module is synchronized with the primary unit and in_sync=0 means the module is not synchronized.
Connect the M1 and M2 interfaces for HA heartbeat communication
HA heartbeat communication between chassis happens over the 10Gbit M1 and M2 interfaces of the FIM modules in each chassis. To set up HA heartbeat connections:
l Connect the M1 interfaces of all FIM modules together using a switch. l Connect the M2 interfaces of all FIM modules together using another switch.
All of the M1 interfaces must be connected together with a switch and all of the M2 interfaces must be connected together with another switch. Connecting M1 interfaces or M2 interfaces directly is not supported as each FIM needs to communicate with all other FIMs.
Connect the M1 and M2 interfaces for HA heartbeat communication
Heartbeat packets are VLAN packets with VLAN ID 999 and ethertype 9890. The MTU value for the M1 and M2 interfaces is 1500. You can use the following commands to change the HA heartbeat packet VLAN ID and ethertype values if required for your switches. You must change these settings on each FIM interface module. By default the M1 and M2 interface heartbeat packets use the same VLAN IDs and ethertypes.
config system ha set hbdev-vlan-id <vlan> set hbdev-second-vlan-id <vlan> set ha-eth-type <eth-type> end
Using separate switches for M1 and M2 is recommended for redundancy. It is also recommended that these switches be dedicated to HA heartbeat communication and not used for other traffic.
If you use the same switch for both M1 and M2, separate the M1 and M2 traffic on the switch and set the heartbeat traffic on the M1 and M2 Interfaces to have different VLAN IDs. For example, use the following command to set the heartbeat traffic on M1 to use VLAN ID 777 and the heartbeat traffic on M2 to use VLAN ID 888:
config system ha
set hbdev-vlan-id 777
set hbdev-second-vlan-id 888 end
If you don’t set different VLAN IDs for the M1 and M2 heartbeat packets q-in-q must be enabled on the switch.
Sample switch configuration for a Cisco Catalyst switch. This configuration sets the interface speeds, configures the switch to allow vlan 999, and enables trunk mode:
##interface config interface TenGigabitEthernet1/0/5 description Chassis1 FIM1 M1 switchport trunk allowed vlan 999
switchport mode trunk
If you are using one switch for both M1 and M2 connections, the configuration would be the same except you would add q-in-q support and two different VLANs, one for M1 traffic and one for M2 traffic.
HA configuration
For the M1 connections:
interface Ethernet1/5 description QinQ Test switchportmode dot1q-tunnel switchport access vlan 888 spanning-tree port type edge
For the M2 connections:
interface Ethernet1/5 description QinQ Test switchport mode dot1q-tunnel switchport access vlan 880 spanning-tree port type edge
HA packets must have the configured VLAN tag (default 999). If the switch removes or changes this tag, HA heartbeat communication will not work and the cluster will form a split brain configuration. In effect two clusters will form, one in each chassis, and network traffic will be disrupted.
HA configuration
Use the following steps to setup the configuration for HA between two chassis (chassis 1 and chassis 2). These steps are written for a set of two FortiGate-7040E or 7060Es. The steps are similar for the FortiGate-7030E except that each FortiGate-7030E only has one FIM interface module.
Each FIM interface module has to be configured for HA separately. The HA configuration is not synchronized among FIMs. You can begin by setting up chassis 1 and setting up HA on both of the FIM interfaces modules in it. Then do the same for chassis 2.
Each of the FortiGate-7000s is assigned a chassis ID (1 and 2). These numbers just allow you to identify the chassis and do not influence primary unit selection.
Setting up HA on the FIM interface modules in the first FortiGate-7000 (chassis 1)
- Log into the CLI of the FIM interface module in slot 1 (FM01) and enter the following command:
config system ha set mode a-p set password <password> set group-id <id> set chassis-id 1 set hbdev M1/M2
end
This adds basic HA settings to this FIM interface module.
- Repeat this configuration on the FIM interface module in slot 2 (FIM02).
config system ha set mode a-p set password <password> set group-id <id> set chassis-id 1 set hbdev M1/M2
HA configuration
end
- From either FIM interface module, enter the following command to confirm that the FortiGate-7000 is in HA mode:
diagnose sys ha status
The password and group-id are unique for each HA cluster and must be the same on all FIM modules. If a cluster does not form, one of the first things to check are groupd-id and re-enter the password on both FIM interface modules.
Configure HA on the FIM interface modules in the second FortiGate-7000 (chassis 2)
- Repeat the same HA configuration settings on the FIM interfaces modules in the second chassis except set the chassis ID to 2.
config system ha set mode a-p set password <password> set group-id <id> set chassis-id 2 set hbdev M1/M2
end
- From any FIM interface module, enter the following command to confirm that the cluster has formed and all of the FIM modules have been added to it:
diagnose sys ha status
The cluster has now formed and you can add the configuration and connect network equipment and start operating the cluster. You can also modify the HA configuration depending on your requirements.
Verifying that the cluster is operating correctly
Enter the following CLI command to view the status of the cluster. You can enter this command from any module’s CLI. The HA members can be in a different order depending on the module CLI from which you enter the command.
If the cluster is operating properly the following command output should indicate the primary and backup (master and slave) chassis as well as primary and backup (master and slave) modules. For each module, the state portion of the output shows all the parameters used to select the primary FIM module. These parameters include the number FPM modules that the FIM module is connecting to that have failed, the status of any link aggregation group (LAG) interfaces in the configuration, the state of the interfaces in the FIM module, the traffic bandwidth score for the FIM module (the higher the traffic bandwidth score the more interfaces are connected to networks, and the status of the management links.
diagnose sys ha status
==========================================================================
Current slot: 1 Module SN: FIM04E3E16000085
Chassis HA mode: a-p
Chassis HA information:
[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FG74E83E16000015: Slave, serialno_prio=1, usr_priority=128, hostname=CH15
FG74E83E16000016: Master, serialno_prio=0, usr_priority=127, hostname=CH16
HA member information:
CH16(FIM04E3E16000085), Master(priority=0), uptime=78379.78, slot=1, chassis=2(2)
HA management configuration
slot: 1, chassis_uptime=145358.97, more: cluster_id:0, flag:1, local_priority:0, usr_priority:127, usr_override:0 state: worker_failure=0/2, lag=(total/good/down/bad-score)=5/5/0/0, intf_state=(port up)=0, force-state(1:force-to-master) traffic-bandwidth-score=120, mgmt-link=1
hbdevs: local_interface= 1-M1 best=yes local_interface= 1-M2 best=no
ha_elbc_master: 3, local_elbc_master: 3
CH15(FIM04E3E16000074), Slave(priority=2), uptime=145363.64, slot=1, chassis=1(2) slot: 1, chassis_uptime=145363.64, more: cluster_id:0, flag:0, local_priority:2, usr_priority:128, usr_override:0 state: worker_failure=0/2, lag=(total/good/down/bad-score)=5/5/0/0, intf_state=(port up)=0, force-state(-1:force-to-slave) traffic-bandwidth-score=120, mgmt-link=1
hbdevs: local_interface= 1-M1 last_hb_time=145640.39 status=alive local_interface= 1-M2 last_hb_time=145640.39 status=alive
CH15(FIM10E3E16000040), Slave(priority=3), uptime=145411.85, slot=2, chassis=1(2) slot: 2, chassis_uptime=145638.51, more: cluster_id:0, flag:0, local_priority:3, usr_priority:128, usr_override:0 state: worker_failure=0/2, lag=(total/good/down/bad-score)=5/5/0/0, intf_state=(port up)=0, force-state(-1:force-to-slave) traffic-bandwidth-score=100, mgmt-link=1
hbdevs: local_interface= 1-M1 last_hb_time=145640.62 status=alive local_interface= 1-M2 last_hb_time=145640.62 status=alive
CH16(FIM10E3E16000062), Slave(priority=1), uptime=76507.11, slot=2, chassis=2(2) slot: 2, chassis_uptime=145641.75, more: cluster_id:0, flag:0, local_priority:1, usr_priority:127, usr_override:0 state: worker_failure=0/2, lag=(total/good/down/bad-score)=5/5/0/0, intf_state=(port up)=0, force-state(-1:force-to-slave) traffic-bandwidth-score=100, mgmt-link=1
hbdevs: local_interface= 1-M1 last_hb_time=145640.39 status=alive local_interface= 1-M2 last_hb_time=145640.39 status=alive
HA management configuration
In HA mode, you should connect the interfaces in the mgmt 802.3 static aggregate interfaces of both chassis to the same switch. You can create one aggregate interface on the switch and connect both chassis management interfaces to it.
Managing individual modules in HA mode
When you browse to the system management IP address you connect to the primary FIM interface module in the primary chassis. Only the primary FIM interface module responds to management connections using the system management IP address. If a failover occurs you can connect to the new primary FIM interface module using the same system management IP address.
Managing individual modules in HA mode
In some cases you may want to connect to an individual FIM or FPM module in a specific chassis. For example, you may want to view the traffic being processed by the FPM module in slot 3 of chassis 2. You can connect to the GUI or CLI of individual modules in the chassis using the system management IP address with a special port number.
For example, if the system management IP address is 1.1.1.1 you can browse to https://1.1.1.1:44323 to connect to the FPM module in chassis 2 slot 3. The special port number (in this case 44323) is a combination of the service port, chassis ID, and slot number. The following table lists the special ports for common admin protocols:
FortiGate-7000 HA special administration port numbers
Chassis and Slot Number | Slot Address | HTTP
(80) |
HTTPS (443) | Telnet
(23) |
SSH (22) | SNMP (161) |
Ch1 slot 5 | FPM05 | 8005 | 44305 | 2305 | 2205 | 16105 |
Ch1 slot 3 | FPM03 | 8005 | 44303 | 2303 | 2203 | 16103 |
Ch1 slot 1 | FIM01 | 8003 | 44301 | 2301 | 2201 | 16101 |
Ch1 slot 2 | FIM02 | 8002 | 44302 | 2302 | 2202 | 16102 |
Ch1 slot 4 | FPM04 | 8004 | 44304 | 2304 | 2204 | 16104 |
Ch1 slot 6 | FPM06 | 8006 | 44306 | 2306 | 2206 | 16106 |
Ch2 slot 5 | FPM05 | 8005 | 44325 | 2325 | 2225 | 16125 |
Ch2 slot 3 | FPM03 | 8005 | 44323 | 2323 | 2223 | 16123 |
Ch2 slot 1 | FIM01 | 8003 | 44321 | 2321 | 2221 | 16121 |
Ch2 slot 2 | FIM02 | 8002 | 44322 | 2322 | 2222 | 16122 |
Ch2 slot 4 | FPM04 | 8004 | 44324 | 2324 | 2224 | 16124 |
Ch2 slot 6 | FPM06 | 8006 | 44326 | 2326 | 2226 | 16126 |
For example:
Firmware upgrade
l To connect to the GUI of the FPM module in chassis 1 slot 3 using HTTPS you would browse to https://1.1.1.1:44313. l To send an SNMP query to the FPM module in chassis 2 slot 6 use the port number 16126.
The formula for calculating the special port number is based on Chassis ID. CH1 = Chassis ID1, CH2 = Chassis ID2. The formula is: service_port x 100 + (chassis_id – 1) x 20 + slot_id.
Firmware upgrade
All of the modules in a FortiGate-7000 HA cluster run the same firmware image. You upgrade the firmware from the GUI or CLI by logging into the primary FIM interface module using the system management IP address and uploading the firmware image.
If uninterruptable-upgrade and session-pickup are enabled, firmware upgrades should only cause a minimal traffic interruption. Use the following command to enable these settings (they should be enabled by default). These settings are synchronized to all modules in the cluster. config system ha set uninterruptable-upgrade enable set session-pickup enable
end
When enabled, the primary FIM interface module uploads firmware to all modules, but in this case, the modules in the backup chassis install their new firmware and reboot and rejoin the cluster and resynchronize.
Then all traffic fails over to the backup chassis which becomes the new primary chassis. Then the modules in the new backup chassis upgrade their firmware and rejoin the cluster. Unless override is enabled, the new primary chassis continues to operate as the primary chassis.
Normally you would want to enable uninterruptable-upgrade to minimize traffic interruptions. But unterruptable-upgrade does not have to be enabled. In fact, if a traffic interruption is not going to cause any problems you an disable unterruptable-upgrade so that the firmware upgrade process takes less time.
Session failover (session-pickup)
Session failover means that after a failover, communications sessions resume on the new primary FortiGate7000 with minimal or no interruption. Two categories of sessions need to be resumed after a failover:
l Sessions passing through the cluster l Sessions terminated by the cluster
Session failover (also called session-pickup) is not enabled by default for FortiGate-7000 HA. If sessions pickup is enabled, while the FortiGate-7000 HA cluster is operating the primary FortiGate-7000 informs the backup FortiGate-7000 of changes to the primary FortiGate-7000 connection and state tables for TCP and UDP sessions passing through the cluster, keeping the backup FortiGate-7000 up-to-date with the traffic currently being processed by the cluster.
Session failover (session-pickup)
After a failover the new primary FortiGate-7000 recognizes open sessions that were being handled by the cluster. The sessions continue to be processed by the new primary FortiGate-7000 and are handled according to their last known state.
Session-pickup has some limitations. For example, session failover is not supported for sessions being scanned by proxy-based security profiles. Session failover is supported for sessions being scanned by flow-based security profiles; however, flowbased sessions that fail over are not inspected after they fail over.
Session terminated by the cluster include management sessions (such as HTTPS connections to the FortiGate
GUI or SSH connection to the CLI as well as SNMP and logging and so on). Also included in this category are IPsec VPN, SSL VPN, sessions terminated by the cluster, and explicit proxy sessions. In general, whether or not session-pickup is enabled, these sessions do not failover and have to be restarted.
Enabling session pickup for TCP and UDP
To enable session-pickup, from the CLI enter:
config system ha set session-pickup enable
end
When session-pickup is enabled, sessions in the primary FortiGate-7000 TCP and UDP session tables are synchronized to the backup FortiGate-7000. As soon as a new TCP or UDP session is added to the primary FortiGate-7000 session table, that session is synchronized to the backup FortiGate-7000. This synchronization happens as quickly as possible to keep the session tables synchronized.
If the primary FortiGate-7000 fails, the new primary FortiGate-7000 uses its synchronized session tables to resume all TCP and UDP sessions that were being processed by the former primary FortiGate-7000 with only minimal interruption. Under ideal conditions all TCP and UDP sessions should be resumed. This is not guaranteed though and under less than ideal conditions some sessions may need to be restarted.
If session pickup is disabled
If you disable session pickup, the FortiGate-7000 HA cluster does not keep track of sessions and after a failover, active sessions have to be restarted or resumed. Most session can be resumed as a normal result of how TCP and UDP resumes communication after any routine network interruption.
If you do not require session failover protection, leaving session pickup disabled may reduce CPU usage and reduce HA heartbeat network bandwidth usage. Also if your FortiGate-7000 HA cluster is mainly being used for traffic that is not synchronized (for example, for proxy-based security profile processing) enabling session pickup is not recommended since most sessions will not be failed over anyway.
Primary unit selection and failover criteria
Primary unit selection and failover criteria
Once two FortiGate-7000s recognize that they can form a cluster, they negotiate to select a primary chassis. Primary selection occurs automatically based on the criteria shown below. After the cluster selects the primary, the other chassis becomes the backup.
Negotiation and primary chassis selection also takes place if the one of the criteria for selecting the primary chassis changes. For example, an interface can become disconnected or module can fail. After this happens, the cluster can renegotiate to select a new primary chassis also using the criteria shown below.
Primary unit selection and failover criteria
How link and module failures affect primary chassis selection
If there are no failures and if you haven’t configured any settings to influence primary chassis selection, the chassis with the highest serial number to becomes the primary chassis.
Using the serial number is a convenient way to differentiate FortiGate-7000 chassis; so basing primary chassis selection on the serial number is predictable and easy to understand and interpret. Also the chassis with the highest serial number would usually be the newest chassis with the most recent hardware version. In many cases you may not need active control over primary chassis selection, so basic primary chassis selection based on serial number is sufficient.
In some situations you may want have control over which chassis becomes the primary chassis. You can control primary chassis selection by setting the priority of one chassis to be higher than the priority of the other. If you change the priority of one of the chassis, during negotiation, the chassis with the highest priority becomes the primary chassis. As shown above, FortiGate-7000 FGCP selects the primary chassis based on priority before serial number. For more information about how to use priorities, see High Availability on page 57.
Chassis uptime is also a factor. Normally when two chassis start up their uptimes are similar and do not affect primary chassis selection. However, during operation, if one of the chassis goes down the other will have a much higher uptime and will be selected as the primary chassis before priorty and serial number are tested.
Verifying primary chassis selection
You can use the diagnose sys ha status command to verify which chassis has become the primary chassis as shown by the following command output example. This output also shows that the chassis with the highest serial number was selected to be the primary chassis.
diagnose sys ha status
==========================================================================
Current slot: 1 Module SN: FIM04E3E16000085 Chassis HA mode: a-p
Chassis HA information:
[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FG74E83E16000015: Slave, serialno_prio=1, usr_priority=128, hostname=CH15
FG74E83E16000016: Master, serialno_prio=0, usr_priority=127, hostname=CH16
How link and module failures affect primary chassis selection
The total number of connected data interfaces in a chassis has a higher priority than the number of failed modules in determining which chassis in a FortiGate-7000 HA configuration is the primary chassis. For example, if one chassis has a failed FPM module and the other has a disconnected or failed data interface, the chassis with the failed processor module becomes the primary unit.
For another example, the following diagnose sys ha status command shows the HA status for a cluster where one chassis has a disconnected or failed data interface and the other chassis has a failed FPM module.
diagnose sys ha status
==========================================================================
Slot: 2 Module SN: FIM01E3E16000088 Chassis HA mode: a-p
Chassis HA information:
How link and module failures affect primary chassis selection
[Debug_Zone HA information]
HA group member information: is_manage_master=1.
FG74E33E16000027: Master, serialno_prio=0, usr_priority=128, hostname=Chassis-K FG74E13E16000072: Slave, serialno_prio=1, usr_priority=128, hostname=Chassis-J
HA member information:
Chassis-K(FIM01E3E16000088), Slave(priority=1), uptime=2237.46, slot=2, chassis=1(1) slot: 2, chassis_uptime=2399.58,
state: worker_failure=1/2, lag=(total/good/down/bad-score)=2/2/0/0, intf_state=(port up)=0, force-state(0:none) traffic-bandwidth-score=20, mgmt-link=1
hbdevs: local_interface= 2-M1 best=yes local_interface= 2-M2 best=no
Chassis-J(FIM01E3E16000031), Slave(priority=2), uptime=2151.75, slot=2, chassis=2(1) slot: 2, chassis_uptime=2151.75,
state: worker_failure=0/2, lag=(total/good/down/bad-score)=2/2/0/0, intf_state=(port up)=0, force-state(0:none) traffic-bandwidth-score=20, mgmt-link=1
hbdevs: local_interface= 2-M1 last_hb_time= 2399.81 status=alive local_interface= 2-M2 last_hb_time= 0.00 status=dead
Chassis-J(FIM01E3E16000033), Slave(priority=3), uptime=2229.63, slot=1, chassis=2(1) slot: 1, chassis_uptime=2406.78,
state: worker_failure=0/2, lag=(total/good/down/bad-score)=2/2/0/0, intf_state=(port up)=0, force-state(0:none) traffic-bandwidth-score=20, mgmt-link=1
hbdevs: local_interface= 2-M1 last_hb_time= 2399.81 status=alive local_interface= 2-M2 last_hb_time= 0.00 status=dead
Chassis-K(FIM01E3E16000086), Master(priority=0), uptime=2203.30, slot=1, chassis=1(1) slot: 1, chassis_uptime=2203.30,
state: worker_failure=1/2, lag=(total/good/down/bad-score)=2/2/0/0, intf_state=(port up)=1, force-state(0:none) traffic-bandwidth-score=30, mgmt-link=1
hbdevs: local_interface= 2-M1 last_hb_time= 2399.74 status=alive local_interface= 2-M2 last_hb_time= 0.00 status=dead
This output shows that chassis 1 (hostname Chassis-K) is the primary or master chassis. The reason for this is that chassis 1 has a total traffic-bandwidth-score of 30 + 20 = 50, while the total trafficbandwidth-score for chassis 2 (hostname Chassis-J) is 20 + 20 = 40.
The output also shows that both FIM modules in chassis 1 are detecting a worker failure (worker_ failure=1/2) while both FIM modules in chassis 2 are not detecting a worker failure worker_ failure=0/2). The intf-state=(port up)=1 field shows that FIM module in slot 1 of chassis 1 has one more interface connected than the FIM module in slot 1 of chassis 2. It is this extra connected interface that gives the FIM module in chassis 1 slot 1 the higher traffic bandwidth score than the FIM module in slot 1 of chassis 2.
One of the interfaces on the FIM module in slot 1 of chassis 2 must have failed. In a normal HA configuration the FIM modules in matching slots of each chassis should have redundant interface connections. So if one module has fewer connected interfaces this indicates a link failure.
Link failure threshold and board failover tolerance
FIM module failures
If an FIM module fails, not only will HA recognize this as a module failure it will also give the chassis with the failed FIM module a much lower traffic bandwidth score. So an FIM module failure would be more likely to cause an HA failover than a FPM module failover.
Also, the traffic bandwidth score for an FIM module with more connected interfaces would be higher than the score for an FIM module with fewer connected interfaces. So if a different FIM module failed in each chassis, the chassis with the functioning FIM module with the most connected data interfaces would have the highest traffic bandwidth score and would become the primary chassis.
Management link failures
Management connections to a chassis can affect primary chassis selection. If the management connection to one chassis become disconnected a failover will occur and the chassis that still has management connections will become the primary chassis.
Link failure threshold and board failover tolerance
The default settings of the link failure threshold and the board failover tolerance result in the default link and module failure behavior. You can change these settings if you want to modify this behavior. For example, if you want a failover to occur if an FPM module fails, even if an interface has failed you can increase the board failover tolerance setting.
Link failure threshold
The link failure threshold determines how many interfaces in a link aggregation interface (LAG) can be lost before the LAG interface is considered down. The chassis with the most connected LAGs becomes the primary chassis. if a LAG goes down the cluster will negotiate and may select a new primary chassis. You can use the following command to change the link failure threshold:
config system ha set link-failure-threshold <threshold>
end
The threshold range is 0 to 80 and 0 is the default.
A threshold of 0 means that if a single interface in any LAG fails the LAG the considered down. A higher failure threshold means that more interfaces in a LAG can fail before the LAG is considered down. For example, if the threshold is set to 1, at least two interfaces will have to fail.
Board failover tolerance
You can use the following command to configure board failover tolerance.
config system ha set board-failover-tolerance <tolerance>
end
The tolerance range is 0 to 12, and 0 is the default.
Priority and primary chassis selection
A tolerance of 0 means that if a single module fails in the primary chassis a failover occurs and the chassis with the fewest failed modules becomes the new primary chassis. A higher failover tolerance means that more modules can fail before a failover occurs. For example, if the tolerance is set to 1, at least two modules in the primary chassis will have to fail before a failover occurs.
Priority and primary chassis selection
You can select a chassis to become the primary chassis by setting the HA priority of one of one or more of its FIM modules (for example, the FIM module in slot 1) higher than the priority of the other FIM modules. Enter the following command to set the HA priority:
config system ha set priority <number>
end
The default priority is 128.
The chassis with the highest total FIM module HA priority becomes the primary chassis.
Override and primary chassis selection
Enabling override changes the order of primary chassis selection. If override is enabled, primary chassis selection considers priority before chassis up tme and serial number. This means that if you set the device priority higher for one chassis, with override enabled this chassi becomes the primary chassis even if its uptime and serial number are lower than the other chassis.
Enter the following command to enable override.
config system ha set override enable
end
When override is enabled primary unit selection checks the traffic bandwidth score, aggregate interface state, management interface links and FPM module failures first. So any of these factors affect primary chassis selection, even if override is enabled.