distribute_HA_keepalived

Posted on 2021-06-15 Edited on 2023-09-14 In distribute , keepalived

Introduction

Load balancing is a method of distributing IP traffic across a cluster of real servers, providing one or more highly available virtual services. When designing load balanced topologies, it is important to account for the availability of the load balancer itself as well as the real servers behind it.

Keepalived provides frameworks for both load balancing and high availability. The load balancing framework relies on the well-known and widely used Linux Virtual Server (IPVS) kernel module, which provides Layer 4 load balancing. Keepalived implements a set of health checkers to dynamically and adaptively maintain and manage load balanced server pools according to their health.

high-availability is achieved by the Virtual Router Redundancy Protocol (VRRP). VRRP is a fundamental brick for router failover. In addition, Keepalived implements a set of hooks to the VRRP finite state machine providing low-level and high-speed protocol interactions. In order to offer fastest network failure detection, Keepalived implements the Bidirectional Forwarding Detection (BFD) protocol. VRRP state transition can take into account BFD hints to drive fast state transition. Keepalived frameworks can be used independently or all together to provide resilient infrastructures.

In short, Keepalived provides two main functions:

Health checking for LVS systems
Implementation of the VRRPv2 stack to handle load balancer failover

In this article, we only talk about high-availability， load balancer failover.

Inside keepalived

VRRP

The Virtual Router Redundancy Protocol (VRRP) is a computer networking protocol that provides for automatic assignment of available Internet Protocol (IP) routers to participating hosts. This increases the availability and reliability of routing paths via automatic default gateway selections on an IP subnetwork.

The protocol achieves this by creation of virtual routers, which are an abstract representation of multiple routers, i.e. Primary/Active and Secondary/Standby routers, acting as a group. The virtual router is assigned to act as a default gateway of participating hosts, instead of a physical router. If the physical router that is routing packets on behalf of the virtual router fails, another physical router is selected to automatically replace it. The physical router that is forwarding packets at any given time is called the Primary/Active router.

VRRP provides information on the state of a router, not the routes processed and exchanged by that router.

Physical routers within the virtual router must communicate within themselves using packets with multicast IP address 224.0.0.18(newly implementation support unicast heartbeat to peers) and IP protocol number 112.

Routers have a priority of between 1 and 254 and the router with the highest priority will become the Primary/Active. The default priority is 100.

Elections of Primary/Active routers

A failure to receive a multicast packet from the Primary/Active router for a period longer than three times the advertisement timer causes the Secondary/Standby routers to assume that the Primary/Active router is dead. The virtual router then transitions into an unsteady state and an election process is initiated to select the next Primary/Active router from the Secondary/Standby routers. This is fulfilled through the use of multicast packets.

Secondary/Standby router(s) are only supposed to send multicast packets during an election process. One exception to this rule is when a physical router is configured with a higher priority than the current Primary/Active, which means that on connection to the network it will preempt the Primary/Active status. This allows a system administrator to force a physical router to the Primary/Active state immediately after booting, for example when that particular router is more powerful than others within the virtual router. The Secondary/Standby router with the highest priority becomes the Primary/Active router by raising its priority above that of the current Primary/Active. It will then take responsibility for routing packets sent to the virtual gateway’s MAC address. In cases where Secondary/Standby routers all have the same priority, the Secondary/Standby router with the highest IP address becomes the Primary/Active router.

All physical routers acting as a virtual router must be in the same local area network (LAN) segment(newly implementation support unicast). Communication within the virtual router takes place periodically. This period can be adjusted by changing advertisement interval timers. The shorter the advertisement interval, the shorter the black hole period, though at the expense of more traffic in the subnet.

Once the new master has been elected, it sends out a “gratuitous ARP.”, every host has an ARP table that ties IP addresses to Ethernet addresses. A gratuitous ARP is an unsolicited message with an IP address to Ethernet address mapping. All hosts receiving the gratuitous ARP update their tables, which effectively means that the virtual IP address is owned by a new device on the network.

Note that whether we use VRRP in multicast or unicast mode, we are not using UDP/IP or TCP/IP. VRRP is its own protocol on top of IP that is independent of either of those

keepalived cases

Even keepalived supports nodes located at different subnet, but the best choice is to run them at same subnet.

different subnets nodes

# VRRP advertisements ordinarily go out over multicast. This 
# configuration paramter causes keepalived to send them
# as unicasts. This specification can be useful in environments
# where multicast isn't supported or in instances where you want
# to limit which devices see your VRRP announcements. The IP
# address(es) can be IPv4 or IPv6, and indicate the real IP of
# other members.
unicast_peer {
    10.5.132.122
}

nodes are at same subnet, no across router

Two nodes runs keepalived

# Ubuntu
node1# apt-get install -y keepalived
node2# apt-get install -y keepalived

# Centos7
node1# yum install -y keepalived
node2# yum install -y keepalived

# node1 keepalived conf

node1# cat /etc/keepalived/keepalived.conf
global_defs {                                                                   
   notification_email {                                                         
      jason_lkm@163.com
   }                                                                            
   notification_email_from keepalived@cyun.tech                        
   smtp_server 192.168.100.1                                                   
   smtp_connect_timeout 30                                                      
   router_id LVS_DEVEL                                                          
}
vrrp_instance VI_1 {
  state MASTER
  interface eth0
  virtual_router_id 51
  priority 100
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 12345
  }
  virtual_ipaddress {
    10.117.5.123 dev eth0
    #10.117.5.111 dev eth0
    # should be nginx VIP(it should be public IP in production), nginx should run at this node as well
    # (nginx can listen on any address even it's not local or exist on interface)
  }
}

node1# service keepalived restart

#node2 keepalived 
node2# cat /etc/keepalived/keepalived.conf

global_defs {                                                                   
   notification_email {                                                         
     jason_lkm@163.com
   }                                                                            
   notification_email_from keepalived@cyun.tech                        
   smtp_server 192.168.100.1                                                   
   smtp_connect_timeout 30                                                      
   router_id LVS_DEVEL                                                          
}
vrrp_instance VI_1 {
  state BACKUP
  interface eth0
  virtual_router_id 51
  priority 80
  advert_int 1
  authentication {
    auth_type PASS
    auth_pass 12345
  }
  virtual_ipaddress {
    10.117.5.123 dev eth0
  }
}
node2# service keepalived restart

vrrp_instance defines an individual instance of the VRRP protocol running on an interface.
state defines the initial state that the instance should start in, but may not be final state due to master selection algorithm.
interface defines the interface that VRRP runs on.
virtual_router_id is the unique identifier, should be same for the all nodes.
priority is the advertised priority used for master/slave election.
advert_int specifies the frequency that advertisements are sent at (1 second, in this case).
authentication specifies the information necessary for servers participating in VRRP to authenticate with each other. In this case, a simple password is defined.
virtual_ipaddress defines the IP addresses (there can be multiple) that VRRP is responsible for.

If you’re using a host-based firewall, such as firewalld or iptables, then you need to add the necessary rules to permit IP protocol 112 traffic.

Debug keepalived

# check virtual ip configured or not on master
$ ip addr show eth0
$ service keepalived status
$ tcpdump -i eth0 vrrp

# if virtual IP is not local
$ sysctl -w net.ipv4.ip_nonlocal_bind=1

# run keepalived in forground and detail logs!!!
$ keepalived -d -D -l -n

split-brain

In a highly available (HA) system, when the "heartbeat" linking the two nodes is disconnected, the HA system, which was originally a whole and coordinated in action, splits into two independent individuals. Since they lost contact with each other, they thought it was the other party that had malfunctioned. The HA software on the two nodes is like a “brain splitter”. If they compete for "shared resources" and compete for "application services", serious consequences will occur-or if the shared resources are divided and the "services" on both sides will not`. Coming; or both “services” are up, but at the same time reading and writing “shared storage”, resulting in data corruption (common errors such as online logs polled by the database).

Two active nodes, same virtual IP configured at differetn nodes

Why it happens

The heartbeat link between the pair of highly available servers fails, which prevents normal communication. If the heartbeat line is broken (including broken, aging).
Because the network card and related drivers are broken, IP configuration and conflict problems (network card direct connection).
Due to the failure of the equipment connected between the heartbeat cables (network card and switch).
There is a problem with the arbitration machine (using the arbitration scheme).
The iptables firewall is turned on on the high availability server to block the transmission of heartbeat messages.
In the same VRRP instance in the Keepalived configuration, if the virtual_router_id parameter settings on both ends are inconsistent, split-brain problems can also occur.
vrrp instance names are inconsistent and their priorities are the same

avoid it

Add redundant heartbeat wires, for example: double-line wires (heartbeat wires are also HA), to minimize the occurrence of “split brain”
Enable disk lock. The serving party locks the shared disk, and when the “split brain” occurs, let the other party completely “snatch away” the shared disk resources. But there is also a problem with using locked disks. If the party occupying the shared disk does not actively “unlock” it, the other party will never get the shared disk. In reality, if the service node suddenly crashes or crashes, it is impossible to execute the unlock command. The backup node cannot take over shared resources and application services. So someone designed a “smart” lock in HA. That is: the party that is serving only enables the disk lock when it finds that the heartbeat line is all disconnected (the peer end is not detected). Usually it is not locked.
Set up an arbitration mechanism. For example, set the reference IP (such as the gateway IP). When the heartbeat line is completely disconnected, both nodes ping the reference IP. If they fail, the breakpoint is at the local end. Not only the “heartbeat”, but also the external “service” of the local network link is broken, even if the application service is started (or continued) is useless, then actively give up competition and let the end that can ping the reference IP to start the service . More secure, the party that cannot ping the reference IP simply restarts itself to completely release the shared resources that may be occupied.
Script detection and alarm

The last two are commonly used in production env

troubleshooting

Check why

first make sure, config is correct, check /etc/keepalived/keepalived.conf
check route is ok
check iptables to allow vrrp