Linux High Availability and Load Balancing in Practice — From Keepalived to Performance Tuning
Introduction
In modern enterprise application architectures, high availability and load balancing are key technologies for ensuring stable system operation. This article provides a detailed introduction to achieving dual-machine hot standby with Keepalived, building internal network service load balancing with HAProxy, and resolving NIC soft interrupt issues for network performance optimization. Through real-world cases and detailed configuration explanations, this article helps readers understand the core principles and practical applications of these technologies.
Keepalived Dual-Machine Hot Standby Deployment
VRRP Principles
The Virtual Router Redundancy Protocol (VRRP) is a protocol used to achieve router high availability. In a VRRP architecture, multiple routers form a virtual router group, where one serves as the MASTER router handling actual network traffic, and the others serve as BACKUP routers. When the MASTER router fails, a BACKUP router immediately takes over, ensuring continuity of network services.
Core VRRP concepts include:
- Virtual Router ID (VRID): Identifies a virtual router group
- Virtual IP Address: The IP address provided by the virtual router to the outside
- Priority: Determines the router’s role in the group; higher values indicate higher priority
- Advertisement Interval: The time interval between heartbeat messages sent by routers
- Authentication Mechanism: Ensures only authorized routers can join the virtual router group
In our architecture, multiple VRRP instances are configured to achieve high availability for different network segments:
flowchart TD
A@{ shape: rounded, label: "Virtual Router Group" } --> B@{ shape: rounded, label: "External Network VRRP Instance" }
A --> C@{ shape: rounded, label: "Internal Network VRRP Instance" }
B --> D@{ shape: rounded, label: "Server-1 Master" }
B --> E@{ shape: rounded, label: "Server-2 Backup" }
C --> F@{ shape: rounded, label: "Server-1 Master" }
C --> G@{ shape: rounded, label: "Server-2 Backup" }
D --> H@{ shape: hex, label: "Floating IP1: 192.168.1.66" }
D --> I@{ shape: hex, label: "Floating IP2: 192.168.1.67" }
E --> H
E --> I
F --> J@{ shape: hex, label: "Floating IP3: 192.168.2.12" }
F --> K@{ shape: hex, label: "Floating IP4: 192.168.2.16" }
G --> J
G --> K
classDef primary fill:#e3f2fd,stroke:#1976d2
classDef network fill:#fff3e0,stroke:#ff9800
classDef alert fill:#ffebee,stroke:#f44336
class A,B,C primary
class D,E,F,G network
class H,I,J,K alertComplete Configuration Details
Server-1 Configuration
| |
Server-2 Configuration
| |
Configuration Key Points
Priority Settings: Server-1 is set to 100 in the external network VRRP instances (EX_1) and internal network VRRP instances (INT_1), serving as the master node; Server-2 is set to 100 in EX_2 and INT_2, serving as the master node. This design achieves load balancing while ensuring high availability.
Non-Preempt Mode: The
#nopreemptcomment in the configuration indicates the use of non-preempt mode, meaning the master node will not immediately reclaim resources after recovery, avoiding network flapping.Dual Network Architecture: Separately handles external network (EX instances) and internal network (INT instances) traffic, achieving network isolation and independent high availability.
Authentication Mechanism: Uses simple password authentication to ensure VRRP communication security.
flowchart TD
A@{ shape: rounded, label: "Business System Client" } --> B@{ shape: rounded, label: "External Network" }
B --> C@{ shape: hex, label: "Floating IP 192.168.1.66-69" }
C --> D@{ shape: hex, label: "Keepalived VIP" }
E@{ shape: rounded, label: "Internal Business System" } --> F@{ shape: rounded, label: "Internal Network" }
F --> G@{ shape: hex, label: "Floating IP 192.168.2.12-23" }
G --> H@{ shape: hex, label: "Keepalived VIP" }
D --> I@{ shape: rounded, label: "Server-1/Server-2" }
H --> I
I --> J@{ shape: rounded, label: "Business System Services" }
classDef primary fill:#e3f2fd,stroke:#1976d2
classDef network fill:#fff3e0,stroke:#ff9800
classDef alert fill:#ffebee,stroke:#f44336
class A,E,B,F,I,J primary
class C,G,D,H alertHAProxy Internal Network Service Load Balancing
TCP Mode Load Balancing
Deployment Architecture
For the two core services (antivirus service and delivery service) of the business system, we deployed HAProxy on Server-1 and Server-2 for load balancing. The advantages of this architecture include:
- Service Availability: A single gateway failure does not affect backend client business systems
- Load Distribution: Multiple servers share the service load
- Transparent Failover: Backend configurations do not need to be aware of specific server changes
flowchart TD
A@{ shape: rounded, label: "Client System" } --> B@{ shape: hex, label: "HAProxy Load Balancer" }
B --> C@{ shape: rounded, label: "Server-1" }
B --> D@{ shape: rounded, label: "Server-2" }
B --> E@{ shape: rounded, label: "Server-3" }
B --> F@{ shape: rounded, label: "Server-4" }
C --> G@{ shape: rounded, label: "Antivirus 6600" }
D --> G
E --> G
F --> G
C --> H@{ shape: rounded, label: "Delivery 8025" }
D --> H
E --> H
F --> H
C --> I@{ shape: rounded, label: "Anti-Spam 8070" }
D --> I
classDef primary fill:#e3f2fd,stroke:#1976d2
classDef network fill:#fff3e0,stroke:#ff9800
classDef process fill:#f3e5f5,stroke:#9c27b0
class A,B network
class C,D,E,F primary
class G,H,I processHAProxy Configuration Details
| |
Health Checks and Statistics
Health Check Mechanism
HAProxy provides a comprehensive health check mechanism:
- Connection Check: Enabled via the
checkparameter - Check Interval:
inter 10smeans checking every 10 seconds - Maximum Connections:
maxconnlimits the maximum connections per server - Weight Settings: The
weightparameter is used to assign traffic weights
Statistics Feature
HAProxy has a built-in web statistics interface that can be enabled through configuration:
| |
Status Monitoring
Monitor HAProxy status with the following commands:
| |
Network Performance Tuning: NIC Soft Interrupt Optimization
Hard Interrupts and Soft Interrupts Principles
What is an Interrupt?
An interrupt is a corresponding hardware or software processing triggered by receiving asynchronous signals from peripheral hardware (relative to CPU and memory) or synchronous signals from software. Issuing such a signal is called an interrupt request (IRQ).
Difference Between Hard Interrupts and Soft Interrupts
Hard Interrupts:
- Asynchronous signals sent by peripheral hardware to the CPU or memory
- Requires an interrupt controller to participate
- Fast processing, triggered directly in hardware fashion
- Can be masked by setting the CPU’s mask bit
Soft Interrupts:
- Interrupt signals sent by the software system itself to the operating system kernel
- Usually triggered by hard interrupt handlers or the process scheduler
- Instructs the CPU to process in the form of CPU instructions
- Cannot be masked, part of system calls
Interrupt Processing Flow
flowchart TD
A@{ shape: hex, label: "Hardware Event" } --> B@{ shape: rounded, label: "Hard Interrupt Triggered" }
B --> C@{ shape: rounded, label: "Save CPU Context" }
C --> D@{ shape: rounded, label: "Execute Hard Interrupt Handler" }
D --> E@{ shape: rounded, label: "Trigger Soft Interrupt" }
E --> F@{ shape: rounded, label: "Soft Interrupt Processing" }
F --> G@{ shape: rounded, label: "Restore CPU Context" }
G --> H@{ shape: stadium, label: "Return to Original Program" }
classDef primary fill:#e3f2fd,stroke:#1976d2
classDef process fill:#f3e5f5,stroke:#9c27b0
classDef network fill:#fff3e0,stroke:#ff9800
class A network
class B,C,D,E,F,G primary
class H processProblem Diagnosis Process
Symptom Identification
The business gateway experienced network packet loss during peak hours, with CPU0 soft interrupt %sys reaching 90%, indicating that network processing had become a system bottleneck.
Monitoring Tools Usage
Viewing Interrupt Distribution:
| |
Detailed Analysis Steps:
- Identify the Problem CPU: Use
/proc/interruptsto see which CPU handles the most interrupts - Locate the Interrupt Source: Analyze which NIC or device is generating high interrupts
- Analyze Network Traffic: Use tools like
iftop,nethogsto view traffic patterns - Check Drivers: Confirm whether the NIC driver version supports optimization
RPS/RFS Optimization Solution
RPS (Receive Packet Steering)
RPS allows distributing received network packets to multiple CPU cores for processing, avoiding single CPU overload.
Configuration Method:
| |
RFS (Receive Flow Steering)
RFS further optimizes by scheduling packets belonging to the same network flow to the same CPU for processing, improving cache hit rates.
Configuration Method:
| |
Comprehensive Optimization Configuration
Optimization Script Example:
| |
Optimization Results Verification
flowchart TD
A@{ shape: rounded, label: "Before Optimization" } --> B@{ shape: rounded, label: "Soft IRQ 90%" }
A --> C@{ shape: rounded, label: "Packet Loss" }
A --> D@{ shape: rounded, label: "Single CPU Overload" }
E@{ shape: rounded, label: "After Optimization" } --> F@{ shape: rounded, label: "Soft IRQ 30%" }
E --> G@{ shape: rounded, label: "Zero Packet Loss" }
E --> H@{ shape: rounded, label: "Multi-CPU Load Balance" }
B --> I@{ shape: rounded, label: "RPS/RFS Optimization" }
C --> I
D --> I
I --> F
I --> G
I --> H
classDef primary fill:#e3f2fd,stroke:#1976d2
classDef alert fill:#ffebee,stroke:#f44336
classDef success fill:#e8f5e9,stroke:#4caf50
classDef process fill:#f3e5f5,stroke:#9c27b0
class A,B,C,D alert
class E,F,G,H success
class I processVerification Commands:
| |
Conclusion
This article has detailed practical solutions for Linux high availability and load balancing, covering the following key points:
1. Keepalived Dual-Machine Hot Standby Architecture
High-availability network architecture achieved through VRRP protocol, with key configuration points including:
- Multi-instance design for separate external and internal network traffic handling
- Priority settings for active-standby load balancing
- Non-preempt mode to avoid network flapping
- Authentication mechanisms for secure communication
2. HAProxy Load Balancing Implementation
High-availability load balancing for core business system services:
- TCP mode load balancing ensures service availability
- Health check mechanisms automatically remove failed nodes
- Statistics features facilitate operations monitoring
- Unified multi-service management simplifies configuration
3. Network Performance Optimization Practices
Solutions for excessive soft interrupt issues:
- RPS technology for multi-CPU distribution of received packets
- RFS technology for optimized network flow processing
- Comprehensive configuration to resolve network bottlenecks
- Real-time monitoring to verify optimization results
4. Operations Recommendations
- Regular Monitoring: Establish a comprehensive monitoring system to detect system bottlenecks in a timely manner
- Capacity Planning: Plan capacity expansion in advance based on business growth forecasts
- Failure Drills: Conduct regular failover drills to ensure high-availability mechanisms are effective
- Documentation Maintenance: Keep configuration documents up to date for troubleshooting and team collaboration
Through the comprehensive application of these technical solutions, a stable, efficient, and scalable Linux network service architecture can be built, providing reliable infrastructure support for business systems.