AlmaLinux性能优化全攻略从系统调优到应用加速解决企业级部署中的性能瓶颈与常见问题

引言

AlmaLinux作为CentOS的替代品，已成为企业级Linux部署的主流选择。然而，随着业务规模的扩大，系统性能瓶颈逐渐显现。本文将从系统内核调优、资源管理、应用加速到监控诊断，提供一套完整的性能优化方案，帮助您解决企业级部署中的常见性能问题。

一、系统内核参数调优

1.1 文件系统优化

文件系统性能直接影响I/O效率。对于AlmaLinux，推荐使用XFS或ext4文件系统。

XFS文件系统优化示例：

# 创建XFS文件系统时指定优化参数
mkfs.xfs -f -i size=512 -l size=128m,lazy-count=1 /dev/sdb1

# 挂载时添加性能优化参数
mount -o noatime,nodiratime,logbufs=8,logbsize=256k /dev/sdb1 /data

ext4文件系统优化：

# 创建ext4文件系统时启用特性
mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/sdb1

# 挂载参数优化
mount -o noatime,nodiratime,data=writeback,barrier=0 /dev/sdb1 /data

内核参数调整：

# 编辑/etc/sysctl.conf文件，添加以下参数
cat >> /etc/sysctl.conf << EOF
# 文件系统缓存优化
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
vm.dirty_expire_centisecs = 3000
vm.dirty_writeback_centisecs = 500

# 内存管理优化
vm.swappiness = 10
vm.vfs_cache_pressure = 50
vm.overcommit_memory = 1
vm.overcommit_ratio = 80

# 网络优化
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15
EOF

# 应用配置
sysctl -p

1.2 进程调度优化

对于高负载服务器，调整进程调度策略可以显著提升性能。

CPU调度器选择：

# 查看当前调度器
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

# 设置为performance模式（适用于计算密集型应用）
echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# 或者使用tuned工具进行更精细的调优
dnf install tuned -y
systemctl enable tuned
systemctl start tuned
tuned-adm profile throughput-performance  # 高吞吐量场景
# tuned-adm profile latency-performance  # 低延迟场景

cgroups资源限制示例：

# 创建cgroup限制特定进程的CPU使用
mkdir -p /sys/fs/cgroup/cpu/myapp
echo 50000 > /sys/fs/cgroup/cpu/myapp/cpu.cfs_quota_us  # 限制为50% CPU
echo 100000 > /sys/fs/cgroup/cpu/myapp/cpu.cfs_period_us

# 将进程加入cgroup
echo <PID> > /sys/fs/cgroup/cpu/myapp/tasks

二、内存管理优化

2.1 内存分配策略

透明大页（THP）配置：

# 检查THP状态
cat /sys/kernel/mm/transparent_hugepage/enabled

# 对于数据库应用，建议禁用THP
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

# 永久生效配置
cat >> /etc/rc.local << EOF
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
EOF
chmod +x /etc/rc.local

NUMA优化：

# 安装numactl工具
dnf install numactl -y

# 查看NUMA拓扑
numactl --hardware

# 绑定进程到特定NUMA节点
numactl --cpunodebind=0 --membind=0 /path/to/application

# 对于MySQL等数据库，配置NUMA策略
cat >> /etc/my.cnf.d/server.cnf << EOF
[mysqld]
# 绑定到第一个NUMA节点
numa_policy = interleave
EOF

2.2 应用内存管理

Java应用内存优化：

# JVM参数优化示例
java -Xms4g -Xmx4g \
     -XX:+UseG1GC \
     -XX:MaxGCPauseMillis=200 \
     -XX:InitiatingHeapOccupancyPercent=35 \
     -XX:+ParallelRefProcEnabled \
     -XX:+ExplicitGCInvokesConcurrent \
     -XX:+DisableExplicitGC \
     -XX:+HeapDumpOnOutOfMemoryError \
     -XX:HeapDumpPath=/tmp/heapdump.hprof \
     -XX:ErrorFile=/tmp/hs_err_pid%p.log \
     -jar myapp.jar

Python应用内存优化：

# 使用memory_profiler监控内存使用
from memory_profiler import profile

@profile
def process_large_data():
    # 使用生成器而不是列表，减少内存占用
    def data_generator():
        for i in range(1000000):
            yield i * 2
    
    # 处理数据
    result = sum(data_generator())
    return result

# 使用numpy进行数值计算，比纯Python快10-100倍
import numpy as np

def numpy_optimized():
    # 创建大数组
    arr = np.arange(1000000)
    # 向量化操作
    result = np.sum(arr * 2)
    return result

三、I/O性能优化

3.1 磁盘I/O优化

I/O调度器选择：

# 查看可用调度器
cat /sys/block/sda/queue/scheduler

# 对于SSD，使用none或mq-deadline
echo none > /sys/block/sda/queue/scheduler

# 对于HDD，使用deadline或cfq
echo deadline > /sys/block/sda/queue/scheduler

# 永久生效配置
cat >> /etc/udev/rules.d/60-scheduler.rules << EOF
# SSD使用none调度器
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"
# HDD使用deadline调度器
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="deadline"
EOF

RAID配置优化：

# 创建RAID 10（推荐用于数据库）
mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/sd[b-e]

# 优化RAID参数
cat >> /etc/mdadm.conf << EOF
ARRAY /dev/md0 metadata=1.2 name=alma:0 UUID=xxx
OPTIONS --write-journal
EOF

# 调整RAID读写策略
echo 1024 > /sys/block/md0/md/sync_speed_min
echo 20000 > /sys/block/md0/md/sync_speed_max

3.2 网络I/O优化

网络接口优化：

# 安装ethtool
dnf install ethtool -y

# 查看网卡信息
ethtool eth0

# 优化网卡参数
ethtool -G eth0 rx 4096 tx 4096  # 设置环形缓冲区
ethtool -C eth0 rx-usecs 100 tx-usecs 100  # 设置中断合并
ethtool -K eth0 gro on gso on tso on  # 启用硬件卸载

# 永久配置
cat >> /etc/sysconfig/network-scripts/ifcfg-eth0 << EOF
ETHTOOL_OPTS="-G eth0 rx 4096 tx 4096 -C eth0 rx-usecs 100 tx-usecs 100 -K eth0 gro on gso on tso on"
EOF

TCP/IP栈优化：

# 编辑/etc/sysctl.conf
cat >> /etc/sysctl.conf << EOF
# TCP缓冲区优化
net.core.rmem_max = 134217728
net.core.wmem_max = 134217728
net.ipv4.tcp_rmem = 4096 87380 134217728
net.ipv4.tcp_wmem = 4096 65536 134217728

# TCP拥塞控制
net.ipv4.tcp_congestion_control = bbr

# 连接跟踪优化
net.netfilter.nf_conntrack_max = 1000000
net.netfilter.nf_conntrack_tcp_timeout_established = 7200
EOF

sysctl -p

四、应用层优化

4.1 Web服务器优化（Nginx）

Nginx配置优化：

# /etc/nginx/nginx.conf
worker_processes auto;  # 自动设置为CPU核心数
worker_rlimit_nofile 65535;  # 每个worker进程的最大文件描述符数

events {
    worker_connections 65535;  # 每个worker的最大连接数
    use epoll;  # 使用epoll事件模型
    multi_accept on;  # 一次接受多个连接
}

http {
    # 缓冲区优化
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 8k;
    
    # 超时设置
    client_body_timeout 12;
    client_header_timeout 12;
    keepalive_timeout 15;
    send_timeout 10;
    
    # Gzip压缩
    gzip on;
    gzip_vary on;
    gzip_min_length 1024;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types
        text/plain
        text/css
        text/xml
        text/javascript
        application/javascript
        application/xml+rss
        application/json;
    
    # 缓存配置
    open_file_cache max=10000 inactive=30s;
    open_file_cache_valid 60s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
    
    # 进程优化
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    
    # 虚拟主机配置
    server {
        listen 80;
        server_name example.com;
        
        # 静态文件缓存
        location ~* \.(jpg|jpeg|png|gif|ico|css|js)$ {
            expires 1y;
            add_header Cache-Control "public, immutable";
        }
        
        # PHP-FPM代理优化
        location ~ \.php$ {
            fastcgi_pass unix:/var/run/php-fpm/www.sock;
            fastcgi_index index.php;
            fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
            include fastcgi_params;
            
            # 缓冲区优化
            fastcgi_buffers 16 16k;
            fastcgi_buffer_size 32k;
            fastcgi_busy_buffers_size 64k;
            fastcgi_temp_file_write_size 64k;
            
            # 超时设置
            fastcgi_connect_timeout 300;
            fastcgi_send_timeout 300;
            fastcgi_read_timeout 300;
        }
    }
}

4.2 数据库优化（MySQL/MariaDB）

MySQL配置优化：

# /etc/my.cnf.d/server.cnf
[mysqld]
# 内存配置
innodb_buffer_pool_size = 70% of total RAM  # 通常设置为总内存的70%
innodb_buffer_pool_instances = 8  # 根据CPU核心数调整
innodb_log_file_size = 2G
innodb_log_buffer_size = 64M

# I/O优化
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 2  # 平衡性能与数据安全
innodb_io_capacity = 2000  # SSD设置为2000-4000
innodb_io_capacity_max = 4000

# 连接优化
max_connections = 500
thread_cache_size = 100
table_open_cache = 2000

# 查询缓存（MySQL 8.0已移除，适用于5.7及以下）
query_cache_type = 0  # 建议禁用，使用应用层缓存

# 慢查询日志
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 2

# 复制优化（主从）
server_id = 1
log_bin = /var/log/mysql/mysql-bin
binlog_format = ROW
sync_binlog = 1
expire_logs_days = 7

# InnoDB优化
innodb_file_per_table = 1
innodb_flush_neighbors = 0  # SSD优化
innodb_read_io_threads = 8
innodb_write_io_threads = 8

PostgreSQL配置优化：

# /var/lib/pgsql/data/postgresql.conf
# 内存配置
shared_buffers = 25% of total RAM  # 总内存的25%
effective_cache_size = 75% of total RAM  # 总内存的75%
work_mem = 64MB  # 每个操作的内存
maintenance_work_mem = 1GB  # 维护操作的内存

# I/O优化
wal_buffers = 16MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB

# 连接优化
max_connections = 200
shared_preload_libraries = 'pg_stat_statements'

# 查询优化
random_page_cost = 1.1  # SSD优化
effective_io_concurrency = 200  # SSD优化

# 日志配置
log_min_duration_statement = 1000  # 记录超过1秒的查询
log_checkpoints = on
log_connections = on
log_disconnections = on

4.3 应用服务器优化（Tomcat）

Tomcat配置优化：

# /etc/tomcat/tomcat.conf
CATALINA_OPTS="-Xms4g -Xmx4g \
               -XX:+UseG1GC \
               -XX:MaxGCPauseMillis=200 \
               -XX:InitiatingHeapOccupancyPercent=35 \
               -XX:+ParallelRefProcEnabled \
               -XX:+ExplicitGCInvokesConcurrent \
               -XX:+DisableExplicitGC \
               -XX:+HeapDumpOnOutOfMemoryError \
               -XX:HeapDumpPath=/tmp/heapdump.hprof \
               -XX:ErrorFile=/tmp/hs_err_pid%p.log \
               -Djava.awt.headless=true \
               -Djava.net.preferIPv4Stack=true \
               -Djava.security.egd=file:/dev/./urandom"

# server.xml优化
cat >> /etc/tomcat/server.xml << EOF
<Connector port="8080" protocol="org.apache.coyote.http11.Http11Nio2Protocol"
           connectionTimeout="20000"
           redirectPort="8443"
           maxThreads="200"
           minSpareThreads="25"
           acceptCount="100"
           enableLookups="false"
           disableUploadTimeout="true"
           compression="on"
           compressionMinSize="2048"
           compressableMimeType="text/html,text/xml,text/plain,text/css,text/javascript,application/javascript,application/json"
           URIEncoding="UTF-8"
           useBodyEncodingForURI="true"/>
EOF

五、监控与诊断

5.1 系统监控工具

安装监控工具：

# 安装sysstat（包含sar、iostat等）
dnf install sysstat -y
systemctl enable sysstat
systemctl start sysstat

# 安装性能分析工具
dnf install perf -y
dnf install bcc-tools -y
dnf install sysdig -y

# 安装Prometheus + Grafana监控
dnf install epel-release -y
dnf install prometheus -y
dnf install grafana -y

常用监控命令：

# CPU监控
mpstat -P ALL 1  # 每个CPU核心的使用情况
pidstat -u 1  # 进程级别的CPU使用

# 内存监控
vmstat 1  # 系统整体内存
pidstat -r 1  # 进程内存使用

# I/O监控
iostat -x 1  # 磁盘I/O详细信息
iotop  # 实时I/O监控

# 网络监控
iftop  # 实时网络流量
nethogs  # 按进程显示网络流量

# 综合监控
top  # 实时系统监控
htop  # 增强版top
glances  # 综合监控工具

5.2 性能分析工具

使用perf进行性能分析：

# 记录CPU性能事件
perf record -g -p <PID> sleep 30
perf report  # 查看报告

# 分析系统调用
perf trace -p <PID>  # 跟踪系统调用

# 分析内存访问模式
perf record -e cache-misses -g -p <PID> sleep 30
perf report

使用bcc工具进行动态跟踪：

# 安装bcc-tools
dnf install bcc-tools -y

# 跟踪系统调用
/usr/share/bcc/tools/trace 'syscalls:sys_enter_*'

# 跟踪磁盘I/O
/usr/share/bcc/tools/biolatency

# 跟踪网络延迟
/usr/share/bcc/tools/tcplife

# 跟踪进程创建
/usr/share/bcc/tools/execsnoop

5.3 日志分析

日志轮转优化：

# 配置logrotate
cat >> /etc/logrotate.d/myapp << EOF
/var/log/myapp/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 644 root root
    sharedscripts
    postrotate
        systemctl reload myapp.service > /dev/null 2>&1 || true
    endscript
}
EOF

日志分析脚本示例：

#!/usr/bin/env python3
import re
from collections import Counter
from datetime import datetime

def analyze_logs(log_file):
    """分析日志文件，找出性能瓶颈"""
    error_patterns = {
        'timeout': r'timeout|TIMEOUT',
        'slow_query': r'slow query|long query',
        'connection_error': r'connection refused|too many connections',
        'memory_error': r'out of memory|OOM',
        'disk_full': r'disk full|no space left'
    }
    
    counters = Counter()
    with open(log_file, 'r') as f:
        for line in f:
            for error_type, pattern in error_patterns.items():
                if re.search(pattern, line, re.IGNORECASE):
                    counters[error_type] += 1
    
    # 输出分析结果
    print(f"日志分析报告 - {datetime.now()}")
    print("=" * 50)
    for error_type, count in counters.most_common():
        print(f"{error_type}: {count} 次")
    
    return counters

if __name__ == "__main__":
    analyze_logs("/var/log/myapp/application.log")

六、常见性能问题与解决方案

6.1 CPU使用率过高

问题诊断：

# 1. 查看CPU使用率最高的进程
top -o %CPU

# 2. 使用perf分析CPU热点
perf top -p <PID>

# 3. 检查是否为内核态占用过高
pidstat -u 1 -p <PID>

# 4. 检查是否为用户态占用过高
pidstat -u 1 -p <PID> -t

解决方案：

# 1. 优化进程优先级
renice -n -10 -p <PID>  # 提高优先级

# 2. 限制CPU使用（如果需要）
cpulimit -l 80 -p <PID>  # 限制为80% CPU

# 3. 使用cgroups限制
mkdir -p /sys/fs/cgroup/cpu/myapp
echo 80000 > /sys/fs/cgroup/cpu/myapp/cpu.cfs_quota_us
echo 100000 > /sys/fs/cgroup/cpu/myapp/cpu.cfs_period_us
echo <PID> > /sys/fs/cgroup/cpu/myapp/tasks

# 4. 代码级优化（Python示例）
# 使用多进程替代多线程（GIL限制）
from multiprocessing import Pool

def process_data(data):
    # CPU密集型任务
    return sum(x**2 for x in data)

if __name__ == "__main__":
    with Pool(processes=4) as pool:
        results = pool.map(process_data, data_chunks)

6.2 内存不足（OOM）

问题诊断：

# 1. 查看内存使用情况
free -h
cat /proc/meminfo

# 2. 查看OOM日志
dmesg | grep -i oom
journalctl -k | grep -i oom

# 3. 查看进程内存使用
smem -k
pmap -x <PID>

# 4. 检查内存泄漏
valgrind --tool=memcheck --leak-check=full ./myapp

解决方案：

# 1. 调整内核参数
cat >> /etc/sysctl.conf << EOF
vm.overcommit_memory = 1
vm.overcommit_ratio = 80
vm.swappiness = 10
EOF
sysctl -p

# 2. 增加交换空间
dd if=/dev/zero of=/swapfile bs=1M count=8192
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile none swap sw 0 0' >> /etc/fstab

# 3. 应用内存优化（Java示例）
# 调整JVM参数
java -Xms2g -Xmx2g -XX:MaxMetaspaceSize=256m -XX:MaxDirectMemorySize=512m ...

# 4. 使用内存监控脚本
#!/bin/bash
# 内存监控脚本
while true; do
    mem_used=$(free | grep Mem | awk '{print $3/$2 * 100.0}')
    if (( $(echo "$mem_used > 90" | bc -l) )); then
        # 发送告警
        echo "内存使用率超过90%: $mem_used%" | mail -s "内存告警" admin@example.com
        # 重启内存泄漏严重的进程
        pkill -f "leaky_process"
    fi
    sleep 60
done

6.3 磁盘I/O瓶颈

问题诊断：

# 1. 查看磁盘I/O使用率
iostat -x 1

# 2. 查看哪些进程在进行I/O
iotop

# 3. 查看磁盘队列长度
cat /sys/block/sda/queue/nr_requests

# 4. 使用blktrace分析I/O路径
blktrace -d /dev/sda -o mytrace
blkparse mytrace.bin | head -100

解决方案：

# 1. 调整I/O调度器
echo deadline > /sys/block/sda/queue/scheduler

# 2. 增加I/O队列深度
echo 1024 > /sys/block/sda/queue/nr_requests

# 3. 使用RAID优化
mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/sd[b-e]

# 4. 应用层优化（数据库示例）
# MySQL配置优化
cat >> /etc/my.cnf.d/server.cnf << EOF
[mysqld]
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 2
EOF

# 5. 使用SSD缓存
# 安装bcache
dnf install bcache-tools -y
make-bcache -B /dev/sdb  # SSD作为缓存
make-bcache -C /dev/sdc  # HDD作为后端

6.4 网络延迟高

问题诊断：

# 1. 测试网络延迟
ping -c 10 example.com
mtr example.com

# 2. 查看网络连接状态
ss -s
netstat -s | grep -i retrans

# 3. 查看网络接口统计
ethtool -S eth0

# 4. 使用tcpdump抓包分析
tcpdump -i eth0 -w capture.pcap
wireshark capture.pcap  # 在图形界面分析

解决方案：

# 1. 调整TCP参数
cat >> /etc/sysctl.conf << EOF
net.ipv4.tcp_fastopen = 3
net.ipv4.tcp_sack = 1
net.ipv4.tcp_window_scaling = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_mtu_probing = 1
net.ipv4.tcp_congestion_control = bbr
EOF
sysctl -p

# 2. 优化网络接口
ethtool -G eth0 rx 4096 tx 4096
ethtool -C eth0 rx-usecs 100 tx-usecs 100
ethtool -K eth0 gro on gso on tso on

# 3. 使用网络加速工具
# 安装并配置tc（流量控制）
tc qdisc add dev eth0 root handle 1: htb default 30
tc class add dev eth0 parent 1: classid 1:1 htb rate 1000mbit
tc class add dev eth0 parent 1:1 classid 1:10 htb rate 500mbit ceil 1000mbit

# 4. 应用层优化（Nginx示例）
# 启用keepalive
keepalive_timeout 65;
keepalive_requests 100;

# 启用HTTP/2
listen 443 ssl http2;

七、自动化优化脚本

7.1 系统性能优化脚本

#!/bin/bash
# AlmaLinux性能优化脚本
# 用法: ./optimize_alma.sh [profile]

PROFILE=${1:-"general"}

echo "开始优化AlmaLinux系统性能..."
echo "当前配置: $PROFILE"

# 函数：应用通用优化
apply_general_optimizations() {
    echo "应用通用优化..."
    
    # 更新系统
    dnf update -y
    
    # 安装必要工具
    dnf install -y tuned sysstat perf bcc-tools
    
    # 配置sysstat
    systemctl enable sysstat
    systemctl start sysstat
    
    # 优化内核参数
    cat >> /etc/sysctl.conf << EOF
# 通用优化
vm.swappiness = 10
vm.vfs_cache_pressure = 50
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5

# 网络优化
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 15
net.ipv4.tcp_congestion_control = bbr

# 文件系统优化
fs.file-max = 2097152
fs.nr_open = 2097152
EOF
    
    sysctl -p
    
    # 配置文件描述符限制
    echo "* soft nofile 65535" >> /etc/security/limits.conf
    echo "* hard nofile 65535" >> /etc/security/limits.conf
    echo "* soft nproc 65535" >> /etc/security/limits.conf
    echo "* hard nproc 65535" >> /etc/security/limits.conf
    
    # 配置透明大页
    echo never > /sys/kernel/mm/transparent_hugepage/enabled
    echo never > /sys/kernel/mm/transparent_hugepage/defrag
    
    # 配置I/O调度器
    for disk in /sys/block/sd*/queue/scheduler; do
        if [[ $(cat $disk) == *"deadline"* ]]; then
            echo deadline > $disk
        elif [[ $(cat $disk) == *"none"* ]]; then
            echo none > $disk
        fi
    done
    
    echo "通用优化完成"
}

# 函数：数据库优化
apply_database_optimizations() {
    echo "应用数据库优化..."
    
    # 安装数据库（如果未安装）
    if ! rpm -q mariadb-server &>/dev/null; then
        dnf install -y mariadb-server
    fi
    
    # 配置MySQL
    cat >> /etc/my.cnf.d/server.cnf << EOF
[mysqld]
# 内存配置
innodb_buffer_pool_size = 70% of total RAM
innodb_buffer_pool_instances = 8
innodb_log_file_size = 2G
innodb_log_buffer_size = 64M

# I/O优化
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 2
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000

# 连接优化
max_connections = 500
thread_cache_size = 100
table_open_cache = 2000

# 慢查询日志
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 2
EOF
    
    systemctl restart mariadb
    
    echo "数据库优化完成"
}

# 函数：Web服务器优化
apply_webserver_optimizations() {
    echo "应用Web服务器优化..."
    
    # 安装Nginx（如果未安装）
    if ! rpm -q nginx &>/dev/null; then
        dnf install -y nginx
    fi
    
    # 配置Nginx
    cat > /etc/nginx/conf.d/optimized.conf << EOF
worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 65535;
    use epoll;
    multi_accept on;
}

http {
    # 缓冲区优化
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 8k;
    
    # 超时设置
    client_body_timeout 12;
    client_header_timeout 12;
    keepalive_timeout 15;
    send_timeout 10;
    
    # Gzip压缩
    gzip on;
    gzip_vary on;
    gzip_min_length 1024;
    gzip_proxied any;
    gzip_comp_level 6;
    
    # 缓存配置
    open_file_cache max=10000 inactive=30s;
    open_file_cache_valid 60s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
    
    # 进程优化
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
}
EOF
    
    systemctl restart nginx
    
    echo "Web服务器优化完成"
}

# 函数：应用优化
apply_application_optimizations() {
    echo "应用应用层优化..."
    
    # 安装Java（如果需要）
    if ! rpm -q java-11-openjdk &>/dev/null; then
        dnf install -y java-11-openjdk
    fi
    
    # 配置Java环境变量
    cat >> /etc/profile.d/java.sh << EOF
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk
export PATH=\$JAVA_HOME/bin:\$PATH
export CLASSPATH=.:\$JAVA_HOME/lib
EOF
    
    source /etc/profile.d/java.sh
    
    echo "应用层优化完成"
}

# 函数：监控配置
apply_monitoring_optimizations() {
    echo "配置监控系统..."
    
    # 安装Prometheus和Grafana
    dnf install -y epel-release
    dnf install -y prometheus grafana
    
    # 配置Prometheus
    cat > /etc/prometheus/prometheus.yml << EOF
global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'node'
    static_configs:
      - targets: ['localhost:9100']
  - job_name: 'mysql'
    static_configs:
      - targets: ['localhost:9180']
  - job_name: 'nginx'
    static_configs:
      - targets: ['localhost:9113']
EOF
    
    # 启动服务
    systemctl enable prometheus
    systemctl start prometheus
    systemctl enable grafana-server
    systemctl start grafana-server
    
    echo "监控系统配置完成"
}

# 主逻辑
case $PROFILE in
    "general")
        apply_general_optimizations
        ;;
    "database")
        apply_general_optimizations
        apply_database_optimizations
        ;;
    "web")
        apply_general_optimizations
        apply_webserver_optimizations
        ;;
    "application")
        apply_general_optimizations
        apply_application_optimizations
        ;;
    "monitoring")
        apply_general_optimizations
        apply_monitoring_optimizations
        ;;
    "full")
        apply_general_optimizations
        apply_database_optimizations
        apply_webserver_optimizations
        apply_application_optimizations
        apply_monitoring_optimizations
        ;;
    *)
        echo "未知配置: $PROFILE"
        echo "可用配置: general, database, web, application, monitoring, full"
        exit 1
        ;;
esac

echo "优化完成！请重启系统以使所有更改生效。"
echo "重启命令: reboot"

7.2 性能监控脚本

#!/usr/bin/env python3
"""
AlmaLinux性能监控脚本
实时监控系统性能指标并生成报告
"""

import psutil
import time
import json
import smtplib
from email.mime.text import MIMEText
from datetime import datetime
import logging

# 配置日志
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('/var/log/performance_monitor.log'),
        logging.StreamHandler()
    ]
)

class PerformanceMonitor:
    def __init__(self, threshold_cpu=80, threshold_memory=85, threshold_disk=90):
        self.threshold_cpu = threshold_cpu
        self.threshold_memory = threshold_memory
        self.threshold_disk = threshold_disk
        self.alerts = []
    
    def get_cpu_usage(self):
        """获取CPU使用率"""
        return psutil.cpu_percent(interval=1)
    
    def get_memory_usage(self):
        """获取内存使用率"""
        mem = psutil.virtual_memory()
        return mem.percent
    
    def get_disk_usage(self, path='/'):
        """获取磁盘使用率"""
        disk = psutil.disk_usage(path)
        return disk.percent
    
    def get_network_stats(self):
        """获取网络统计"""
        net_io = psutil.net_io_counters()
        return {
            'bytes_sent': net_io.bytes_sent,
            'bytes_recv': net_io.bytes_recv,
            'packets_sent': net_io.packets_sent,
            'packets_recv': net_io.packets_recv
        }
    
    def get_process_stats(self, top_n=5):
        """获取进程统计"""
        processes = []
        for proc in psutil.process_iter(['pid', 'name', 'cpu_percent', 'memory_percent']):
            try:
                processes.append(proc.info)
            except (psutil.NoSuchProcess, psutil.AccessDenied):
                pass
        
        # 按CPU使用率排序
        processes.sort(key=lambda x: x['cpu_percent'], reverse=True)
        return processes[:top_n]
    
    def check_thresholds(self):
        """检查性能阈值"""
        alerts = []
        
        cpu = self.get_cpu_usage()
        memory = self.get_memory_usage()
        disk = self.get_disk_usage()
        
        if cpu > self.threshold_cpu:
            alerts.append(f"CPU使用率过高: {cpu}% > {self.threshold_cpu}%")
        
        if memory > self.threshold_memory:
            alerts.append(f"内存使用率过高: {memory}% > {self.threshold_memory}%")
        
        if disk > self.threshold_disk:
            alerts.append(f"磁盘使用率过高: {disk}% > {self.threshold_disk}%")
        
        return alerts
    
    def generate_report(self):
        """生成性能报告"""
        report = {
            'timestamp': datetime.now().isoformat(),
            'system': {
                'cpu_usage': self.get_cpu_usage(),
                'memory_usage': self.get_memory_usage(),
                'disk_usage': self.get_disk_usage(),
                'network': self.get_network_stats()
            },
            'top_processes': self.get_process_stats(),
            'alerts': self.check_thresholds()
        }
        
        return report
    
    def send_alert(self, message):
        """发送告警邮件"""
        try:
            # 配置邮件（根据实际情况修改）
            sender = 'monitor@example.com'
            receivers = ['admin@example.com']
            
            msg = MIMEText(message)
            msg['Subject'] = '性能告警 - AlmaLinux'
            msg['From'] = sender
            msg['To'] = ', '.join(receivers)
            
            # 发送邮件（需要配置SMTP服务器）
            # server = smtplib.SMTP('smtp.example.com', 587)
            # server.starttls()
            # server.login(sender, 'password')
            # server.send_message(msg)
            # server.quit()
            
            logging.warning(f"告警发送: {message}")
            
        except Exception as e:
            logging.error(f"发送告警失败: {e}")
    
    def run_monitoring(self, interval=60):
        """运行监控循环"""
        logging.info("开始性能监控...")
        
        while True:
            try:
                # 生成报告
                report = self.generate_report()
                
                # 记录报告
                logging.info(f"性能报告: {json.dumps(report, indent=2)}")
                
                # 检查告警
                if report['alerts']:
                    alert_message = "\n".join(report['alerts'])
                    self.send_alert(alert_message)
                
                # 等待下一次检查
                time.sleep(interval)
                
            except KeyboardInterrupt:
                logging.info("监控停止")
                break
            except Exception as e:
                logging.error(f"监控错误: {e}")
                time.sleep(interval)

if __name__ == "__main__":
    # 创建监控实例
    monitor = PerformanceMonitor(
        threshold_cpu=80,
        threshold_memory=85,
        threshold_disk=90
    )
    
    # 启动监控（每60秒检查一次）
    monitor.run_monitoring(interval=60)

八、最佳实践总结

8.1 优化原则

测量优先：在优化前，先使用监控工具测量当前性能
渐进式优化：每次只调整一个参数，观察效果
备份配置：修改系统配置前，备份原始文件
测试环境验证：在生产环境应用前，先在测试环境验证
文档记录：记录所有优化措施和效果

8.2 推荐的优化顺序

系统级优化：内核参数、文件系统、I/O调度器
资源管理：内存、CPU、磁盘配额
应用级优化：Web服务器、数据库、应用服务器
监控与告警：建立完整的监控体系
自动化：编写脚本实现自动化优化和恢复

8.3 性能优化检查清单

[ ] 系统内核参数已优化
[ ] 文件系统已优化配置
[ ] I/O调度器已选择合适类型
[ ] 内存管理参数已调整
[ ] 网络参数已优化
[ ] 应用服务器配置已优化
[ ] 数据库配置已优化
[ ] 监控系统已部署
[ ] 告警机制已建立
[ ] 优化脚本已编写
[ ] 备份策略已制定
[ ] 文档已更新

九、常见问题FAQ

Q1: 如何确定系统性能瓶颈？

A: 使用以下工具组合：

top/htop - 查看整体资源使用
iostat - 查看磁盘I/O
vmstat - 查看内存和进程
netstat/ss - 查看网络连接
perf - 深入分析性能热点
bcc-tools - 动态跟踪系统行为

Q2: AlmaLinux与CentOS的性能差异？

A: AlmaLinux是CentOS的1:1二进制兼容版本，性能表现基本相同。主要差异在于：

软件包来源不同（AlmaLinux来自RHEL，CentOS来自社区）
更新策略可能略有不同
社区支持和文档可能有所差异

Q3: 如何平衡性能与安全性？

使用firewalld配置最小必要端口
启用SELinux（不要禁用）
定期更新系统和应用
使用auditd记录关键操作
性能优化时避免降低安全级别

Q4: 生产环境优化有哪些注意事项？

备份：修改前备份所有配置
测试：在测试环境验证优化效果
灰度发布：逐步应用优化，观察影响
监控：优化后密切监控系统表现
回滚计划：准备快速回滚方案
文档：详细记录优化过程和效果

十、总结

AlmaLinux性能优化是一个系统工程，需要从内核、系统、应用多个层面进行。本文提供了从基础到高级的完整优化方案，包括：

系统内核调优：文件系统、进程调度、内存管理
I/O优化：磁盘、网络I/O优化策略
应用层优化：Web服务器、数据库、应用服务器配置
监控诊断：工具使用和问题排查方法
自动化脚本：一键优化和监控脚本
最佳实践：优化原则和检查清单

记住，性能优化不是一次性的工作，而是一个持续的过程。建议定期评估系统性能，根据业务变化调整优化策略。同时，建立完善的监控体系，及时发现和解决性能问题，确保系统稳定高效运行。

通过本文的指导，您应该能够系统地优化AlmaLinux系统，解决企业级部署中的性能瓶颈问题，提升应用响应速度和系统吞吐量。