引言

AlmaLinux作为RHEL的开源替代品,在企业级服务器环境中扮演着越来越重要的角色。然而,仅仅安装系统并不能保证最佳性能。本文将深入探讨AlmaLinux性能优化的各个方面,从底层系统调优到应用层实践,提供全面的优化策略和实战技巧。

一、系统基础性能评估

1.1 性能监控工具概览

在开始优化之前,我们需要建立性能基线。AlmaLinux提供了一系列强大的监控工具:

# 安装常用性能监控工具
sudo dnf install sysstat htop iotop perf

# 启用sysstat收集系统性能数据
sudo systemctl enable --now sysstat

1.2 关键性能指标

  • CPU使用率tophtopmpstat
  • 内存使用free -hvmstat
  • 磁盘I/Oiostatiotop
  • 网络流量iftopnload
  • 进程级监控pidstat

1.3 建立性能基线

# 生成系统性能报告
sudo sar -A > system_baseline_$(date +%Y%m%d).txt

# 持续监控CPU使用情况(每5秒采样,共10次)
mpstat -P ALL 5 10

# 监控内存使用趋势
vmstat 1 10

二、内核级性能调优

2.1 内核参数优化

2.1.1 虚拟内存管理

# 查看当前虚拟内存参数
sysctl vm.swappiness
sysctl vm.vfs_cache_pressure

# 优化虚拟内存设置(适用于数据库服务器)
sudo tee /etc/sysctl.d/99-vm-optimization.conf << EOF
# 减少交换倾向,优先使用物理内存
vm.swappiness = 10

# 控制内核回收内存的倾向
vm.vfs_cache_pressure = 50

# 增加内存映射文件的最大数量
vm.max_map_count = 262144

# 优化内存分配策略
vm.overcommit_memory = 1
vm.overcommit_ratio = 80
EOF

# 应用配置
sudo sysctl -p /etc/sysctl.d/99-vm-optimization.conf

2.1.2 文件系统优化

# 查看当前文件系统挂载选项
mount | grep -E "(ext4|xfs)"

# 优化ext4文件系统(适用于Web服务器)
sudo tune2fs -o journal_data_writeback /dev/sda1

# 优化XFS文件系统(适用于数据库服务器)
sudo xfs_admin -u /dev/sda1
sudo xfs_admin -L "database" /dev/sda1

# 调整文件系统挂载选项
sudo tee /etc/fstab << EOF
/dev/sda1 /data ext4 defaults,noatime,nodiratime,data=writeback 0 2
/dev/sdb1 /var/lib/mysql xfs defaults,noatime,nodiratime,logbufs=8,logbsize=256k 0 2
EOF

2.2 I/O调度器优化

# 查看当前I/O调度器
cat /sys/block/sda/queue/scheduler

# 为不同设备设置合适的调度器
# SSD设备使用none或mq-deadline
echo none > /sys/block/sda/queue/scheduler

# 机械硬盘使用deadline
echo deadline > /sys/block/sdb/queue/scheduler

# 持久化配置
sudo tee /etc/udev/rules.d/60-io-scheduler.rules << EOF
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="none"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="deadline"
EOF

2.3 网络栈优化

# 网络参数优化(适用于高并发Web服务器)
sudo tee /etc/sysctl.d/99-network-optimization.conf << EOF
# 增加TCP连接队列
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# 优化TCP缓冲区
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# TCP拥塞控制算法
net.ipv4.tcp_congestion_control = bbr

# TIME_WAIT连接重用
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30

# 网络连接跟踪
net.netfilter.nf_conntrack_max = 2000000
net.netfilter.nf_conntrack_tcp_timeout_established = 7200
EOF

# 应用配置
sudo sysctl -p /etc/sysctl.d/99-network-optimization.conf

三、存储系统优化

3.1 磁盘I/O优化

3.1.1 RAID配置优化

# 查看RAID状态
cat /proc/mdstat

# 创建RAID 10(适用于数据库)
sudo mdadm --create /dev/md0 --level=10 --raid-devices=4 /dev/sd[b-e]

# 优化RAID参数
sudo mdadm --grow /dev/md0 --bitmap=internal
sudo mdadm --grow /dev/md0 --chunk-size=512

# 持久化RAID配置
sudo mdadm --detail --scan >> /etc/mdadm.conf

3.1.2 LVM优化

# 创建优化的LVM配置
sudo pvcreate /dev/sdb
sudo vgcreate -s 16M vg_data /dev/sdb
sudo lvcreate -L 100G -n lv_mysql vg_data

# 优化LVM参数
sudo lvchange --discards passdown /dev/vg_data/lv_mysql
sudo lvchange --zero y /dev/vg_data/lv_mysql

# 调整LVM缓存(适用于频繁访问的小文件)
sudo lvcreate --type cache --size 1G --name lv_cache vg_data /dev/sdb
sudo lvconvert --type cache-pool --poolmetadata vg_data/lv_cache vg_data/lv_mysql

3.2 文件系统优化实践

3.2.1 XFS文件系统优化

# 创建XFS文件系统并优化参数
sudo mkfs.xfs -f -d su=128k,sw=4 -l size=128m /dev/vg_data/lv_mysql

# 调整XFS参数
sudo xfs_admin -u /dev/vg_data/lv_mysql
sudo xfs_admin -L "database" /dev/vg_data/lv_mysql

# 挂载优化
sudo mount -o noatime,nodiratime,logbufs=8,logbsize=256k /dev/vg_data/lv_mysql /var/lib/mysql

3.2.2 EXT4文件系统优化

# 创建优化的EXT4文件系统
sudo mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 -O ^has_journal /dev/vg_data/lv_web

# 调整EXT4参数
sudo tune2fs -o journal_data_writeback /dev/vg_data/lv_web
sudo tune2fs -i 0 -c 0 /dev/vg_data/lv_web

# 挂载优化
sudo mount -o noatime,nodiratime,data=writeback /dev/vg_data/lv_web /var/www/html

四、应用层性能优化

4.1 Web服务器优化(Nginx)

4.1.1 Nginx配置优化

# /etc/nginx/nginx.conf 优化配置
user nginx;
worker_processes auto;  # 自动设置为CPU核心数
worker_rlimit_nofile 65535;

events {
    worker_connections 65535;
    use epoll;  # Linux高性能事件模型
    multi_accept on;
}

http {
    # 基础优化
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 1000;
    
    # 缓冲区优化
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 8k;
    
    # Gzip压缩
    gzip on;
    gzip_vary on;
    gzip_min_length 1024;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types
        text/plain
        text/css
        text/xml
        text/javascript
        application/javascript
        application/xml+rss
        application/json;
    
    # 缓存配置
    open_file_cache max=10000 inactive=30s;
    open_file_cache_valid 60s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
    
    # 日志优化
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for" '
                    'rt=$request_time uct="$upstream_connect_time" '
                    'uht="$upstream_header_time" urt="$upstream_response_time"';
    
    access_log /var/log/nginx/access.log main buffer=64k flush=5m;
    
    # 虚拟主机配置
    include /etc/nginx/conf.d/*.conf;
}

4.1.2 Nginx进程管理优化

# 使用systemd管理Nginx进程
sudo tee /etc/systemd/system/nginx.service.d/override.conf << EOF
[Service]
LimitNOFILE=65535
LimitNPROC=65535
ExecStartPre=/usr/sbin/nginx -t
ExecStart=/usr/sbin/nginx
ExecReload=/bin/kill -HUP $MAINPID
KillMode=mixed
KillSignal=SIGQUIT
TimeoutStopSec=5
PrivateTmp=true
EOF

# 重新加载systemd配置
sudo systemctl daemon-reload

4.2 数据库优化(MySQL/MariaDB)

4.2.1 MySQL配置优化

# /etc/my.cnf.d/server.cnf 优化配置
[mysqld]
# 基础配置
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mariadb/mariadb.log
pid-file=/run/mariadb/mariadb.pid

# 内存优化
innodb_buffer_pool_size = 4G  # 通常设置为系统内存的50-70%
innodb_buffer_pool_instances = 8  # 与CPU核心数匹配
innodb_log_file_size = 512M
innodb_log_buffer_size = 64M

# I/O优化
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 2  # 平衡性能与数据安全
innodb_file_per_table = 1
innodb_io_capacity = 2000  # SSD设置为2000-4000
innodb_io_capacity_max = 4000

# 连接优化
max_connections = 500
thread_cache_size = 50
table_open_cache = 2000
table_definition_cache = 1400

# 查询缓存(MySQL 8.0+已移除,MariaDB仍可用)
query_cache_type = 1
query_cache_size = 128M
query_cache_limit = 2M

# 日志优化
slow_query_log = 1
slow_query_log_file = /var/log/mariadb/slow.log
long_query_time = 2
log_queries_not_using_indexes = 1

# 复制优化(如果使用主从复制)
server_id = 1
log_bin = /var/log/mariadb/mariadb-bin
binlog_format = ROW
expire_logs_days = 7

4.2.2 MySQL性能监控与调优

# 安装MySQL性能分析工具
sudo dnf install percona-toolkit

# 分析慢查询日志
sudo pt-query-digest /var/log/mariadb/slow.log > slow_query_report.txt

# 生成MySQL配置建议
sudo pt-mysql-summary --user=root --password

# 监控InnoDB状态
mysql -e "SHOW ENGINE INNODB STATUS\G" > innodb_status.txt

4.3 应用服务器优化(Java/Python)

4.3.1 Java应用优化

# JVM参数优化(适用于Spring Boot应用)
JAVA_OPTS="
-Xms4G -Xmx4G  # 堆内存设置为固定值,避免动态调整
-XX:+UseG1GC  # 使用G1垃圾回收器
-XX:MaxGCPauseMillis=200  # 目标最大GC暂停时间
-XX:+UnlockExperimentalVMOptions
-XX:+UseCGroupMemoryLimitForHeap  # 容器环境使用
-XX:+AlwaysPreTouch  # 预热内存
-XX:+UseStringDeduplication  # 字符串去重
-XX:MaxMetaspaceSize=256m  # 元空间限制
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/log/java/heapdump.hprof
"

# 使用systemd管理Java应用
sudo tee /etc/systemd/system/myapp.service << EOF
[Unit]
Description=My Java Application
After=network.target

[Service]
Type=simple
User=myapp
WorkingDirectory=/opt/myapp
Environment=JAVA_OPTS="$JAVA_OPTS"
ExecStart=/usr/bin/java -jar /opt/myapp/app.jar
Restart=always
RestartSec=10
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=myapp

[Install]
WantedBy=multi-user.target
EOF

4.3.2 Python应用优化

# Gunicorn配置优化(适用于Django/Flask应用)
# gunicorn_config.py
import multiprocessing

# 基础配置
bind = "0.0.0.0:8000"
workers = multiprocessing.cpu_count() * 2 + 1  # 推荐公式
worker_class = "gevent"  # 使用异步worker
worker_connections = 1000  # 每个worker的最大连接数
timeout = 30
keepalive = 2

# 性能优化
preload_app = True  # 预加载应用
max_requests = 1000  # 每个worker处理1000个请求后重启
max_requests_jitter = 50  # 随机抖动避免同时重启

# 内存优化
limit_request_line = 4096
limit_request_fields = 100
limit_request_field_size = 8190

# 日志配置
accesslog = "/var/log/gunicorn/access.log"
errorlog = "/var/log/gunicorn/error.log"
loglevel = "info"

# 进程管理
daemon = False
pidfile = "/var/run/gunicorn.pid"
umask = 0o007
user = "myapp"
group = "myapp"

# 监控配置
statsd_host = "localhost"
statsd_port = 8125
statsd_prefix = "gunicorn"

五、容器化应用优化

5.1 Docker容器优化

5.1.1 Docker守护进程优化

# /etc/docker/daemon.json 优化配置
{
  "data-root": "/var/lib/docker",
  "storage-driver": "overlay2",
  "storage-opts": [
    "overlay2.override_kernel_check=true"
  ],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "default-ulimits": {
    "nofile": {
      "Name": "nofile",
      "Hard": 65535,
      "Soft": 65535
    }
  },
  "exec-opts": ["native.cgroupdriver=systemd"],
  "live-restore": true,
  "max-concurrent-downloads": 10,
  "max-concurrent-uploads": 5,
  "registry-mirrors": ["https://mirror.gcr.io"]
}

# 重启Docker服务
sudo systemctl restart docker

5.1.2 容器资源限制优化

# docker-compose.yml 优化配置
version: '3.8'
services:
  web:
    image: nginx:alpine
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G
        reservations:
          cpus: '1'
          memory: 1G
    ulimits:
      nofile:
        soft: 65535
        hard: 65535
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

5.2 Kubernetes优化

5.2.1 节点优化配置

# kubelet配置优化
apiVersion: v1
kind: ConfigMap
metadata:
  name: kubelet-config
  namespace: kube-system
data:
  kubelet.config: |
    {
      "kind": "KubeletConfiguration",
      "apiVersion": "kubelet.config.k8s.io/v1beta1",
      "address": "0.0.0.0",
      "port": 10250",
      "readOnlyPort": 0,
      "cgroupDriver": "systemd",
      "clusterDNS": ["10.96.0.10"],
      "clusterDomain": "cluster.local",
      "resolvConf": "/etc/resolv.conf",
      "maxPods": 110,
      "podsPerCore": 10,
      "kubeAPIQPS": 50,
      "kubeAPIBurst": 100,
      "evictionHard": {
        "memory.available": "100Mi",
        "nodefs.available": "10%",
        "nodefs.inodesFree": "5%",
        "imagefs.available": "15%",
        "imagefs.inodesFree": "5%"
      },
      "evictionSoft": {
        "memory.available": "200Mi",
        "nodefs.available": "15%",
        "nodefs.inodesFree": "10%",
        "imagefs.available": "20%",
        "imagefs.inodesFree": "10%"
      },
      "evictionSoftGracePeriod": {
        "memory.available": "2m",
        "nodefs.available": "2m",
        "nodefs.inodesFree": "2m",
        "imagefs.available": "2m",
        "imagefs.inodesFree": "2m"
      },
      "evictionMaxPodGracePeriod": 120,
      "evictionPressureTransitionPeriod": "5m",
      "kubeReserved": {
        "cpu": "200m",
        "memory": "256Mi"
      },
      "systemReserved": {
        "cpu": "100m",
        "memory": "128Mi"
      },
      "enforceNodeAllocatable": ["pods", "system-reserved", "kube-reserved"],
      "featureGates": {
        "RotateKubeletServerCertificate": true
      }
    }

5.2.2 Pod资源优化

# deployment优化配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: web
        image: nginx:alpine
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "500m"
            memory: "512Mi"
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        lifecycle:
          postStart:
            exec:
              command: ["/bin/sh", "-c", "echo 'Container started'"]
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]
      nodeSelector:
        node-type: web
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - web
              topologyKey: kubernetes.io/hostname

六、监控与告警优化

6.1 Prometheus + Grafana监控体系

6.1.1 Prometheus配置优化

# prometheus.yml 优化配置
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  external_labels:
    cluster: 'alma-cluster'
    environment: 'production'

rule_files:
  - "rules/*.yml"

scrape_configs:
  - job_name: 'alma-linux'
    static_configs:
      - targets: ['localhost:9100']
    scrape_interval: 30s
    scrape_timeout: 10s
    metrics_path: /metrics
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance
        regex: '([^:]+)(?::\d+)?'
        replacement: '${1}'
  
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node1:9100', 'node2:9100', 'node3:9100']
    scrape_interval: 15s
    scrape_timeout: 5s
    metrics_path: /metrics
  
  - job_name: 'mysql'
    static_configs:
      - targets: ['mysql-server:9104']
    scrape_interval: 30s
    scrape_timeout: 10s
    metrics_path: /metrics
  
  - job_name: 'nginx'
    static_configs:
      - targets: ['nginx-server:9113']
    scrape_interval: 30s
    scrape_timeout: 10s
    metrics_path: /metrics

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093

# 远程存储配置(用于长期存储)
remote_write:
  - url: "http://remote-storage:9201/api/v1/write"
    queue_config:
      capacity: 10000
      max_samples_per_send: 1000
      batch_send_deadline: 5s
      max_shards: 200
      min_shards: 1
      max_backoff: 100ms
      min_backoff: 50ms
      retry_on_http_429: true

6.1.2 Grafana仪表板优化

{
  "dashboard": {
    "title": "AlmaLinux Performance Dashboard",
    "panels": [
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
            "legendFormat": "{{instance}}"
          }
        ],
        "thresholds": [
          {
            "value": 80,
            "color": "red",
            "op": "gt"
          }
        ],
        "alert": {
          "conditions": [
            {
              "evaluator": {
                "params": [80],
                "type": "gt"
              },
              "operator": {
                "type": "and"
              },
              "query": {
                "params": ["A", "5m", "now"]
              },
              "reducer": {
                "params": [],
                "type": "avg"
              },
              "type": "query"
            }
          ],
          "executionErrorState": "alerting",
          "frequency": "1m",
          "handler": 1,
          "name": "High CPU Usage",
          "noDataState": "no_data",
          "notifications": []
        }
      }
    ]
  }
}

6.2 日志优化

6.2.1 日志收集优化

# 安装和配置Fluentd作为日志收集器
sudo dnf install fluentd fluent-plugin-elasticsearch

# 配置Fluentd
sudo tee /etc/fluentd/fluent.conf << EOF
<source>
  @type tail
  path /var/log/nginx/access.log
  pos_file /var/log/fluentd/nginx.access.pos
  tag nginx.access
  format nginx
  time_format %d/%b/%Y:%H:%M:%S %z
</source>

<source>
  @type tail
  path /var/log/mariadb/mariadb.log
  pos_file /var/log/fluentd/mariadb.pos
  tag mariadb
  format multiline
  format_firstline /^\d{4}-\d{2}-\d{2}/
  format1 /^(?<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}) (?<level>\w+) (?<message>.*)/
  time_format %Y-%m-%d %H:%M:%S
</source>

<filter nginx.access>
  @type parser
  key_name message
  reserve_data true
  <parse>
    @type regexp
    expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)"(?: (?<request_time>[^ ]*))?)?$/
    time_format %d/%b/%Y:%H:%M:%S %z
  </parse>
</filter>

<match **>
  @type elasticsearch
  host elasticsearch
  port 9200
  logstash_format true
  logstash_prefix fluentd
  logstash_dateformat %Y%m%d
  include_tag_key true
  tag_key @log_name
  flush_interval 1s
  request_timeout 30s
  reload_connections true
  reconnect_on_error true
  reload_on_failure true
  sniffer_class_name Fluent::Plugin::ElasticsearchSimpleSniffer
  enable_ilm true
  ilm_policy_id fluentd-policy
  ilm_policy_overwrite true
</match>

七、实战案例:高并发Web应用优化

7.1 场景描述

假设我们有一个基于AlmaLinux的电商网站,面临以下挑战:

  • 日均PV:500万
  • 峰值QPS:1000
  • 数据库:MySQL 8.0
  • Web服务器:Nginx + PHP-FPM
  • 缓存:Redis集群

7.2 优化步骤

7.2.1 系统层优化

# 1. 内核参数优化
sudo tee /etc/sysctl.d/99-web-optimization.conf << EOF
# 网络优化
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30

# 内存优化
vm.swappiness = 10
vm.vfs_cache_pressure = 50
vm.max_map_count = 262144

# 文件系统优化
fs.file-max = 2097152
fs.nr_open = 2097152
EOF

sudo sysctl -p /etc/sysctl.d/99-web-optimization.conf

# 2. 用户资源限制
sudo tee /etc/security/limits.d/99-web-limits.conf << EOF
* soft nofile 65535
* hard nofile 65535
* soft nproc 65535
* hard nproc 65535
webuser soft nofile 65535
webuser hard nofile 65535
EOF

7.2.2 Nginx优化

# /etc/nginx/nginx.conf
user webuser;
worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 65535;
    use epoll;
    multi_accept on;
}

http {
    # 基础优化
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 10000;
    
    # 缓冲区优化
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 8k;
    
    # Gzip压缩
    gzip on;
    gzip_vary on;
    gzip_min_length 1024;
    gzip_proxied any;
    gzip_comp_level 6;
    gzip_types
        text/plain
        text/css
        text/xml
        text/javascript
        application/javascript
        application/xml+rss
        application/json;
    
    # 缓存配置
    open_file_cache max=10000 inactive=30s;
    open_file_cache_valid 60s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;
    
    # 日志优化
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for" '
                    'rt=$request_time uct="$upstream_connect_time" '
                    'uht="$upstream_header_time" urt="$upstream_response_time"';
    
    access_log /var/log/nginx/access.log main buffer=64k flush=5m;
    
    # 虚拟主机配置
    include /etc/nginx/conf.d/*.conf;
}

7.2.3 PHP-FPM优化

; /etc/php-fpm.d/www.conf
[www]
user = webuser
group = webuser

; 进程管理
pm = dynamic
pm.max_children = 200
pm.start_servers = 50
pm.min_spare_servers = 30
pm.max_spare_servers = 100
pm.max_requests = 1000

; 内存限制
php_admin_value[memory_limit] = 256M
php_admin_value[post_max_size] = 10M
php_admin_value[upload_max_filesize] = 10M

; 性能优化
php_admin_value[max_execution_time] = 30
php_admin_value[max_input_time] = 60
php_admin_value[realpath_cache_size] = 4096K
php_admin_value[realpath_cache_ttl] = 600

; 错误日志
php_admin_value[error_log] = /var/log/php-fpm/error.log
php_admin_value[log_errors] = on

; 进程管理
pm.process_idle_timeout = 10s
pm.status_path = /status
ping.path = /ping
ping.response = pong

7.2.4 MySQL优化

# /etc/my.cnf.d/server.cnf
[mysqld]
# 基础配置
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-error=/var/log/mariadb/mariadb.log
pid-file=/run/mariadb/mariadb.pid

# 内存优化
innodb_buffer_pool_size = 8G  # 80% of 10GB RAM
innodb_buffer_pool_instances = 8
innodb_log_file_size = 2G
innodb_log_buffer_size = 256M

# I/O优化
innodb_flush_method = O_DIRECT
innodb_flush_log_at_trx_commit = 2
innodb_file_per_table = 1
innodb_io_capacity = 4000
innodb_io_capacity_max = 8000

# 连接优化
max_connections = 500
thread_cache_size = 100
table_open_cache = 4000
table_definition_cache = 2000

# 查询优化
query_cache_type = 0  # MySQL 8.0+默认关闭
join_buffer_size = 256K
sort_buffer_size = 256K
read_buffer_size = 256K
read_rnd_buffer_size = 512K

# 日志优化
slow_query_log = 1
slow_query_log_file = /var/log/mariadb/slow.log
long_query_time = 1
log_queries_not_using_indexes = 1

# 复制优化(如果使用主从)
server_id = 1
log_bin = /var/log/mariadb/mariadb-bin
binlog_format = ROW
expire_logs_days = 7

7.2.5 Redis优化

# /etc/redis.conf 优化配置
bind 0.0.0.0
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300

# 内存优化
maxmemory 4gb
maxmemory-policy allkeys-lru
maxmemory-samples 5

# 持久化优化
save 900 1
save 300 10
save 60 10000
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /var/lib/redis

# AOF优化
appendonly yes
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb

# 性能优化
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
aof-rewrite-incremental-fsync yes

7.2.6 监控与告警

# 安装监控组件
sudo dnf install node_exporter prometheus grafana

# 配置Prometheus告警规则
sudo tee /etc/prometheus/rules/web-app.yml << EOF
groups:
- name: web-app
  rules:
  - alert: HighCPUUsage
    expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.instance }}"
      description: "CPU usage is {{ $value }}% for more than 5 minutes"
  
  - alert: HighMemoryUsage
    expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High memory usage on {{ $labels.instance }}"
      description: "Memory usage is {{ $value }}% for more than 5 minutes"
  
  - alert: HighDiskIO
    expr: rate(node_disk_io_time_seconds_total[5m]) * 100 > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High disk I/O on {{ $labels.instance }}"
      description: "Disk I/O is {{ $value }}% for more than 5 minutes"
  
  - alert: HighNetworkTraffic
    expr: rate(node_network_receive_bytes_total[5m]) + rate(node_network_transmit_bytes_total[5m]) > 100000000
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High network traffic on {{ $labels.instance }}"
      description: "Network traffic is {{ $value }} bytes/s for more than 5 minutes"
  
  - alert: MySQLSlowQueries
    expr: rate(mysql_global_status_slow_queries[5m]) > 10
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High number of slow queries in MySQL"
      description: "Slow queries rate is {{ $value }}/s for more than 5 minutes"
  
  - alert: NginxHighErrorRate
    expr: rate(nginx_http_requests_total{status=~"5.."}[5m]) / rate(nginx_http_requests_total[5m]) * 100 > 5
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "High error rate in Nginx"
      description: "Error rate is {{ $value }}% for more than 5 minutes"
EOF

八、性能优化最佳实践总结

8.1 优化原则

  1. 测量优先:在优化前建立性能基线,优化后验证效果
  2. 逐步优化:每次只调整一个参数,观察效果
  3. 理解业务:根据业务特点选择合适的优化策略
  4. 持续监控:建立完善的监控体系,及时发现性能问题
  5. 文档记录:记录所有优化配置和效果,便于回滚和复盘

8.2 常见性能问题排查流程

# 1. 快速诊断脚本
#!/bin/bash
echo "=== 系统性能快速诊断 ==="
echo "时间: $(date)"
echo ""

echo "1. CPU使用情况:"
mpstat -P ALL 1 1 | tail -n +4
echo ""

echo "2. 内存使用情况:"
free -h
echo ""

echo "3. 磁盘I/O:"
iostat -x 1 3 | tail -n +4
echo ""

echo "4. 网络连接:"
ss -s
echo ""

echo "5. 进程TOP 10 (CPU):"
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%cpu | head -11
echo ""

echo "6. 进程TOP 10 (内存):"
ps -eo pid,ppid,cmd,%mem,%cpu --sort=-%mem | head -11
echo ""

echo "7. 系统负载:"
uptime
echo ""

echo "8. 文件系统使用:"
df -h
echo ""

echo "9. 网络流量:"
ifstat -t -i eth0 1 3
echo ""

echo "10. 检查系统日志:"
journalctl -p err -n 20 --no-pager

8.3 性能优化检查清单

  • [ ] 内核参数已根据工作负载优化
  • [ ] 文件系统挂载选项已优化
  • [ ] I/O调度器已根据存储类型设置
  • [ ] 网络栈参数已优化
  • [ ] 应用服务器配置已优化
  • [ ] 数据库配置已优化
  • [ ] 缓存系统已优化
  • [ ] 监控体系已建立
  • [ ] 告警规则已配置
  • [ ] 日志系统已优化
  • [ ] 备份策略已制定
  • [ ] 性能测试已执行
  • [ ] 优化文档已记录

九、性能优化工具箱

9.1 系统级工具

# 性能分析工具
sudo dnf install perf strace ltrace systemtap

# 网络分析工具
sudo dnf install tcpdump wireshark-ng nload iftop

# 磁盘分析工具
sudo dnf install iotop ioping hdparm

# 内存分析工具
sudo dnf install smem pmap valgrind

# 进程分析工具
sudo dnf install htop atop glances

9.2 应用级工具

# Web服务器分析
sudo dnf install ngxtop goaccess

# 数据库分析
sudo dnf install percona-toolkit mytop

# 应用性能分析
sudo dnf install py-spy (for Python)
sudo dnf install jstack jmap (for Java)

9.3 监控工具

# 监控套件
sudo dnf install prometheus grafana alertmanager
sudo dnf install node_exporter mysql_exporter nginx_exporter

# 日志分析
sudo dnf install loki promtail
sudo dnf install elasticsearch logstash kibana

十、总结

AlmaLinux性能优化是一个系统工程,需要从内核、系统、应用多个层面进行综合考虑。通过本文介绍的优化策略和实战技巧,您可以:

  1. 建立性能基线:使用监控工具建立系统性能基准
  2. 系统级优化:调整内核参数、文件系统、网络栈
  3. 应用级优化:优化Web服务器、数据库、应用服务器
  4. 容器化优化:优化Docker和Kubernetes配置
  5. 监控告警:建立完善的监控体系
  6. 持续改进:通过监控数据持续优化系统性能

记住,性能优化不是一次性的工作,而是一个持续的过程。建议定期进行性能评估,根据业务变化调整优化策略,确保系统始终处于最佳状态。

最后提醒:在进行任何优化前,务必在测试环境验证,并做好备份和回滚计划。生产环境的优化需要谨慎进行,避免因优化不当导致系统不稳定。