引言:为什么MongoDB备份至关重要

在当今数据驱动的世界中,数据库备份是保障业务连续性的关键环节。MongoDB作为最流行的NoSQL数据库之一,虽然具有高可用性和容错能力,但仍然面临着数据丢失的风险。根据行业统计,超过60%的数据丢失事件源于人为错误、软件缺陷或硬件故障,而非自然灾害。因此,制定完善的备份策略对于任何使用MongoDB的企业或个人开发者来说都是必不可少的。

MongoDB的备份不仅仅是简单地复制数据文件,它涉及到对数据一致性、恢复时间目标(RTO)和恢复点目标(RPO)的综合考量。一个优秀的备份策略应该能够在最坏的情况下快速恢复数据,同时在日常运维中保持高效和可靠。

本文将从基础备份方法讲起,逐步深入到高级备份方案,涵盖各种场景下的最佳实践,帮助您构建一个全面的MongoDB备份和恢复体系。

第一部分:MongoDB基础备份方法

1.1 mongodump工具详解

mongodump是MongoDB官方提供的最基本也是最常用的备份工具。它能够以BSON格式导出数据库中的数据,适用于各种存储引擎和部署模式。

1.1.1 mongodump基本用法

# 备份整个数据库
mongodump --host localhost --port 27017 --db mydb --out /backup/mongodb/$(date +%F)

# 备份指定集合
mongodump --host localhost --port 27017 --db mydb --collection users --out /backup/mongodb/$(date +%F)

# 使用认证备份
mongodump --host localhost --port 27017 --username backupuser --password "backupPass" --authenticationDatabase admin --db mydb --out /backup/mongodb/$(date +%F)

# 压缩备份(MongoDB 3.2+)
mongodump --host localhost --port 27017 --db mydb --gzip --out /backup/mongodb/$(date +%F)

1.1.2 mongodump高级选项

# 排除某些集合(MongoDB 3.2+)
mongodump --host localhost --port 27017 --db mydb --excludeCollection=logs --excludeCollection=sessions --out /backup/mongodb/$(date +%F)

# 查询备份(只备份符合条件的文档)
mongodump --host localhost --port 27017 --db mydb --collection users --query '{ "created_at": { "$gte": { "$date": "2023-01-01T00:00:00Z" } } }' --out /backup/mongodb/$(date +%F)

# 备份到标准输出(可用于管道处理)
mongodump --host localhost --port 27017 --db mydb --collection users --out - | gzip > /backup/mongodb/users_$(date +%F).gz

1.1.3 mongodump工作原理

mongodump通过连接到MongoDB服务器,执行查询操作来获取数据。对于副本集,mongodump会从secondary节点读取数据(默认情况下),以减轻primary节点的压力。它的工作流程如下:

  1. 建立与MongoDB的连接
  2. 获取数据库和集合的元数据
  3. 读取数据并导出为BSON格式
  4. 同时导出索引定义(如果指定)

1.1.4 mongodump的优缺点

优点:

  • 跨平台兼容性好
  • 支持选择性备份
  • 支持压缩
  • 操作简单直观

缺点:

  • 备份速度相对较慢(需要遍历所有数据)
  • 恢复时需要重建索引
  • 对于大型数据库,资源消耗较大
  • 不支持增量备份

1.2 文件系统快照备份

文件系统快照是另一种基础备份方法,它利用操作系统的快照功能(如LVM、ZFS、EBS快照等)来备份整个数据目录。

1.2.1 LVM快照备份示例

# 1. 确保MongoDB使用单独的LVM卷
# 假设MongoDB数据目录在 /dev/mongodb_vg/mongodb_lv

# 2. 创建LVM快照(需要先锁库或确保数据一致性)
mongod --dbpath /var/lib/mongodb --shutdown
lvcreate --size 10G --snapshot --name mongodb_snap /dev/mongodb_vg/mongodb_lv

# 3. 重新启动MongoDB
systemctl start mongod

# 4. 挂载快照并复制数据
mkdir /mnt/mongodb_snap
mount /dev/mongodb_vg/mongodb_snap /mnt/mongodb_snap
rsync -av /mnt/mongodb_snap/ /backup/mongodb/$(date +%F)/

# 5. 卸载并删除快照
umount /mnt/mongodb_snap
lvremove /dev/mongodb_vg/mongodb_snap

1.2.2 AWS EBS快照备份示例

# 1. 获取MongoDB数据卷的Volume ID
aws ec2 describe-instances --instance-ids i-1234567890abcdef0 --query "Reservations[0].Instances[0].BlockDeviceMappings[?DeviceName=='/dev/sdf'].Ebs.VolumeId" --output text

# 2. 创建EBS快照
aws ec2 create-snapshot --volume-id vol-0123456789abcdef0 --description "MongoDB Backup $(date +%F)"

# 3. 等待快照完成
aws ec2 describe-snapshots --snapshot-ids snap-0123456789abcdef0 --query "Snapshots[0].State"

# 4. 为快照添加标签(便于管理)
aws ec2 create-tags --resources snap-0123456789abcdef0 --tags Key=Name,Value=MongoDB-Backup Key=Date,Value=$(date +%F)

1.2.3 文件系统快照的优缺点

优点:

  • 备份速度极快(几乎瞬间完成)
  • 对生产系统影响最小
  • 恢复速度快
  • 包含所有数据和索引

缺点:

  • 需要特定的文件系统支持
  • 通常需要停机或锁库才能保证一致性
  • 备份粒度较粗(整个数据目录)
  • 跨平台恢复可能有问题

1.3 mongorestore恢复工具

mongorestore是与mongodump配对的恢复工具,用于将BSON格式的备份数据导入到MongoDB中。

1.3.1 mongorestore基本用法

# 恢复整个数据库
mongorestore --host localhost --port 27017 --db mydb /backup/mongodb/2023-10-01/mydb/

# 恢复指定集合
mongorestore --host localhost --port 27017 --db mydb --collection users /backup/mongodb/2023-10-01/mydb/users.bson

# 使用认证恢复
mongorestore --host localhost --port 27017 --username restoreuser --password "restorePass" --authenticationDatabase admin --db mydb /backup/mongodb/2023-10-01/mydb/

# 恢复时删除原有数据(谨慎使用)
mongorestore --host localhost --port 27017 --db mydb --drop /backup/mongodb/2023-10-01/mydb/

# 恢复压缩的备份
mongorestore --host localhost --port 27017 --db mydb --gzip /backup/mongodb/2023-10-01/mydb/

1.3.2 mongorestore高级选项

# 恢复时重命名数据库
mongorestore --host localhost --port 27017 --db newdb --nsFrom 'mydb.*' --nsTo 'newdb.*' /backup/mongodb/2023-10-01/mydb/

# 并行恢复(MongoDB 3.4+)
mongorestore --host localhost --port 27017 --db mydb --numInsertionWorkersPerCollection=4 /backup/mongodb/2023-10-01/mydb/

# 恢复索引(默认会恢复)
mongorestore --host localhost --port 27017 --db mydb --noIndexRestore /backup/mongodb/2023-10-01/mydb/  # 跳过索引恢复,后续手动创建

1.3.3 恢复过程中的注意事项

  1. 版本兼容性:通常建议使用与备份时相同或更高版本的mongorestore
  2. 存储空间:恢复时需要足够的磁盘空间
  3. 索引重建:恢复后索引需要重建,可能耗时较长
  4. 数据一致性:对于副本集,建议在secondary节点恢复然后同步
  5. 权限问题:确保目标数据库有足够的权限

第二部分:MongoDB高级备份策略

2.1 副本集备份策略

MongoDB副本集提供了天然的备份优势,可以从secondary节点进行备份,避免影响primary节点的性能。

2.1.1 从secondary节点备份

# 1. 选择一个延迟secondary节点(如果有)
# 或者临时将一个secondary节点设为隐藏节点

# 2. 从secondary节点执行mongodump
mongodump --host secondary_host --port 27017 --db mydb --out /backup/mongodb/$(date +%F)

# 3. 如果secondary节点没有延迟,可以临时停止同步(谨慎使用)
rs.secondaryOk()  # 在mongo shell中允许从secondary读取
# 然后执行备份

2.1.2 使用延迟节点进行备份

// 在MongoDB shell中配置延迟secondary节点
cfg = rs.conf()
cfg.members[2].priority = 0  // 降低优先级
cfg.members[2].hidden = true // 设为隐藏节点
cfg.members[2].slaveDelay = 3600 // 延迟1小时
rs.reconfig(cfg)

// 然后从这个延迟节点备份
// 这样即使发生误删除操作,也有1小时的时间窗口来恢复

2.1.3 副本集备份脚本示例

#!/bin/bash
# MongoDB副本集备份脚本

# 配置
BACKUP_DIR="/backup/mongodb"
DATE=$(date +%F_%H%M)
REPLSET="rs0"
PRIMARY="mongodb-primary.example.com:27017"
SECONDARY="mongodb-secondary.example.com:27017"
DB_NAME="mydb"
USER="backupuser"
PASS="backuppass"

# 创建备份目录
mkdir -p $BACKUP_DIR/$DATE

# 检查哪个节点是secondary
PRIMARY_NODE=$(mongo --quiet --eval "db.isMaster().primary" $PRIMARY)
SECONDARY_NODE=$(mongo --quiet --eval "rs.isMaster().ismaster ? 'none' : rs.isMaster().primary" $SECONDARY)

# 从secondary节点备份
if [ "$SECONDARY_NODE" != "none" ]; then
    echo "从secondary节点备份: $SECONDARY"
    mongodump --host $SECONDARY --username $USER --password $PASS --authenticationDatabase admin --db $DB_NAME --out $BACKUP_DIR/$DATE
else
    echo "没有可用的secondary节点,从primary备份"
    mongodump --host $PRIMARY --username $USER --password $PASS --authenticationDatabase admin --db $DB_NAME --out $BACKUP_DIR/$DATE
fi

# 压缩备份
tar -czf $BACKUP_DIR/mongodb_backup_$DATE.tar.gz -C $BACKUP_DIR $DATE
rm -rf $BACKUP_DIR/$DATE

# 清理旧备份(保留最近7天)
find $BACKUP_DIR -name "mongodb_backup_*.tar.gz" -mtime +7 -delete

echo "备份完成: $BACKUP_DIR/mongodb_backup_$DATE.tar.gz"

2.2 分片集群备份策略

分片集群的备份需要协调多个分片和配置服务器,确保所有组件的数据一致性。

2.2.1 分片集群备份步骤

# 1. 锁定所有分片的primary节点(或使用均衡器状态)
# 2. 从每个分片的secondary节点备份
# 3. 备份配置服务器
# 4. 记录balancer状态
# 5. 解锁节点

# 具体步骤示例:

# 步骤1:停止均衡器
mongos --eval "sh.stopBalancer()"

# 步骤2:等待正在进行的迁移完成
mongos --eval "while(db.adminCommand({balancerStatus:1}).inFlightMigrationCount > 0) { sleep(1000) }"

# 步骤3:备份每个分片
for shard in shard1 shard2 shard3; do
    mongodump --host ${shard}-secondary --port 27017 --db mydb --out /backup/mongodb/shards/$shard_$(date +%F)
done

# 步骤4:备份配置服务器
mongodump --host config-secondary --port 27017 --db config --out /backup/mongodb/config_$(date +%F)

# 步骤5:备份mongos元数据(如果需要)
# 通常只需要备份config数据库即可

# 步骤6:重新启动均衡器
mongos --eval "sh.startBalancer()"

2.2.2 分片集群恢复步骤

# 1. 停止所有mongos进程
# 2. 停止所有分片和配置服务器
# 3. 清空所有数据目录
# 4. 恢复每个分片
for shard in shard1 shard2 shard3; do
    mongorestore --host ${shard}-primary --port 27017 --db mydb /backup/mongodb/shards/${shard}_$(date +%F)/mydb
done

# 5. 恢复配置服务器
mongorestore --host config-primary --port 27017 --db config /backup/mongodb/config_$(date +%F)/config

# 6. 重新启动所有服务
# 7. 启动mongos
# 8. 启动均衡器

2.3 增量备份方案

MongoDB本身不提供原生的增量备份功能,但可以通过一些技巧实现。

2.3.1 基于oplog的增量备份

#!/bin/bash
# 基于oplog的增量备份

# 配置
BACKUP_DIR="/backup/mongodb/incremental"
OPLOG_SIZE=50  # GB
LAST_BACKUP_FILE="$BACKUP_DIR/last_backup.txt"

# 获取上次备份的oplog时间戳
if [ -f "$LAST_BACKUP_FILE" ]; then
    LAST_TS=$(cat $LAST_BACKUP_FILE)
else
    # 第一次全量备份
    echo "执行全量备份..."
    mongodump --host mongodb-primary --port 27017 --db mydb --out $BACKUP_DIR/full_$(date +%F)
    # 记录当前oplog时间戳
    mongo --quiet --eval "db.adminCommand({getLog:'oplog'}).logs[0].lastWriteOpDate.t" > $LAST_TS
    exit 0
fi

# 备份从上次到现在的oplog
mongodump --host mongodb-primary --port 27017 --db local --collection oplog.rs --query '{ "ts": { "$gt": Timestamp($LAST_TS, 0) } }' --out $BACKUP_DIR/oplog_$(date +%F_%H%M)

# 更新时间戳
mongo --quiet --eval "db.adminCommand({getLog:'oplog'}).logs[0].lastWriteOpDate.t" > $LAST_BACKUP_FILE

2.3.2 使用WiredTiger快照实现增量备份

#!/bin/bash
# WiredTiger增量备份(MongoDB 3.2+)

BACKUP_DIR="/backup/mongodb/wt_incremental"
FULL_BACKUP_DIR="$BACKUP_DIR/full"
INC_BACKUP_DIR="$BACKUP_DIR/incremental"

# 检查是否需要全量备份
if [ ! -d "$FULL_BACKUP_DIR" ]; then
    echo "执行全量备份..."
    mkdir -p $FULL_BACKUP_DIR
    
    # 创建WiredTiger快照
    mongod --dbpath /var/lib/mongodb --shutdown
    cp -r /var/lib/mongodb/WiredTiger* $FULL_BACKUP_DIR/
    cp -r /var/lib/mongodb/diagnostic.data $FULL_BACKUP_DIR/
    
    # 启动MongoDB
    systemctl start mongod
    
    # 记录快照ID
    mongo --quiet --eval "db.adminCommand({createBackup:1}).backupId" > $FULL_BACKUP_DIR/backup_id.txt
else
    echo "执行增量备份..."
    mkdir -p $INC_BACKUP_DIR/$(date +%F_%H%M)
    
    # WiredTiger增量备份需要特定的配置
    # 这里使用文件系统级别的增量备份作为示例
    rsync -av --link-dest=$FULL_BACKUP_DIR /var/lib/mongodb/ $INC_BACKUP_DIR/$(date +%F_%H%M)/
fi

2.4 云原生备份方案

2.4.1 MongoDB Atlas备份

MongoDB Atlas提供了自动化的备份服务:

// Atlas API 创建快照
const axios = require('axios');

const clusterName = 'myCluster';
const groupId = 'your-group-id';
const snapshotName = `manual-snapshot-${new Date().toISOString()}`;

axios.post(
    `https://cloud.mongodb.com/api/atlas/v1.0/groups/${groupId}/clusters/${clusterName}/backup/snapshots`,
    {
        "snapshotType": "manual",
        "retentionDays": 7,
        "description": `Manual backup for ${snapshotName}`
    },
    {
        auth: {
            username: 'your-api-key',
            password: 'your-api-secret'
        }
    }
).then(response => {
    console.log('Snapshot created:', response.data.id);
}).catch(error => {
    console.error('Error:', error.response.data);
});

2.4.2 AWS DocumentDB备份

# AWS DocumentDB自动备份
aws docdb create-db-cluster-snapshot \
    --db-cluster-identifier my-docdb-cluster \
    --db-cluster-snapshot-identifier docdb-snapshot-$(date +%F)

# 从快照恢复
aws docdb restore-db-cluster-from-snapshot \
    --db-cluster-identifier my-docdb-cluster-restored \
    --db-cluster-snapshot-identifier docdb-snapshot-2023-10-01 \
    --db-instance-class db.r5.large \
    --vpc-security-group-ids sg-0123456789abcdef0

第三部分:备份自动化与监控

3.1 自动化备份脚本

3.1.1 完整的自动化备份脚本

#!/bin/bash
# MongoDB自动化备份脚本(支持副本集和分片集群)

set -e

# 配置
BACKUP_BASE="/backup/mongodb"
DATE=$(date +%F_%H%M)
LOG_FILE="/var/log/mongodb_backup.log"
RETENTION_DAYS=7
COMPRESS=true
NOTIFY_EMAIL="admin@example.com"

# MongoDB连接配置
MONGO_HOST="mongodb-primary.example.com"
MONGO_PORT="27017"
MONGO_USER="backupuser"
MONGO_PASS="backuppass"
AUTH_DB="admin"

# 日志函数
log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a $LOG_FILE
}

# 错误处理
error_exit() {
    log "ERROR: $1"
    echo "MongoDB backup failed on $(hostname)" | mail -s "MongoDB Backup Failed" $NOTIFY_EMAIL
    exit 1
}

# 检查MongoDB连接
check_mongodb() {
    if ! mongo --host $MONGO_HOST --port $MONGO_PORT --username $MONGO_USER --password $MONGO_PASS --authenticationDatabase $AUTH_DB --eval "db.adminCommand('ping')" > /dev/null 2>&1; then
        error_exit "Cannot connect to MongoDB"
    fi
}

# 检查磁盘空间
check_disk_space() {
    local required_gb=10
    local available_gb=$(df -BG $BACKUP_BASE | awk 'NR==2 {print $4}' | sed 's/G//')
    if [ $available_gb -lt $required_gb ]; then
        error_exit "Insufficient disk space. Required: ${required_gb}GB, Available: ${available_gb}GB"
    fi
}

# 执行备份
perform_backup() {
    local backup_dir="$BACKUP_BASE/$DATE"
    mkdir -p $backup_dir
    
    log "Starting backup to $backup_dir"
    
    # 检查是否为副本集
    is_replset=$(mongo --quiet --host $MONGO_HOST --port $MONGO_PORT --username $MONGO_USER --password $MONGO_PASS --authenticationDatabase $AUTH_DB --eval "db.isMaster().replSet" 2>/dev/null)
    
    if [ -n "$is_replset" ]; then
        log "Detected replica set: $is_replset"
        # 从secondary备份
        SECONDARY=$(mongo --quiet --host $MONGO_HOST --port $MONGO_PORT --username $MONGO_USER --password $MONGO_PASS --authenticationDatabase $AUTH_DB --eval "rs.isMaster().ismaster ? 'none' : rs.isMaster().primary" 2>/dev/null)
        if [ "$SECONDARY" != "none" ]; then
            MONGO_HOST=$SECONDARY
            log "Using secondary: $SECONDARY"
        fi
    fi
    
    # 执行mongodump
    if mongodump --host $MONGO_HOST --port $MONGO_PORT --username $MONGO_USER --password $MONGO_PASS --authenticationDatabase $AUTH_DB --out $backup_dir 2>> $LOG_FILE; then
        log "Backup completed successfully"
    else
        error_exit "mongodump failed"
    fi
    
    # 压缩
    if [ "$COMPRESS" = true ]; then
        log "Compressing backup..."
        tar -czf $BACKUP_BASE/mongodb_backup_$DATE.tar.gz -C $BACKUP_BASE $DATE
        rm -rf $backup_dir
        log "Compression completed"
    fi
}

# 清理旧备份
cleanup_old_backups() {
    log "Cleaning up backups older than $RETENTION_DAYS days"
    find $BACKUP_BASE -name "mongodb_backup_*.tar.gz" -mtime +$RETENTION_DAYS -delete
    log "Cleanup completed"
}

# 主执行流程
main() {
    log "=== MongoDB Backup Started ==="
    
    check_mongodb
    check_disk_space
    perform_backup
    cleanup_old_backups
    
    log "=== MongoDB Backup Completed Successfully ==="
    
    # 发送成功通知
    echo "MongoDB backup completed successfully on $(hostname) at $(date)" | mail -s "MongoDB Backup Success" $NOTIFY_EMAIL
}

# 运行主函数
main "$@"

3.1.2 使用systemd定时任务

# /etc/systemd/system/mongodb-backup.service
[Unit]
Description=MongoDB Backup Service
After=mongod.service

[Service]
Type=oneshot
User=backupuser
Group=backupuser
ExecStart=/usr/local/bin/mongodb_backup.sh

# /etc/systemd/system/mongodb-backup.timer
[Unit]
Description=MongoDB Backup Timer
Requires=mongodb-backup.service

[Timer]
OnCalendar=daily
Persistent=true

[Install]
WantedBy=timers.target

# 启用定时任务
systemctl enable mongodb-backup.timer
systemctl start mongodb-backup.timer

3.2 备份监控与告警

3.2.1 监控脚本示例

#!/bin/bash
# MongoDB备份监控脚本

BACKUP_DIR="/backup/mongodb"
LOG_FILE="/var/log/mongodb_backup.log"
LAST_BACKUP_FILE="$BACKUP_DIR/last_backup.txt"
MAX_AGE_HOURS=26  # 允许最多26小时(考虑到备份可能在不同时间运行)

# 检查最后备份时间
if [ ! -f "$LAST_BACKUP_FILE" ]; then
    echo "CRITICAL: No backup found"
    exit 2
fi

LAST_BACKUP=$(cat $LAST_BACKUP_FILE)
CURRENT_TIME=$(date +%s)
LAST_BACKUP_TIME=$(date -d "$LAST_BACKUP" +%s 2>/dev/null || echo 0)

if [ $LAST_BACKUP_TIME -eq 0 ]; then
    echo "CRITICAL: Invalid backup timestamp"
    exit 2
fi

AGE_HOURS=$(( (CURRENT_TIME - LAST_BACKUP_TIME) / 3600 ))

if [ $AGE_HOURS -gt $MAX_AGE_HOURS ]; then
    echo "CRITICAL: Backup is $AGE_HOURS hours old (max: $MAX_AGE_HOURS)"
    exit 2
fi

# 检查备份文件完整性
LATEST_BACKUP=$(ls -t $BACKUP_DIR/mongodb_backup_*.tar.gz 2>/dev/null | head -1)
if [ -z "$LATEST_BACKUP" ]; then
    echo "CRITICAL: No backup files found"
    exit 2
fi

# 验证备份文件
if ! tar -tzf $LATEST_BACKUP > /dev/null 2>&1; then
    echo "CRITICAL: Backup file is corrupt"
    exit 2
fi

echo "OK: Backup is $AGE_HOURS hours old"
exit 0

3.2.2 集成到监控系统(如Prometheus)

# Python脚本导出备份指标到Prometheus
#!/usr/bin/env python3
import time
import subprocess
import os
from prometheus_client import start_http_server, Gauge, Counter

# 定义指标
backup_age = Gauge('mongodb_backup_age_seconds', 'Age of the last backup')
backup_size = Gauge('mongodb_backup_size_bytes', 'Size of the last backup')
backup_last_success = Gauge('mongodb_backup_last_success_timestamp', 'Timestamp of last successful backup')
backup_failures = Counter('mongodb_backup_failures_total', 'Total number of backup failures')

def collect_metrics():
    backup_dir = "/backup/mongodb"
    last_backup_file = os.path.join(backup_dir, "last_backup.txt")
    
    # 读取最后备份时间
    if os.path.exists(last_backup_file):
        with open(last_backup_file, 'r') as f:
            last_backup_time = float(f.read().strip())
            current_time = time.time()
            backup_age.set(current_time - last_backup_time)
            backup_last_success.set(last_backup_time)
    
    # 获取最新备份文件大小
    latest_backup = subprocess.run(
        ["ls", "-t", f"{backup_dir}/mongodb_backup_*.tar.gz"],
        capture_output=True, text=True
    ).stdout.strip().split('\n')[0]
    
    if latest_backup:
        size = os.path.getsize(latest_backup)
        backup_size.set(size)

if __name__ == '__main__':
    start_http_server(9101)
    while True:
        try:
            collect_metrics()
        except Exception as e:
            backup_failures.inc()
            print(f"Error collecting metrics: {e}")
        time.sleep(60)

第四部分:备份验证与恢复测试

4.1 备份验证策略

4.1.1 自动化备份验证脚本

#!/bin/bash
# MongoDB备份验证脚本

BACKUP_DIR="/backup/mongodb"
TEST_DB="backup_test_$(date +%F)"
TEST_HOST="localhost"
TEST_PORT="27018"  # 使用不同端口避免冲突

# 1. 选择最新备份
LATEST_BACKUP=$(ls -t $BACKUP_DIR/mongodb_backup_*.tar.gz 2>/dev/null | head -1)
if [ -z "$LATEST_BACKUP" ]; then
    echo "No backup found"
    exit 1
fi

echo "Verifying backup: $LATEST_BACKUP"

# 2. 创建临时MongoDB实例用于测试
TEMP_DBPATH="/tmp/mongodb_test_$$"
mkdir -p $TEMP_DBPATH

# 3. 启动临时MongoDB实例
mongod --dbpath $TEMP_DBPATH --port $TEST_PORT --fork --logpath $TEMP_DBPATH/mongod.log

# 4. 恢复备份到临时实例
tar -xzf $LATEST_BACKUP -C /tmp/
BACKUP_CONTENT=$(tar -tzf $LATEST_BACKUP | head -1 | cut -d/ -f1)
mongorestore --host localhost --port $TEST_PORT /tmp/$BACKUP_CONTENT/

# 5. 验证数据完整性
mongo --host localhost --port $TEST_PORT --eval "
db.adminCommand({listDatabases:1}).databases.forEach(function(db) {
    if (db.name != 'admin' && db.name != 'local' && db.name != 'config') {
        var stats = db.getSiblingDB(db.name).stats();
        print('Database: ' + db.name + ', Collections: ' + stats.collections + ', Objects: ' + stats.objects);
    }
})
"

# 6. 验证关键集合
KEY_COLLECTIONS=("users" "orders" "products")
for coll in "${KEY_COLLECTIONS[@]}"; do
    COUNT=$(mongo --quiet --host localhost --port $TEST_PORT --eval "db.getSiblingDB('$TEST_DB').$coll.count()")
    if [ "$COUNT" -gt 0 ]; then
        echo "✓ Collection $coll has $COUNT documents"
    else
        echo "✗ Collection $coll is empty or missing"
    fi
done

# 7. 清理
mongod --dbpath $TEMP_DBPATH --port $TEST_PORT --shutdown
rm -rf $TEMP_DBPATH /tmp/$BACKUP_CONTENT

echo "Backup verification completed"

4.1.2 备份完整性检查

#!/bin/bash
# 检查备份文件完整性

BACKUP_DIR="/backup/mongodb"
LOG_FILE="/var/log/mongodb_backup_check.log"

log() {
    echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a $LOG_FILE
}

check_backup() {
    local backup_file=$1
    log "Checking $backup_file"
    
    # 检查文件是否存在
    if [ ! -f "$backup_file" ]; then
        log "ERROR: File not found: $backup_file"
        return 1
    fi
    
    # 检查文件大小(最小1MB)
    local size=$(stat -c%s "$backup_file")
    if [ $size -lt 1048576 ]; then
        log "ERROR: File too small: $backup_file ($size bytes)"
        return 1
    fi
    
    # 检查压缩文件
    if [[ $backup_file == *.tar.gz ]]; then
        if ! tar -tzf "$backup_file" > /dev/null 2>&1; then
            log "ERROR: Corrupted archive: $backup_file"
            return 1
        fi
        
        # 检查是否包含BSON文件
        if ! tar -tzf "$backup_file" | grep -q "\.bson$"; then
            log "ERROR: No BSON files found in: $backup_file"
            return 1
        fi
    fi
    
    log "OK: $backup_file is valid"
    return 0
}

# 检查所有备份
for backup in $(ls -t $BACKUP_DIR/mongodb_backup_*.tar.gz 2>/dev/null); do
    check_backup "$backup"
done

4.2 恢复测试流程

4.2.1 定期恢复测试脚本

#!/bin/bash
# 定期恢复测试脚本

# 配置
BACKUP_DIR="/backup/mongodb"
TEST_ENV_DIR="/tmp/mongodb_restore_test"
TEST_PORT="27019"
TEST_DB="restore_test"
RESTORE_DATE="2023-10-01"  # 可以改为自动选择最近备份

# 创建测试环境
mkdir -p $TEST_ENV_DIR
cd $TEST_ENV_DIR

# 选择备份
BACKUP_FILE="$BACKUP_DIR/mongodb_backup_${RESTORE_DATE}_*.tar.gz"
LATEST_BACKUP=$(ls -t $BACKUP_FILE 2>/dev/null | head -1)

if [ -z "$LATEST_BACKUP" ]; then
    echo "No backup found for $RESTORE_DATE"
    exit 1
fi

echo "Testing restore from: $LATEST_BACKUP"

# 解压备份
tar -xzf $LATEST_BACKUP
BACKUP_CONTENT=$(tar -tzf $LATEST_BACKUP | head -1 | cut -d/ -f1)

# 启动临时MongoDB
mongod --dbpath $TEST_ENV_DIR/data --port $TEST_PORT --fork --logpath $TEST_ENV_DIR/mongod.log

# 等待启动
sleep 5

# 恢复数据
mongorestore --host localhost --port $TEST_PORT $TEST_ENV_DIR/$BACKUP_CONTENT/

# 执行验证查询
echo "Running validation queries..."
mongo --host localhost --port $TEST_PORT --eval "
// 检查数据库列表
var dbs = db.adminCommand({listDatabases:1}).databases.map(d => d.name);
print('Databases: ' + dbs.join(', '));

// 检查每个数据库的集合
dbs.forEach(function(dbName) {
    if (dbName != 'admin' && dbName != 'local' && dbName != 'config') {
        var collections = db.getSiblingDB(dbName).getCollectionNames();
        print(dbName + ' collections: ' + collections.length);
        
        // 检查文档数量
        collections.forEach(function(coll) {
            if (!coll.startsWith('system.')) {
                var count = db.getSiblingDB(dbName)[coll].count();
                print('  ' + dbName + '.' + coll + ': ' + count + ' docs');
            }
        });
    }
});
"

# 性能测试
echo "Performance test..."
time mongo --host localhost --port $TEST_PORT --eval "db.getSiblingDB('$TEST_DB').users.find().limit(1000).itcount()"

# 清理
mongod --dbpath $TEST_ENV_DIR/data --port $TEST_PORT --shutdown
rm -rf $TEST_ENV_DIR

echo "Restore test completed successfully"

4.2.2 灾难恢复演练文档模板

# MongoDB灾难恢复演练报告

## 演练日期
2023-10-01

## 演练场景
- 场景1:单个集合误删除
- 场景2:整个数据库误删除
- 场景3:服务器硬件故障

## 演练步骤

### 场景1:单个集合误删除
1. 模拟误删除:`db.users.drop()`
2. 从备份恢复:`mongorestore --db mydb --collection users /backup/.../users.bson`
3. 验证数据:`db.users.count()` 应等于备份时的数量
4. 结果:✓ 成功,耗时2分钟

### 场景2:整个数据库误删除
1. 模拟误删除:`db.dropDatabase()`
2. 从备份恢复:`mongorestore --db mydb /backup/.../mydb/`
3. 验证数据:检查所有集合和索引
4. 结果:✓ 成功,耗时15分钟

### 场景3:服务器硬件故障
1. 模拟故障:停止MongoDB服务,删除数据目录
2. 从备份恢复:复制备份文件到新服务器
3. 启动MongoDB并恢复数据
4. 验证副本集状态
5. 结果:✓ 成功,耗时45分钟

## 演练结果总结
- 平均恢复时间:20分钟
- 数据完整性:100%
- 需要改进的地方:
  1. 备份验证频率应提高
  2. 需要更详细的恢复文档
  3. 自动化程度需要提升

## 改进计划
1. 每周执行一次备份验证
2. 每月执行一次完整恢复演练
3. 开发一键恢复脚本

第五部分:备份安全与最佳实践

5.1 备份安全策略

5.1.1 备份加密

#!/bin/bash
# 加密备份脚本

BACKUP_DIR="/backup/mongodb"
ENCRYPTED_DIR="/backup/mongodb/encrypted"
GPG_KEY="backup-key@example.com"

# 创建加密备份
encrypt_backup() {
    local backup_file=$1
    local encrypted_file="$ENCRYPTED_DIR/$(basename $backup_file).gpg"
    
    # 使用GPG加密
    gpg --encrypt --recipient $GPG_KEY --output $encrypted_file $backup_file
    
    # 验证加密文件
    if gpg --verify $encrypted_file 2>/dev/null; then
        echo "Encryption successful: $encrypted_file"
        # 删除原始文件(可选)
        # rm $backup_file
    else
        echo "Encryption failed"
        exit 1
    fi
}

# 解密备份
decrypt_backup() {
    local encrypted_file=$1
    local output_dir=$2
    
    gpg --decrypt --output - $encrypted_file | tar -xzf - -C $output_dir
}

# 主流程
for backup in $(ls -t $BACKUP_DIR/mongodb_backup_*.tar.gz); do
    encrypt_backup "$backup"
done

5.1.2 备份访问控制

# 创建专用备份用户
use admin
db.createUser({
    user: "backupuser",
    pwd: "strongpassword",
    roles: [
        { role: "backup", db: "admin" },
        { role: "clusterMonitor", db: "admin" },
        { role: "readAnyDatabase", db: "admin" }
    ]
})

# 限制备份用户只能从特定IP访问
db.updateUser("backupuser", {
    roles: [
        { role: "backup", db: "admin" },
        { role: "clusterMonitor", db: "admin" },
        { role: "readAnyDatabase", db: "admin" }
    ],
    authenticationRestrictions: [{
        clientSource: ["10.0.0.0/8", "192.168.0.0/16"],
        serverAddress: ["0.0.0.0/0"]
    }]
})

5.2 备份最佳实践

5.2.1 备份策略矩阵

数据规模 备份频率 保留周期 备份方法 恢复时间目标
< 10GB 每日 7天 mongodump + 压缩 < 30分钟
10-100GB 每日 7天 副本集secondary备份 < 1小时
100GB-1TB 每日 + 增量 14天 文件系统快照 + oplog < 2小时
> 1TB 每日 + 连续增量 30天 云快照 + 复制集备份 < 4小时

5.2.2 备份检查清单

# MongoDB备份检查清单

## 日常检查
- [ ] 备份任务是否成功完成
- [ ] 备份文件大小是否正常(无异常增长或缩小)
- [ ] 备份目录磁盘空间是否充足
- [ ] 备份日志是否有错误信息

## 每周检查
- [ ] 验证至少一个备份文件的完整性
- [ ] 检查备份保留策略执行情况
- [ ] 审查备份访问日志
- [ ] 测试备份文件的可恢复性

## 每月检查
- [ ] 执行完整恢复测试
- [ ] 审查备份策略的有效性
- [ ] 更新备份文档
- [ ] 检查备份加密密钥状态

## 每季度检查
- [ ] 灾难恢复演练
- [ ] 备份基础设施健康检查
- [ ] 备份成本分析
- [ ] 备份策略优化

## 年度检查
- [ ] 全面备份审计
- [ ] 备份合规性检查
- [ ] 备份架构评估
- [ ] 备份预算规划

第六部分:常见问题与解决方案

6.1 备份失败常见原因

6.1.1 网络问题导致备份失败

问题描述:备份过程中连接中断或超时

解决方案

# 增加超时时间
mongodump --host $HOST --port $PORT --username $USER --password $PASS \
  --authenticationDatabase admin --db $DB \
  --out $BACKUP_DIR \
  --timeout=600000  # 10分钟超时

# 使用nohup防止终端断开
nohup mongodump --host $HOST --port $PORT --username $USER --password $PASS \
  --authenticationDatabase admin --db $DB \
  --out $BACKUP_DIR > /var/log/mongodb_backup.log 2>&1 &

6.1.2 磁盘空间不足

问题描述:备份过程中磁盘空间耗尽

解决方案

#!/bin/bash
# 带空间检查的备份脚本

BACKUP_DIR="/backup/mongodb"
MIN_SPACE_GB=50  # 最小需要50GB空间

# 检查可用空间
AVAILABLE_GB=$(df -BG $BACKUP_DIR | awk 'NR==2 {print $4}' | sed 's/G//')
if [ $AVAILABLE_GB -lt $MIN_SPACE_GB ]; then
    echo "ERROR: Insufficient disk space. Available: ${AVAILABLE_GB}GB, Required: ${MIN_SPACE_GB}GB"
    
    # 尝试清理旧备份
    find $BACKUP_DIR -name "*.tar.gz" -mtime +7 -delete
    
    # 再次检查
    AVAILABLE_GB=$(df -BG $BACKUP_DIR | awk 'NR==2 {print $4}' | sed 's/G//')
    if [ $AVAILABLE_GB -lt $MIN_SPACE_GB ]; then
        echo "CRITICAL: Still insufficient space after cleanup"
        exit 1
    fi
fi

# 继续备份...

6.1.3 权限问题

问题描述:备份用户权限不足

解决方案

// 检查并修复备份用户权限
use admin

// 查看当前用户权限
db.getUser("backupuser")

// 重新授权(MongoDB 3.4+)
db.grantRolesToUser("backupuser", [
    { role: "backup", db: "admin" },
    { role: "clusterMonitor", db: "admin" },
    { role: "readAnyDatabase", db: "admin" }
])

// 对于分片集群,还需要额外权限
db.grantRolesToUser("backupuser", [
    { role: "read", db: "config" }
])

6.2 恢复失败常见原因

6.2.1 版本不兼容

问题描述:使用旧版本的mongorestore恢复新版本MongoDB的备份

解决方案

# 检查版本兼容性
mongodump --version
mongorestore --version

# 如果版本不匹配,建议升级mongorestore或使用相同版本
# 或者使用--nsInclude只恢复特定命名空间
mongorestore --nsInclude 'mydb.users' --nsInclude 'mydb.orders' /backup/.../mydb/

6.2.2 索引重建失败

问题描述:恢复时索引创建失败,导致查询性能下降

解决方案

# 方案1:先恢复数据,后手动创建索引
mongorestore --noIndexRestore /backup/.../mydb/

# 然后在MongoDB shell中手动创建索引
mongo --eval "
db.users.createIndex({email:1}, {unique:true})
db.orders.createIndex({created_at:1})
"

# 方案2:检查索引定义文件
# 如果索引定义文件损坏,可以从应用代码重新生成

6.2.3 数据冲突(副本集)

问题描述:在副本集中恢复数据时出现冲突

解决方案

# 1. 停止副本集同步
rs.stop()

# 2. 从备份恢复
mongorestore --host secondary1 --port 27017 /backup/.../mydb/

# 3. 重置oplog
rs.reconfig({force: true})

# 4. 重新启动副本集
rs.start()

第七部分:高级主题 - 备份即代码

7.1 使用Terraform管理备份基础设施

# main.tf
provider "aws" {
  region = "us-east-1"
}

# EBS卷用于MongoDB数据
resource "aws_ebs_volume" "mongodb_data" {
  availability_zone = "us-east-1a"
  size              = 100
  type              = "gp3"
  iops              = 3000
  throughput        = 125

  tags = {
    Name = "mongodb-data-volume"
    Environment = "production"
  }
}

# 自动快照策略
resource "aws_backup_plan" "mongodb_backup" {
  name = "mongodb-backup-plan"

  rule {
    rule_name         = "daily-backup"
    target_vault_name = aws_backup_vault.mongodb_vault.name
    schedule          = "cron(0 2 * * ? *)"
    start_window      = 60
    completion_window = 180

    lifecycle {
      cold_storage_after = 30
      delete_after       = 365
    }

    recovery_point_tags = {
      Environment = "production"
      Database    = "mongodb"
    }
  }

  advanced_backup_setting {
    backup_options = {
      WindowsVSS = "enabled"
    }
    resource_type = "EC2"
  }
}

resource "aws_backup_vault" "mongodb_vault" {
  name        = "mongodb-backup-vault"
  kms_key_arn = aws_kms_key.mongodb_backup.arn

  tags = {
    Environment = "production"
  }
}

resource "aws_kms_key" "mongodb_backup" {
  description             = "KMS key for MongoDB backup encryption"
  deletion_window_in_days = 7

  tags = {
    Environment = "production"
  }
}

# IAM角色用于备份
resource "aws_iam_role" "mongodb_backup_role" {
  name = "mongodb-backup-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "backup.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "mongodb_backup_policy" {
  role       = aws_iam_role.mongodb_backup_role.name
  policy_arn = "arn:aws:iam::aws:policy/AWSBackupServiceRolePolicyForBackup"
}

7.2 使用Kubernetes进行备份

# MongoDB备份CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
  name: mongodb-backup
spec:
  schedule: "0 2 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: mongo:6.0
            command:
            - /bin/bash
            - -c
            - |
              #!/bin/bash
              set -e
              
              BACKUP_DIR="/backup/mongodb/$(date +%F)"
              mkdir -p $BACKUP_DIR
              
              # 执行备份
              mongodump \
                --host $(MONGO_HOST) \
                --port $(MONGO_PORT) \
                --username $(MONGO_USER) \
                --password $(MONGO_PASSWORD) \
                --authenticationDatabase admin \
                --db $(MONGO_DB) \
                --out $BACKUP_DIR
              
              # 压缩
              tar -czf /backup/mongodb_backup_$(date +%F).tar.gz -C /backup mongodb/$(date +%F)
              
              # 上传到S3
              aws s3 cp /backup/mongodb_backup_$(date +%F).tar.gz s3://mongodb-backups/
              
              # 清理本地
              rm -rf /backup/mongodb
            env:
            - name: MONGO_HOST
              value: "mongodb-service"
            - name: MONGO_PORT
              value: "27017"
            - name: MONGO_USER
              valueFrom:
                secretKeyRef:
                  name: mongodb-secret
                  key: username
            - name: MONGO_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mongodb-secret
                  key: password
            - name: MONGO_DB
              value: "mydb"
            - name: AWS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: aws-secret
                  key: access-key
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: aws-secret
                  key: secret-key
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          restartPolicy: OnFailure
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc

7.3 使用Ansible自动化备份部署

# ansible/playbook.yml
---
- name: Deploy MongoDB Backup Infrastructure
  hosts: mongodb_servers
  become: yes
  vars:
    backup_dir: /backup/mongodb
    backup_user: backupuser
    backup_script_path: /usr/local/bin/mongodb_backup.sh
    
  tasks:
    - name: Create backup directory
      file:
        path: "{{ backup_dir }}"
        state: directory
        owner: "{{ backup_user }}"
        group: "{{ backup_user }}"
        mode: '0755'
        
    - name: Install required packages
      package:
        name:
          - mongodb-database-tools
          - tar
          - gzip
        state: present
        
    - name: Copy backup script
      copy:
        src: templates/mongodb_backup.sh.j2
        dest: "{{ backup_script_path }}"
        owner: root
        group: root
        mode: '0755'
        
    - name: Create systemd service
      copy:
        content: |
          [Unit]
          Description=MongoDB Backup Service
          After=mongod.service
          
          [Service]
          Type=oneshot
          User={{ backup_user }}
          Group={{ backup_user }}
          ExecStart={{ backup_script_path }}
        dest: /etc/systemd/system/mongodb-backup.service
        owner: root
        group: root
        mode: '0644'
        
    - name: Create systemd timer
      copy:
        content: |
          [Unit]
          Description=MongoDB Backup Timer
          Requires=mongodb-backup.service
          
          [Timer]
          OnCalendar=daily
          Persistent=true
          
          [Install]
          WantedBy=timers.target
        dest: /etc/systemd/system/mongodb-backup.timer
        owner: root
        group: root
        mode: '0644'
        
    - name: Enable and start timer
      systemd:
        name: mongodb-backup.timer
        enabled: yes
        state: started
        
    - name: Create log directory
      file:
        path: /var/log/mongodb
        state: directory
        owner: "{{ backup_user }}"
        group: "{{ backup_user }}"
        mode: '0755'
        
    - name: Configure logrotate
      copy:
        content: |
          /var/log/mongodb/backup.log {
              daily
              rotate 7
              compress
              delaycompress
              missingok
              notifempty
              create 0640 {{ backup_user }} {{ backup_user }}
          }
        dest: /etc/logrotate.d/mongodb-backup
        owner: root
        group: root
        mode: '0644'

第八部分:备份成本优化

8.1 存储成本优化

8.1.1 分层存储策略

#!/bin/bash
# 分层存储管理脚本

BACKUP_DIR="/backup/mongodb"
S3_BUCKET="s3://mongodb-backups"

# 上传到S3并设置存储类
upload_to_s3() {
    local backup_file=$1
    local s3_path="$S3_BUCKET/$(basename $backup_file)"
    
    # 上传到S3 Standard(最近7天)
    aws s3 cp $backup_file $s3_path --storage-class STANDARD
    
    # 7天后转移到Infrequent Access
    aws s3api put-object-legal-hold --bucket mongodb-backups --key $(basename $backup_file) --legal-hold Status=ON
    
    # 30天后转移到Glacier
    # 使用S3生命周期策略自动处理
}

# S3生命周期策略
aws s3api put-bucket-lifecycle-configuration \
    --bucket mongodb-backups \
    --lifecycle-configuration file://lifecycle.json

# lifecycle.json内容:
{
    "Rules": [
        {
            "ID": "Transition to IA",
            "Status": "Enabled",
            "Filter": {
                "Prefix": ""
            },
            "Transitions": [
                {
                    "Days": 7,
                    "StorageClass": "STANDARD_IA"
                },
                {
                    "Days": 30,
                    "StorageClass": "GLACIER"
                }
            ],
            "Expiration": {
                "Days": 365
            }
        }
    ]
}

8.1.2 去重和压缩优化

#!/bin/bash
# 备份去重和压缩优化

# 使用zstd压缩(比gzip更好)
compress_backup() {
    local backup_dir=$1
    local output_file="/backup/mongodb_backup_$(date +%F).tar.zst"
    
    # 使用zstd压缩(需要安装zstd)
    tar -cf - -C /backup $backup_dir | zstd -T0 -19 -o $output_file
    
    # 验证压缩文件
    zstd -t $output_file
    
    echo "Compressed backup: $output_file"
}

# 使用硬链接实现增量备份
create_incremental() {
    local full_backup=$1
    local inc_backup=$2
    
    # 创建硬链接,节省空间
    cp -al $full_backup $inc_backup
    
    # 只备份变化的文件
    rsync -av --delete --link-dest=$full_backup /var/lib/mongodb/ $inc_backup/
}

8.2 备份性能优化

8.2.1 并行备份

#!/bin/bash
# 并行备份多个集合

DB_NAME="mydb"
BACKUP_DIR="/backup/mongodb/parallel_$(date +%F)"
MONGO_HOST="localhost"
MONGO_PORT="27017"

# 获取所有集合
COLLECTIONS=$(mongo --quiet --host $MONGO_HOST --port $MONGO_PORT --eval "
db.getSiblingDB('$DB_NAME').getCollectionNames().filter(c => !c.startsWith('system.')).join(' ')
")

# 并行备份每个集合
for coll in $COLLECTIONS; do
    (
        mongodump --host $MONGO_HOST --port $MONGO_PORT \
            --db $DB_NAME --collection $coll \
            --out $BACKUP_DIR/$coll.bson &
    )
done

# 等待所有后台任务完成
wait

# 合并备份目录
mkdir -p $BACKUP_DIR/$DB_NAME
for coll in $COLLECTIONS; do
    mv $BACKUP_DIR/$coll.bson $BACKUP_DIR/$DB_NAME/
    # 如果有索引定义,也需要移动
    if [ -f "$BACKUP_DIR/${coll}.metadata.json" ]; then
        mv $BACKUP_DIR/${coll}.metadata.json $BACKUP_DIR/$DB_NAME/
    fi
done

8.2.2 资源限制

#!/bin/bash
# 限制备份进程资源使用

# 使用cpulimit限制CPU使用率(需要安装cpulimit)
cpulimit -l 50 -f -- mongodump --host $HOST --port $PORT --db $DB --out $BACKUP_DIR

# 使用nice降低优先级
nice -n 19 mongodump --host $HOST --port $PORT --db $DB --out $BACKUP_DIR

# 使用ionice限制I/O优先级(需要root)
ionice -c 2 -n 7 mongodump --host $HOST --port $PORT --db $DB --out $BACKUP_DIR

# 组合使用
nice -n 19 ionice -c 2 -n 7 cpulimit -l 50 -f -- mongodump --host $HOST --port $PORT --db $DB --out $BACKUP_DIR

第九部分:备份合规性与审计

9.1 备份审计日志

#!/bin/bash
# 备份审计日志脚本

AUDIT_LOG="/var/log/mongodb_backup_audit.log"

log_audit() {
    local action=$1
    local status=$2
    local details=$3
    
    echo "$(date -Iseconds) | $action | $status | $details | $USER | $(whoami) | $(hostname)" >> $AUDIT_LOG
}

# 示例:记录备份操作
log_audit "BACKUP_START" "INFO" "Starting backup of mydb"
# ... 执行备份 ...
if [ $? -eq 0 ]; then
    log_audit "BACKUP_COMPLETE" "SUCCESS" "Backup completed successfully"
else
    log_audit "BACKUP_COMPLETE" "FAILURE" "Backup failed"
fi

# 记录恢复操作
log_audit "RESTORE_START" "INFO" "Restoring from backup file"
# ... 执行恢复 ...
if [ $? -eq 0 ]; then
    log_audit "RESTORE_COMPLETE" "SUCCESS" "Restore completed successfully"
else
    log_audit "RESTORE_COMPLETE" "FAILURE" "Restore failed"
fi

# 记录删除操作
log_audit "BACKUP_DELETE" "INFO" "Deleted old backup: $BACKUP_FILE"

9.2 合规性报告生成

#!/usr/bin/env python3
# 生成备份合规性报告

import json
import subprocess
from datetime import datetime, timedelta
import os

class BackupCompliance:
    def __init__(self, backup_dir):
        self.backup_dir = backup_dir
        self.report = {}
        
    def check_backup_age(self):
        """检查备份时效性"""
        files = [f for f in os.listdir(self.backup_dir) if f.startswith('mongodb_backup_')]
        if not files:
            return {"status": "FAIL", "message": "No backups found"}
        
        latest = max(files)
        latest_date = datetime.strptime(latest.split('_')[1].split('.')[0], '%Y-%m-%d')
        age_days = (datetime.now() - latest_date).days
        
        if age_days > 1:
            return {"status": "FAIL", "message": f"Backup is {age_days} days old"}
        return {"status": "PASS", "message": f"Backup is {age_days} days old"}
    
    def check_backup_integrity(self):
        """检查备份完整性"""
        latest_backup = os.path.join(self.backup_dir, max([f for f in os.listdir(self.backup_dir) if f.startswith('mongodb_backup_')]))
        
        try:
            result = subprocess.run(['tar', '-tzf', latest_backup], capture_output=True, text=True, timeout=60)
            if result.returncode == 0:
                # 检查是否包含BSON文件
                if '.bson' in result.stdout:
                    return {"status": "PASS", "message": "Backup is valid and contains BSON files"}
                else:
                    return {"status": "FAIL", "message": "Backup does not contain BSON files"}
            else:
                return {"status": "FAIL", "message": "Backup archive is corrupted"}
        except Exception as e:
            return {"status": "FAIL", "message": f"Error checking backup: {str(e)}"}
    
    def check_retention_policy(self):
        """检查保留策略"""
        retention_days = 7
        cutoff_date = datetime.now() - timedelta(days=retention_days)
        
        old_files = []
        for f in os.listdir(self.backup_dir):
            if f.startswith('mongodb_backup_'):
                file_date = datetime.strptime(f.split('_')[1].split('.')[0], '%Y-%m-%d')
                if file_date < cutoff_date:
                    old_files.append(f)
        
        if old_files:
            return {"status": "FAIL", "message": f"Found {len(old_files)} old backups beyond retention policy"}
        return {"status": "PASS", "message": "All backups within retention policy"}
    
    def generate_report(self):
        """生成完整报告"""
        self.report['timestamp'] = datetime.now().isoformat()
        self.report['backup_dir'] = self.backup_dir
        
        self.report['checks'] = {
            'backup_age': self.check_backup_age(),
            'backup_integrity': self.check_backup_integrity(),
            'retention_policy': self.check_retention_policy()
        }
        
        # 总体状态
        all_pass = all(check['status'] == 'PASS' for check in self.report['checks'].values())
        self.report['overall_status'] = 'COMPLIANT' if all_pass else 'NON_COMPLIANT'
        
        return self.report

# 使用示例
if __name__ == '__main__':
    compliance = BackupCompliance('/backup/mongodb')
    report = compliance.generate_report()
    
    print(json.dumps(report, indent=2))
    
    # 保存报告
    with open(f"/var/log/mongodb_compliance_{datetime.now().strftime('%Y%m%d')}.json", 'w') as f:
        json.dump(report, f, indent=2)

第十部分:总结与建议

10.1 备份策略选择指南

根据您的业务需求,选择合适的备份策略:

  1. 小型应用(<10GB)

    • 每日使用mongodump
    • 保留7天
    • 手动验证
  2. 中型应用(10GB-100GB)

    • 副本集secondary节点备份
    • 每日全量 + 每周验证
    • 自动化脚本 + 监控
  3. 大型应用(100GB-1TB)

    • 文件系统快照 + oplog增量
    • 每日全量 + 每小时增量
    • 自动化 + 定期恢复测试
  4. 超大型应用(>1TB)

    • 云原生快照 + 分片集群备份
    • 连续增量备份
    • 专业备份解决方案

10.2 关键成功因素

  1. 自动化:减少人为错误
  2. 监控:及时发现问题
  3. 验证:确保备份可用
  4. 测试:定期演练恢复流程
  5. 文档:详细记录操作步骤
  6. 安全:加密和访问控制

10.3 持续改进建议

  1. 建立备份SLA:明确RTO和RPO目标
  2. 定期审查:每季度评估备份策略有效性
  3. 成本优化:监控存储成本,优化存储层级
  4. 团队培训:确保团队成员都熟悉恢复流程
  5. 应急预案:制定详细的灾难恢复预案

通过实施本文介绍的备份策略和最佳实践,您可以显著降低数据丢失风险,确保在发生故障时能够快速恢复业务。记住,备份的价值只有在恢复时才能真正体现,因此请务必定期测试您的备份和恢复流程。