深度学习入门基础从零开始掌握神经网络核心概念与实战应用技巧

深度学习作为人工智能领域的核心技术，已经深刻改变了我们处理复杂数据和解决实际问题的方式。从图像识别到自然语言处理，从自动驾驶到医疗诊断，神经网络无处不在。本篇文章将从零开始，系统地介绍深度学习的基础知识、核心概念以及实战应用技巧，帮助初学者快速入门并掌握神经网络的核心技术。

1. 深度学习简介与基础概念

1.1 什么是深度学习？

深度学习是机器学习的一个子领域，它模仿人脑神经元的工作方式，通过多层神经网络结构来学习数据中的复杂模式和特征。与传统机器学习方法不同，深度学习能够自动从原始数据中提取特征，无需人工设计特征提取器。

深度学习的核心优势在于：

自动特征学习：能够从原始数据中自动提取高层次的抽象特征
处理复杂数据：特别适合处理图像、语音、文本等非结构化数据
端到端学习：直接从输入到输出进行学习，简化了整个流程
强大的表示能力：多层网络结构能够表示极其复杂的函数关系

1.2 神经网络的基本结构

神经网络的基本组成单元是神经元（Neuron），也称为感知机。一个典型的神经元接收多个输入信号，对它们进行加权求和，然后通过激活函数产生输出。

神经元模型

一个基本的神经元模型可以用以下数学公式表示：

y = f(Σ(w_i * x_i) + b)

其中：

x_i 是输入信号
w_i 是对应的权重
b 是偏置项
f 是激活函数
y 是输出

网络结构

神经网络由多个神经元按层组织而成：

输入层（Input Layer）：接收原始数据
隐藏层（Hidden Layers）：进行特征提取和转换（可以有多层）
输出层（Output Layer）：产生最终预测结果

1.3 激活函数的作用与常见类型

激活函数为神经网络引入非线性，使其能够学习复杂的模式。如果没有激活函数，无论网络有多少层，都只能表示线性关系。

常见激活函数

Sigmoid函数
- 公式：σ(x) = 1 / (1 + e^(-x))
- 输出范围：(0, 1)
- 优点：平滑可导
- 缺点：容易导致梯度消失；输出不是零中心
Tanh函数
- 公式：tanh(x) = (e^x - e^-x) / (e^x + e^-x)
- 输出范围：(-1, 1)
- 优点：输出零中心，收敛更快
- 缺点：仍然存在梯度消失问题
ReLU函数
- 公式：ReLU(x) = max(0, x)
- 输出范围：[0, +∞)
- 优点：计算简单，缓解梯度消失
- 缺点：可能导致神经元”死亡”（输出恒为0）
Leaky ReLU
- 公式：LeakyReLU(x) = max(αx, x)（α通常为0.01）
- 解决了ReLU的”死亡”问题

代码示例：激活函数实现

import numpy as np
import matplotlib.pyplot as plt

# 定义激活函数
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def tanh(x):
    return np.tanh(x)

def relu(x):
    return np.maximum(0, x)

def leaky_relu(x, alpha=0.01):
    return np.maximum(alpha * x, x)

# 生成输入数据
x = np.linspace(-5, 5, 100)

# 绘制图像
plt.figure(figsize=(12, 8))
plt.plot(x, sigmoid(x), label='Sigmoid')
plt.plot(x, tanh(x), label='Tanh')
plt.plot(x, relu(x), label='ReLU')
plt.plot(x, leaky_relu(x), label='Leaky ReLU')
plt.axhline(y=0, color='k', linestyle='--', alpha=0.3)
plt.axvline(x=0, color='k', linestyle='--', alpha=0.3)
plt.legend()
plt.title('常见激活函数对比')
plt.grid(True, alpha=0.3)
plt.show()

这段代码展示了四种常见激活函数的数学特性。在实际应用中，ReLU及其变体因其在深层网络中的优异表现而被广泛使用。

2. 神经网络的训练过程

2.1 损失函数（Loss Function）

损失函数用于衡量模型预测值与真实值之间的差异。选择合适的损失函数对模型训练至关重要。

常见损失函数

均方误差（MSE） - 用于回归问题
- 公式：L = (1/n) * Σ(y_pred - y_true)^2
交叉熵损失（Cross-Entropy） - 用于分类问题
- 公式：L = -Σ(y_true * log(y_pred))
二元交叉熵（Binary Cross-Entropy） - 用于二分类
- 公式：L = -[y_true * log(y_pred) + (1-y_true) * log(1-y_pred)]

2.2 梯度下降与反向传播

梯度下降算法

梯度下降是优化神经网络参数的核心算法。它通过计算损失函数相对于参数的梯度，沿着梯度的反方向更新参数。

基本公式：

θ = θ - α * ∇J(θ)

其中：

θ 是模型参数
α 是学习率
∇J(θ) 是损失函数的梯度

反向传播算法

反向传播（Backpropagation）是计算梯度的高效算法，它利用链式法则从输出层向输入层逐层计算梯度。

反向传播的四个基本步骤：

前向传播计算预测值
计算损失函数
反向传播计算梯度
更新参数

代码示例：简单神经网络的实现

import numpy as np

class SimpleNeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        # 初始化权重
        self.W1 = np.random.randn(input_size, hidden_size) * 0.01
        self.b1 = np.zeros((1, hidden_size))
        self.W2 = np.random.randn(hidden_size, output_size) * 0.01
        self.b2 = np.zeros((1, output_size))
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(self, x):
        return x * (1 - x)
    
    def forward(self, X):
        # 前向传播
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = self.sigmoid(self.z1)
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = self.sigmoid(self.z2)
        return self.a2
    
    def compute_loss(self, y_true, y_pred):
        # 计算交叉熵损失
        m = y_true.shape[0]
        loss = -np.sum(y_true * np.log(y_pred + 1e-8) + (1-y_true) * np.log(1-y_pred + 1e-8)) / m
        return loss
    
    def backward(self, X, y_true, y_pred, learning_rate=0.1):
        # 反向传播
        m = X.shape[0]
        
        # 输出层梯度
        dz2 = y_pred - y_true
        dW2 = np.dot(self.a1.T, dz2) / m
        db2 = np.sum(dz2, axis=0, keepdims=True) / m
        
        # 隐藏层梯度
        dz1 = np.dot(dz2, self.W2.T) * self.sigmoid_derivative(self.a1)
        dW1 = np.dot(X.T, dz1) / m
        db1 = np.sum(dz1, axis=0, keepdims=True) / m
        
        # 更新参数
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
    
    def train(self, X, y, epochs=1000, learning_rate=0.1, verbose=True):
        losses = []
        for epoch in range(epochs):
            # 前向传播
            y_pred = self.forward(X)
            
            # 计算损失
            loss = self.compute_loss(y, y_pred)
            losses.append(loss)
            
            # 反向传播
            self.backward(X, y, y_pred, learning_rate)
            
            if verbose and epoch % 100 == 0:
                print(f"Epoch {epoch}, Loss: {loss:.6f}")
        
        return losses

# 创建示例数据：XOR问题
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# 创建并训练网络
nn = SimpleNeuralNetwork(input_size=2, hidden_size=4, output_size=1)
losses = nn.train(X, y, epochs=2000, learning_rate=0.5, verbose=True)

# 测试预测
print("\n测试预测:")
for i in range(len(X)):
    pred = nn.forward(X[i:i+1])
    print(f"输入: {X[i]}, 预测: {pred[0][0]:.4f}, 真实: {y[i][0]}")

这个简单的神经网络实现了XOR问题的求解，展示了前向传播、损失计算、反向传播和参数更新的完整流程。

2.3 优化算法

常见优化算法

随机梯度下降（SGD）
- 每次使用一个样本更新参数
- 收敛慢，容易震荡
批量梯度下降（BGD）
- 使用全部样本计算梯度
- 收敛稳定，但计算开销大
小批量梯度下降（Mini-batch SGD）
- 折中方案，使用小批量样本
- 实际应用中最常用
动量法（Momentum）
- 引入历史梯度方向，加速收敛
- 公式：v = β * v + (1-β) * ∇J(θ)
Adam优化器
- 结合动量法和RMSprop
- 自适应调整学习率
- 实际应用中的首选

代码示例：不同优化器对比

import numpy as np
import matplotlib.pyplot as plt

class Optimizers:
    @staticmethod
    def sgd(params, grads, learning_rate):
        for param, grad in zip(params, grads):
            param -= learning_rate * grad
    
    @staticmethod
    def momentum(params, grads, velocities, learning_rate=0.01, beta=0.9):
        for i, (param, grad) in enumerate(zip(params, grads)):
            velocities[i] = beta * velocities[i] + (1 - beta) * grad
            param -= learning_rate * velocities[i]
    
    @staticmethod
    def adam(params, grads, m, v, t, learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8):
        for i, (param, grad) in enumerate(zip(params, grads)):
            m[i] = beta1 * m[i] + (1 - beta1) * grad
            v[i] = beta2 * v[i] + (1 - beta2) * grad ** 2
            m_hat = m[i] / (1 - beta1 ** t)
            v_hat = v[i] / (1 - beta2 ** t)
            param -= learning_rate * m_hat / (np.sqrt(v_hat) + epsilon)

# 模拟优化过程
def optimize_loss(optimizer_name, initial_pos, loss_func, grad_func, iterations=100):
    position = np.array(initial_pos, dtype=float)
    history = [position.copy()]
    
    if optimizer_name == 'sgd':
        for _ in range(iterations):
            grad = grad_func(position)
            Optimizers.sgd([position], [grad], learning_rate=0.1)
            history.append(position.copy())
    
    elif optimizer_name == 'momentum':
        velocity = np.zeros_like(position)
        for _ in range(iterations):
            grad = grad_func(position)
            Optimizers.momentum([position], [grad], [velocity], learning_rate=0.1, beta=0.9)
            history.append(position.copy())
    
    elif optimizer_name == 'adam':
        m = np.zeros_like(position)
        v = np.zeros_like(position)
        t = 0
        for _ in range(iterations):
            t += 1
            grad = grad_func(position)
            Optimizers.adam([position], [grad], m, v, t, learning_rate=0.1)
            history.append(position.copy())
    
    return history

# 定义一个简单的二次损失函数和梯度
def simple_loss(x):
    return x[0]**2 + 10*x[1]**2

def simple_grad(x):
    return np.array([2*x[0], 20*x[1]])

# 比较不同优化器
initial_pos = [10, 10]
optimizers = ['sgd', 'momentum', 'adam']
histories = {}

for opt in optimizers:
    histories[opt] = optimize_loss(opt, initial_pos, simple_loss, simple_grad)

# 可视化结果
plt.figure(figsize=(12, 8))
for opt, history in histories.items():
    history = np.array(history)
    plt.plot(history[:, 0], history[:, 1], 'o-', label=opt, alpha=0.7)
    
# 绘制等高线图
x = np.linspace(-12, 12, 100)
y = np.linspace(-12, 12, 100)
X, Y = np.meshgrid(x, y)
Z = X**2 + 10*Y**2
plt.contour(X, Y, Z, levels=np.logspace(-1, 3, 10), alpha=0.3)

plt.plot(initial_pos[0], initial_pos[1], 'ro', markersize=10, label='Start')
plt.plot(0, 0, 'g*', markersize=15, label='Minimum')
plt.legend()
plt.title('不同优化器对比')
plt.xlabel('x')
plt.ylabel('y')
plt.grid(True, alpha=0.3)
plt.show()

3. 深度学习框架实战

3.1 PyTorch基础

PyTorch是目前最流行的深度学习框架之一，以其动态计算图和Pythonic的接口而闻名。

PyTorch核心概念

Tensor：PyTorch的基本数据结构，类似于NumPy数组但支持GPU加速
自动微分（Autograd）：自动计算梯度
神经网络模块（nn.Module）：构建神经网络的基类
优化器（Optimizer）：实现各种优化算法

代码示例：使用PyTorch实现MNIST分类

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

# 1. 定义神经网络模型
class MNISTNet(nn.Module):
    def __init__(self):
        super(MNISTNet, self).__init__()
        self.flatten = nn.Flatten()
        self.network = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(512, 256),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(256, 10)
        )
    
    def forward(self, x):
        x = self.flatten(x)
        return self.network(x)

# 2. 数据准备
def prepare_data():
    # 数据预处理
    transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
    ])
    
    # 下载并加载训练数据
    train_dataset = torchvision.datasets.MNIST(
        root='./data', 
        train=True, 
        download=True, 
        transform=transform
    )
    
    # 下载并加载测试数据
    test_dataset = torchvision.datasets.MNIST(
        root='./data', 
        train=False, 
        download=True, 
        transform=transform
    )
    
    train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
    test_loader = DataLoader(test_dataset, batch_size=1000, shuffle=False)
    
    return train_loader, test_loader

# 3. 训练函数
def train_model(model, train_loader, test_loader, epochs=10):
    # 损失函数和优化器
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    # 记录训练历史
    train_losses = []
    train_accs = []
    test_accs = []
    
    for epoch in range(epochs):
        model.train()
        running_loss = 0.0
        correct = 0
        total = 0
        
        for batch_idx, (data, target) in enumerate(train_loader):
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            _, predicted = output.max(1)
            total += target.size(0)
            correct += predicted.eq(target).sum().item()
            
            if batch_idx % 100 == 0:
                print(f'Epoch: {epoch+1}/{epochs} | Batch: {batch_idx}/{len(train_loader)} | Loss: {loss.item():.4f}')
        
        epoch_loss = running_loss / len(train_loader)
        epoch_acc = 100. * correct / total
        train_losses.append(epoch_loss)
        train_accs.append(epoch_acc)
        
        # 测试
        test_acc = test_model(model, test_loader)
        test_accs.append(test_acc)
        
        print(f'Epoch {epoch+1} - Train Loss: {epoch_loss:.4f}, Train Acc: {epoch_acc:.2f}%, Test Acc: {test_acc:.2f}%')
    
    return train_losses, train_accs, test_accs

# 4. 测试函数
def test_model(model, test_loader):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            _, predicted = output.max(1)
            total += target.size(0)
            correct += predicted.eq(target).sum().item()
    
    accuracy = 100. * correct / total
    return accuracy

# 5. 可视化函数
def visualize_results(train_losses, train_accs, test_accs):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # 损失曲线
    ax1.plot(train_losses, 'b-', linewidth=2)
    ax1.set_title('Training Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.grid(True, alpha=0.3)
    
    # 准确率曲线
    ax2.plot(train_accs, 'b-', label='Train Accuracy', linewidth=2)
    ax2.plot(test_accs, 'r-', label='Test Accuracy', linewidth=2)
    ax2.set_title('Accuracy')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy (%)')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

# 主程序
if __name__ == "__main__":
    # 设置随机种子以确保结果可复现
    torch.manual_seed(42)
    
    # 准备数据
    print("准备数据...")
    train_loader, test_loader = prepare_data()
    
    # 创建模型
    print("创建模型...")
    model = MNISTNet()
    print(f"模型参数数量: {sum(p.numel() for p in model.parameters()):,}")
    
    # 训练模型
    print("开始训练...")
    train_losses, train_accs, test_accs = train_model(model, train_loader, test_loader, epochs=10)
    
    # 可视化结果
    print("可视化结果...")
    visualize_results(train_losses, train_accs, test_accs)
    
    # 最终测试
    final_acc = test_model(model, test_loader)
    print(f"\n最终测试准确率: {final_acc:.2f}%")

3.2 TensorFlow/Keras基础

TensorFlow是另一个主流的深度学习框架，而Keras作为其高层API，提供了更简洁的接口。

代码示例：使用Keras实现MNIST分类

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt

# 1. 数据准备
def prepare_data_tf():
    # 加载MNIST数据集
    (x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
    
    # 数据预处理
    x_train = x_train.astype('float32') / 255.0
    x_test = x_test.astype('float32') / 255.0
    
    # 添加通道维度 (H, W) -> (H, W, 1)
    x_train = np.expand_dims(x_train, -1)
    x_test = np.expand_dims(x_test, -1)
    
    print(f"训练数据形状: {x_train.shape}")
    print(f"测试数据形状: {x_test.shape}")
    
    return (x_train, y_train), (x_test, y_test)

# 2. 构建模型
def build_model():
    model = keras.Sequential([
        layers.Input(shape=(28, 28, 1)),
        layers.Flatten(),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.2),
        layers.Dense(256, activation='relu'),
        layers.Dropout(0.2),
        layers.Dense(10, activation='softmax')
    ])
    
    model.compile(
        optimizer='adam',
        loss='sparse_categorical_crossentropy',
        metrics=['accuracy']
    )
    
    return model

# 3. 训练模型
def train_model_tf(model, x_train, y_train, x_test, y_test, epochs=10):
    # 定义回调函数
    callbacks = [
        keras.callbacks.EarlyStopping(patience=3, restore_best_weights=True),
        keras.callbacks.ReduceLROnPlateau(factor=0.5, patience=2)
    ]
    
    # 训练
    history = model.fit(
        x_train, y_train,
        batch_size=64,
        epochs=epochs,
        validation_split=0.1,
        callbacks=callbacks,
        verbose=1
    )
    
    # 评估
    test_loss, test_acc = model.evaluate(x_test, y_test, verbose=0)
    print(f"\n测试准确率: {test_acc:.4f}")
    
    return history, test_acc

# 4. 可视化训练历史
def visualize_history(history):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))
    
    # 损失
    ax1.plot(history.history['loss'], 'b-', label='Training Loss')
    ax1.plot(history.history['val_loss'], 'r-', label='Validation Loss')
    ax1.set_title('Model Loss')
    ax1.set_xlabel('Epoch')
    ax1.set_ylabel('Loss')
    ax1.legend()
    ax1.grid(True, alpha=0.3)
    
    # 准确率
    ax2.plot(history.history['accuracy'], 'b-', label='Training Accuracy')
    ax2.plot(history.history['val_accuracy'], 'r-', label='Validation Accuracy')
    ax2.set_title('Model Accuracy')
    ax2.set_xlabel('Epoch')
    ax2.set_ylabel('Accuracy')
    ax2.legend()
    ax2.grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

# 主程序
if __name__ == "__main__":
    # 准备数据
    print("准备数据...")
    (x_train, y_train), (x_test, y_test) = prepare_data_tf()
    
    # 构建模型
    print("构建模型...")
    model = build_model()
    model.summary()
    
    # 训练模型
    print("开始训练...")
    history, test_acc = train_model_tf(model, x_train, y_train, x_test, y_test, epochs=10)
    
    # 可视化
    visualize_history(history)
    
    # 预测示例
    print("\n预测示例:")
    sample_idx = np.random.randint(0, len(x_test), 5)
    predictions = model.predict(x_test[sample_idx])
    predicted_labels = np.argmax(predictions, axis=1)
    
    for i, idx in enumerate(sample_idx):
        print(f"样本 {idx}: 预测={predicted_labels[i]}, 真实={y_test[idx]}, 置信度={predictions[i][predicted_labels[i]]:.4f}")

4. 卷积神经网络（CNN）

4.1 CNN核心概念

卷积神经网络是专门处理网格结构数据（如图像）的神经网络，通过卷积操作自动提取局部特征。

CNN的关键组件

卷积层（Convolutional Layer）
- 使用卷积核（Filter）在输入上滑动，提取局部特征
- 参数共享：同一个卷积核在整个输入上共享参数
- 空间局部连接：每个神经元只连接输入的一个局部区域
池化层（Pooling Layer）
- 降低特征图的空间维度
- 常见类型：最大池化（Max Pooling）、平均池化（Average Pooling）
- 提供平移不变性
全连接层（Fully Connected Layer）
- 用于分类或回归
- 通常在CNN的最后几层

代码示例：自定义CNN实现

import torch
import torch.nn as nn
import torch.nn.functional as F

class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()
        # 卷积层：输入通道3（RGB图像），输出通道32，卷积核3x3
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)
        # 卷积层：输入通道32，输出通道64，卷积核3x3
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
        # 池化层：2x2最大池化
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        # 全连接层
        self.fc1 = nn.Linear(64 * 8 * 8, 512)  # 假设输入是32x32图像
        self.fc2 = nn.Linear(512, num_classes)
        # Dropout防止过拟合
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
        # 输入形状: (batch_size, 3, 32, 32)
        
        # 第一层卷积 + ReLU + 池化
        x = self.pool(F.relu(self.conv1(x)))  # -> (batch_size, 32, 16, 16)
        
        # 第二层卷积 + ReLU + 池化
        x = self.pool(F.relu(self.conv2(x)))  # -> (batch_size, 64, 8, 8)
        
        # 展平
        x = x.view(x.size(0), -1)  # -> (batch_size, 64*8*8)
        
        # 全连接层
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        
        return x

# 测试模型
def test_cnn():
    # 创建模拟输入：batch_size=4, channels=3, height=32, width=32
    dummy_input = torch.randn(4, 3, 32, 32)
    
    # 创建模型
    model = SimpleCNN(num_classes=10)
    
    # 前向传播
    output = model(dummy_input)
    
    print(f"输入形状: {dummy_input.shape}")
    print(f"输出形状: {output.shape}")
    print(f"模型参数数量: {sum(p.numel() for p in model.parameters()):,}")
    
    return model

# 运行测试
model = test_cnn()

4.2 经典CNN架构

LeNet-5

最早的CNN架构之一，用于手写数字识别。

AlexNet

2012年ImageNet冠军，引入ReLU和Dropout。

VGGNet

使用3x3小卷积核堆叠，结构简洁。

ResNet

引入残差连接，解决了深层网络的退化问题。

代码示例：实现ResNet基本块

class BasicBlock(nn.Module):
    """ResNet的基本残差块"""
    def __init__(self, in_channels, out_channels, stride=1):
        super(BasicBlock, self).__init__()
        # 第一个卷积
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, 
                               stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        
        # 第二个卷积
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3,
                               stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        # 残差连接（如果需要降采样）
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1,
                         stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )
    
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)  # 残差连接
        out = F.relu(out)
        return out

class ResNet(nn.Module):
    """简化的ResNet"""
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_channels = 64
        
        # 初始卷积层
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        
        # 残差层
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        
        # 全局平均池化和分类器
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)
    
    def _make_layer(self, block, out_channels, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks - 1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_channels, out_channels, stride))
            self.in_channels = out_channels
        return nn.Sequential(*layers)
    
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.avgpool(out)
        out = out.view(out.size(0), -1)
        out = self.fc(out)
        return out

def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])

# 测试ResNet18
def test_resnet():
    model = ResNet18()
    dummy_input = torch.randn(4, 3, 32, 32)
    output = model(dummy_input)
    print(f"ResNet18输入: {dummy_input.shape}")
    print(f"ResNet18输出: {output.shape}")
    print(f"参数数量: {sum(p.numel() for p in model.parameters()):,}")
    return model

# 运行测试
model = test_resnet()

5. 循环神经网络（RNN）与序列建模

5.1 RNN基础

循环神经网络专门处理序列数据，通过隐藏状态记忆历史信息。

RNN的核心问题

梯度消失/爆炸：长序列训练困难
长期依赖问题：难以学习长距离关系

解决方案

LSTM（长短期记忆网络）：引入门控机制
GRU（门控循环单元）：LSTM的简化版本

代码示例：从零实现LSTM

import torch
import torch.nn as nn

class SimpleLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers=1, output_size=1):
        super(SimpleLSTM, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # LSTM层
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        
        # 全连接输出层
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x, hidden=None):
        # x shape: (batch_size, seq_len, input_size)
        
        # LSTM前向传播
        lstm_out, hidden = self.lstm(x, hidden)
        
        # 取最后一个时间步的输出
        out = lstm_out[:, -1, :]
        
        # 全连接层
        out = self.fc(out)
        
        return out, hidden

# 测试LSTM
def test_lstm():
    batch_size = 4
    seq_len = 10
    input_size = 8
    hidden_size = 32
    
    # 模拟输入
    x = torch.randn(batch_size, seq_len, input_size)
    
    # 创建模型
    model = SimpleLSTM(input_size, hidden_size, num_layers=2, output_size=1)
    
    # 前向传播
    output, hidden = model(x)
    
    print(f"输入形状: {x.shape}")
    print(f"输出形状: {output.shape}")
    print(f"隐藏状态形状: {hidden[0].shape}, {hidden[1].shape}")
    
    return model

# 运行测试
model = test_lstm()

5.2 实际应用：文本分类

代码示例：使用RNN进行情感分析

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import numpy as np

# 简单的文本数据集
class TextDataset(Dataset):
    def __init__(self, texts, labels, vocab, max_len=20):
        self.texts = texts
        self.labels = labels
        self.vocab = vocab
        self.max_len = max_len
    
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        text = self.texts[idx]
        label = self.labels[idx]
        
        # 文本转索引
        indices = [self.vocab.get(word, 1) for word in text.split()]  # 1是未知词
        # 填充/截断
        if len(indices) < self.max_len:
            indices += [0] * (self.max_len - len(indices))  # 0是填充
        else:
            indices = indices[:self.max_len]
        
        return torch.tensor(indices), torch.tensor(label)

class TextRNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_size, output_size):
        super(TextRNN, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx=0)
        self.rnn = nn.GRU(embedding_dim, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x):
        # x shape: (batch_size, seq_len)
        embedded = self.embedding(x)  # -> (batch_size, seq_len, embedding_dim)
        
        # RNN前向传播
        rnn_out, hidden = self.rnn(embedded)  # hidden shape: (1, batch_size, hidden_size)
        
        # 取最后一个时间步的隐藏状态
        out = hidden.squeeze(0)  # -> (batch_size, hidden_size)
        
        # 全连接层
        out = self.fc(out)
        
        return out

# 训练函数
def train_text_classifier():
    # 准备数据
    texts = [
        "I love this movie", "This is amazing", "Great film", "Wonderful",
        "I hate this movie", "This is terrible", "Bad film", "Awful"
    ]
    labels = [1, 1, 1, 1, 0, 0, 0, 0]  # 1: 正面, 0: 负面
    
    # 构建词汇表
    vocab = {"<PAD>": 0, "<UNK>": 1}
    for text in texts:
        for word in text.split():
            if word not in vocab:
                vocab[word] = len(vocab)
    
    # 创建数据集
    dataset = TextDataset(texts, labels, vocab, max_len=5)
    dataloader = DataLoader(dataset, batch_size=4, shuffle=True)
    
    # 模型参数
    vocab_size = len(vocab)
    embedding_dim = 16
    hidden_size = 32
    output_size = 2  # 二分类
    
    # 创建模型
    model = TextRNN(vocab_size, embedding_dim, hidden_size, output_size)
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.01)
    
    # 训练
    print("开始训练文本分类器...")
    for epoch in range(50):
        total_loss = 0
        for batch_texts, batch_labels in dataloader:
            optimizer.zero_grad()
            outputs = model(batch_texts)
            loss = criterion(outputs, batch_labels)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        
        if (epoch + 1) % 10 == 0:
            print(f"Epoch {epoch+1}, Loss: {total_loss/len(dataloader):.4f}")
    
    # 测试
    print("\n测试预测:")
    model.eval()
    with torch.no_grad():
        for text, label in zip(texts, labels):
            indices = [vocab.get(word, 1) for word in text.split()]
            indices = indices[:5] + [0] * (5 - len(indices))
            input_tensor = torch.tensor([indices])
            output = model(input_tensor)
            pred = torch.argmax(output, dim=1).item()
            print(f"文本: '{text}' | 预测: {'正面' if pred==1 else '负面'} | 真实: {'正面' if label==1 else '负面'}")

# 运行训练
train_text_classifier()

6. 实战项目：图像分类器

6.1 完整项目流程

我们将使用CIFAR-10数据集构建一个完整的图像分类项目。

项目步骤

数据加载与预处理
模型构建
训练与验证
模型评估
预测与可视化

代码示例：完整CIFAR-10分类器

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

class AdvancedCNN(nn.Module):
    """更复杂的CNN用于CIFAR-10"""
    def __init__(self, num_classes=10):
        super(AdvancedCNN, self).__init__()
        
        # 卷积块1
        self.conv1 = nn.Conv2d(3, 32, 3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, 3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.drop1 = nn.Dropout(0.25)
        
        # 卷积块2
        self.conv3 = nn.Conv2d(64, 128, 3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.conv4 = nn.Conv2d(128, 128, 3, padding=1)
        self.bn4 = nn.BatchNorm2d(128)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.drop2 = nn.Dropout(0.25)
        
        # 卷积块3
        self.conv5 = nn.Conv2d(128, 256, 3, padding=1)
        self.bn5 = nn.BatchNorm2d(256)
        self.pool3 = nn.MaxPool2d(2, 2)
        self.drop3 = nn.Dropout(0.25)
        
        # 全连接层
        self.fc1 = nn.Linear(256 * 4 * 4, 512)
        self.bn6 = nn.BatchNorm1d(512)
        self.drop4 = nn.Dropout(0.5)
        self.fc2 = nn.Linear(512, num_classes)
    
    def forward(self, x):
        # Block 1
        x = F.relu(self.bn1(self.conv1(x)))
        x = F.relu(self.bn2(self.conv2(x)))
        x = self.pool1(x)
        x = self.drop1(x)
        
        # Block 2
        x = F.relu(self.bn3(self.conv3(x)))
        x = F.relu(self.bn4(self.conv4(x)))
        x = self.pool2(x)
        x = self.drop2(x)
        
        # Block 3
        x = F.relu(self.bn5(self.conv5(x)))
        x = self.pool3(x)
        x = self.drop3(x)
        
        # Flatten and FC
        x = x.view(x.size(0), -1)
        x = F.relu(self.bn6(self.fc1(x)))
        x = self.drop4(x)
        x = self.fc2(x)
        
        return x

def prepare_cifar10_data():
    """准备CIFAR-10数据"""
    # 数据增强
    train_transform = transforms.Compose([
        transforms.RandomCrop(32, padding=4),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])
    
    test_transform = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
    ])
    
    # 加载数据
    train_dataset = torchvision.datasets.CIFAR10(
        root='./data', train=True, download=True, transform=train_transform
    )
    test_dataset = torchvision.datasets.CIFAR10(
        root='./data', train=False, download=True, transform=test_transform
    )
    
    train_loader = DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=2)
    test_loader = DataLoader(test_dataset, batch_size=100, shuffle=False, num_workers=2)
    
    # 类别名称
    classes = ('plane', 'car', 'bird', 'cat', 'deer', 
               'dog', 'frog', 'horse', 'ship', 'truck')
    
    return train_loader, test_loader, classes

def train_cifar10_model(model, train_loader, test_loader, epochs=50):
    """训练CIFAR-10模型"""
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = model.to(device)
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='max', factor=0.5, patience=5, verbose=True)
    
    train_losses = []
    train_accs = []
    test_accs = []
    best_acc = 0
    
    for epoch in range(epochs):
        # 训练阶段
        model.train()
        running_loss = 0.0
        correct = 0
        total = 0
        
        for batch_idx, (inputs, targets) in enumerate(train_loader):
            inputs, targets = inputs.to(device), targets.to(device)
            
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            loss.backward()
            optimizer.step()
            
            running_loss += loss.item()
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
            
            if batch_idx % 100 == 0:
                print(f'Epoch: {epoch+1}/{epochs} | Batch: {batch_idx}/{len(train_loader)} | Loss: {loss.item():.4f}')
        
        train_loss = running_loss / len(train_loader)
        train_acc = 100. * correct / total
        
        # 测试阶段
        test_acc = test_cifar10_model(model, test_loader, device)
        
        # 记录历史
        train_losses.append(train_loss)
        train_accs.append(train_acc)
        test_accs.append(test_acc)
        
        # 学习率调度
        scheduler.step(test_acc)
        
        # 保存最佳模型
        if test_acc > best_acc:
            best_acc = test_acc
            torch.save(model.state_dict(), 'best_cifar10_model.pth')
        
        print(f'Epoch {epoch+1}: Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.2f}%, Test Acc: {test_acc:.2f}%, Best: {best_acc:.2f}%')
    
    return train_losses, train_accs, test_accs, best_acc

def test_cifar10_model(model, test_loader, device):
    """测试CIFAR-10模型"""
    model.eval()
    correct = 0
    total = 0
    all_preds = []
    all_targets = []
    
    with torch.no_grad():
        for inputs, targets in test_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            outputs = model(inputs)
            _, predicted = outputs.max(1)
            total += targets.size(0)
            correct += predicted.eq(targets).sum().item()
            
            all_preds.extend(predicted.cpu().numpy())
            all_targets.extend(targets.cpu().numpy())
    
    accuracy = 100. * correct / total
    return accuracy

def visualize_cifar10_results(train_losses, train_accs, test_accs, classes):
    """可视化CIFAR-10训练结果"""
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    
    # 损失曲线
    axes[0, 0].plot(train_losses, 'b-', linewidth=2)
    axes[0, 0].set_title('Training Loss')
    axes[0, 0].set_xlabel('Epoch')
    axes[0, 0].set_ylabel('Loss')
    axes[0, 0].grid(True, alpha=0.3)
    
    # 准确率曲线
    axes[0, 1].plot(train_accs, 'b-', label='Train Accuracy', linewidth=2)
    axes[0, 1].plot(test_accs, 'r-', label='Test Accuracy', linewidth=2)
    axes[0, 1].set_title('Accuracy')
    axes[0, 1].set_xlabel('Epoch')
    axes[0, 1].set_ylabel('Accuracy (%)')
    axes[0, 1].legend()
    axes[0, 1].grid(True, alpha=0.3)
    
    # 最终准确率对比
    axes[1, 0].bar(['Train', 'Test'], [train_accs[-1], test_accs[-1]], color=['blue', 'red'])
    axes[1, 0].set_title('Final Accuracy')
    axes[1, 0].set_ylabel('Accuracy (%)')
    axes[1, 0].set_ylim([0, 100])
    for i, v in enumerate([train_accs[-1], test_accs[-1]]):
        axes[1, 0].text(i, v + 1, f'{v:.2f}%', ha='center', va='bottom')
    
    # 学习曲线（损失 vs 准确率）
    axes[1, 1].scatter(train_losses, train_accs, c='blue', alpha=0.6, label='Train')
    axes[1, 1].scatter(train_losses, test_accs, c='red', alpha=0.6, label='Test')
    axes[1, 1].set_title('Loss vs Accuracy')
    axes[1, 1].set_xlabel('Loss')
    axes[1, 1].set_ylabel('Accuracy (%)')
    axes[1, 1].legend()
    axes[1, 1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

def predict_and_visualize(model, test_loader, classes, num_samples=10):
    """预测并可视化结果"""
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.eval()
    
    # 获取一批测试数据
    data_iter = iter(test_loader)
    images, labels = next(data_iter)
    
    # 预测前num_samples个样本
    with torch.no_grad():
        images = images[:num_samples].to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)
    
    # 可视化
    fig, axes = plt.subplots(2, 5, figsize=(15, 6))
    axes = axes.flatten()
    
    for i in range(num_samples):
        # 反归一化
        img = images[i].cpu().permute(1, 2, 0).numpy()
        img = img * 0.5 + 0.5  # 反归一化
        
        axes[i].imshow(img)
        true_label = classes[labels[i]]
        pred_label = classes[predicted[i]]
        
        color = 'green' if true_label == pred_label else 'red'
        axes[i].set_title(f'True: {true_label}\nPred: {pred_label}', color=color)
        axes[i].axis('off')
    
    plt.tight_layout()
    plt.show()

# 主程序
def main_cifar10():
    print("="*60)
    print("CIFAR-10图像分类器完整项目")
    print("="*60)
    
    # 1. 准备数据
    print("\n1. 准备CIFAR-10数据集...")
    train_loader, test_loader, classes = prepare_cifar10_data()
    
    # 2. 创建模型
    print("\n2. 创建模型...")
    model = AdvancedCNN(num_classes=10)
    print(f"模型参数数量: {sum(p.numel() for p in model.parameters()):,}")
    
    # 3. 训练模型
    print("\n3. 开始训练...")
    train_losses, train_accs, test_accs, best_acc = train_cifar10_model(
        model, train_loader, test_loader, epochs=50
    )
    
    # 4. 可视化结果
    print("\n4. 可视化结果...")
    visualize_cifar10_results(train_losses, train_accs, test_accs, classes)
    
    # 5. 预测示例
    print("\n5. 预测示例...")
    predict_and_visualize(model, test_loader, classes, num_samples=10)
    
    print(f"\n最佳测试准确率: {best_acc:.2f}%")
    print("项目完成！")

# 运行主程序（取消注释以运行）
# main_cifar10()

7. 模型优化与调试技巧

7.1 过拟合与欠拟合

识别方法

过拟合：训练准确率高，验证准确率低
欠拟合：训练和验证准确率都低

解决方案

正则化技术
- L1/L2正则化
- Dropout
- 数据增强
早停（Early Stopping）
- 监控验证损失，停止训练
批量归一化（Batch Normalization）
- 加速训练，减少过拟合

代码示例：正则化技术对比

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import matplotlib.pyplot as plt
import numpy as np

def create_overfit_data():
    """创建容易过拟合的小数据集"""
    # 生成训练数据
    x_train = torch.randn(100, 10)  # 100个样本，10个特征
    y_train = torch.sum(x_train * torch.tensor([1.0, 2.0, -1.5, 0.5, -0.8, 1.2, -2.0, 0.3, 1.5, -1.0]), dim=1)
    y_train += torch.randn(100) * 0.1  # 添加噪声
    
    # 生成验证数据
    x_val = torch.randn(50, 10)
    y_val = torch.sum(x_val * torch.tensor([1.0, 2.0, -1.5, 0.5, -0.8, 1.2, -2.0, 0.3, 1.5, -1.0]), dim=1)
    y_val += torch.randn(50) * 0.1
    
    train_dataset = TensorDataset(x_train, y_train)
    val_dataset = TensorDataset(x_val, y_val)
    
    train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False)
    
    return train_loader, val_loader

class OverfitNet(nn.Module):
    """容易过拟合的网络"""
    def __init__(self, use_regularization=False):
        super(OverfitNet, self).__init__()
        self.fc1 = nn.Linear(10, 256)
        self.fc2 = nn.Linear(256, 256)
        self.fc3 = nn.Linear(256, 1)
        
        self.use_regularization = use_regularization
        self.dropout = nn.Dropout(0.5)
        self.bn1 = nn.BatchNorm1d(256)
        self.bn2 = nn.BatchNorm1d(256)
    
    def forward(self, x):
        x = torch.relu(self.fc1(x))
        if self.use_regularization:
            x = self.bn1(x)
            x = self.dropout(x)
        
        x = torch.relu(self.fc2(x))
        if self.use_regularization:
            x = self.bn2(x)
            x = self.dropout(x)
        
        x = self.fc3(x)
        return x

def train_with_regularization(use_reg=False):
    """训练模型，可选择是否使用正则化"""
    train_loader, val_loader = create_overfit_data()
    model = OverfitNet(use_regularization=use_reg)
    
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    
    train_losses = []
    val_losses = []
    
    for epoch in range(500):
        # 训练
        model.train()
        train_loss = 0
        for x, y in train_loader:
            optimizer.zero_grad()
            output = model(x)
            loss = criterion(output, y.unsqueeze(1))
            
            # 添加L2正则化
            if use_reg:
                l2_reg = 0
                for param in model.parameters():
                    l2_reg += torch.norm(param)
                loss += 0.01 * l2_reg
            
            loss.backward()
            optimizer.step()
            train_loss += loss.item()
        
        # 验证
        model.eval()
        val_loss = 0
        with torch.no_grad():
            for x, y in val_loader:
                output = model(x)
                loss = criterion(output, y.unsqueeze(1))
                val_loss += loss.item()
        
        train_losses.append(train_loss / len(train_loader))
        val_losses.append(val_loss / len(val_loader))
        
        if epoch % 50 == 0:
            print(f"Epoch {epoch}: Train Loss: {train_losses[-1]:.4f}, Val Loss: {val_losses[-1]:.4f}")
    
    return train_losses, val_losses

def compare_regularization():
    """比较有无正则化的效果"""
    print("训练无正则化模型...")
    train_loss_no_reg, val_loss_no_reg = train_with_regularization(use_reg=False)
    
    print("\n训练有正则化模型...")
    train_loss_reg, val_loss_reg = train_with_regularization(use_reg=True)
    
    # 可视化
    fig, axes = plt.subplots(1, 2, figsize=(15, 5))
    
    # 无正则化
    axes[0].plot(train_loss_no_reg, 'b-', label='Train Loss')
    axes[0].plot(val_loss_no_reg, 'r-', label='Val Loss')
    axes[0].set_title('Without Regularization')
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Loss')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # 有正则化
    axes[1].plot(train_loss_reg, 'b-', label='Train Loss')
    axes[1].plot(val_loss_reg, 'r-', label='Val Loss')
    axes[1].set_title('With Regularization')
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Loss')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()
    
    # 打印最终结果
    print(f"\n最终验证损失对比:")
    print(f"无正则化: {val_loss_no_reg[-1]:.4f}")
    print(f"有正则化: {val_loss_reg[-1]:.4f}")

# 运行比较（取消注释以运行）
# compare_regularization()

7.2 学习率调度

代码示例：学习率调度器

import torch.optim as optim
import matplotlib.pyplot as plt

def compare_lr_schedulers():
    """比较不同学习率调度器"""
    # 创建虚拟模型和优化器
    model = nn.Linear(10, 1)
    optimizer = optim.SGD(model.parameters(), lr=0.1)
    
    # 定义调度器
    schedulers = {
        'StepLR': optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1),
        'ExponentialLR': optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.95),
        'CosineAnnealingLR': optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=100),
        'ReduceLROnPlateau': optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=10)
    }
    
    # 记录学习率
    lr_history = {name: [] for name in schedulers}
    
    # 模拟训练过程
    for epoch in range(100):
        for name, scheduler in schedulers.items():
            if name == 'ReduceLROnPlateau':
                # 模拟损失值
                loss = 1.0 / (epoch + 1) + 0.1 * (0.9 ** epoch)
                scheduler.step(loss)
            else:
                scheduler.step()
            
            lr_history[name].append(optimizer.param_groups[0]['lr'])
    
    # 可视化
    plt.figure(figsize=(12, 8))
    for name, lrs in lr_history.items():
        plt.plot(lrs, label=name, linewidth=2)
    
    plt.xlabel('Epoch')
    plt.ylabel('Learning Rate')
    plt.title('Learning Rate Schedulers Comparison')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.yscale('log')
    plt.show()

# 运行比较（取消注释以运行）
# compare_lr_schedulers()

8. 模型评估与部署

8.1 评估指标

代码示例：完整评估指标

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import classification_report, confusion_matrix
import pandas as pd

def comprehensive_evaluation(y_true, y_pred, y_pred_proba=None, class_names=None):
    """综合评估函数"""
    if class_names is None:
        class_names = [f"Class_{i}" for i in range(len(np.unique(y_true)))]
    
    # 基础指标
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred, average='weighted', zero_division=0)
    recall = recall_score(y_true, y_pred, average='weighted', zero_division=0)
    f1 = f1_score(y_true, y_pred, average='weighted', zero_division=0)
    
    print("="*60)
    print("模型综合评估报告")
    print("="*60)
    print(f"准确率 (Accuracy): {accuracy:.4f}")
    print(f"精确率 (Precision): {precision:.4f}")
    print(f"召回率 (Recall): {recall:.4f}")
    print(f"F1分数 (F1 Score): {f1:.4f}")
    print("\n详细分类报告:")
    print(classification_report(y_true, y_pred, target_names=class_names, zero_division=0))
    
    # 混淆矩阵
    cm = confusion_matrix(y_true, y_pred)
    print("\n混淆矩阵:")
    print(pd.DataFrame(cm, index=class_names, columns=class_names))
    
    # 可视化混淆矩阵
    plt.figure(figsize=(10, 8))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                xticklabels=class_names, yticklabels=class_names)
    plt.title('Confusion Matrix')
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    plt.tight_layout()
    plt.show()
    
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1,
        'confusion_matrix': cm
    }

# 示例使用
def demo_evaluation():
    """演示评估函数"""
    # 模拟预测结果
    np.random.seed(42)
    y_true = np.random.randint(0, 3, 100)  # 3类分类问题
    y_pred = y_true.copy()
    # 添加一些错误
    error_indices = np.random.choice(100, 20, replace=False)
    y_pred[error_indices] = np.random.randint(0, 3, 20)
    
    class_names = ['Cat', 'Dog', 'Bird']
    metrics = comprehensive_evaluation(y_true, y_pred, class_names=class_names)
    
    return metrics

# 运行演示
# demo_evaluation()

8.2 模型保存与加载

代码示例：模型持久化

import torch
import os

def save_load_model_demo():
    """演示模型保存与加载"""
    # 创建一个简单模型
    model = nn.Sequential(
        nn.Linear(10, 20),
        nn.ReLU(),
        nn.Linear(20, 5)
    )
    
    # 模拟训练
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    x = torch.randn(32, 10)
    y = torch.randint(0, 5, (32,))
    
    criterion = nn.CrossEntropyLoss()
    optimizer.zero_grad()
    output = model(x)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()
    
    print("原始模型输出:", model(x[:2]))
    
    # 保存模型
    save_path = 'demo_model.pth'
    torch.save({
        'model_state_dict': model.state_dict(),
        'optimizer_state_dict': optimizer.state_dict(),
        'loss': loss.item(),
        'epoch': 1
    }, save_path)
    print(f"模型已保存到: {save_path}")
    
    # 加载模型
    checkpoint = torch.load(save_path)
    new_model = nn.Sequential(
        nn.Linear(10, 20),
        nn.ReLU(),
        nn.Linear(20, 5)
    )
    new_model.load_state_dict(checkpoint['model_state_dict'])
    
    print("加载模型输出:", new_model(x[:2]))
    print("模型加载成功！")
    
    # 清理
    if os.path.exists(save_path):
        os.remove(save_path)

# 运行演示
# save_load_model_demo()

9. 深度学习最佳实践

9.1 训练技巧总结

数据预处理
- 归一化/标准化
- 数据增强
- 处理不平衡数据
模型设计
- 从简单开始
- 逐步增加复杂度
- 使用合适的激活函数
训练策略
- 合适的学习率
- 学习率调度
- 早停机制
- 正则化
调试技巧
- 监控训练/验证损失
- 可视化预测结果
- 分析错误样本

9.2 常见问题与解决方案

问题1：训练不稳定

解决方案：

检查数据预处理
调小学习率
使用梯度裁剪

问题2：模型不收敛

解决方案：

检查损失函数
增加训练轮数
调整网络结构

问题3：过拟合

解决方案：

增加正则化
使用更多数据
减少模型复杂度

10. 总结与进阶方向

10.1 本指南要点回顾

基础概念：神经网络结构、激活函数、损失函数
训练过程：梯度下降、反向传播、优化算法
框架使用：PyTorch和TensorFlow/Keras基础
网络架构：CNN、RNN、LSTM、ResNet
实战项目：MNIST、CIFAR-10完整分类器
优化技巧：正则化、学习率调度、模型评估
部署实践：模型保存、加载、评估指标

10.2 进阶学习路径

理论进阶

深度学习数学基础（线性代数、概率论）
优化理论
信息论基础

架构进阶

Transformer架构
GAN（生成对抗网络）
强化学习
图神经网络

工程进阶

分布式训练
模型压缩与量化
模型部署（ONNX、TensorRT）
MLOps实践

应用领域

计算机视觉（目标检测、图像分割）
自然语言处理（文本生成、机器翻译）
语音识别与合成
推荐系统

10.3 学习资源推荐

在线课程
- Andrew Ng的深度学习专项课程
- fast.ai实战课程
- MIT深度学习课程
书籍
- 《深度学习》（花书）
- 《动手学深度学习》
- 《神经网络与深度学习》
论文
- 经典论文：AlexNet、VGG、ResNet、Transformer
- 最新研究：关注arXiv和顶级会议（NeurIPS、ICML、CVPR）
开源项目
- PyTorch官方教程
- TensorFlow官方示例
- Hugging Face Transformers

10.4 结语

深度学习是一个快速发展的领域，掌握基础知识和实践技能是第一步。本指南从零开始，系统地介绍了神经网络的核心概念和实战技巧。记住，理论学习与动手实践相结合是最有效的学习方式。建议读者：

动手实践：运行所有代码示例，尝试修改参数
项目驱动：选择感兴趣的实际问题，用深度学习解决
持续学习：关注最新研究，不断更新知识体系
社区参与：在GitHub、Kaggle等平台分享和学习

祝你在深度学习的旅程中取得成功！