引言
旋转目标检测(Rotated Object Detection)是计算机视觉领域的一个重要分支,它专注于检测图像中具有任意方向的目标,而不仅仅是水平或垂直方向。这在遥感图像分析、自动驾驶、工业检测、无人机监控等场景中具有广泛的应用价值。与传统水平框检测相比,旋转目标检测需要处理目标方向的不确定性,因此在算法设计和工程实现上都面临更多挑战。
本文将从算法原理、主流方法、优化策略、工程落地实践等多个维度,全面解析旋转目标检测的优化实战指南,帮助读者从理论到实践全面掌握这一技术。
一、旋转目标检测基础概念
1.1 什么是旋转目标检测?
旋转目标检测是指在图像中定位并识别具有任意旋转角度的目标。与水平框(Horizontal Bounding Box, HBB)不同,旋转目标检测通常使用旋转框(Oriented Bounding Box, OBB)来表示目标,其定义通常为 (x, y, w, h, θ),其中:
(x, y):框的中心点坐标w:框的宽度h:框的高度θ:框的旋转角度(通常以弧度或度数表示,范围可能为[-π/2, π/2]或[0, π))
1.2 旋转目标检测的应用场景
- 遥感图像分析:检测飞机、船舶、车辆等目标,这些目标在遥感图像中可能以任意方向出现。
- 自动驾驶:检测道路上的车辆、行人、交通标志等,这些目标可能因视角变化而呈现不同方向。
- 工业检测:检测生产线上的零件,这些零件可能因摆放角度不同而需要旋转框标注。
- 无人机监控:检测地面上的移动目标,如行人、车辆等,这些目标在无人机视角下方向多变。
1.3 旋转目标检测的挑战
- 方向敏感性:目标方向的变化会影响特征提取和定位精度。
- 边界模糊:旋转框的边界可能与图像边缘或其他目标重叠,导致标注和检测困难。
- 计算复杂度:旋转框的表示和计算比水平框更复杂,增加了模型的计算负担。
- 数据稀缺:旋转框标注的数据集相对较少,且标注成本高。
二、旋转目标检测的算法原理
2.1 旋转框的表示方法
旋转框的表示方法有多种,常见的包括:
- 五参数表示法:
(x, y, w, h, θ),这是最常用的方法,但存在角度周期性问题(例如,θ和θ+π表示同一个框)。 - 八参数表示法:使用四个角点坐标
(x1, y1, x2, y2, x3, y3, x4, y4)表示,避免了角度周期性问题,但参数较多。 - 中心点+角点表示法:结合中心点和一个角点,减少参数数量。
2.2 旋转目标检测的主流方法
2.2.1 基于水平框的改进方法
这类方法在水平框检测的基础上,通过增加角度预测分支来实现旋转检测。
示例:RRPN (Rotation Region Proposal Network)
RRPN 是在 Faster R-CNN 基础上改进的旋转目标检测方法。它通过在 RPN 中引入多尺度、多角度的锚框(Anchor)来生成旋转区域建议。
# 伪代码示例:RRPN 的锚框生成
import numpy as np
def generate_rotated_anchors(image_size, scales, ratios, angles):
"""
生成旋转锚框
:param image_size: 图像尺寸 (H, W)
:param scales: 锚框尺度列表
:param ratios: 锚框长宽比列表
:param angles: 锚框角度列表(弧度)
:return: 锚框列表,每个锚框为 (x, y, w, h, θ)
"""
H, W = image_size
anchors = []
for scale in scales:
for ratio in ratios:
w = scale * np.sqrt(ratio)
h = scale / np.sqrt(ratio)
for angle in angles:
# 假设锚框中心在图像中心(实际中会根据特征图位置计算)
x_center = W / 2
y_center = H / 2
anchors.append((x_center, y_center, w, h, angle))
return anchors
# 示例参数
scales = [32, 64, 128]
ratios = [0.5, 1.0, 2.0]
angles = np.linspace(-np.pi/2, np.pi/2, 9) # 9个角度
image_size = (512, 512)
anchors = generate_rotated_anchors(image_size, scales, ratios, angles)
print(f"生成了 {len(anchors)} 个旋转锚框")
2.2.2 直接回归旋转框的方法
这类方法直接回归旋转框的参数,通常使用角度回归或角点回归。
示例:R2CNN (Rotated Region-based CNN)
R2CNN 在 Faster R-CNN 的基础上,增加了角度回归分支。其损失函数包括分类损失、水平框回归损失和角度回归损失。
import torch
import torch.nn as nn
class R2CNNLoss(nn.Module):
def __init__(self, alpha=1.0, beta=1.0, gamma=1.0):
super(R2CNNLoss, self).__init__()
self.alpha = alpha # 分类损失权重
self.beta = beta # 水平框回归损失权重
self.gamma = gamma # 角度回归损失权重
self.cls_loss = nn.CrossEntropyLoss()
self.reg_loss = nn.SmoothL1Loss()
def forward(self, cls_pred, cls_target, bbox_pred, bbox_target, angle_pred, angle_target):
# 分类损失
cls_loss = self.cls_loss(cls_pred, cls_target)
# 水平框回归损失(假设 bbox_pred 和 bbox_target 是 (x, y, w, h))
bbox_loss = self.reg_loss(bbox_pred, bbox_target)
# 角度回归损失(使用周期性损失函数)
angle_diff = torch.abs(angle_pred - angle_target)
# 处理角度周期性:例如,角度范围是 [-π/2, π/2],但预测可能超出
angle_diff = torch.min(angle_diff, 2*np.pi - angle_diff)
angle_loss = self.reg_loss(angle_pred, angle_target)
total_loss = self.alpha * cls_loss + self.beta * bbox_loss + self.gamma * angle_loss
return total_loss, cls_loss, bbox_loss, angle_loss
# 示例使用
loss_fn = R2CNNLoss(alpha=1.0, beta=1.0, gamma=1.0)
cls_pred = torch.randn(32, 10) # 32个样本,10个类别
cls_target = torch.randint(0, 10, (32,))
bbox_pred = torch.randn(32, 4)
bbox_target = torch.randn(32, 4)
angle_pred = torch.randn(32)
angle_target = torch.randn(32)
total_loss, cls_loss, bbox_loss, angle_loss = loss_fn(cls_pred, cls_target, bbox_pred, bbox_target, angle_pred, angle_target)
print(f"Total Loss: {total_loss.item():.4f}, Cls Loss: {cls_loss.item():.4f}, BBox Loss: {bbox_loss.item():.4f}, Angle Loss: {angle_loss.item():.4f}")
2.2.3 基于角点的方法
这类方法直接回归旋转框的四个角点坐标,避免了角度周期性问题。
示例:CSL (Circular Smooth Label)
CSL 通过将角度离散化为多个区间,并使用圆形平滑标签来处理角度周期性问题。
import torch
import torch.nn as nn
import numpy as np
class CSLLoss(nn.Module):
def __init__(self, num_bins=180, sigma=1.0):
"""
CSL 损失函数
:param num_bins: 角度离散化的区间数
:param sigma: 高斯分布的标准差
"""
super(CSLLoss, self).__init__()
self.num_bins = num_bins
self.sigma = sigma
self.bce_loss = nn.BCELoss()
def forward(self, angle_pred, angle_target):
"""
angle_pred: 预测的角度分布,形状为 (N, num_bins)
angle_target: 真实角度,形状为 (N,)
"""
N = angle_pred.shape[0]
# 将真实角度转换为离散区间索引
angle_target_idx = (angle_target * self.num_bins / (2 * np.pi)).long()
angle_target_idx = angle_target_idx % self.num_bins
# 生成圆形平滑标签
circular_labels = torch.zeros(N, self.num_bins)
for i in range(N):
center_idx = angle_target_idx[i]
# 在中心索引周围生成高斯分布
for j in range(self.num_bins):
# 计算圆形距离
dist = min(abs(j - center_idx), self.num_bins - abs(j - center_idx))
circular_labels[i, j] = np.exp(-dist**2 / (2 * self.sigma**2))
# 归一化标签
circular_labels = circular_labels / circular_labels.sum(dim=1, keepdim=True)
# 计算交叉熵损失
loss = self.bce_loss(angle_pred, circular_labels)
return loss
# 示例使用
loss_fn = CSLLoss(num_bins=180, sigma=1.0)
angle_pred = torch.randn(32, 180).sigmoid() # 模拟预测的分布
angle_target = torch.randn(32) * np.pi # 真实角度(弧度)
loss = loss_fn(angle_pred, angle_target)
print(f"CSL Loss: {loss.item():.4f}")
2.2.4 基于中心点的方法
这类方法将旋转目标检测转化为中心点检测问题,例如 CenterNet 的旋转版本。
示例:CenterNet for Rotated Objects
import torch
import torch.nn as nn
import torch.nn.functional as F
class RotatedCenterNet(nn.Module):
def __init__(self, num_classes, max_objs=128):
super(RotatedCenterNet, self).__init__()
self.num_classes = num_classes
self.max_objs = max_objs
# 假设有一个骨干网络(如 ResNet)
self.backbone = nn.Sequential(
nn.Conv2d(3, 64, 7, stride=2, padding=3),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(3, stride=2, padding=1)
)
# 中心点热图预测
self.center_head = nn.Sequential(
nn.Conv2d(64, 64, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, num_classes, 1)
)
# 尺寸预测(w, h)
self.size_head = nn.Sequential(
nn.Conv2d(64, 64, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 2, 1)
)
# 角度预测
self.angle_head = nn.Sequential(
nn.Conv2d(64, 64, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(64, 1, 1)
)
def forward(self, x):
features = self.backbone(x)
center_heatmap = self.center_head(features)
size_pred = self.size_head(features)
angle_pred = self.angle_head(features)
return center_heatmap, size_pred, angle_pred
class RotatedCenterNetLoss(nn.Module):
def __init__(self, alpha=1.0, beta=1.0, gamma=1.0):
super(RotatedCenterNetLoss, self).__init__()
self.alpha = alpha
self.beta = beta
self.gamma = gamma
self.mse_loss = nn.MSELoss()
self.focal_loss = FocalLoss()
def forward(self, center_pred, center_target, size_pred, size_target, angle_pred, angle_target):
# 中心点热图损失(使用 Focal Loss)
center_loss = self.focal_loss(center_pred, center_target)
# 尺寸回归损失
size_loss = self.mse_loss(size_pred, size_target)
# 角度回归损失
angle_loss = self.mse_loss(angle_pred, angle_target)
total_loss = self.alpha * center_loss + self.beta * size_loss + self.gamma * angle_loss
return total_loss, center_loss, size_loss, angle_loss
class FocalLoss(nn.Module):
def __init__(self, alpha=0.25, gamma=2.0):
super(FocalLoss, self).__init__()
self.alpha = alpha
self.gamma = gamma
def forward(self, pred, target):
bce_loss = F.binary_cross_entropy_with_logits(pred, target, reduction='none')
pt = torch.exp(-bce_loss)
focal_loss = self.alpha * (1 - pt) ** self.gamma * bce_loss
return focal_loss.mean()
# 示例使用
model = RotatedCenterNet(num_classes=10)
loss_fn = RotatedCenterNetLoss(alpha=1.0, beta=1.0, gamma=1.0)
# 模拟输入和输出
x = torch.randn(1, 3, 512, 512)
center_pred, size_pred, angle_pred = model(x)
# 模拟目标
center_target = torch.randn(1, 10, 128, 128) # 假设特征图尺寸为 128x128
size_target = torch.randn(1, 2, 128, 128)
angle_target = torch.randn(1, 1, 128, 128)
total_loss, center_loss, size_loss, angle_loss = loss_fn(center_pred, center_target, size_pred, size_target, angle_pred, angle_target)
print(f"Total Loss: {total_loss.item():.4f}, Center Loss: {center_loss.item():.4f}, Size Loss: {size_loss.item():.4f}, Angle Loss: {angle_loss.item():.4f}")
三、旋转目标检测的优化策略
3.1 数据增强
数据增强是提升旋转目标检测性能的关键手段。针对旋转目标,需要设计专门的数据增强策略。
3.1.1 旋转增强
对图像和旋转框同时进行旋转,增强模型对不同角度的鲁棒性。
import cv2
import numpy as np
def rotate_image_and_bbox(image, bbox, angle):
"""
旋转图像和旋转框
:param image: 输入图像 (H, W, C)
:param bbox: 旋转框 (x, y, w, h, θ)
:param angle: 旋转角度(弧度)
:return: 旋转后的图像和旋转框
"""
h, w = image.shape[:2]
center = (w // 2, h // 2)
# 旋转图像
M = cv2.getRotationMatrix2D(center, angle * 180 / np.pi, 1.0)
rotated_image = cv2.warpAffine(image, M, (w, h))
# 旋转旋转框
x, y, w_box, h_box, theta = bbox
# 将旋转框转换为四个角点
cos_theta = np.cos(theta)
sin_theta = np.sin(theta)
corners = np.array([
[x - w_box/2 * cos_theta + h_box/2 * sin_theta, y - w_box/2 * sin_theta - h_box/2 * cos_theta],
[x + w_box/2 * cos_theta + h_box/2 * sin_theta, y + w_box/2 * sin_theta - h_box/2 * cos_theta],
[x + w_box/2 * cos_theta - h_box/2 * sin_theta, y + w_box/2 * sin_theta + h_box/2 * cos_theta],
[x - w_box/2 * cos_theta - h_box/2 * sin_theta, y - w_box/2 * sin_theta + h_box/2 * cos_theta]
])
# 旋转角点
rotated_corners = np.dot(M, np.hstack([corners, np.ones((4, 1))]).T).T
# 计算新的旋转框
new_center = rotated_corners.mean(axis=0)
new_w = np.linalg.norm(rotated_corners[1] - rotated_corners[0])
new_h = np.linalg.norm(rotated_corners[3] - rotated_corners[0])
new_theta = theta + angle
# 归一化角度到 [0, π)
new_theta = new_theta % np.pi
new_bbox = (new_center[0], new_center[1], new_w, new_h, new_theta)
return rotated_image, new_bbox
# 示例使用
image = np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8)
bbox = (256, 256, 100, 50, np.pi/4) # 中心 (256,256),宽100,高50,角度45度
angle = np.pi/6 # 旋转30度
rotated_image, rotated_bbox = rotate_image_and_bbox(image, bbox, angle)
print(f"原始旋转框: {bbox}")
print(f"旋转后旋转框: {rotated_bbox}")
3.1.2 随机裁剪和缩放
随机裁剪和缩放可以增强模型对目标尺度变化的适应性。
def random_crop_and_scale(image, bbox, crop_size, scale_range=(0.8, 1.2)):
"""
随机裁剪和缩放图像及旋转框
:param image: 输入图像
:param bbox: 旋转框
:param crop_size: 裁剪尺寸
:param scale_range: 缩放范围
:return: 裁剪缩放后的图像和旋转框
"""
h, w = image.shape[:2]
scale = np.random.uniform(scale_range[0], scale_range[1])
# 缩放图像
scaled_image = cv2.resize(image, (int(w * scale), int(h * scale)))
# 缩放旋转框
x, y, w_box, h_box, theta = bbox
scaled_bbox = (x * scale, y * scale, w_box * scale, h_box * scale, theta)
# 随机裁剪
crop_x = np.random.randint(0, max(0, scaled_image.shape[1] - crop_size))
crop_y = np.random.randint(0, max(0, scaled_image.shape[0] - crop_size))
cropped_image = scaled_image[crop_y:crop_y + crop_size, crop_x:crop_x + crop_size]
# 调整旋转框坐标
new_x = scaled_bbox[0] - crop_x
new_y = scaled_bbox[1] - crop_y
new_bbox = (new_x, new_y, scaled_bbox[2], scaled_bbox[3], scaled_bbox[4])
return cropped_image, new_bbox
# 示例使用
image = np.random.randint(0, 255, (512, 512, 3), dtype=np.uint8)
bbox = (256, 256, 100, 50, np.pi/4)
crop_size = 416
cropped_image, cropped_bbox = random_crop_and_scale(image, bbox, crop_size)
print(f"裁剪后图像尺寸: {cropped_image.shape}")
print(f"裁剪后旋转框: {cropped_bbox}")
3.1.3 混合增强
结合多种增强方法,如 MixUp、Mosaic 等,进一步提升模型性能。
def mosaic_augmentation(images, bboxes_list, mosaic_size=640):
"""
Mosaic 数据增强:将4张图像拼接成一张大图
:param images: 图像列表,长度为4
:param bboxes_list: 旋转框列表,每个元素是对应图像的旋转框列表
:param mosaic_size: 拼接后的图像尺寸
:return: 拼接后的图像和对应的旋转框列表
"""
assert len(images) == 4, "Mosaic augmentation requires 4 images"
# 创建空白画布
mosaic_image = np.zeros((mosaic_size, mosaic_size, 3), dtype=np.uint8)
# 随机选择拼接中心点
center_x = np.random.randint(mosaic_size // 4, 3 * mosaic_size // 4)
center_y = np.random.randint(mosaic_size // 4, 3 * mosaic_size // 4)
# 拼接4张图像
offsets = [(0, 0), (mosaic_size, 0), (0, mosaic_size), (mosaic_size, mosaic_size)]
new_bboxes = []
for i, (img, bboxes) in enumerate(zip(images, bboxes_list)):
h, w = img.shape[:2]
# 计算当前图像在拼接图中的位置
if i == 0: # 左上
x_start = center_x - w
y_start = center_y - h
elif i == 1: # 右上
x_start = center_x
y_start = center_y - h
elif i == 2: # 左下
x_start = center_x - w
y_start = center_y
else: # 右下
x_start = center_x
y_start = center_y
# 确保不越界
x_start = max(0, x_start)
y_start = max(0, y_start)
x_end = min(mosaic_size, x_start + w)
y_end = min(mosaic_size, y_start + h)
# 拼接图像
mosaic_image[y_start:y_end, x_start:x_end] = img[:y_end-y_start, :x_end-x_start]
# 调整旋转框坐标
for bbox in bboxes:
x, y, w_box, h_box, theta = bbox
new_x = x + x_start
new_y = y + y_start
new_bboxes.append((new_x, new_y, w_box, h_box, theta))
return mosaic_image, new_bboxes
# 示例使用
images = [np.random.randint(0, 255, (320, 320, 3), dtype=np.uint8) for _ in range(4)]
bboxes_list = [
[(160, 160, 50, 30, np.pi/6)], # 图像1的旋转框
[(160, 160, 40, 20, np.pi/3)], # 图像2的旋转框
[(160, 160, 60, 40, np.pi/4)], # 图像3的旋转框
[(160, 160, 45, 25, np.pi/2)], # 图像4的旋转框
]
mosaic_image, mosaic_bboxes = mosaic_augmentation(images, bboxes_list)
print(f"Mosaic图像尺寸: {mosaic_image.shape}")
print(f"Mosaic旋转框数量: {len(mosaic_bboxes)}")
3.2 损失函数优化
3.2.1 角度损失函数的改进
角度周期性问题是旋转目标检测中的核心挑战。以下是一些改进的损失函数:
示例:Smooth L1 损失结合角度周期性处理
import torch
import torch.nn as nn
class PeriodicSmoothL1Loss(nn.Module):
def __init__(self, beta=1.0, angle_range=np.pi):
super(PeriodicSmoothL1Loss, self).__init__()
self.beta = beta
self.angle_range = angle_range
def forward(self, pred, target):
# 计算角度差值
diff = pred - target
# 处理周期性:选择最小的角度差值
diff = torch.min(diff, self.angle_range - diff)
# Smooth L1 损失
abs_diff = torch.abs(diff)
loss = torch.where(abs_diff < self.beta, 0.5 * abs_diff**2 / self.beta, abs_diff - 0.5 * self.beta)
return loss.mean()
# 示例使用
loss_fn = PeriodicSmoothL1Loss(beta=1.0, angle_range=np.pi)
pred = torch.tensor([0.1, 1.5, 3.0])
target = torch.tensor([0.2, 1.4, 3.1])
loss = loss_fn(pred, target)
print(f"Periodic Smooth L1 Loss: {loss.item():.4f}")
3.2.2 多任务损失平衡
旋转目标检测通常涉及分类、框回归和角度回归,需要平衡各任务的损失权重。
示例:动态损失权重调整
class DynamicLossWeighting(nn.Module):
def __init__(self, num_tasks=3, init_weights=[1.0, 1.0, 1.0]):
super(DynamicLossWeighting, self).__init__()
self.num_tasks = num_tasks
self.weights = nn.Parameter(torch.tensor(init_weights, dtype=torch.float32))
def forward(self, losses):
"""
losses: 各任务的损失列表,长度为 num_tasks
"""
# 计算总损失
total_loss = 0
for i, loss in enumerate(losses):
total_loss += self.weights[i] * loss
# 反向传播时,权重也会被更新
return total_loss
# 示例使用
loss_fn = DynamicLossWeighting(num_tasks=3, init_weights=[1.0, 1.0, 1.0])
cls_loss = torch.tensor(0.5)
bbox_loss = torch.tensor(0.3)
angle_loss = torch.tensor(0.2)
total_loss = loss_fn([cls_loss, bbox_loss, angle_loss])
print(f"Total Loss with Dynamic Weighting: {total_loss.item():.4f}")
print(f"Learned Weights: {loss_fn.weights.data}")
3.3 模型架构优化
3.3.1 特征金字塔网络(FPN)的改进
FPN 是目标检测中常用的多尺度特征融合方法。针对旋转目标检测,可以改进 FPN 以更好地处理不同尺度的目标。
示例:旋转感知的 FPN
import torch
import torch.nn as nn
import torch.nn.functional as F
class RotatedFPN(nn.Module):
def __init__(self, in_channels_list, out_channels=256):
super(RotatedFPN, self).__init__()
self.in_channels_list = in_channels_list
self.out_channels = out_channels
# 自顶向下路径
self.top_down_layers = nn.ModuleList()
for in_channels in in_channels_list:
self.top_down_layers.append(nn.Conv2d(in_channels, out_channels, 1))
# 横向连接(1x1卷积调整通道数)
self.lateral_convs = nn.ModuleList()
for in_channels in in_channels_list:
self.lateral_convs.append(nn.Conv2d(in_channels, out_channels, 1))
# 3x3卷积用于平滑特征
self.smooth_convs = nn.ModuleList()
for _ in range(len(in_channels_list)):
self.smooth_convs.append(nn.Conv2d(out_channels, out_channels, 3, padding=1))
# 旋转感知模块(例如,添加方向注意力)
self.direction_attention = DirectionAttention(out_channels)
def forward(self, features):
"""
features: 多尺度特征列表,从骨干网络输出
"""
# 横向连接
lateral_features = []
for i, feature in enumerate(features):
lateral_features.append(self.lateral_convs[i](feature))
# 自顶向下融合
top_down_features = [lateral_features[-1]]
for i in range(len(lateral_features) - 2, -1, -1):
# 上采样
prev_feature = F.interpolate(top_down_features[-1], scale_factor=2, mode='nearest')
# 融合
fused_feature = lateral_features[i] + prev_feature
top_down_features.append(fused_feature)
# 反转顺序,使特征从高分辨率到低分辨率
top_down_features = top_down_features[::-1]
# 平滑处理
smooth_features = []
for i, feature in enumerate(top_down_features):
smooth_feature = self.smooth_convs[i](feature)
smooth_features.append(smooth_feature)
# 应用旋转感知注意力
attention_features = []
for feature in smooth_features:
attention_feature = self.direction_attention(feature)
attention_features.append(attention_feature)
return attention_features
class DirectionAttention(nn.Module):
def __init__(self, channels):
super(DirectionAttention, self).__init__()
self.conv = nn.Conv2d(channels, channels, 3, padding=1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
# 计算方向注意力权重
attention = self.conv(x)
attention = self.sigmoid(attention)
# 应用注意力
return x * attention
# 示例使用
in_channels_list = [256, 512, 1024, 2048]
fpn = RotatedFPN(in_channels_list, out_channels=256)
# 模拟多尺度特征
features = [torch.randn(1, c, 64, 64) for c in in_channels_list]
output_features = fpn(features)
print(f"FPN输出特征数量: {len(output_features)}")
for i, feat in enumerate(output_features):
print(f"特征 {i} 尺寸: {feat.shape}")
3.3.2 注意力机制的引入
注意力机制可以帮助模型聚焦于与方向相关的关键区域。
示例:旋转感知的注意力模块
class RotatedAttention(nn.Module):
def __init__(self, in_channels, reduction=16):
super(RotatedAttention, self).__init__()
self.in_channels = in_channels
self.reduction = reduction
# 通道注意力
self.channel_attention = nn.Sequential(
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(in_channels, in_channels // reduction, 1),
nn.ReLU(inplace=True),
nn.Conv2d(in_channels // reduction, in_channels, 1),
nn.Sigmoid()
)
# 空间注意力(考虑旋转)
self.spatial_attention = nn.Sequential(
nn.Conv2d(2, 1, 3, padding=1),
nn.Sigmoid()
)
def forward(self, x):
# 通道注意力
channel_weights = self.channel_attention(x)
channel_attended = x * channel_weights
# 空间注意力(使用梯度幅值和方向)
# 计算梯度
sobel_x = torch.tensor([[-1, 0, 1], [-2, 0, 2], [-1, 0, 1]], dtype=torch.float32).view(1, 1, 3, 3).to(x.device)
sobel_y = torch.tensor([[-1, -2, -1], [0, 0, 0], [1, 2, 1]], dtype=torch.float32).view(1, 1, 3, 3).to(x.device)
grad_x = F.conv2d(x.mean(dim=1, keepdim=True), sobel_x, padding=1)
grad_y = F.conv2d(x.mean(dim=1, keepdim=True), sobel_y, padding=1)
# 梯度幅值和方向
grad_mag = torch.sqrt(grad_x**2 + grad_y**2 + 1e-6)
grad_dir = torch.atan2(grad_y, grad_x)
# 构建空间注意力输入
spatial_input = torch.cat([grad_mag, grad_dir], dim=1)
spatial_weights = self.spatial_attention(spatial_input)
spatial_attended = channel_attended * spatial_weights
return spatial_attended
# 示例使用
attention = RotatedAttention(256)
x = torch.randn(1, 256, 64, 64)
output = attention(x)
print(f"注意力输出尺寸: {output.shape}")
四、工程落地实践
4.1 数据集准备与标注
4.1.1 常用旋转目标检测数据集
- DOTA (Dataset for Object deTection in Aerial images):遥感图像数据集,包含15个类别,约1800张图像,标注为旋转框。
- HRSC2016 (High Resolution Ship Collection):船舶检测数据集,包含1061张图像,标注为旋转框。
- ICDAR2015:文本检测数据集,包含文本行的旋转框标注。
- UCAS-AOD:航空图像目标检测数据集,包含飞机和车辆。
4.1.2 数据标注工具
- LabelImg:支持旋转框标注的开源工具。
- CVAT:支持旋转框标注的在线标注平台。
- 自定义工具:根据需求开发标注工具。
4.1.3 数据格式转换
不同数据集和模型可能使用不同的数据格式,需要进行转换。
示例:DOTA 格式转换为 COCO 格式
import json
import os
import numpy as np
def dota_to_coco(dota_dir, output_json):
"""
将 DOTA 数据集转换为 COCO 格式
:param dota_dir: DOTA 数据集目录,包含 images 和 labels 子目录
:param output_json: 输出的 COCO JSON 文件路径
"""
coco_format = {
"images": [],
"annotations": [],
"categories": []
}
# 类别映射(根据 DOTA 的类别)
categories = ["plane", "ship", "storage tank", "baseball diamond", "tennis court",
"basketball court", "ground track field", "harbor", "bridge", "large vehicle",
"small vehicle", "helicopter", "roundabout", "soccer ball field", "swimming pool"]
for idx, category in enumerate(categories):
coco_format["categories"].append({
"id": idx + 1,
"name": category,
"supercategory": "object"
})
image_id = 1
annotation_id = 1
# 遍历图像和标签
images_dir = os.path.join(dota_dir, "images")
labels_dir = os.path.join(dota_dir, "labels")
for image_file in os.listdir(images_dir):
if not image_file.endswith(('.png', '.jpg', '.jpeg')):
continue
image_path = os.path.join(images_dir, image_file)
label_path = os.path.join(labels_dir, image_file.replace('.png', '.txt').replace('.jpg', '.txt').replace('.jpeg', '.txt'))
# 读取图像信息
import cv2
img = cv2.imread(image_path)
h, w = img.shape[:2]
# 添加图像信息
coco_format["images"].append({
"id": image_id,
"file_name": image_file,
"width": w,
"height": h
})
# 读取标签
if os.path.exists(label_path):
with open(label_path, 'r') as f:
lines = f.readlines()
for line in lines:
parts = line.strip().split()
if len(parts) < 10:
continue
# 解析旋转框:x1, y1, x2, y2, x3, y3, x4, y4, category, difficult
points = list(map(float, parts[:8]))
category_name = parts[8]
difficult = int(parts[9])
# 找到类别ID
category_id = categories.index(category_name) + 1
# 计算旋转框的中心点、宽高和角度
points = np.array(points).reshape(4, 2)
center = points.mean(axis=0)
w_box = np.linalg.norm(points[1] - points[0])
h_box = np.linalg.norm(points[3] - points[0])
# 计算角度(使用第一个边向量)
vec = points[1] - points[0]
angle = np.arctan2(vec[1], vec[0])
# 转换为 COCO 格式的旋转框(这里使用五参数表示)
bbox = [center[0], center[1], w_box, h_box, angle]
# 添加注释
coco_format["annotations"].append({
"id": annotation_id,
"image_id": image_id,
"category_id": category_id,
"bbox": bbox,
"area": w_box * h_box,
"iscrowd": 0,
"segmentation": [] # 旋转框没有分割信息
})
annotation_id += 1
image_id += 1
# 保存为 JSON 文件
with open(output_json, 'w') as f:
json.dump(coco_format, f, indent=2)
print(f"转换完成,共 {len(coco_format['images'])} 张图像,{len(coco_format['annotations'])} 个注释")
# 示例使用(假设数据集目录结构)
# dota_dir = "path/to/DOTA"
# output_json = "dota_coco.json"
# dota_to_coco(dota_dir, output_json)
4.2 模型训练与评估
4.2.1 训练流程
- 数据加载:使用 PyTorch 或 TensorFlow 的数据加载器,支持自定义数据增强。
- 模型初始化:选择合适的骨干网络(如 ResNet、EfficientNet)和检测头。
- 优化器选择:通常使用 Adam 或 SGD 优化器。
- 学习率调度:使用余弦退火、StepLR 等调度策略。
- 训练循环:迭代训练数据,计算损失,反向传播,更新参数。
示例:PyTorch 训练循环
import torch
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import transforms
# 假设已有自定义数据集类 RotatedDataset
from your_dataset import RotatedDataset
# 数据增强
train_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
# 自定义旋转增强(在数据集类中实现)
])
# 数据集和数据加载器
train_dataset = RotatedDataset(data_dir='train_data', transform=train_transform)
train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True, num_workers=4)
# 模型、损失函数和优化器
model = RotatedCenterNet(num_classes=10) # 假设模型类
loss_fn = RotatedCenterNetLoss(alpha=1.0, beta=1.0, gamma=1.0)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
scheduler = optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=100)
# 训练循环
num_epochs = 100
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
for epoch in range(num_epochs):
model.train()
total_loss = 0
for batch_idx, (images, targets) in enumerate(train_loader):
images = images.to(device)
# 前向传播
center_pred, size_pred, angle_pred = model(images)
# 准备目标
center_target = targets['center'].to(device)
size_target = targets['size'].to(device)
angle_target = targets['angle'].to(device)
# 计算损失
total_loss_batch, center_loss, size_loss, angle_loss = loss_fn(
center_pred, center_target, size_pred, size_target, angle_pred, angle_target
)
# 反向传播
optimizer.zero_grad()
total_loss_batch.backward()
optimizer.step()
total_loss += total_loss_batch.item()
if batch_idx % 10 == 0:
print(f"Epoch [{epoch+1}/{num_epochs}], Batch [{batch_idx}/{len(train_loader)}], "
f"Loss: {total_loss_batch.item():.4f}, "
f"Center Loss: {center_loss.item():.4f}, "
f"Size Loss: {size_loss.item():.4f}, "
f"Angle Loss: {angle_loss.item():.4f}")
# 更新学习率
scheduler.step()
# 打印平均损失
avg_loss = total_loss / len(train_loader)
print(f"Epoch [{epoch+1}/{num_epochs}] Average Loss: {avg_loss:.4f}")
# 保存模型(每10个epoch)
if (epoch + 1) % 10 == 0:
torch.save(model.state_dict(), f"model_epoch_{epoch+1}.pth")
4.2.2 评估指标
旋转目标检测的评估指标通常使用旋转框的 IoU(Intersection over Union)。
示例:计算旋转框 IoU
import numpy as np
def rotated_iou(box1, box2):
"""
计算两个旋转框的 IoU
:param box1: (x1, y1, w1, h1, θ1)
:param box2: (x2, y2, w2, h2, θ2)
:return: IoU 值
"""
# 将旋转框转换为四个角点
def bbox_to_corners(bbox):
x, y, w, h, theta = bbox
cos_theta = np.cos(theta)
sin_theta = np.sin(theta)
corners = np.array([
[x - w/2 * cos_theta + h/2 * sin_theta, y - w/2 * sin_theta - h/2 * cos_theta],
[x + w/2 * cos_theta + h/2 * sin_theta, y + w/2 * sin_theta - h/2 * cos_theta],
[x + w/2 * cos_theta - h/2 * sin_theta, y + w/2 * sin_theta + h/2 * cos_theta],
[x - w/2 * cos_theta - h/2 * sin_theta, y - w/2 * sin_theta + h/2 * cos_theta]
])
return corners
corners1 = bbox_to_corners(box1)
corners2 = bbox_to_corners(box2)
# 计算交集多边形
from shapely.geometry import Polygon
poly1 = Polygon(corners1)
poly2 = Polygon(corners2)
if not poly1.is_valid or not poly2.is_valid:
return 0.0
intersection = poly1.intersection(poly2).area
union = poly1.area + poly2.area - intersection
if union == 0:
return 0.0
return intersection / union
# 示例使用
box1 = (100, 100, 50, 30, np.pi/6)
box2 = (110, 105, 55, 35, np.pi/4)
iou = rotated_iou(box1, box2)
print(f"旋转框 IoU: {iou:.4f}")
示例:计算 mAP(mean Average Precision)
def compute_map(predictions, ground_truths, iou_threshold=0.5):
"""
计算 mAP(mean Average Precision)
:param predictions: 预测结果列表,每个元素为 (image_id, class_id, confidence, bbox)
:param ground_truths: 真实标签列表,每个元素为 (image_id, class_id, bbox)
:param iou_threshold: IoU 阈值
:return: mAP 值
"""
# 按置信度排序
predictions = sorted(predictions, key=lambda x: x[2], reverse=True)
# 按类别分组
classes = set([p[1] for p in predictions] + [gt[1] for gt in ground_truths])
aps = []
for class_id in classes:
# 获取当前类别的预测和真实标签
class_preds = [p for p in predictions if p[1] == class_id]
class_gts = [gt for gt in ground_truths if gt[1] == class_id]
if len(class_preds) == 0 or len(class_gts) == 0:
aps.append(0.0)
continue
# 按图像分组
image_ids = set([p[0] for p in class_preds] + [gt[0] for gt in class_gts])
# 计算每个图像的 TP 和 FP
tp_list = []
fp_list = []
for image_id in image_ids:
image_preds = [p for p in class_preds if p[0] == image_id]
image_gts = [gt for gt in class_gts if gt[0] == image_id]
# 按置信度排序
image_preds = sorted(image_preds, key=lambda x: x[2], reverse=True)
# 标记已匹配的真实标签
matched_gts = set()
for pred in image_preds:
pred_bbox = pred[3]
best_iou = 0
best_gt_idx = -1
for i, gt in enumerate(image_gts):
if i in matched_gts:
continue
gt_bbox = gt[2]
iou = rotated_iou(pred_bbox, gt_bbox)
if iou > best_iou:
best_iou = iou
best_gt_idx = i
if best_iou >= iou_threshold and best_gt_idx != -1:
tp_list.append(1)
fp_list.append(0)
matched_gts.add(best_gt_idx)
else:
tp_list.append(0)
fp_list.append(1)
# 计算 Precision 和 Recall
tp_cumsum = np.cumsum(tp_list)
fp_cumsum = np.cumsum(fp_list)
recalls = tp_cumsum / len(class_gts)
precisions = tp_cumsum / (tp_cumsum + fp_cumsum + 1e-6)
# 计算 AP(使用 11-point interpolation)
ap = 0
for t in np.arange(0, 1.1, 0.1):
if np.any(recalls >= t):
precision_at_t = np.max(precisions[recalls >= t])
ap += precision_at_t
ap /= 11
aps.append(ap)
# 计算 mAP
mAP = np.mean(aps)
return mAP
# 示例使用
predictions = [
(1, 1, 0.9, (100, 100, 50, 30, np.pi/6)),
(1, 1, 0.8, (110, 105, 55, 35, np.pi/4)),
(2, 1, 0.7, (200, 200, 60, 40, np.pi/3)),
]
ground_truths = [
(1, 1, (105, 105, 52, 32, np.pi/6)),
(2, 1, (205, 205, 58, 38, np.pi/3)),
]
mAP = compute_map(predictions, ground_truths, iou_threshold=0.5)
print(f"mAP@0.5: {mAP:.4f}")
4.3 模型部署与优化
4.3.1 模型转换与量化
将训练好的模型转换为适合部署的格式(如 ONNX、TensorRT),并进行量化以减少模型大小和加速推理。
示例:PyTorch 模型转换为 ONNX
import torch
import onnx
import onnxruntime as ort
# 加载训练好的模型
model = RotatedCenterNet(num_classes=10)
model.load_state_dict(torch.load("model_epoch_100.pth"))
model.eval()
# 创建示例输入
dummy_input = torch.randn(1, 3, 512, 512)
# 导出为 ONNX
torch.onnx.export(
model,
dummy_input,
"rotated_detection.onnx",
export_params=True,
opset_version=11,
do_constant_folding=True,
input_names=['input'],
output_names=['center_heatmap', 'size_pred', 'angle_pred'],
dynamic_axes={'input': {0: 'batch_size'}, 'center_heatmap': {0: 'batch_size'}}
)
# 验证 ONNX 模型
onnx_model = onnx.load("rotated_detection.onnx")
onnx.checker.check_model(onnx_model)
# 使用 ONNX Runtime 进行推理
ort_session = ort.InferenceSession("rotated_detection.onnx")
input_name = ort_session.get_inputs()[0].name
output_names = [output.name for output in ort_session.get_outputs()]
# 模拟输入
input_data = dummy_input.numpy()
outputs = ort_session.run(output_names, {input_name: input_data})
print(f"ONNX 推理完成,输出尺寸: {[out.shape for out in outputs]}")
4.3.2 TensorRT 优化
TensorRT 是 NVIDIA 推出的高性能深度学习推理优化器,可以显著提升推理速度。
示例:ONNX 模型转换为 TensorRT 引擎
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
def build_tensorrt_engine(onnx_file_path, engine_file_path, max_batch_size=1, max_workspace_size=1 << 30):
"""
将 ONNX 模型转换为 TensorRT 引擎
:param onnx_file_path: ONNX 文件路径
:param engine_file_path: 输出引擎文件路径
:param max_batch_size: 最大批处理大小
:param max_workspace_size: 最大工作空间大小(字节)
"""
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
# 创建 Builder
builder = trt.Builder(TRT_LOGGER)
# 创建网络定义
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
# 创建 ONNX 解析器
parser = trt.OnnxParser(network, TRT_LOGGER)
# 解析 ONNX 模型
with open(onnx_file_path, 'rb') as model:
if not parser.parse(model.read()):
for error in range(parser.num_errors):
print(parser.get_error(error))
return None
# 配置 Builder
config = builder.create_builder_config()
config.max_workspace_size = max_workspace_size
# 设置优化配置
config.set_flag(trt.BuilderFlag.FP16) # 启用 FP16 精度(如果 GPU 支持)
# 构建引擎
engine = builder.build_engine(network, config)
# 保存引擎
with open(engine_file_path, 'wb') as f:
f.write(engine.serialize())
print(f"TensorRT 引擎已保存到 {engine_file_path}")
return engine
# 示例使用
onnx_file_path = "rotated_detection.onnx"
engine_file_path = "rotated_detection.trt"
build_tensorrt_engine(onnx_file_path, engine_file_path)
4.3.3 推理优化技巧
- 批处理:使用批处理提高 GPU 利用率。
- 混合精度推理:使用 FP16 或 INT8 精度减少计算量和内存占用。
- 模型剪枝:移除不重要的权重或通道,减少模型大小。
- 知识蒸馏:用大模型(教师模型)指导小模型(学生模型)训练,提升小模型性能。
示例:混合精度推理(PyTorch)
import torch
from torch.cuda.amp import autocast
# 加载模型
model = RotatedCenterNet(num_classes=10)
model.load_state_dict(torch.load("model_epoch_100.pth"))
model.eval()
model = model.cuda()
# 创建示例输入
input_data = torch.randn(1, 3, 512, 512).cuda()
# 混合精度推理
with autocast():
center_pred, size_pred, angle_pred = model(input_data)
print(f"混合精度推理完成,输出尺寸: {center_pred.shape}, {size_pred.shape}, {angle_pred.shape}")
4.4 性能优化与调试
4.4.1 性能分析工具
- PyTorch Profiler:分析模型各部分的计算时间和内存使用。
- TensorBoard:可视化训练过程和模型结构。
- NVIDIA Nsight Systems:系统级性能分析。
示例:使用 PyTorch Profiler 分析模型
import torch
from torch.profiler import profile, record_function, ProfilerActivity
# 加载模型和输入
model = RotatedCenterNet(num_classes=10)
model.eval()
input_data = torch.randn(1, 3, 512, 512)
# 使用 Profiler
with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], record_shapes=True) as prof:
with record_function("model_inference"):
center_pred, size_pred, angle_pred = model(input_data)
# 打印结果
print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=10))
4.4.2 常见问题与解决方案
角度预测不稳定:
- 原因:角度周期性问题未处理好。
- 解决方案:使用 CSL、周期性损失函数或角点回归。
小目标检测性能差:
- 原因:特征金字塔中低层特征分辨率不足。
- 解决方案:增加低层特征的权重,使用更细粒度的特征金字塔。
训练收敛慢:
- 原因:损失函数权重不平衡或学习率设置不当。
- 解决方案:调整损失权重,使用动态学习率调度。
模型过拟合:
- 原因:数据量不足或数据增强不够。
- 解决方案:增加数据增强,使用正则化(如 Dropout、权重衰减)。
五、总结与展望
旋转目标检测是一个充满挑战但应用广泛的领域。本文从算法原理、优化策略、工程落地等多个维度进行了全面解析。通过理解旋转框的表示方法、主流算法、数据增强、损失函数优化、模型架构改进等,读者可以掌握旋转目标检测的核心技术。
在工程落地方面,数据集准备、模型训练、评估指标、模型部署和性能优化都是不可或缺的环节。通过实际代码示例,读者可以更好地理解和应用这些技术。
未来,旋转目标检测的发展方向可能包括:
- 更高效的表示方法:减少参数数量,避免角度周期性问题。
- 自监督和半监督学习:利用无标注数据提升模型性能。
- 多模态融合:结合图像、点云、雷达等多模态数据进行检测。
- 实时性优化:在嵌入式设备上实现高精度、低延迟的旋转目标检测。
希望本文能为读者在旋转目标检测领域的研究和应用提供有价值的参考。
