当前位置：首页 > news >正文

公司网站建设优点上海网站建设哪里便宜

news 2025/10/14 7:01:06

公司网站建设优点,上海网站建设哪里便宜,专业的网站建设服务商,旅游网站建设目的✅ YOLOv3 训练与推理流程详解一、前言 YOLOv3 是目标检测领域的重要模型之一#xff0c;其核心思想是#xff1a; 使用多尺度预测提升小目标检测能力#xff1b;使用 anchor boxes 提升边界框匹配合理性#xff1b;单阶段结构实现高效实时检测#xff1b; 本文将通过…✅ YOLOv3 训练与推理流程详解一、前言 YOLOv3 是目标检测领域的重要模型之一其核心思想是使用多尺度预测提升小目标检测能力使用 anchor boxes 提升边界框匹配合理性单阶段结构实现高效实时检测本文将通过一个实际数据样例带你一步步走过 YOLOv3 的训练和推理过程。二、假设的数据集样例我们构造一个小型的真实数据集样例用于说明训练与推理流程数据集描述图像尺寸416 × 416类别数量2 类person, carAnchor Boxes 数量9 个每层 3 个标注格式PASCAL VOC XML归一化坐标示例图像标注ground truth objectnameperson/namebndboxxmin100/xminymin150/yminxmax200/xmaxymax300/ymax/bndbox /objectobjectnamecar/namebndboxxmin250/xminymin100/yminxmax350/xmaxymax200/ymax/bndbox /object三、YOLOv3 的训练流程详解 ✅ 来源依据 YOLOv3: An Incremental Improvement (CVPR 2018)AlexeyAB/darknet 开源实现 ⚙️ Step 1: 数据预处理输入图像处理调整为固定大小416 × 416归一化像素值到 [0, 1] 区间边界框处理将 (xmin, ymin, xmax, ymax) 转换为 (x_center, y_center, width, height)并归一化到 [0, 1]示例转换结果image_size 416 person_bbox [150 / 416, 225 / 416, 100 / 416, 150 / 416] # x_center, y_center, w, h car_bbox [300 / 416, 150 / 416, 100 / 416, 100 / 416]⚙️ Step 2: Anchor Box 分配正样本划分原理回顾 YOLOv3 使用 K-Means 对 COCO 数据集中的真实框聚类得到的 9 个 anchors按层级分配如下层级Anchors大目标13×13[116×90, 156×198, 373×326]中目标26×26[30×61, 62×45, 59×119]小目标52×52[10×13, 16×30, 33×23] 示例 anchor 匹配逻辑对每个 ground truth 框计算其与所有 anchor 的 IoU并选择 IoU 最大的那个作为正样本。 from yolov3.utils import compute_iou, match_anchor_to_gtanchors [(10, 13), (16, 30), (33, 23),(30, 61), (62, 45), (59, 119),(116, 90), (156, 198), (373, 326)]gt_boxes [[0.36, 0.54, 0.24, 0.36], # person[0.72, 0.36, 0.24, 0.24]] # carpositive_anchors match_anchor_to_gt(gt_boxes, anchors)输出示例简化表示 [{anchor_idx: 0, layer: 2, grid_cell: (26, 26)}, # person → 小目标层 anchor 0{anchor_idx: 4, layer: 1, grid_cell: (18, 9)} # car → 中目标层 anchor 4 ]⚙️ Step 3: 构建训练标签Label Assignment YOLOv3 的输出是一个三维张量 [batch_size, H, W, (B × (5 C))]其中 H × W特征图大小如 13×13B 3每个位置预测的 bounding box 数量5 C每个 bounding box 的参数tx, ty, tw, th, confidence, class_probs 示例标签构建对于 person 和 car 各一个目标生成三个层级的 label label_13x13 np.zeros((13, 13, 3, 5 2)) label_26x26 np.zeros((26, 26, 3, 5 2)) label_52x52 np.zeros((52, 52, 3, 5 2))# 在 person 对应的 grid cell 和 anchor 上填充真实值 label_52x52[26, 26, 0, :4] [0.36, 0.54, 0.24, 0.36] # tx, ty, tw, th label_52x52[26, 26, 0, 4] 1.0 # confidence label_52x52[26, 26, 0, 5] 1.0 # person 类别置信度# 在 car 对应的 grid cell 和 anchor 上填充真实值 label_26x26[18, 9, 1, :4] [0.72, 0.36, 0.24, 0.24] label_26x26[18, 9, 1, 4] 1.0 label_26x26[18, 9, 1, 6] 1.0 # car 类别置信度⚙️ Step 4: 损失函数计算 YOLOv3 的损失函数由三部分组成 1. 定位损失Localization Loss 仅对正样本计算 L l o c λ c o o r d ∑ ( tx , ty , tw , th ) 2 \mathcal{L}_{loc} \lambda_{coord} \sum (\text{tx}, \text{ty}, \text{tw}, \text{th})^2 Llocλcoord∑(tx,ty,tw,th)2 2. 置信度损失Confidence Loss 对正样本和负样本分别计算 L c o n f ∑ p o s ( confidence − 1 ) 2 λ n o o b j ∑ n e g ( confidence ) 2 \mathcal{L}_{conf} \sum_{pos} (\text{confidence} - 1)^2 \lambda_{noobj} \sum_{neg} (\text{confidence})^2 Lconfpos∑(confidence−1)2λnoobjneg∑(confidence)2 3. 分类损失Class Probability Loss 仅对正样本计算交叉熵或 BCELoss L c l s ∑ c 1 C ( p c − p ^ c ) 2 \mathcal{L}_{cls} \sum_{c1}^{C} (p_c - \hat{p}_c)^2 Lclsc1∑C(pc−p^c)2 四、YOLOv3 的推理流程详解 ⚙️ Step 1: 图像输入与预处理 image cv2.imread(test.jpg) resized_image cv2.resize(image, (416, 416)) / 255.0 # 归一化 input_tensor np.expand_dims(resized_image, axis0) # 添加 batch 维度⚙️ Step 2: 推理输出来自 Darknet 或 PyTorch 模型模型输出三个层级的预测结果 output_13x13 model.predict(input_tensor)[0] # shape: (13, 13, 255) output_26x26 model.predict(input_tensor)[1] # shape: (26, 26, 255) output_52x52 model.predict(input_tensor)[2] # shape: (52, 52, 255)每个 bounding box 的输出格式为 (tx, ty, tw, th, confidence, class_0, class_1)⚙️ Step 3: 解码 bounding box 使用以下公式将网络输出解码为图像空间中的绝对坐标 b x σ ( t x ) c x b y σ ( t y ) c y b w p w ⋅ e t w b h p h ⋅ e t h b_x \sigma(t_x) c_x \\ b_y \sigma(t_y) c_y \\ b_w p_w \cdot e^{t_w} \\ b_h p_h \cdot e^{t_h} bxσ(tx)cxbyσ(ty)cybwpw⋅etwbhph⋅eth 其中 ( c x , c y ) (c_x, c_y) (cx,cy)当前 grid cell 左上角坐标归一化后 ( p w , p h ) (p_w, p_h) (pw,ph)对应 anchor 的宽高归一化后示例解码伪代码 def decode_box(output_tensor, anchors):num_anchors len(anchors)bboxes []for i in range(output_tensor.shape[0]):for j in range(output_tensor.shape[1]):for k in range(num_anchors):tx, ty, tw, th output_tensor[i, j, k*85:(k1)*85][:4]conf output_tensor[i, j, k*854]class_probs output_tensor[i, j, k*855:k*857]# 解码bx sigmoid(tx) j * stride_xby sigmoid(ty) i * stride_ybw anchors[k][0] * exp(tw)bh anchors[k][1] * exp(th)# 归一化坐标转为图像空间坐标x1 (bx - bw/2) * image_sizey1 (by - bh/2) * image_sizex2 (bx bw/2) * image_sizey2 (by bh/2) * image_sizebboxes.append([x1, y1, x2, y2, conf, class_probs])return bboxes⚙️ Step 4: 执行 NMSNon-Maximum Suppression 计算综合得分 score confidence × max ⁡ ( class_probs ) \text{score} \text{confidence} \times \max(\text{class\_probs}) scoreconfidence×max(class_probs) 示例执行 NMSPyTorch import torch from torchvision.ops import nms# 假设 boxes 是 [N, 4]scores 是 [N] keep_indices nms(boxes, scores, iou_threshold0.45)final_boxes boxes[keep_indices] final_scores scores[keep_indices] final_labels labels[keep_indices]五、YOLOv3 的完整训练与推理流程总结阶段内容✅ 输入图像416 × 416 × 3 RGB 图像✅ 数据增强随机缩放、翻转、HSV扰动等✅ 正样本划分anchor 与 GT IoU 最大者为正样本✅ 输出结构三层输出13×13、26×26、52×52✅ 损失函数BCE Loss IoU Loss可选 GIoU/DIoU✅ NMS默认使用 greedynms阈值 0.45✅ 推理输出每个 bounding box 包含 (x1, y1, x2, y2, score, label) 六、YOLOv3 的关键配置文件片段来自 .cfg 文件 [yolo] mask 0,1,2 anchors 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes2 num9 jitter.3 ignore_thresh .5 truth_thresh 1 scale_x_y 1.05 iou_thresh0.213 iou_normalizer0.07✅ 这些配置项在 AlexeyAB/darknet 中真实存在影响 anchor 匹配、loss 计算、NMS 等流程。七、YOLOv3 的性能表现来源官方测试数据模型mAPCOCOFPSV100是否支持改进 IoUYOLOv3~33.0~45支持需手动配置YOLOv3-tiny~25.4~150不推荐用于复杂任务YOLOv3 DIoU~33.6~45✅ 支持YOLOv3 CIoU~33.9~45✅ 支持八、YOLOv3 的局限性来自社区反馈局限性说明❌ 不支持 Soft-NMS需要自定义修改❌ 不支持 Efficient NMS如 ONNXRuntime 的优化版本❌ anchor 设置固定新任务需重新聚类适配❌ 输出结构固定不适合直接部署到 ONNX 九、结语 YOLOv3 的训练与推理流程清晰、稳定且已在工业界广泛使用多年。它的设计虽然简洁但非常实用 Anchor Boxes 提升了召回率多尺度预测增强了小物体检测能力IoU 支持多种变体GIoU/DIoU/CIoUNMS 支持多种策略greedy / soft 欢迎点赞收藏关注我我会持续更新更多关于目标检测、YOLO系列、深度学习等内容

查看全文

http://www.yingshimen.cn/news/5682/