重庆理工大学学报(自然科学) ›› 2023, Vol. 37 ›› Issue (10): 146-155.

• “扩展现实(XR)理论与技术及应用”专栏 • 上一篇    下一篇

改进 Yolov5s的移动端 AR目标识别算法

曹献烁,陈纯毅,胡小娟   

  1. (长春理工大学 计算机科学技术学院,长春 130022)
  • 出版日期:2023-11-20 发布日期:2023-11-20
  • 作者简介:曹献烁,男,硕士研究生,主要从事增强现实研究,Email:1141884370@qq.com;通信作者 陈纯毅,男,博士,教 授,主要从事真实感三维图形绘制研究,Email:chenchunyi@hotmail.com。

Mobile AR target recognition algorithm based on improved Yolov5s

  • Online:2023-11-20 Published:2023-11-20

摘要: 针对目标识别模型存在参数量大、识别速度慢的问题,提出了一种改进的轻量化目 标检测算法 Yolov5sMCB。将 MobileNetV3网络作为 Yolov5s主干特征提取网络以降低模型参 数量。为了更好地拟合非线性数据优化模型收敛效果,将 MobileNetV3网络 ReLU激活函数替 换成 Mish激活函数以避免梯度消失和梯度爆炸。增加 BiFPN特征金字塔结构,利用一种迭代 式的特征融合方法提高检测精度。此外,引入坐标注意力机制使得模型关注大范围的位置信息 以提高检测性能。为了优化模型训练收敛速度,采用 FocalLossEIoU作为边框回归损失函数来 解决低质量样本产生损失值剧烈震荡的问题。实验结果表明,该算法在 VOC数据集的平均识 别精度达到了90.5%,模型大小为7.63MB,检测速度为99FPS,与原 Yolov5s相比,在保持识别 精度不变的情况下,推理速度提升了 17.85%,模型大小降低了 45.9%,满足检测任务的实时性 和检测精度要求。同时,将 Yolov5sMCB模型转为 ONNX模型移植到手机上,结合 ARCoreSDK 开发一个附带目标检测功能的 AR应用。

关键词: Yolov5s, 轻量化, 注意力机制, 移动增强现实

Abstract: To address the problems of many parameters and slow recognition speed of existing target recognition models,an improved lightweight target detection algorithm Yolov5s-MCB is proposed.Firstly,MobileNetV3 network is used as the Yolov5s backbone feature extraction network to reduce the number of parameters of the model.In order to fit the nonlinear data better and optimize the model convergence effecter,the MobileNetV3 network frontal ReLU activation function is replaced by Mish activation function to avoid gradient disappearance and gradient explosion.Secondly,the BiFPN feature pyramid structure is added to improve the detection accuracy with an iterative feature fusion method.In addition,the introduction of coordinate attention mechanism allows the model to focus on a wide range of location information to improve the detection performance.In order to optimize the model training rate of convergence,Focal-Loss EIOU is used as the border regression loss function to solve the problem of low-quality samples generating drastic oscillations in loss values.The experimental results show that the algorithm achieves an average recognition accuracy of 90.5% in the VOC dataset,a model size of 7.63 MB,and a detection speed of 99 FPS.Compared with Yolov5s,the proposed algorithm improves the inference speed by 17.85% and reduces the model size by 45.9% while keeping the recognition accuracy unchanged,meeting the requirements of the real-time detection tasks and detection accuracy.And the Yolov5s-MCB model is converted to ONNX model and ported to a cell phone to develop an AR application with target detection function in combination with ARCore SDK.

中图分类号: 

  • TP391