重庆理工大学学报(自然科学) ›› 2023, Vol. 37 ›› Issue (12): 201-209.

• 智能技术 • 上一篇    下一篇

基于结构重参数化和注意力机制的复杂背景下手势识别

杨黎霞, 夏天, 陈仁祥, 张晓, 邱天然   

  1. 重庆科技大学工商管理学院; 重庆交通大学交通工程应用机器人重庆市工程实验室
  • 出版日期:2024-02-04 发布日期:2024-02-04
  • 作者简介:杨黎霞,女,博士,讲师,主要从事智能测控、人工智能研究,E-mail:lixiayang1207@126.com;夏天,男,硕士研究生,主要从事手势识别研究,E-mail:1946153665@qq.com

Hand gesture recognition in complex background based on structure reparameterization and attention mechanism

  • Online:2024-02-04 Published:2024-02-04

摘要: 针对复杂背景下手势图像受到干扰较多而导致的手势识别准确率低、识别速度慢问题,提出一种基于结构重参数化和注意力机制的复杂背景下手势识别算法RepSEHGR(re-parameter squeeze-expand hand gesture recognition)。通过使用结构重参数化方法,将其应用到残差结构中,在部署阶段去除多余分支结构,提升算法识别速度;同时嵌入通道注意力机制模块,利用其为不同通道特征加权的特点使算法关注手势特征,减少复杂背景干扰;使用cutout与仿射变换2种数据增强方法训练算法,抑制复杂背景噪声输入并增强数据,减少过拟合的同时提升算法健壮性。在一个复杂背景手势数据集上进行对比实验,结果显示:识别精度达到99.9%,识别速度达到200 fps,证明了所提算法的有效性。

关键词: 手势识别, 注意力机制, 复杂背景, 结构重参数化, 数据增强

Abstract:

As a highly adaptive form of interaction in human-computer interaction, gestures can simplify interactions by eliminating physical contacts between mechanical devices and their users. Gesture interaction provides more intuitive interaction and richer interaction effects, better meeting people’s needs and expectations for interaction. Gesture recognition has been widely researched in the field of human-computer interaction, especially gesture recognition based on machine vision thanks to its low cost, being more natural and non-contact. However, the existing gesture recognition methods are primarily based on simple experimental environment background. In the actual human-computer interaction, gesture recognition usually occurs in various complex environments.

In practice, changes in brightness, complex backgrounds, and color-like interference are key factors affecting the accuracy of gesture recognition. The interference caused by complex background greatly affects the extraction of gesture features, making it difficult to recognize gestures quickly and accurately. Some researchers employ a two-stage model to first extract gesture areas and then identify them, while others directly use deep convolutional neural networks to identify complex background gestures. However, the recognition speed of the two-stage gesture recognition method hardly meets the requirements in practical applications, and the accuracy of the single-stage gesture recognition method needs to be further improved for the gesture image recognition of complex background. The existing gesture recognition methods are unable to solve the problems of gesture recognition in the actual complex background due to their difficulties in striking a balance between recognition speed and accuracy. To remedy this, the key lies in how to eliminate or weaken the interference of complex background on the basis of improving the recognition speed of the algorithm, or how to enhance the ability of gesture feature extraction, so that the gesture recognition algorithm can correctly represent the gesture information. The attention mechanism can imitate the principle of human visual system’s attention to objects, by increasing the attention to the target area to achieve the detailed information of the target area. Embedding attention mechanism in gesture recognition algorithm based on deep learning can allow the algorithm to focus on the feature of target gesture area and eliminate the interference of complex background. Meanwhile, the structure reparameterization method can remove the redundant branch structure in the deployment stage and improve the algorithm recognition speed.

To remedy such problems as low recognition accuracy and slow recognition speed caused by more interference in gesture images under complex background, a gesture recognition algorithm RepSEHGR based on structural reparameterization and attention mechanism is proposed. By using the structure reparameterization method, it is applied to the residual structure to remove the redundant branch structure in the deployment stage and improve the algorithm recognition speed. Meanwhile, the channel attention mechanism module is embedded to enable the algorithm to attend to gesture features by weighted features of different channels, thus reducing complex background interference. Finally, two data enhancement methods, cutout and affine transformation, are employed to train the algorithm, suppress complex background noise input and enhance the data, reduce overfitting and improve the robustness of the algorithm. Comparison experiments on a complex background gesture data set show the recognition accuracy reaches 99.9% and the recognition speed 200FPS, demonstrating the effectiveness of the proposed algorithm.

中图分类号: 

  • TP391.4