重庆理工大学学报(自然科学) ›› 2023, Vol. 37 ›› Issue (12): 222-231.

• 智能技术 • 上一篇    下一篇

FCG-NNER:一种融合字形信息的中文嵌套命名实体识别方法

陈鹏, 马洪彬, 周佳伦, 李琳宇, 余肖生   

  1. 三峡大学湖北省水电工程智能视觉监测重点实验室; 三峡大学计算机与信息学院
  • 出版日期:2024-02-04 发布日期:2024-02-04
  • 作者简介:陈鹏,男,博士,教授,主要从事大数据分析技术研究,E-mail:chenpeng@ctgu.edu.cn;通信作者 余肖生,男,博士,副教授,主要从事健康医疗大数据分析研究,E-mail:yuxiaosheng@ctgu.edu.cn

FCG-NNER: A Chinese nested named entity recognition method fused with glyph information

  • Online:2024-02-04 Published:2024-02-04

摘要: 基于跨度的模型是嵌套命名实体识别的主要方法,其核心是将实体识别问题转化为跨度分类问题。而在中文数据集中,由于中文单词不具有明显的分割符号,导致语义和边界信息不明确,进而造成中文嵌套命名实体识别效果不佳。为了解决这一问题,提出了融合字形信息的基于跨度的中文嵌套命名实体识别算法——FCG-NNER,首先通过卷积神经网络获取汉字的字形信息,其次通过交叉Biaffine双仿射解码层实现原文信息与字形信息融合,然后通过对角融合CNN层获取不同跨度之间的局部相互作用,最后将交叉Biaffine双仿射解码层的输出与对角融合CNN层的输出相加后输入到全连接层中,得到最终的预测结果。采用2个具有代表性的中文嵌套NER数据集(CMeEE和CLUENER2020)用于实验验证。结果显示,FCG-NNER 在CMeEE数据集中的精度为65.02%,召回率为67.93%, F1值达到0.664 4;在CLUENER2020数据集中的精度为79.45%,召回率为82.33%,F1值达到0.808 6,证明FCG-NNER算法的性能明显超过2个数据集的基线

关键词: 中文嵌套命名实体识别, 字形特征, 跨度分类, 特征融合

Abstract: The span-based model is the primary approach for nested named entity recognition, which is based on the principle of transforming from entity recognition to span classification. However, Chinese datasets characterized by no obvious word delimiters contain ambiguous semantic and boundary information, and thus cause a poor performance of Chinese nested named entity recognition. To address the problem, this paper proposes FCG-NNER, a span-based Chinese nested named entity recognition algorithm fused with glyph information. First, a convolutional neural network is employed to extract the glyph information of Chinese characters. Then, the original information and glyph information are fused by using the cross-biaffine bilinear decoding layer. A fusion CNN layer is utilized to capture local interactions between different spans. Finally, the sum of the output of the cross-biaffine bilinear decoding layer and that of the fusion CNN layer is treated as the input of the fully connected layer to obtain the final prediction results. Two representative Chinese nested named entity recognition datasets, CMeEE and CLUENER2020, are selected for verification. The results show FCG-NNER achieves an accuracy of 65.02%, a recall of 67.93%, and an F1-score of 0.664 4 in the CMeEE dataset while it records an accuracy of 79.45%, a recall of 82.33%, and an F1-score of 0.808 6 in CLUENER2020 dataset, demonstrating FCG-NNER algorithm clearly outperforms the baselines provided by the two datasets.

中图分类号: 

  • TP391.1