重庆理工大学学报(自然科学) ›› 2024, Vol. 38 ›› Issue (2): 189-197.

• 信息计算机 • 上一篇    下一篇

AM FRel:一种中文电子病历实体关系联合抽取方法

余肖生,李琳宇,周佳伦,马洪彬,陈鹏   

  1. 三峡大学湖北省水电工程智能视觉监测重点实验室; 三峡大学计算机与信息学院
  • 出版日期:2024-03-22 发布日期:2024-03-22
  • 作者简介:余肖生,男,博士,副教授,主要从事健康医疗大数据分析研究,E-mail:yuxiaosheng@ctgu.edu.cn;通信作者陈鹏,男,博士,教授,主要从事大数据分析技术研究,E-mail:chenpeng@ctgu.edu.cn。

AMFRel:A method for joint extraction of entity relations in Chinese electronic medical records

  • Online:2024-03-22 Published:2024-03-22

摘要: 中文电子病历实体关系抽取是构建医疗知识图谱,服务下游子任务的重要基础。目前,中文电子病例进行实体关系抽取仍存在因医疗文本关系复杂、实体密度大而造成医疗名词识别不准确的问题。针对这一问题,提出了基于对抗学习与多特征融合的中文电子病历实体关系联合抽取模型AMFRel(adversarial learning and multi-feature fusion for relation triple extraction),提取电子病历的文本和词性特征,得到融合词性信息的编码向量;利用编码向量联合对抗训练产生的扰动生成对抗样本,抽取句子主语;利用信息融合模块丰富文本结构特征,并根据特定的关系信息抽取出相应的宾语,得到医疗文本的三元组。采用CHIP2020关系抽取数据集和糖尿病数据集进行实验验证,结果显示:AMFRel在CHIP2020关系抽取数据集上的Precision为63.922%,Recall为57.279%,F1值为60.418%;在糖尿病数据集上的Precision、Recall和F1值分别为83.914%,67.021%和74.522%,证明了该模型的三元组抽取性能优于其他基线模型。

关键词: 关系抽取, 联合抽取, 对抗学习, 多特征融合, 关系重叠

Abstract: The entity relationship extraction of Chinese electronic medical records is a key part for constructing medical knowledge graphs and serving downstream tasks.Due to the complex relations in medical texts and high density of entities,inaccurate identification of medical terms and other problems may occur.To address these issues,a model called Adversarial Learning and Multi-Feature Fusion for Relation Triple Extraction-AMFRel is proposed in this paper.The model first extracts texts and part-of-speech features from medical text to obtain encoded vectors that incorporate part-of-speech information.Then,encoding vector is employed to generate adversarial samples by combining the perturbations generated by adversarial training to extract the subject of the sentence.Finally,the model enriches the structural features of the text by using an information fusion module,extracts the corresponding object based on specific relationship information,and obtains a triplet of medical text.Experiments are conducted on the CHIP2020 relation extraction dataset and the diabetes dataset.Our results show AMFRel achieves a precision of 63.922%,recall of 57.279%,and F1 score of 60.418% on the CHIP2020 relation extraction dataset,and a precision of 83.914%,recall of 67.021%,and F1 score of 74.522% on the diabetes dataset,demonstrating the triple extraction performance of this model is superior to other baseline models.

中图分类号: 

  • TP391.1