重庆理工大学学报(自然科学) ›› 2023, Vol. 37 ›› Issue (4): 174-181.

• 智能技术 • 上一篇    下一篇

中文慕课评论情感识别语料库的构建与应用

魏晓聪,于 澜   

  1. 大连外国语大学 软件学院,辽宁 大连 116044
  • 出版日期:2023-05-06 发布日期:2023-05-06
  • 作者简介:魏晓聪,女,博士,讲师,主要从事自然语言处理研究,Email:weixiaocong@dlufl.edu.cn。

Construction and application of the emotion recognition corpus for Chinese MOOC review

  • Online:2023-05-06 Published:2023-05-06

摘要: 中文在线教育评论情感识别在很大程度上受到有标注数据不足的限制,针对此问 题,基于 806门中国大学 MOOC以自动和人工相结合的方式构建中文慕课情感识别语料库,共 计 10340条评论,其中褒义 5411条,贬义 4929条,保证了语料库的平衡性和学科覆盖的广泛 性。制定语料收集和预处理策略、标注规范、标注体系、一致性检测方法;提出了神经网络模型 以及基于大规模预训练语言模型的情感识别方法;实现了情感识别结果面向教学管理部门、教 师 2种使用角色的应用。该语料库为面向在线教育评论的情感分析研究奠定数据基础,对赋能 教学评价、助力智能教学系统具有重要意义。

关键词: 中文, 慕课评论, 情感识别, 语料库, 大规模预训练语言模型

Abstract: Emotion recognition in online Chinese educational reviews is largely limited by a lack of annotated data. To solve this problem, this paper constructs a Chinese MOOC emotion recognition corpus from 806 college MOOCs in China by combining automatic and manual methods to ensure corpus balance and extensive subject coverage, including a total of 10 340 reviews, of which 5 411 are positive and 4 929 are negative. Firstly, it formulates strategies of corpus collection and pre-processing, annotation specification, annotation system and consistency detection method. Then, a neural network model and an emotion recognition method based on large-scale pre-trained language models are proposed. Finally, the results of emotion recognition are applied to teaching management department and instructors. The corpus lays a data foundation for emotion analysis research of online education reviews, and is of great significance for enabling teaching evaluation and facilitating the intelligent teaching system.

中图分类号: 

  • TP39