Journal of Chongqing University of Technology(Natural Science) ›› 2023, Vol. 37 ›› Issue (7): 245-255.
• Information and computer science • Previous Articles Next Articles
Online:
Published:
Abstract:
The text vector representation method based on Word2vec does not fully consider the content features and spread features of micro-blog texts, so it is not good enough to finish the micro-blog text vector representation. Besides, a single machine learning algorithm which is applied to classify the micro-blog text through emotions can’t provide a high accuracy of emotion classification. To further improve the effect of emotion classification for the micro-blog text, this paper proposes a new text vector representation method, which is combined with the improved Stacking ensemble learning algorithm to accomplish emotion classification for micro-blog text data in this paper.
At first, text feature vectors with rich semantic and emotional information are proposed to be constructed together by integrating text content features such as emoticons, semantic features of words, and part of speech and emotion, with the spread features such as comments, retweets and likes. Specifically, when constructing the initial text feature vector, this paper synthesizes the content features such as emoticons, word semantics, as well as part of speech and emotion. Meanwhile, it also constructs the corresponding feature vectors according to the above content features, and splices these vectors into the initial text feature based on content characteristics. Secondly, the influence of the text is constructed based on the spread features of the text, such as the number of comments, retweets and agreements. Finally, the influence of the micro-blog text is combined with the initial text feature vector to further enrich the semantic and emotional information contained in the vector representation of the micro-blog text.
Moreover, in the improved Stacking ensemble learning algorithm, combined with the initial training data set, four classification algorithms are selected, such as AdaBoost, random forest, GBDT and XGBoost. Then, a 5 fold cross-validation method is used to generate a high-performance base classifier. More importantly, the class probability vector is used instead of the class label output from the base classifier. Different weights are set and multiplied with the class probability vector according to the performance of the base classifiers on the training data set. After that, they are multiplied by the class probability vector to get the weighted class probability vector, retaining the maximum weighted probability values, the minimum weighted probability values and the average weighted probability values of each text predicted by all base classifiers belonging to each category. A simple and stable logistic regression algorithm is selected as the meta-classifier as well. At last, the original Stacking algorithm is improved by integrating the above weighted probability values as the input data of the meta-classifier with the original text feature vector so as to accomplish emotion classification of micro-blog text.
The experiment results on the data set of the micro-blog text show that the proposed method can better represent text vectors, and the improved Stacking ensemble learning classifier by the weight method is superior to the single emotion classifier. Compared with other emotion classification methods, the method proposed in this paper has made a performance improvement on the accuracy index from 1.75% to 4.90%, effectively improving the effect of emotion classification.
CLC Number:
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: http://clgzk.qks.cqut.edu.cn/EN/
http://clgzk.qks.cqut.edu.cn/EN/Y2023/V37/I7/245
Cited