首页 | 本学科首页   官方微博 | 高级检索  
     

基于Stacking-Bert集成学习的中文短文本分类算法
引用本文:郑承宇,王新,王婷,尹甜甜,邓亚萍. 基于Stacking-Bert集成学习的中文短文本分类算法[J]. 科学技术与工程, 2022, 22(10): 4033-4038
作者姓名:郑承宇  王新  王婷  尹甜甜  邓亚萍
作者单位:云南民族大学数学与计算机科学学院, 昆明650500
基金项目:国家自然科学基金项目(61363022),云南省教育厅科学研究基金项目(2021Y670)
摘    要:由于word2vec、Glove等静态词向量表示方法存在无法完整表示文本语义等问题,且当前主流神经网络模型在做文本分类问题时,其预测效果往往依赖于具体问题,场景适应性差,泛化能力弱.针对上述问题,提出一种多基模型框架(Stacking-Bert)的中文短文本分类方法.模型采用BERT预训练语言模型进行文本字向量表示,输...

关 键 词:多基模型框架  BERT预训练语言模型  Stacking集成学习  短文本分类
收稿时间:2021-06-28
修稿时间:2022-03-23

Chinese Short Text Classification Algorithm Based on STACKING-BERT Ensemble Learning
Zheng Chengyu,Wang Xin,Wang Ting,Yin Tiantian,Deng Yaping. Chinese Short Text Classification Algorithm Based on STACKING-BERT Ensemble Learning[J]. Science Technology and Engineering, 2022, 22(10): 4033-4038
Authors:Zheng Chengyu  Wang Xin  Wang Ting  Yin Tiantian  Deng Yaping
Affiliation:School of Mathematics and Computer Science,Yunnan Minzu University
Abstract:Duo to the static word vector representation methods such as word2vec and Glove have problems such as incomplete representation of text semantics, and when the current mainstream neural network model is doing text classification problems, its prediction effect often depends on specific problems, the scene adaptability is poor, and the generalization ability is weak. To solve the above problems, a Chinese short text classification method based on multi-base model framework named Stacking-Bert is proposed. The model uses the BERT pre-trained language model to represent text word vectors, outputs the deep feature information vector of the text, and uses neural network models such as TextCNN, DPCNN, TextRNN, TextRCNN to construct a heterogeneous multi-base classifier, and obtains the text vector through Stacking integration learning Different feature information is expressed to improve the generalization ability of the model, and finally SVM is used as a meta-classifier model for training and prediction. Comparing experiments with text classification algorithms such as word2vec-CNN, word2vec-BiLSTM, BERT-texCNN, BERT-DPCNN, BERT-RNN, BERT-RCNN, etc. on three Chinese data sets published on the Internet, the results show that Stacking-Bert integrated learning The model has the highest accuracy rate, precision rate, recall rate and F1 value, which can effectively improve the classification performance of Chinese short texts.
Keywords:Multi-base model framework   BERT model   Stacking ensemble learning   short text classification
本文献已被 万方数据 等数据库收录!
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号