首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于词-标签概率的多标签文本分类研究
引用本文:赵宏,郑厚泽,郭岚.基于词-标签概率的多标签文本分类研究[J].兰州理工大学学报,2023,49(1):103.
作者姓名:赵宏  郑厚泽  郭岚
作者单位:兰州理工大学 计算机与通信学院, 甘肃 兰州 730050
基金项目:国家自然科学基金(62166025),甘肃省重点研发计划(21YF5GA073)
摘    要:针对多标签文本分类任务中如何有效地提取文本特征和获取标签之间潜在的相关性问题,提出一种CNN(convolutional neural networks)结合Bi-LSTM (bi-directional long short-term memory)的模型.首先,通过CNN网络和最大池化提取文本的特征;然后,利用训练的Labeled-LDA(labeled latent dirichlet allocation)模型获取所有词与标签之间的词-标签概率信息;接着,使用Bi-LSTM网络和CNN网络提取当前预测文本中每个词的词-标签信息特征;最后,结合提取的文本特征,预测与当前文本相关联的标签集.实验结果表明,使用词-标签概率获取文本中词与标签之间的相关性信息,能够有效提升模型的F1值.

关 键 词:多标签文本分类  卷积神经网络  双向长短期记忆网络  标签的隐狄利克雷分布  
收稿时间:2021-09-10

Multi-label text classification based on word-label probability
ZHAO Hong,ZHENG Hou-ze,GUO Lan.Multi-label text classification based on word-label probability[J].Journal of Lanzhou University of Technology,2023,49(1):103.
Authors:ZHAO Hong  ZHENG Hou-ze  GUO Lan
Institution:School of Computer and Communication, Lanzhou Univ. of Tech., Lanzhou 730050, China
Abstract:Multi-label text classification is one of the important tasks in the field of natural language processing, the goal of which is to find the label subset associated with the text from a given label set. Aiming at the problem of how to effectively extract text features and obtain the potential correlation between labels in processing multi-label text classification, a model of convolutional neural networks (CNN) combined with bi-directional long short-term memory (Bi-LSTM) is proposed to process multi-label text classification. Firstly, text features are extracted through the CNN network and max pooling. Then, the trained Labeled Latent Dirichlet Allocation (labeled LDA) model is used to obtain the word-label probability information of all words and labels. In addition, the Bi-LSTM network and CNN network are used to extract the word-label information feature of each word in the current prediction text. Finally, combined with the extracted text features, the label set associated with the text is predicted. The experimental results show that the F1 value of the model can be effectively improved by using the word-label probability to get the correlation information between the words and labels in the text.
Keywords:multi-label text classification  convolutional neural networks  bi-directional long short-term memory  labeled latent dirichlet allocation  
点击此处可从《兰州理工大学学报》浏览原始摘要信息
点击此处可从《兰州理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号