首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于类别混合嵌入的电力文本层次化分类方法
引用本文:陈晓娜,高鹏飞,梁越,马应龙.基于类别混合嵌入的电力文本层次化分类方法[J].北京大学学报(自然科学版),2022,58(1):77-82.
作者姓名:陈晓娜  高鹏飞  梁越  马应龙
作者单位:华北电力大学控制与计算机工程学院, 北京 102206
基金项目:国家重点研发计划课题(2018YFC0831404)资助;
摘    要:针对当前电力文本分类方法中因忽视类别标签之间潜在语义关联关系而导致分类性能低效的问题, 提出一种基于层次化分类模型的电力文本分类方法。首先, 利用采集的电力成果非结构化文档, 采用自动化信息提取技术和标注技术, 构建电力文本多标签分类训练集, 并结合领域知识分析, 构建类别标签之间的层次化关系。然后, 提出基于类别结构和标签语义混合嵌入的文本分类模型 HONLSTM-BERT, 利用类别标签之间的层次化结构关系进行自顶向下的层次化文本分类。最后, 通过实验与当前流行的文本分类模型进行对比分析, 结果表明HONLSTM-BERT方法具有更好的分类准确率, 可有效地提高电力文本自动分类性能。

关 键 词:电力信息技术  电力文本分类  层次化文本分类  类别嵌入  
收稿时间:2021-05-31

A Category Hybrid Embedding Based Approach for PowerText Hierarchical Classification
CHEN Xiaona,GAO Pengfei,LIANG Yue,MA Yinglong.A Category Hybrid Embedding Based Approach for PowerText Hierarchical Classification[J].Acta Scientiarum Naturalium Universitatis Pekinensis,2022,58(1):77-82.
Authors:CHEN Xiaona  GAO Pengfei  LIANG Yue  MA Yinglong
Institution:School of Control and Computer Engineering, North China Electric Power University, Beijing 102206
Abstract:Aiming at the problem that the current power text classification methods ignore the latent semantic association between category labels and therefore lead to low classification performance, a hierarchical multi-label power text classification method is proposed. Firstly, a power multi-label text dataset is built using automatic information extraction based on power unstructured texts, and the hierarchical structural relationships between categories are constructed by leveraging relevant domain knowledge. Secondly, a text classification method HONLSTM-BERT is proposed based on hybrid embeddings of category structure and label semantics for hierarchically classifying power texts in a top-down manner. At last, experiments were made in comparison with some popular text classification methods, and the experimental results show that proposed HONLSTM-BERT method achieves superior classification accuracy, and can efficiently improve the performance of automatic text classification.
Keywords:power information technology  power text classification  hierarchical text classification  category embedding
    
点击此处可从《北京大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《北京大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号