首页 | 本学科首页   官方微博 | 高级检索  
     检索      

文字云及主题模型的统计挖掘
引用本文:程玉胜,梁辉.文字云及主题模型的统计挖掘[J].安庆师范学院学报(自然科学版),2014(1):32-35,53.
作者姓名:程玉胜  梁辉
作者单位:安庆师范学院 计算机与信息学院,安徽 安庆,246133;安庆师范学院 统计所,安徽 安庆,246133
基金项目:安徽省自然科学基金项目(10040606Q42)资助
摘    要:互联网等信息技术的迅猛发展使网络中积累了大量半结构化和非结构化的文本数据,如何从这些海量电子文档中获取需要的信息并以高效直观信息图的形式展现,成为统计分析工作者的一项主要任务。文字云是信息图表达的一种新型文本显示方式,利用文字云和主题模型文本挖掘方法,对文本进行移除数字、去除停用词等预处理操作,然后执行中文分词,构建语料库,建立文档-词条矩阵,最后以文字云和主题模型的形式呈现挖掘结果。实验中主要利用R语言,以多年粗糙集会议纪要为实验数据进行了相关统计分析,并对比了 Tagxedo文字云生成器,结果表明,从文字云中比较容易获取文本的重要信息如主题模型等,挖掘效果较好。

关 键 词:文本挖掘  文字云  主题模型  统计分析  粗糙集

Word Clouds and Topic Model Mining Based on Statistical Analysis
CHENG Yu-sheng,LIANG Hui.Word Clouds and Topic Model Mining Based on Statistical Analysis[J].Journal of Anqing Teachers College(Natural Science Edition),2014(1):32-35,53.
Authors:CHENG Yu-sheng  LIANG Hui
Institution:CHENG Yu-sheng, LIANG Hui
Abstract:With the rapid development of internet and other information technologies , networks are accumulated with vast semi-structured and unstructured text data .It will be a primary mission to statistical analysis workers that how to get the required informa-tion, and show it with an efficient and visual information graph from those massive electronic documents .Word clouds is a new text displaying way of information graph expressing .In the present work, we make some pretreatment of removing the number and the stop word in the text by a text mining method of word clouds and topic model .Then, we make Chinese word segmentation , build corpus and set up document-term matrix.Finally, we present the mining result with word clouds and topic model .The experiment statisti-cally analyses the data of the rough set conference summaries using R language and make a contrast with word cloud generator of Tagxedo.These results indicate that the method of this paper has a better effect in mining and easy acquire important information from text, such as topic model.
Keywords:text mining  word clouds  topic model  statistical analysis  rough set
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号