首页 | 本学科首页   官方微博 | 高级检索  
     检索      

含关键字的新浪微博获取与舆情分析
引用本文:江林升,张春霞.含关键字的新浪微博获取与舆情分析[J].宝鸡文理学院学报(自然科学版),2014(1):51-54.
作者姓名:江林升  张春霞
作者单位:南京森林警察学院,江苏南京210023
基金项目:南京森林警察学院科研项目(RWZD201352);江苏省高等教育教改研究课题(2013JSJG199)
摘    要:目的自动从新浪微博中抓取含指定关键词的相关微博,通过对抓取的微博进行分析,得到相关舆情热点。方法首先通过多线程爬虫,自动爬取含有指定关键字的微博,将其保存于数据库中,再采用基于字符串匹配的逆向最大匹配法对微博进行分词,计算各分词项的TF-IDF权重作为文本聚类的输入数据,最后用k-means算法进行聚类分析,得出舆情热点。结果与结论这种方法能自动从新浪微博中抓取含指定关键词的相关微博,通过聚类分析,每一族的微博内容具有较高的一致性和共同的主题,由此可迅速找出热点舆情,对及时了解和引导舆情具有积极的意义。

关 键 词:微博  爬虫  聚类  舆情

The analysis method and obtainment of public sentiment based on Sina Weibo with the specified keyword
JIANG Lin-sheng,ZHANG Chun-xia.The analysis method and obtainment of public sentiment based on Sina Weibo with the specified keyword[J].Journal of Baoji College of Arts and Science(Natural Science Edition),2014(1):51-54.
Authors:JIANG Lin-sheng  ZHANG Chun-xia
Institution:(Nanjing Forest Police College, Nanjing 210023, Jiangsu, China)
Abstract:Objective-To obtain public hotspots by automatically capturing and analyzing micro blogs which contains specified keywords from Sina Weibo. Methods-First, save in the database the crawling micro-blogs which contains specified keywords through the automatic multithreaded crawl ers. Then, segment the words in the micro-blogs with Reverse Maximum String Matching Method to calculate TF-IDF weight of each term as text clustering input data. Finally, obtain the hotspot of pub lic sentiment by analyzing the cluster with k-means algorithm. Results and Conclusion-This method can automatically capture the micro-blogs containing relevant keywords from Sina Weibo. After clus ter analysis, the contents of each cluster of micro-blogs have highly consistent and common themes, which can quickly find hot public opinions. The method has positive significance for the understanding and timely guiding public opinions.
Keywords:Weibo  crawler  clustering  public opinion
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号