首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于SVM预测的金融主题爬虫
引用本文:陈黎.基于SVM预测的金融主题爬虫[J].四川大学学报(自然科学版),2010,47(2).
作者姓名:陈黎
摘    要:随着Internet上信息的爆炸,利用通用搜索引擎检索用户相关的信息变得越来越困难,而主题爬虫成为WEB上检索主题相关信息的重要工具。目前大部分基于分类器预测的主题爬虫的训练数据是不同类别网页的内容,但是在实际预测过程只能根据父网页中的一些链接信息进行预测,所以造成主题爬虫的预测的准确率较低。本文使用SVM分类器对标注了类别的URL以及上下文和锚文本进行训练,并分别使用了DF和信息增益两种不同的特征选择方法进行特征筛选,对影响分类器的各种因素进行了实验对比,并对分类器进行了在线的实验.实验证明这种方法在实际预测过程中效率很高。

关 键 词:主题爬虫、分类器、支持向量机、特征选择、金融
修稿时间:6/2/2009 12:00:00 AM

Financial topical crawler based on SVM prediction
chenli.Financial topical crawler based on SVM prediction[J].Journal of Sichuan University (Natural Science Edition),2010,47(2).
Authors:chenli
Abstract:With the rapid growth of information and the explosion of web pages from the World Wide Web, it gets harder for general crawlers to retrieve the information relevant to a user. Topical crawlers are becoming important tools to gather web pages on a specific topic. Training set of topical crawler based on classifier prediction comes from different kinds of Web contents, but most of classifier can predict according to some links information of parent Web pages in actual condition. As being different kinds of information between training and testing, the accuracy of this kind of classifier is low. SVM classifier is used in this paper to train the contexts and anchors of URLs, and train different information from different character selection methods: the DF and information gain to contrast experiment results based on all sorts of factors which will impact on classifier. It can validate that there is of very high accuracy in actual prediction when classifier being on-line experiments.
Keywords:topical crawler  classifier  supporting vector machine  character selection  financial
点击此处可从《四川大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《四川大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号