首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种改进的Apriori算法在试卷评估中的应用研究
引用本文:陈世保,吴国凤.一种改进的Apriori算法在试卷评估中的应用研究[J].井冈山大学学报(自然科学版),2012(2):58-62.
作者姓名:陈世保  吴国凤
作者单位:[1]湖南大学软件学院,湖南长沙410082 [2]厦门理工学院,福建厦门361021
摘    要:在深入研究网络信息采集技术的基础上,提出一个基于Web结构的新闻采集模型。该模型加载采集入口地址后,通过信息采集和过滤算法确定新闻列表页,结合正则表达式技术自动识别新闻内容页的链接地址,访问目标新闻内容页,使用采集算法自动提取新闻信息数据。同时,它可以过滤在此页面中嵌入的广告等信息。实践结果表明,该模型工作良好,可以自动化、高效率地采集新闻信息。

关 键 词:信息采集  Web结构  正则表达式  数据挖掘  新闻采集

DESIGN AND IMPLEMENTATION OF NEWS GATHERING SYSTEM BASED ON WEB STRUCTURE
CHEN Shi-bao,WU Guo-fen.DESIGN AND IMPLEMENTATION OF NEWS GATHERING SYSTEM BASED ON WEB STRUCTURE[J].Journal of Jinggangshan University(Natural Sciences Edition),2012(2):58-62.
Authors:CHEN Shi-bao  WU Guo-fen
Institution:CHEN Jian-guo(1.Sottware School of Hunan University, Changsha, Hunan 410082, China; 2. Xiamen University of Technology, Xiamen, Fujian 361021, China)
Abstract:On the basis of depth studying the technology of web information gathering, a web structure-based news gathering model is proposed. It load the gathering entry address, find the news list page with the information gathering and filter algorithm, identify and improve the news content page link address according to the rules set by acquisition and the regular expression technology automatically. Furthermore, it load the target page--news content page, gather the news information with the algorithm automatically. At the same time, it can filter any information that is set in this page such as embedded advertising messages. Practical results show that the proposed model works well and gathers news information efficiently and automatically.
Keywords:information gathering  Web structure  regular expressions  data mining  news gathering
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《井冈山大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《井冈山大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号