一种改进的Apriori算法在试卷评估中的应用研究 DESIGN AND IMPLEMENTATION OF NEWS GATHERING SYSTEM BASED ON WEB STRUCTURE期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种改进的Apriori算法在试卷评估中的应用研究

引用本文：	陈世保,吴国凤.一种改进的Apriori算法在试卷评估中的应用研究[J].井冈山大学学报（自然科学版）,2012(2):58-62.

作者姓名：	陈世保吴国凤

作者单位：	[1]湖南大学软件学院,湖南长沙410082 [2]厦门理工学院,福建厦门361021

摘要：	在深入研究网络信息采集技术的基础上,提出一个基于Web结构的新闻采集模型。该模型加载采集入口地址后,通过信息采集和过滤算法确定新闻列表页,结合正则表达式技术自动识别新闻内容页的链接地址,访问目标新闻内容页,使用采集算法自动提取新闻信息数据。同时,它可以过滤在此页面中嵌入的广告等信息。实践结果表明,该模型工作良好,可以自动化、高效率地采集新闻信息。
关键词：	信息采集 Web结构正则表达式数据挖掘新闻采集
DESIGN AND IMPLEMENTATION OF NEWS GATHERING SYSTEM BASED ON WEB STRUCTURE

CHEN Shi-bao,WU Guo-fen.DESIGN AND IMPLEMENTATION OF NEWS GATHERING SYSTEM BASED ON WEB STRUCTURE[J].Journal of Jinggangshan University(Natural Sciences Edition),2012(2):58-62.

Authors:	CHEN Shi-bao WU Guo-fen

Institution:	CHEN Jian-guo(1.Sottware School of Hunan University, Changsha, Hunan 410082, China; 2. Xiamen University of Technology, Xiamen, Fujian 361021, China)

Abstract:	On the basis of depth studying the technology of web information gathering, a web structure-based news gathering model is proposed. It load the gathering entry address, find the news list page with the information gathering and filter algorithm, identify and improve the news content page link address according to the rules set by acquisition and the regular expression technology automatically. Furthermore, it load the target page--news content page, gather the news information with the algorithm automatically. At the same time, it can filter any information that is set in this page such as embedded advertising messages. Practical results show that the proposed model works well and gathers news information efficiently and automatically.

Keywords:	information gathering Web structure regular expressions data mining news gathering
本文献已被 CNKI 维普等数据库收录！
	点击此处可从《井冈山大学学报（自然科学版）》浏览原始摘要信息
	点击此处可从《井冈山大学学报（自然科学版）》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏