网页超链抓取及自动分类技术实现 A Scheme of Extraction Hyperlink from Web Page and Automatic Classification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

网页超链抓取及自动分类技术实现

引用本文：	顾潇华,郭军城. 网页超链抓取及自动分类技术实现[J]. 河北大学学报(自然科学版), 2007, 27(1): 99-102. DOI: 10.3969/j.issn.1000-1565.2007.01.025

作者姓名：	顾潇华郭军城

作者单位：	河北大学,管理学院,河北,保定,071002

摘要：	为网络环境下个性化信息服务系统实现自动建库功能,提出了一种网络蜘蛛程序的技术方案.该方案使用DELPHI集成开发环境提供的TIdhttp组件抓取网页文本,并利用正则表达式和Mshtml组件从中提取超链信息保存到指定数据库,并能对网页文本进行自动的统计分类.
关键词：	超链抓取正则表达式 DELPHI
文章编号：	1000-1565(2007)01-0099-04
修稿时间：	2006-11-10
A Scheme of Extraction Hyperlink from Web Page and Automatic Classification

GU Xiao-hua,GUO Jun-cheng. A Scheme of Extraction Hyperlink from Web Page and Automatic Classification[J]. Journal of Hebei University (Natural Science Edition), 2007, 27(1): 99-102. DOI: 10.3969/j.issn.1000-1565.2007.01.025

Authors:	GU Xiao-hua GUO Jun-cheng

Abstract:	To enable the individualized information service system in the network environment have the function of automatic database building,proposes a basic technical scheme of web Crawler.This scheme includes Tidhttp component in Delphi IDE to capture the text from web pages,Regular Expression and MShtml component to extract hyperlink data from web pages.Finally,this scheme provide a simple arithmetic to classify the content of web pages automatically.

Keywords:	extraction of hyperlink regular expression DELPHI
本文献已被 CNKI 万方数据等数据库收录！
	点击此处可从《河北大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《河北大学学报(自然科学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏