Web内容抽取及其数据管理方法 Web Content Extraction & Its Data Management Method期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Web内容抽取及其数据管理方法

引用本文：	张成洪,肖军建,张诚.Web内容抽取及其数据管理方法[J].复旦学报(自然科学版),2001,40(2):177-183.

作者姓名：	张成洪肖军建张诚

作者单位：	复旦大学管理学院,

摘要：	随着Internet及其相关技术的飞速发展，WWW已成为最大的信息集散地，无论对企业还是个人，Web逐渐成为最主要的信息来源，然而由于网站数量过多以及由此带来的信息泛滥，使得有用信息的获取越来越困难，搜索引擎只能提供信息的查找范围，而具体的内容还是要靠详细搜查，而且网页信息都是非结构化或半结构化的，无法直接利用分析工具进行分析，所以有必要提供一种网页内容自动抽取及使网页数据结构化的方法，来简化信息获取的过程和方便信息分析处理。
关键词：	数据抽取网页包装规则表达式模式匹配 Internet WWW Web数据集成系统数据管理网页数据结构化
文章编号：	0427-7104(2001)02-0177-07
Web Content Extraction & Its Data Management Method

ZHANG Cheng-hong,XIAO Jun-jian,ZHANG Cheng.Web Content Extraction & Its Data Management Method[J].Journal of Fudan University(Natural Science),2001,40(2):177-183.

Authors:	ZHANG Cheng-hong XIAO Jun-jian ZHANG Cheng

Abstract:	With the development of Internet and its relative technology, the WWW has become the largest information area. For the enterprise or the individual, Web becomes the main information source gradually. However, because of too many web sites and the information overflow resulting from this, it is more and more difficult to obtain useful information. Search engines only provide the scope of the searching information, and the concrete information must be looked up carefully by oneself. Because Web information is non strutured or semi structured, the analysis tool can't be used to analyze it directly. So it is necessary to advance a method of extracting the Web content automatically and structuring the Web data to simplify the process of obtaining information and facilitate the information analysis. This paper will describe this in detail.

Keywords:	data extraction Web wrapper regular expression semi structured pattern matching
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏