基于样本实例的Web信息抽取 Web Information Extraction Based on Samples期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于样本实例的Web信息抽取

引用本文：	张绍华,徐林昊,杨文柱,薛文玲,李天柱.基于样本实例的Web信息抽取[J].河北大学学报(自然科学版),2001,21(4):431-437.

作者姓名：	张绍华徐林昊杨文柱薛文玲李天柱

作者单位：	河北大学,数学与计算机学院,河北,保定,071002

摘要：	主要研究了基于HTML文档的信息抽取 ,提出了一种基于样本实例的Web信息抽取的方法 .用户首先选定样本页面和预先定义模式 (基于O -R模型 ) ,然后对样本页面和其中的样本记录进行标记、学习 ,形成信息抽取规则 ,并存入知识库 ;利用知识库对其他同类页面自动抽取所需的信息 ,存入数据库中 .本方法可用于Web查询 ,也可用于信息集成的包装器 .
关键词：	HTML 模式抽取器信息抽取 Web查询
文章编号：	1000-1565(2001)04-0431-07
修稿时间：	2001年6月25日
Web Information Extraction Based on Samples

ZHANG Shao hua,XU Lin hao,YANG Wen zhu,XUE Wen ling,LI Tian zhu.Web Information Extraction Based on Samples[J].Journal of Hebei University (Natural Science Edition),2001,21(4):431-437.

Authors:	ZHANG Shao hua XU Lin hao YANG Wen zhu XUE Wen ling LI Tian zhu

Abstract:	This paper mainly discusses an approach of information extraction from HTML documents and presents a samples-based method of fast information extraction. User first chooses the sample pages,predefined scheme(based on O-R model) and marks sample reords,then the system automatically form extraction rules from user's marking behaviors on pages. All the rules are stored into knowledge base. The system can automatically extract information from other similar pages using the knowledge in knowledge base and the information extracted is stored into database. The method can be applied to Web query and wrappers for information integration.

Keywords:	HTML schema information extraction Web query wrapper
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏