带模板的结构化HTML文档深度标注框架 Deep annotation framework of template-based structured HTML documents期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

带模板的结构化HTML文档深度标注框架

引用本文：	廖述梅,徐升华,陶皖. 带模板的结构化HTML文档深度标注框架[J]. 清华大学学报(自然科学版), 2006, 46(Z1): 936-941

作者姓名：	廖述梅徐升华陶皖

作者单位：	1. 江西财经大学,信息管理学院,南昌,330013 2. 安徽工程科技学院,计算机工程系,芜湖,241000

摘要：	将现有W eb提升为语义W eb的有效方法之一就是标注W eb页。当前W eb上,动态生成页面的规模有静态页面的500倍之多,标注从数据库动态生成的页面是深度标注方法之一。针对数据库生成的W eb页面具有模板和结构化的特征,在对带模板的结构化HTM L文档和本体形式化表示之后,提出了两段式的深度标注框架,即第一阶段解析HTM L文档,抽取结构化信息,第二阶段指定实例与词汇间的映射,标注自动生成。与其他标注方法相比,该方法能明显降低标注过程的工作量。
关键词：	语义网深度标注信息抽取映射规则
文章编号：	1000-0054(2006)S1-0936-06
修稿时间：	2006-02-28
Deep annotation framework of template-based structured HTML documents

LIAO Shumei,XU Shenghua,TAO Wan. Deep annotation framework of template-based structured HTML documents[J]. Journal of Tsinghua University(Science and Technology), 2006, 46(Z1): 936-941

Authors:	LIAO Shumei XU Shenghua TAO Wan

Abstract:	One of the effective ways to upgrade the current Web to the semantic one is to markup numerous web pages.On the Web,the number of dynamically generated pages is 500 times of that of static ones.To markup the contents mostly generated from databases dynamically is so-called deep annotation.Because most database-generated web pages are template-based and structured,this paper,based on the formal presentation of template-based structured HTML documents and ontology,puts forward an approach of two-phased deep annotation framework,which parses the HTML document and extracts structured information in the first stage, and then identifies the mapping between instances and concepts and automatically generates Web markups.This approach to deep annotation may significantly reduce the work of human beings in this process compared with others.

Keywords:	semantic Web deep annotation information extraction mapping rules
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏