基于Web的电子期刊元数据信息抽取方法 Web-based extraction of periodical metadata information期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于Web的电子期刊元数据信息抽取方法

引用本文：	李胜利,李昌清,袁平鹏,刘英书.基于Web的电子期刊元数据信息抽取方法[J].华中科技大学学报(自然科学版),2007,35(12):13-15.

作者姓名：	李胜利李昌清袁平鹏刘英书

作者单位：	1. 华中科技大学,计算机科学与技术学院,湖北,武汉,430074 2. 华中科技大学,计算机科学与技术学院,湖北,武汉,430074;河南科技大学,电子信息工程学院,河南,洛阳,471003

基金项目：	中国下一代互联网资助项目 , 湖北省科技基础条件平台专项基金 , 湖北省武汉市科技攻关项目

摘要：	通过对各种Web信息抽取方式的分析,将一种新的抽取方法应用于电子期刊信息抽取.该方法首先应用文档结构相对路径结合节点内容特征进行相似度比较来完成对所需抽取信息块的精确定位;然后对于需要抽取出来的各个信息项则采用正则表达式构造文本信息项的特征模式;在此基础上,实现准确抽取.测试结果表明:基于Web的电子期刊元数据信息抽取方法在查全率和精确度方面高于一般的信息抽取方法,取得了比较令人满意的效果.
关键词：	信息抽取包装器模式匹配电子期刊
文章编号：	1671-4512(2007)12-0013-03
修稿时间：	2006年9月15日
Web-based extraction of periodical metadata information

Li Shengli,Li Changqing,Yuan Pingpeng,Liu Yingshu.Web-based extraction of periodical metadata information[J].JOURNAL OF HUAZHONG UNIVERSITY OF SCIENCE AND TECHNOLOGY.NATURE SCIENCE,2007,35(12):13-15.

Authors:	Li Shengli Li Changqing Yuan Pingpeng Liu Yingshu

Abstract:	A novel method which was adopted to extract periodical metadata was proposed after various ways to extract the information from webs was analyzed.Before the metadata were extracted,those target information blocks were correctly extracted by using relative paths in document and the contents of nodes to jude similarity.According to the similarity,the target information blocks were located.Regular expressions were used to feature the text of the extracted information The experiment results showed the method obtained higher recall and precision than normal method.

Keywords:	information extraction wrap pattern matching periodical metadata
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏