首页 | 本学科首页   官方微博 | 高级检索  
     

基于索引路径的深度网页数据抽取改良
引用本文:乐丁惕. 基于索引路径的深度网页数据抽取改良[J]. 海南大学学报(自然科学版), 2012, 30(4): 349-353
作者姓名:乐丁惕
作者单位:闽江学院计算机科学系,福建福州,350108
基金项目:福建省科技厅农业科技重点项目
摘    要:介绍了基于索引路径的数据抽取算法的不足,从代码角度和用户需求角度2个方面进行考虑,提出了一种进改良措施,有效地提升了数据抽取的准确率,从而大大减少了数据冗余.由于补充了记录、有效数据等定义,使得抽取出的数据仍然保有其在网页中的结构关系,为之后的语义标注工作带来了极大的方便,为深度网页(Deep Web)数据集成奠定了良好的基础.

关 键 词:深度网页  数据抽取  索引路径

Improvement of Data Extraction Based on Index Path
LE Ding-ti. Improvement of Data Extraction Based on Index Path[J]. Natural Science Journal of Hainan University, 2012, 30(4): 349-353
Authors:LE Ding-ti
Affiliation:LE Ding-ti (Department of Computer Science, Minjiang University, Fuzhou 350108, China)
Abstract:In our report, the shortcoming of Data Extraction Based on Index Path (DEIP) algorithm were discussed, and based on HTML code and user needs, an improvement method were proposed to elevate the preci- sion ratio of Deep Web data extraction, which reduced data redundancy. Because the record and effective data were added, the data extracted by the new algorithm could still keep their structure just like that in Interact pages, which make the semantic annotation much easier to be processed, and lay a good foundation for data integration of Deep Web.
Keywords:Deep Web  data extraction  index path
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号