基于索引路径的深度网页数据抽取改良 Improvement of Data Extraction Based on Index Path期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于索引路径的深度网页数据抽取改良

引用本文：	乐丁惕. 基于索引路径的深度网页数据抽取改良[J]. 海南大学学报(自然科学版), 2012, 30(4): 349-353

作者姓名：	乐丁惕

作者单位：	闽江学院计算机科学系,福建福州,350108

基金项目：	福建省科技厅农业科技重点项目

摘要：	介绍了基于索引路径的数据抽取算法的不足，从代码角度和用户需求角度2个方面进行考虑，提出了一种进改良措施，有效地提升了数据抽取的准确率，从而大大减少了数据冗余．由于补充了记录、有效数据等定义，使得抽取出的数据仍然保有其在网页中的结构关系，为之后的语义标注工作带来了极大的方便，为深度网页（Deep Web）数据集成奠定了良好的基础．
关键词：	深度网页数据抽取索引路径
Improvement of Data Extraction Based on Index Path

LE Ding-ti. Improvement of Data Extraction Based on Index Path[J]. Natural Science Journal of Hainan University, 2012, 30(4): 349-353

Authors:	LE Ding-ti

Affiliation:	LE Ding-ti (Department of Computer Science, Minjiang University, Fuzhou 350108, China)

Abstract:	In our report, the shortcoming of Data Extraction Based on Index Path （DEIP） algorithm were discussed, and based on HTML code and user needs, an improvement method were proposed to elevate the preci- sion ratio of Deep Web data extraction, which reduced data redundancy. Because the record and effective data were added, the data extracted by the new algorithm could still keep their structure just like that in Interact pages, which make the semantic annotation much easier to be processed, and lay a good foundation for data integration of Deep Web.

Keywords:	Deep Web data extraction index path
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏