嵌套数据记录列表页的Web信息抽取 Web Information Extraction Based on List Pages of Nested Data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

嵌套数据记录列表页的Web信息抽取

引用本文：	李贵,张琪,郑新录,韩子扬,李征宇.嵌套数据记录列表页的Web信息抽取[J].郑州大学学报(理学版),2011,43(2).

作者姓名：	李贵张琪郑新录韩子扬李征宇

作者单位：	沈阳建筑大学,计算机应用技术系,辽宁,沈阳,110168

基金项目：	辽宁省自然科学基金资助项目,编号20071004

摘要：	在已有嵌套数据挖掘算法的基础上,加人了数据区域挖掘算法,根据构造出的嵌套数据列表页的标签树,找出所有的数据区域,再对数据区域进行统一处理,对所有子树应用部分树对齐算法进行匹配,生成全局模式,进而抽取出所有数据记录.与原算法相比,改进后的算法在确保准确性的基础上,有效地提高了原算法在处理多数据区域时的效率.
关键词：	嵌套数据列表页标签树数据区域全局模式
Web Information Extraction Based on List Pages of Nested Data

LI Gui , ZHANG Qi , ZHENG Xin-lu , HAN Zi-yang , LI Zheng-yu.Web Information Extraction Based on List Pages of Nested Data[J].Journal of Zhengzhou University:Natural Science Edition,2011,43(2).

Authors:	LI Gui ZHANG Qi ZHENG Xin-lu HAN Zi-yang LI Zheng-yu

Institution:	LI Gui,ZHANG Qi,ZHENG Xin-lu,HAN Zi-yang,LI Zheng-yu(Department of Computer Application Technology,Shenyang Jianzhu University,Shenyang 110168,China)

Abstract:	On the basis of the existing algorithms of the nested data,the data mining algorithm was joined.According to the tag trees of constructed nested list pages,all data regions were found and unified handled.Then a global pattern was produced after all the subtrees were matched based on partial tree aligning algorithm.And all the data records were extracted.Compared with the original algorithm,the efficiency was improved by using the new method,and it ensured the accuracy.

Keywords:	nested data list page tag tree data region global pattern
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏