首页 | 本学科首页   官方微博 | 高级检索  
     检索      

嵌套数据记录列表页的Web信息抽取
引用本文:李贵,张琪,郑新录,韩子扬,李征宇.嵌套数据记录列表页的Web信息抽取[J].郑州大学学报(理学版),2011,43(2).
作者姓名:李贵  张琪  郑新录  韩子扬  李征宇
作者单位:沈阳建筑大学,计算机应用技术系,辽宁,沈阳,110168
基金项目:辽宁省自然科学基金资助项目,编号20071004
摘    要:在已有嵌套数据挖掘算法的基础上,加人了数据区域挖掘算法,根据构造出的嵌套数据列表页的标签树,找出所有的数据区域,再对数据区域进行统一处理,对所有子树应用部分树对齐算法进行匹配,生成全局模式,进而抽取出所有数据记录.与原算法相比,改进后的算法在确保准确性的基础上,有效地提高了原算法在处理多数据区域时的效率.

关 键 词:嵌套数据  列表页  标签树  数据区域  全局模式

Web Information Extraction Based on List Pages of Nested Data
LI Gui , ZHANG Qi , ZHENG Xin-lu , HAN Zi-yang , LI Zheng-yu.Web Information Extraction Based on List Pages of Nested Data[J].Journal of Zhengzhou University:Natural Science Edition,2011,43(2).
Authors:LI Gui  ZHANG Qi  ZHENG Xin-lu  HAN Zi-yang  LI Zheng-yu
Institution:LI Gui,ZHANG Qi,ZHENG Xin-lu,HAN Zi-yang,LI Zheng-yu(Department of Computer Application Technology,Shenyang Jianzhu University,Shenyang 110168,China)
Abstract:On the basis of the existing algorithms of the nested data,the data mining algorithm was joined.According to the tag trees of constructed nested list pages,all data regions were found and unified handled.Then a global pattern was produced after all the subtrees were matched based on partial tree aligning algorithm.And all the data records were extracted.Compared with the original algorithm,the efficiency was improved by using the new method,and it ensured the accuracy.
Keywords:nested data  list page  tag tree  data region  global pattern  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号