首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于块分析的网页去噪音方法
引用本文:刘晨曦,吴扬扬.一种基于块分析的网页去噪音方法[J].广西师范大学学报(自然科学版),2007,25(2):149-152.
作者姓名:刘晨曦  吴扬扬
作者单位:华侨大学,信息科学与工程学院,福建,泉州,362021
基金项目:福建省科技计划资助项目(2004I014),福建省自然科学基金资助项目(A0510020)
摘    要:一张网页通常由许多信息块组成,除了主题内容块以外,还常常包含广告信息、导航条、版权信息等信息块。结合网页块大小、位置等信息以及网页本身的一些特点,提出了一种基于块分析的、自动调整阈值的去除噪音方法,该算法显著减少了网页的噪音,并通过网页分类对比实验证明了该算法的有效性。

关 键 词:网页  噪音  信息提取  HTML
文章编号:1001-6600(2007)02-0149-04
收稿时间:2006-12-15
修稿时间:2006-12-15

A Block-analysis-based Approach to Eliminate Noise in Web Pages
LIU Chen-xi,WU Yang-yang.A Block-analysis-based Approach to Eliminate Noise in Web Pages[J].Journal of Guangxi Normal University(Natural Science Edition),2007,25(2):149-152.
Authors:LIU Chen-xi  WU Yang-yang
Institution:College of Information Science and Engineering,Huaqiao University,Quanzhou 362021 ,China
Abstract:A Web page usually consists of many information blocks.In addition to main content blocks,it usually contains advertisement,navigation panels,and copyright etc.In view of the information of the Web pages,such as the size and the position of the Web block,and based on the features of the Web pages,a block-analysis-based and auto-adjusted threshold approach is proposed to eliminate the noise content in Web pages.This approach markedly reduces the noise contents in Web pages,and the experimental result of Web page classification verifies the validity of the approach presented in this paper.
Keywords:Web page  noise  information extraction  HTML
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号