首页 | 本学科首页   官方微博 | 高级检索  
     检索      

WWW网站分类体系包装器WCSW
引用本文:高克宁,王波,张斌,游镇.WWW网站分类体系包装器WCSW[J].东北大学学报(自然科学版),2007,28(1):44-48.
作者姓名:高克宁  王波  张斌  游镇
作者单位:东北大学信息科学与工程学院 辽宁沈阳110004
基金项目:国家“十五”科技攻关项目(2004BA721A05)
摘    要:Web网站按自身的导航体系组织信息,其导航体系中含有分类语义特征.为实现有效的Web信息抽取,针对Web网站的分类体系,提出了基于HTML页面分块算法的Web网站分类体系包装器WCSW(website classification system wrapper),WCSW将整个网站作为包装对象,以分块算法和块语义特征分析为基础,根据抽取规则对网站具有分类语义的导航信息块进行处理.实验结果表明:抽取的Web网站分类层次的准确率较高,实用性较强.

关 键 词:Web分类  包装器  Web页面分块  语义特征分析  WCSW规则  
文章编号:1005-3026(2007)01-0044-05
收稿时间:2005-12-31
修稿时间:2005-12-31

On the WCSW:Website Classification System Wrapper
GAO Ke-ning, WANG Bo, ZHANG Bin, YOU Zhen.On the WCSW:Website Classification System Wrapper[J].Journal of Northeastern University(Natural Science),2007,28(1):44-48.
Authors:GAO Ke-ning  WANG Bo  ZHANG Bin  YOU Zhen
Institution:(1) School of Information Science and Engineering, Northeastern University, Shenyang 110004, China
Abstract:In a website,various information is organized by its own navigation system,which involves the semantic characteristics of classification.In order to fulfill effective extraction of Web information,the WCSW(website classification system wrapper) based on HTML page blocking algorithm is proposed aiming at the classification system of websites.WCSW deals with navigation information blocks involving semantic classification in accordance to extraction rules,which the whole website as an object based on the blocking algorithm and analysis of semantic characteristics,the experimental result shows high-accuracy level classification in extracted websites with good practicability.
Keywords:web classification  wrapper  block  analysis of semantic characteristics  rule of WCSW
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《东北大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《东北大学学报(自然科学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号