首页 | 本学科首页   官方微博 | 高级检索  
     检索      

Hierarchical Subtopic Segmentation of Web Document
作者姓名:ZHANG  Yun-tao  GONG  Ling  WANG  Yong-cheng
作者单位:[1]Network and-Information Center, Shanghai Jiaotong University, Shanghai 200030, China [2]School of Electronic and Information Technology, Shanghai Jiaotong University, Shanghai 200030, China
基金项目:Supported by the National High Teeh nology Research and Development Program of China (2002AA119050)
摘    要:0 IntroductionWittehxt t hceo lcloecnttiionnusi n,gi tgr ohawsth b oefc othmee wionrclrde-awsiindgely W ie mbpaonrdt aonntli ntoeprovidei mproved mechanismsfor findinginformation quickly.A Webdocument may contain multiple various subtopics covered in more orless detail .Each subtopic corresponds to a specific passage containinga natural paragraph or multiple adjacent paragraphsinthe document .The main commercial search engines return the entire relevantdocuments corresponding to user’s query…

关 键 词:Web文档  副主题  分级文本分割  文字信息处理
文章编号:1007-1202(2006)01-0047-04
收稿时间:2005-04-20

Hierarchical subtopic segmentation of web document
ZHANG Yun-tao GONG Ling WANG Yong-cheng.Hierarchical Subtopic Segmentation of Web Document[J].Wuhan University Journal of Natural Sciences,2006,11(1):47-50.
Authors:Zhang Yun-tao  Gong Ling  Wang Yong-cheng
Institution:(1) Network and Information Center, Shanghai Jiaotong University, 200030 Shanghai, China;(2) School of Electronic and Information Technology, Shanghai Jiaotong University, 200030 Shanghai, China
Abstract:The paper proposes a novel method for subtopics segmentation of Web document. An effective retrieval results may be obtained by using subtopics segmentation. The proposed method can segment hierarchically subtopics and identify the boundary of each subtopic. Based on the term frequency matrix, the method measures the similarity between adjacent blocks, such as paragraphs, passages. In the real-world sample experiment, the macro-averaged precision and recall reach 73.4 % and 82.5 %, and the micro-averaged precision and recall reach 72.9% and 83. 1%. Moreover, this method is equally efficient to other Asian languages such as Japanese and Korean, as well as other western languages.
Keywords:subtopic segmentation  Web document  passage retrieval  discourse
本文献已被 CNKI 维普 万方数据 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号