首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于结构信息总结树的XML文档聚类方法
引用本文:梁作鹏,吴文明,董逸生. 一种基于结构信息总结树的XML文档聚类方法[J]. 应用科学学报, 2005, 23(1): 71-74
作者姓名:梁作鹏  吴文明  董逸生
作者单位:东南大学计算机科学与工程系, 江苏南京 210096
摘    要:提出一种有效的XML文档结构信息表达方法,用数字化的结构总结树SST对XML文档的结构信息进行编码,在此基础上给出结构距离的定义,并采用遗传算法对XML文档进行聚类.实验证明该方法分类准确率高,易于实现,且不需先验的DTD知识.

关 键 词:文档聚类  SST (结构总结树)  信息检索  遗传算法  XML  
文章编号:0255-8297(2005)01-0071-04
收稿时间:2003-11-01
修稿时间:2004-03-15

Clustering XML Documents Based on a Structural Summary Tree
LIANG Zuo-peng,WU Wen-ming,DONG Yi-sheng. Clustering XML Documents Based on a Structural Summary Tree[J]. Journal of Applied Sciences, 2005, 23(1): 71-74
Authors:LIANG Zuo-peng  WU Wen-ming  DONG Yi-sheng
Affiliation:Department of Computer Science & Engineering, Southeast University, Nanjing 210096, China
Abstract:An approach for calculating the structural similarity between XML documents is proposed in this paper. The structural information of an XML document is captured with a structural summary tree (SST). By encoding elements as digital numbers, a SST is transformed to a digit-labeled tree. Digital numbers at different tree levels are concatenated to form a vector after the normalization process. Consequently, each XML document is represented as an m-dimension vector. The GA-based clustering algorithm is adopted since it is able to provide good results irrespective of the starting configuration. Experimental results show the effectiveness and scalability of the approach.
Keywords:XML  information retrieval  document clustering  GA  SST(structure summary tree)
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《应用科学学报》浏览原始摘要信息
点击此处可从《应用科学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号