首页 | 本学科首页   官方微博 | 高级检索  
     检索      

A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE
作者姓名:Taketoshi  YOSHIDA
作者单位:School of
摘    要:1. Introduction As an important application field of the data clustering technologies (Jain and Murty et al. 1999), text clustering is unsupervised partitioning of a collection of textual documents into self-similar groups so that any item is more similar with another item in the same group thanwith an item outside the group. Such groups are called clusters, which are run-timely formed during the clustering process, instead of being pre-defined as in the case of text categorization, which comm…

关 键 词:文本聚类算法  蚁群算法  语义相似测度  文字信息处理

A modified ant-based text clustering algorithm with semantic similarity measure
Taketoshi YOSHIDA.A MODIFIED ANT-BASED TEXT CLUSTERING ALGORITHM WITH SEMANTIC SIMILARITY MEASURE[J].Journal of Systems Science and Systems Engineering,2006,15(4):474-492.
Authors:Haoxiang Xia  Shuguang Wang  Taketoshi Yoshida
Institution:1. Institute of Systems Engineering, Dalian University of Technology, Dalian, 116024, China
2. BHR-Frontline Technologies (Dalian) Corporation Ltd, Dalian, 116023, China
3. School of Knowledge Science, Japan Advanced Institute of Science and Technology, Ishikawa 923-1292 Japan
Abstract:Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance. Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant’s carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm. This work was supported in part by National Natural Science Foundation of China under Grants No.70301009 and No. 70431001, and by Ministry of Education, Culture, Sports, Science and Technology of Japan under the “Kanazawa Region, Ishikawa High-Tech Sensing Cluster of Knowledge-Based Cluster Creation Project”. Haoxiang Xia Associate professor at Dalian University of Technology. He obtained his Ph.D. degree from Institute of Systems Engineering, Dalian University of Technology (DUT) in 1998. Before working at DUT since 2000, he was a postdoctoral fellow at Institute of Systems Science, the Chinese Academy of Sciences from 1998 to 2000. He worked at Japan Advanced Institute of Science and Technology as a visiting associate professor from 2004 to 2006. His major research interests include Internet-based information systems, knowledge management systems, and complex adaptive systems. Shuguang Wang Software engineer at BHR-Frontline technologies (Dalian) CO, LTD. He received his master’s degree from Institute of Systems Engineering, Dalian University of Technology in 2006. His research interests are on data clustering, text mining and evolutionary algorithms. Taketoshi Yoshida Professor at Japan Advanced Institute of Science and Technology. He received his Ph.D. degree from the department of Systems Engineering, Case Western Reserve University in 1984. He worked for IBM Japan from 1985 to 1997. His research interests are in systems science and knowledge-handling information systems.
Keywords:Ant-based clustering  text clustering  ant movement rule  semantic similarity measure
本文献已被 CNKI 维普 万方数据 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号