首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种潜在语义索引差异模型
引用本文:米晓芳,王立宏,宋宜斌.一种潜在语义索引差异模型[J].烟台大学学报(自然科学与工程版),2008,21(2):125-129.
作者姓名:米晓芳  王立宏  宋宜斌
作者单位:烟台大学计算机科学与技术学院,山东,烟台,264005
基金项目:国家自然科学基金 , 山东省自然科学基金
摘    要:通过对全局模型和局部模型的分析,提出一种新的潜在语义索引差异模型,能将类别信息反应在词项中、以医学网页为实验对象,将网页中的文本抽取出来并分别用全局模型和差异模型表示,采用SVD和SLSI降维,利用SVM算法进行分类并计算分类正确率和F1指标.实验发现:采用差异模型表示时,2种降维技术下分类正确率和F1指标较全局模型都有明显提高;同时采用差异模型和SLSI算法并不能对分类结果有更大改善

关 键 词:潜在语义索引  差异模型  文本分类  SVM算法
文章编号:1004-8820(2008)02-0125-05
修稿时间:2007年10月18

A Difference Latent Semantic Indexing
MI Xiao-fang,WANG Li-hong,SONG Yi-bin.A Difference Latent Semantic Indexing[J].Journal of Yantai University(Natural Science and Engineering edirion),2008,21(2):125-129.
Authors:MI Xiao-fang  WANG Li-hong  SONG Yi-bin
Institution:(Institute of Computer Science and Technology, Yantai University, Yantai 264005, China)
Abstract:On the base of analysis of global LSI and local LSI, a new difference latent semantic indexing is proposed, which integrates the class information into term set. Medical web pages are used to test the new LSI. The text in medical webpage is extracted and represented by the global LSI and the difference LSI respectively. SVD and SLSI are used to reduce the dimension of feature space, SVM algorithm is employed to classify the feature vectors of testing collection, and the categorical accuracy and macro-average F1 are calculated. Experiment illustrates that the difference LSI gives higher accuracy and macro-average F1 than the global LSI when combined with SVD or SLSI. However, the difference LSI combines with SLSI can' t obtain more improvement on accuracy and the macro-average F1.
Keywords:latent semantic indexing  difference model  text categorization  SVM algorithm
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号