首页 | 本学科首页   官方微博 | 高级检索  
     检索      

网络信息审计系统中的文本片断模糊分类算法
引用本文:李金库,张德运,高鹏,孙钦东.网络信息审计系统中的文本片断模糊分类算法[J].西安交通大学学报,2005,39(8):800-803.
作者姓名:李金库  张德运  高鹏  孙钦东
作者单位:西安交通大学电子与信息工程学院,710049,西安
基金项目:国家高技术发展计划资助项目(2003AA148010).
摘    要:分析了分段对文本分类的影响,提出了与文本语义密切相关的最大语义标志原则(MSMR)和段落间的语义激励原则(SIR),在模糊K-最近邻分类算法的基础上,应用这2个原则设计并实现了一种基于上下文的文本片断模糊分类算法.该算法依据SIR判断文本片段分类的相互影响,降低了片段分类的错误率,当某一片断类隶属度大于某一阈值时,依据MSMR判定可知,同一文档的后续片断均属于同一类别,这样就不用计算所有片断的类隶属度.实验表明:与模糊K-最近邻分类算法相比,所提算法能有效提高系统的查准率、查全率和正确率,其中查全率可提高16%以上;在同一会话中,由于被明确分类后的后续片段不需要计算类隶属度,所以算法总计算时间明显少于模糊K-最近邻分类算法,具有较高的分类效率.

关 键 词:文本片段分类  信息审计  K-最近邻  模糊分类
文章编号:0253-987X(2005)08-0800-04
收稿时间:09 30 2004 12:00AM
修稿时间:2004年9月30日

Text-Fragment Fuzzy Classification Algorithm for Network Information Auditing System
Li Jinku,ZHANG Deyun,Gao Peng,Sun Qindong.Text-Fragment Fuzzy Classification Algorithm for Network Information Auditing System[J].Journal of Xi'an Jiaotong University,2005,39(8):800-803.
Authors:Li Jinku  ZHANG Deyun  Gao Peng  Sun Qindong
Abstract:The impact on text classification when text document is broken into fragments is analyzed; the most semantic marking rule (MSMR) and semantic inspiring rule (SIR) between paragraphs which are closely correlated to text semantics are defined; using these two rules, based on KNN (K-nearest-neighbor) algorithm, a context-sensitive text-fragment classification algorithm is designed and implemented. Through computing the classification interaction between text-fragments, the algorithm can reduce the error rate of classification according to SIR, and when the membership value of one fragment is more than an especial threshold it can conclude that the following fragments of a document belong to a same class according to MSMR. Compared to KNN algorithm, the experiment shows that the new algorithm increases veracity and efficiency of classification by more than 16%, and in a session, because the subsequent fragments that have been classified definitely do not need the computation of the membership value, the total computing time of the proposed algorithm is much less than ordinary nearest fuzzy neighbor classification method, thus has higher classification efficiency.
Keywords:text-fragment classification  information auditing  K-nearest-neighbor  fuzzy classifica-
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号