首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于粒度空间的最优聚类指标研究
引用本文:唐旭清,梁启浩,李阳.基于粒度空间的最优聚类指标研究[J].系统工程理论与实践,2018,38(3):755-764.
作者姓名:唐旭清  梁启浩  李阳
作者单位:江南大学 理学院, 无锡 214122
基金项目:国家自然科学基金(11371174);江苏省普通高校研究生科研创新计划资助项目(KYLX15-1188)
摘    要:本文在粒度空间理论的基础上,进行了基于粒度空间的最优聚类模型研究.具体包含以下三个内容:首先提出了基于类内偏差和类间偏差获取数据分层结构的优化聚类指标,进一步建立最优聚类模型,证明了该模型解的存在性,并给出了相应的算法;其次将发生在1902-2015年间同时含有HA与NA蛋白的甲型H1N1流感病毒序列作为实验数据库,应用本文提出的优化模型和算法构建了流感病毒蛋白系统的第一级结构和第二级结构,基于距离中心最近原理建立了签名病毒选取的优化模型,挑选签名病毒蛋白,并构建H1N1流感病毒的核心进化树;最后基于距离中心最近原则构建分类器以验证本文方法的有效性.实验结果表明:应用本文方法处理甲型H1N1流感病毒可得到非常好的分类结果,且正确率达到93.25%.这些为基于大数据的信息处理提供一整套全新的处理方法.

关 键 词:粒度空间  分层聚类指标  最优聚类模型  多层结构  算法  H1N1流感病毒蛋白序列  
收稿时间:2016-08-22

Study on optimal clustering index based on granular space
TANG Xuqing,LIANG Qihao,LI Yang.Study on optimal clustering index based on granular space[J].Systems Engineering —Theory & Practice,2018,38(3):755-764.
Authors:TANG Xuqing  LIANG Qihao  LI Yang
Institution:School of Science, Jiangnan University, Wuxi 214122, China
Abstract:According to the granular space theory, an optimal clustering model based on granular space is presented in this paper. It contains the following three main results. Firstly, the optimization clustering index is proposed to establish the optimal clustering model for extracting the hierarchy structure of data based on the intra-class deviation and inter-class deviation, the existence for solving the model is proved, and the corresponding algorithm is given. Furthermore, the H1N1 influenza virus protein sequences from 1902-2015 years which containing both HA and NA protein are used as an experimental database, and the first structure and the second structure of the influenza virus protein system are constructed by applying the optimization model and algorithm. Based on the nearest principle, the optimization model is established for selecting signature viruses, and their phylogenetic tree is obtained. Finally, a classifier is designed to verify the effectiveness of our method according to the nearest-to-center principle. The results demonstrate that the new method is effective, and the correct rate is 93.25%. These results provide a new processing methods for information processing based on large data.
Keywords:granular space  hierarchical clustering index  optimal clustering model  multi-level structure  algorithm  H1N1 virus protein sequence  
本文献已被 CNKI 等数据库收录!
点击此处可从《系统工程理论与实践》浏览原始摘要信息
点击此处可从《系统工程理论与实践》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号