首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于最优局部信息融合的蛋白质-亚细胞定位预测方法
引用本文:张树波,赖剑煌,何建国. 一种基于最优局部信息融合的蛋白质-亚细胞定位预测方法[J]. 中山大学学报(自然科学版), 2008, 47(6)
作者姓名:张树波  赖剑煌  何建国
作者单位:1. 中山大学数学与计算科学学院,广东,广州,5120275
2. 中山大学信息技术与科学学院,广东,广州,5120275
3. 中山大学生命科学技术学院,广东,广州,5120275
摘    要: 基于蛋白质的合成及分选机制,提出了一种新的蛋白质亚细胞定位预测方法。先采用遍历搜索技术,找出各种亚细胞蛋白质序列分选信号和成熟蛋白质之间的最佳分割位点,把蛋白质序列分为两条子序列,计算这两条子序列中的氨基酸组份并将它们融合起来作为整条蛋白质序列的特征,然后构造用于识别每类蛋白质的最佳子分类器,再根据最大化原则组建集成分类器。在NNPSL数据集上,采用5重交叉验证方法对本文方法进行测试,原核和真核两个蛋白质序列子集分别取得94.1%和87.5%的总体预测精度。同时,此方法在一些蛋白质序列中找到的分割位点与真实生物现象相吻合,能为预测蛋白质序列的剪切位点提供参考信息。

关 键 词:亚细胞定位  N-端分选信号  成熟蛋白质  支持向量机  分割位点
收稿时间:2008-04-11;

A Novel Approach for Prediction of Protein Subcellular Localization Using Optimal Local Information
ZHANG Shu-bo,LAI Jian-huang,HE Jian-guo. A Novel Approach for Prediction of Protein Subcellular Localization Using Optimal Local Information[J]. Acta Scientiarum Naturalium Universitatis Sunyatseni, 2008, 47(6)
Authors:ZHANG Shu-bo  LAI Jian-huang  HE Jian-guo
Affiliation:(1. School of Mathematics and Computational science, Sun Yat sen University, Guangzhou 510275, China;2. School of Information Science and Technology, Sun Yat sen University, Guangzhou 510275, China;3. School of Life Sciences, Sun Yat sen University, Guangzhou 510275, China)
Abstract:Prediction of protein subcellular localization can help infer the function of proteins and apply insight into the interaction between proteins. A novel approach based on the sorting mechanism of proteins, is proposed for predicting subcellular localization of proteins. An optimal splice site is found through iterative searching technique to divide the sequence into sorting signal and mature protein subsequence for each kind of proteins. When designing the classifier, a sub-classifier is built to discriminate each kind of protein from the rest, these sub-classifiers are then combined into an ensemble classifier to predict the subcellular localization of unknown proteins. Through five-fold cross-validation tests on NNPSL datasets and TargetP datasets, overall accuracies of 94.1% and 87.5% are obtained for prokaryotic and eukaryotic proteins respectively, as for TargetP datasets, the overall accuracies are 90.2% and 93.9% for plant and non-plant proteins respectively. Meanwhile, the optimal splice sites found in this paper are coincided with the biological facts in most of kinds protein, this can help predict the cleavage sites of proteins.
Keywords:subcellular localization  N terminal sorting signal  mature protein  support vector machine  splice site
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《中山大学学报(自然科学版)》浏览原始摘要信息
点击此处可从《中山大学学报(自然科学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号