首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
癌症基因表达数据的熵度量分类方法   总被引:5,自引:4,他引:1  
基因芯片技术的出现和发展为生物医学领域带来了深远影响,运用分类方法研究其产生的海量数据对癌症的分类及治疗有重要意义.该文提出一种利用熵度量作为指标进行癌症基因表达数据特征提取的方法.首先对基因表达数据进行筛选并计算各个基因的熵,然后提取出熵最大的若干基因作为特征基因,并用支持向量机进行分类.对前列腺癌基因表达数据的留一法以及分组法实验都证明了该方法的有效性.  相似文献   

2.
基于结肠癌基因表达谱数据集,提出了一种信息基因提取的新方法。该方法结合了支持向量机(SVM)、Bhattacharyya距离、递归特征消除(RFE)和快速基于相关性过滤器(FCBF)方法。首先,利用Bhattacharyya距离与SVM-RFE方法结合去除无关基因,然后运用FCBF方法得到信息基因,最后以支持向量机作为分类器对结肠癌样本进行分类识别。实验结果表明,同现有的方法相比,该方法在提取基因数量和准确率上都有明显的优势。  相似文献   

3.
Gene expression mieroarray data can be used to classify tumor types.We proposed a new procedure to classify human tumor samples based on mieroarray gene expressions by using a hybrid supervised learning method called MOEA WV(Multi-Objective Evolutionary Algorithm Weighted Voting).MOEA is used to search for a relatively few subsets of informative genes from the high-dimensional gene space,and WV is used as a classification tool.This new method has been applied to predicate the subtypes of lymphoma and outcomes of medulloblastoma.The results are relatively accurate and meaningful compared to those from other methods.  相似文献   

4.
Gene association study is one of the major challenges of biochip technology both for gene diagnosis where only a gene subset is responsible for some diseases, and for the treatment of the curse of dimensionality which occurs especially in DNA microarray datasets where there are more than thousands of genes and only a few number of experiments (samples). This paper presents a gene selection method by training linear support vector machine (SVM)/nonlinear MLP (multilayer perceptron) classifiers and testing them with cross-validation for finding a gene subset which is optimal/suboptimal for the diagnosis of binary/multiple disease types. Genes are selected with linear SVM classifier for the diagnosis of each binary disease types pair and tested by leave-one-out cross-validation; then, genes in the gene subset initialized by the union of them are deleted one by one by removing the gene which brings the greatest decrease of the generalization power, for samples, on the gene subset after removal, where generalization is measured by training MLPs with leave-one-out and leave-four-out cross-validations. The proposed method was tested with experiments on real DNA microarray MIT data and NCI data. The result shows that it outperforms conventional SNR method in the separability of the data with expression levels on selected genes. For real DNA microarray MIT/NCI data, which is composed of 7129/2308 effective genes with only 72/64 labeled samples belonging to 2/4 disease classes, only 11/6 genes are selected to be diagnostic genes. The selected genes are tested by the classification of samples on these genes with SVM/MLP with leave-one-out/both leave-one-out and leave-four-out cross-validations. The result of no misclassification indicates that the selected genes can be really considered as diagnostic genes for the diagnosis of the corresponding diseases.  相似文献   

5.
Gene association study is one of the major challenges of biochip technology both for gene diagnosis where only a gene subset is responsible to some diseases, and for treatment of curse of dimensionality which occurs especially in DNA microarray datasets where there are more than thousands of genes and only a few number of experiments (samples). This paper presents a gene selection method by training linear support vector machine (SVM)/nonlinear MLP (multi-layer perceptron) classifiers and testing them with cross validation for finding a gene subset which is optimal/suboptimal for diagnosis of binary/multiple disease types. Genes are selected with linear SVM classifier for the diagnosis of each binary disease types pair and tested by leave-one-out cross validation; then, genes in the gene subset initialized by the union of them are deleted one by one by removing the gene which brings the greatest decrease of the generalization power, for samples, on the gene subset after removal, where generalization is measured by training MLPs with leave-one-out and leave-4-out cross validations. The proposed method was tested with experiments on real DNA microarray MIT data and NCI data. The result shows that it outperforms conventional SNR method in separability of the data with expression levels on selected genes. For real DNA microarray MIT/NCI data, which is composed of 7129/2308 effective genes with only 72/64 labeled samples belonging to 2/4 disease classes, only 11/6 genes are selected to be diagnostic genes. The selected genes are tested by classification of samples on these genes with SVM/MLP with leave-one-out/both leave-one-out and leave-4-out cross validations. The result of no misclassification indicates that the selected genes can be really considered as diagnostic genes for the diagnosis of the corresponding diseases.  相似文献   

6.
Gene association study is one of the major challenges of biochip technology both for gene diagnosis where only a gene subset is responsible for some diseases, and for the treatment of the curse of dimensionality which occurs especially in DNA microarray datasets where there are more than thousands of genes and only a few number of experiments (samples). This paper presents a gene selection method by training linear support vector machine (SVM)/nonlinear MLP (multilayer perceptron) classifiers and testing them with cross-validation for finding a gene subset which is optimal/suboptimal for the diagnosis of binary/multiple disease types. Genes are selected with linear SVM classifier for the diagnosis of each binary disease types pair and tested by leave-one-out cross-validation; then, genes in the gene subset initialized by the union of them are deleted one by one by removing the gene which brings the greatest decrease of the generalization power, for samples, on the gene subset after removal, where generalization is measured by training MLPs with leaveone-out and leave-four-out cross-validations. The proposed method was tested with experiments on real DNA microarray MIT data and NCI data. The result shows that it outperforms conventional SNR method in the separability of the data with expression levels on selected genes. For real DNA microarray MIT/NCI data, which is composed of 7129/2308 effective genes with only 72/64 labeled samples belonging to 2/4 disease classes, only 11/6 genes are selected to be diagnostic genes. The selected genes are tested by the classification of samples on these genes with SVM/MLP with leave-one-out/both leave-one-out and leave-four-out cross-validations. The result of no misclassification indicates that the selected genes can be really considered as diagnostic genes for the diagnosis of the corresponding diseases.  相似文献   

7.
针对基因表达谱数据的高维度、低样本和连续型等特点,提出一种结合邻域互信息和自组织映射进行特征基因选取的方法.首先提出一种改进的Relief算法,对基因进行排序生成候选特征集合;然后提出基于邻域互信息的自组织映射算法对生成的候选特征基因进行聚类;最后利用提出的属性重要性系数从每一类簇中选择代表基因组成特征基因子集.实验结果表明,该方法可以快速有效地选取肿瘤特征基因,能获得较好的分类结果.  相似文献   

8.
微阵列数据具有样本小、维度高的特点,给数据分析带来了困难。因此,在生物信息学的研究和应用中,从微阵列数据里挑选主基因(特征选取)是十分重要和有意义的。本文采用基于最优正交质心特征选取算法(OCFS)来挑选主基因,并与基于信噪比的主基因挑选法和基于遗传算法的主基因挑选法进行了对比。利用挑选出的主基因,采用支持向量机(SVM)对数据样本进行了分类研究。通过实验,在经典的白血病数据集上,对于34个样本的测试集,达到了33/34的分类准确率,表明了本方法的适用性。  相似文献   

9.
为解决微阵列数据中因样本量少且每个样本的维度高而带有大量干扰信息和冗余信息的问题, 通过分阶段的步骤对特征基因集进行全方位的选取和优化。考虑到单个基因在不同环境中的差异性, 从中选择出只在特定条件下差异较大的基因构成候选特征集; 剔除候选特征集中相关性较小的基因; 采用遗传算法对所得特征集的任意子集的整体分类性能进行考查, 选出较优的子集。实验结果表明, 该算法对逐步选取特征基因具有可行性和有效性, 而特征基因集在分类适应度(分类能力度量)和分类准确率均比原始数据更好。  相似文献   

10.
Tumor diagnosis by analyzing gene expression profiles becomes an interesting topic in bioinformatics and the main problem is to identify the genes related to a tumor. This paper proposes a rank sum method to identify the related genes based on the rank sum test theory in statistics. The tumor diagnosis system is constructed by the support vector machine (SVM) trained on the set of the related gene expression profiles. The experiments demonstrate that the constructed tumor diagnosis system with the rank sum method and SVM can reach an accuracy level of 96.2% on the colon data and 100% on the leukemia data.  相似文献   

11.
甘薯是世界第7大粮食作物,具有产量高、营养丰富、耐干旱和盐碱等优点.通过转录组学方法,基于国内3个甘薯品种的综合转录组数据库中进行耐旱和耐盐相关转录本序列的挖掘,共获得238个转录本,随机选取9条具有全长编码序列的转录本进行PCR扩增、克隆和测序后,测定序列与组装序列相似度均高于98%,表明转录本序列是可靠的.分析这些转录本在不同甘薯品种间的表达水平时发现,耐旱基因在京薯6号中表达量普遍较高,而在徐薯18中则普遍偏低,而耐盐相关基因的差异表达模式却与之相反.利用数字基因表达谱数据进一步分析耐旱和耐盐基因在徐薯18的6个组织或生长发育阶段中的差异和特异表达,结果显示:耐旱和耐盐基因在成熟叶和膨大期块根中相对于其他组织具有高表达且差异显著.采用荧光定量PCR方法验证结果表明,耐旱和耐盐基因在徐薯18不同组织器官或发育阶段表达情况与数字基因表达谱分析结果基本一致.  相似文献   

12.
基于互信息的差异共表达致病基因挖掘方法   总被引:1,自引:0,他引:1  
为了挖掘基因表达数据中的差异共表达致病基因模块,提出了基于互信息和最大团相结合的方法.互信息用于度量基因表达谱之间的相互关系,计算任意2条基因表达谱在2种不同样本中的互信息值,得到2个互信息矩阵M1和M2,选定2个阈值T1和T2(T1T2)将矩阵M1和M2二值化,并通过M1和M2中元素的逻辑"与"运算得到图的邻接矩阵,从邻接矩阵挖掘出的最大团则为差异共表达致病基因模块.将该方法应用于Colon数据,选定T1=2.2,T2=1.0,得到6个相互重叠的最大团,实验结果表明,该方法能有效挖掘出差异共表达致病基因模块.  相似文献   

13.
为了探讨蔷薇科植物MLO基因在抗白粉病中的作用,研究应用病毒诱导的基因沉默技术(virus induced gene silencing,VIGS)抑制了大花香水月季RgMLO6基因和长尖叶蔷薇RlMLO7基因的表达,随后接种白粉菌对这2个基因进行抗性鉴定. 研究发现在VIGS载体转化植株叶片20 d后,RgMLO6和RlMLO7基因的相对表达量显著下降了80%~90%,沉默效果明显. 分别对2个基因沉默后的嫩叶进行白粉病抗性鉴定,大花香水月季和长尖叶蔷薇的抗性水平较对照组均提高. 显微镜观察白粉菌接种2个基因沉默后植株叶片中菌丝体的生长情况,整体表现出沉默植株叶表皮细胞上的白粉菌生长较对照组生长缓慢. 结果表明RgMLO6与RlMLO7基因对蔷薇科植物的白粉病有负向调控作用.  相似文献   

14.
通过多重PCR方法,对采用SSH(抑制消减杂交)技术制备日本对虾卵巢特异探针并筛选卵巢cDNA全长文库所获得的8个阳性克隆进行分析,研究这些新克隆的基因在精巢和卵巢的差异表达情况。这8个基因可分为二类,一类为卵巢特异表达的基因,一类为卵巢的表达量高于精巢的差异表达基因。  相似文献   

15.
Modeling linkage disequilibria (LD) between genes usually observed in admixed natural populations has been shown an effective approach in high-resolution mapping of disease genes in humans. A prerequisite to obtain accurate estimation of recombination fraction between genes at a marker locus and the disease locus using the approach is a reliable prediction of the proportion of the admixture populations. The present study suggested the use of gene frequencies to predict the estimate of the admixture proportion based on the observation that the gene frequencies are much more stable quantities than the haplotype frequencies over evolution of the population. In this paper, we advanced the theory and methods by which the decay rate of nonlinear term of LD in admixed population may be used to estimate the recombination fraction between the genes. Theoretical analysis and simulation study indicate that, the larger the difference of gene frequencies between parental populations and the more closely the admixture proportion approaches 0.5, the more important the nonlinear term of the LD in the admixed population, and hence the more informative such admixed populations in the high-resolution gene mapping practice.  相似文献   

16.
17.
18.
针对肿瘤基因数据因维度高和冗余基因较多而导致分类精度低的问题,提出一种基于PCA和信息增益的肿瘤特征基因选择方法.该方法首先使用PCA算法剔除冗余基因,获得预选特征基因子集;然后利用信息增益算法对预选特征基因子集进行优化选取,得到特征基因子集;最后采用不同分类模型对特征基因子集进行仿真实验.实验结果表明,所提方法提高了基因表达谱的分类精度,从而表明致病基因被有效地选取出来.  相似文献   

19.
The mechanisms of cotton fiber development and somatic embryogenesis have been explored sys-tematically with microarray and suppression subtractive hybridization. Real-time RT-PCR provides the simultaneous measurement of gene expression in many different samples,with which the data from microarray or others can be confirmed in detail. To achieve accurate and reliable gene expression re-sults,normalization of real-time PCR data against one or several internal control genes is required,which should not fluctuate in different tissues during various stages of development. We assessed the gene expression of 7 frequently used housekeeping genes,including 18S rRNA,Histone3,UBQ7,Actin,Cyclophilin,Gbpolyubiquitin-1 and Gbpolyubiquitin-2,in a diverse set of 21 cotton samples. For fiber developmental series the expression of all housekeeping genes had the same down tendency after 17 DPA. But the expression of the AGP gene(arabinogalactan protein) that has high expression level at the later fiber development stage was up-regulated from 15 to 27 DPA. So the relative absolute quanti-fication should be an efficient and convenient method for the fiber developmental series. The expres-sion of nonfiber tissues series varied not so much against the fiber developmental series. And three best control genes Histone3,UBQ7 and Gbpolyubiquitin-1 have to be used in a combinated way to get better normalization.  相似文献   

20.
聚类是识别基因表达数据蕴含的关键基因调控模块的一种有效方法,基因表达谱的相似性度量是聚类的关键问题.然而,一般的相似性度量方法不能刻画时间序列基因表达谱数据所蕴含的时间延迟、反向相关和局部相关等复杂的基因调控关系.针对时间序列基因表达谱数据,提出一种基于近邻传播和动态规划的相似性度量方法和聚类算法.在大鼠再生肝细胞基因表达谱数据集上的聚类结果与基因功能富集分析结果高度一致,证明算法在时间序列基因表达谱数据聚类上的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号