首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
由于人类DNA序列上单核苷酸具有多态性,DNA序列异常挖掘是后基因组时代的一个重要研究课题。文章在分析现有DNA序列数据挖掘方法的基础上,利用流形学习中不同低维嵌入向量之间向量距离不同的特点,提出了基于流形学习的DNA序列数据挖掘方法(5Dlocally linear embedding,简称5DLLE)。实验结果表明,与隐马尔可夫模型(HMM)和支持向量机(SVM)相比,文中所提出的5DLLE方法在DNA序列数据挖掘方面具有一定优势,不但平均识别率高,而且计算时间相对较少。  相似文献   

2.
DNA序列特征提取方法研究   总被引:3,自引:0,他引:3  
针对DNA序列分类问题提出了两种特征提取方法,利用可分支持向量分类机间隔大、推广能力强的原理建立了DNA序列特征提取方法优劣的评价标准,利用该标准把本文的两种特征提取方法进行了比较,且跟以往的DNA序列特征提取方法进行了比较.实验表明,提出的两种特征方法得到的DNA序列特征完全能够代表DNA序列,对已知分类样本的预测率为100%,且此特征提取方法有很强的推广能力.  相似文献   

3.
针对DNA序列类别的分属问题,提出采用支持向量机(Support Vector Machine,SVM)的方法进行分类。根据SVM分类器的要求建立特征属性空间,首先由每个DNA中4个碱基的含量得到4个特征属性,然后在此空间中扩充DNA序列长度的属性,最后根据SVM分类器对已知的DNA分类样本做训练得到分类超平面。利用此超平面检测所要分类的DNA序列,实验结果表明这种方法具有很好的分类精度。  相似文献   

4.
为了提高多模信号的均衡效果,提出一种新变异DNA遗传人工鱼群优化DNA序列的频域加权多模算法(nm DNAGAFS-DNA-FWMMA)。该算法利用新型变异DNA遗传人工鱼群算法收敛速度快和全局搜索能力强的优点,通过DNA约束模型和代价函数来寻找最优DNA序列,将该序列解码后作为频域加权多模算法(FWMMA)初始最优权向量,以提高收敛速度并减小剩余均方误差。仿真结果表明,nm DNAG-AFS-DNA-FWMMA的收敛速度快、均方误差小。  相似文献   

5.
刘西奎  李艳  许进 《自然科学进展》2004,14(9):1032-1038
在DNA序列研究中,对长DNA序列进行有效表示,可以为DNA序列的分类、分析和比较等研究提供创新性的方法. Nandy,Leong和Mogenthaler,Randic等已经给出了DNA序列的二维或三维图表示. 这些图表示给出了DNA序列的可视化特征. 文中给出了一个改进的DNA序列的图表示:在2维指数坐标系内用4个特定的向量分别表示DNA序列中的4个碱基,从而使DNA序列可以用有向路表示. 给出了一个例子说明该方法的有效性,可以证明该种改进的DNA序列图表示方法具有较低的退化度甚至没有退化.  相似文献   

6.
马弘 《科技信息》2012,(2):160-160,162
一条DNA原始序列可以看作是字符集Ω={a,g,c,t}上的一个字符串。基于多重集{∞?a,∞?g,∞?c,∞?t}的所有2-组合,我们将DNA原始序列转化为一个10字母序列,进而构造10-元向量来刻画DNA序列,这个向量的分量是一种加权拟熵,它能更好的反映出序列中的元素,尤其元素之间的序关系所包含的信息。这样构造的DNA序列的数值刻画对字符替换是非常敏感的。作为应用,我们对15个物种的β-球蛋白基因进行了相似性分析,得到的结果与文献中是一致的。  相似文献   

7.
本文提出了一种比较DNA序列的方法,对于通过数学方法提取出的含有DNA序列信息的多种数值特征构成的向量,设计了一种新的聚类算法,使其能对一组向量进行有效分类从而达到对DNA序列进行比较的目的.为了避免传统算法的缺陷和增加达到分类最优解的概率,我们将遗传算法引入进来,提出了一种基于遗传算法的聚类算法.一组对八种有胎盘哺乳动物的线粒体全基因组序列的分类实验验证了该方法的有效性.  相似文献   

8.
提出了一种新的DNA序列的2-D图形表示方法,并证明了它的非退化性,随后结合图形表示给出DNA序列的12个正规化的ALE指标.在此基础上,结合双核苷酸计数和符号序列LZ复杂度,将DNA序列转化为一个29维的数值向量.对23个物种的β球蛋白基因和18个物种的线粒体NADH脱氢酶序列进行的系统发生分析,证明了所提方法的有效性.  相似文献   

9.
基于DNA序列上A,G,C,T等4种碱基的含量能反映序列的一些结构特征的假设,通过将4种碱基出现的相对频率视为向量分量,而将一条DNA序列抽象成R4空间的一个向量,然后按类似欧氏距离定义了A类、B类序列集的中心和半径,将问题转化为讨论任一向量与球域的相对位置关系,从而得到了一种几何分类方法.  相似文献   

10.
利用隐马尔可夫模型训练中不同结构的DNA序列的L值分布范围不同的特点,对传统多类投票模型进行改进,提出一种优于传统算法的快速训练算法,该算法只需训练出一类隐马尔可夫模型参数.对DNA内含子和外显子序列进行识别,平均识别率达到了90.8%.与支持向量机相比,隐马尔可夫模型在解决多分类问题方面具有优势,不但计算时间少,而且识别率高.  相似文献   

11.
DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods. In this paper, we propose a DNA sequence alignment that uses quality information and a fuzzy inference method developed based on characteristics of DNA fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods that uses DNA sequence quality information. In conventional algorithms, DNA sequence alignment scores are calculated by the global sequence alignment algorithm proposed by Needleman-Wunsch, which is established by using quality information of each DNA fragment. However, there may be errors in the process of calculating DNA sequence alignment scores when the quality of DNA fragment tips is low, because only overall DNA sequence quality information are used. In our proposed method, an exact DNA sequence alignment can be achieved in spite of low quality of DNA fragment tips by improvement of conventional algorithms using quality information. Mapping score parameters used to calculate DNA sequence alignment scores are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments. From the experiments by applying real genome data of National Center for Biotechnology Information, we could see that the proposed method is more efficient than conventional algorithms.  相似文献   

12.
DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods. In this paper, we propose a DNA sequence alignment that uses quality information and a fuzzy inference method developed based on the characteristics of DNA fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods that uses DNA sequence quality information. In conventional algorithms, DNA sequence alignment scores are calculated by the global sequence alignment algo- rithm proposed by Needleman-Wunsch, which is established by using quality information of each DNA fragment. However, there may be errors in the process of calculating DNA sequence alignment scores when the quality of DNA fragment tips is low, because only the overall DNA sequence quality information are used. In our proposed method, an exact DNA sequence alignment can be achieved in spite of the low quality of DNA fragment tips by improvement of conventional algorithms using quality information. Mapping score param- eters used to calculate DNA sequence alignment scores are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments. From the experiments by applying real genome data of National Center for Biotechnology Information, we could see that the proposed method is more efficient than conventional algorithms.  相似文献   

13.
The design of DNA sequence plays an important role in improving the reliability of DNA computation. Proper constrained terms that DNA sequence should satisfy are selected, and then the evaluation formulas of each DNA individual corresponding to the selected constrained terms are proposed. The heuristic improved genetic algorithm (GA)/simulated annealing (SA) algorithm is presented to solve the multi-objective optimize problem, and the DNA sequence design system is developed. Furthermore, an example is illustrated to show the efficiency of our method given here.  相似文献   

14.
The design of DNA sequence plays an important role in improving the reliability of DNA computation. Proper constrained terms that DNA sequence should satisfy are selected, and then the evaluation formulas of each DNA individual corresponding to the selected constrained terms are proposed. The heuristic improved genetic algorithm (GA)/simulated annealing (SA) algorithm is presented to solve the multi-objective optimize problem, and the DNA sequence design system is developed. Furthermore, an example is illustrated to show the efficiency of our method given here.  相似文献   

15.
DNA sequence design has a crucial role in successful DNA computation,which has been proved to be an NP-hard(non-deterministic polynomial-time hard) problem.In this paper,a membrane evolutionary algorithm is proposed for the DNA sequence design problem.The results of computer experiments are reported,in which the new algorithm is validated and out-performs certain known evolutionary algorithms for the DNA sequence design problem.  相似文献   

16.
Fluorescence detection in automated DNA sequence analysis   总被引:51,自引:0,他引:51  
We have developed a method for the partial automation of DNA sequence analysis. Fluorescence detection of the DNA fragments is accomplished by means of a fluorophore covalently attached to the oligonucleotide primer used in enzymatic DNA sequence analysis. A different coloured fluorophore is used for each of the reactions specific for the bases A, C, G and T. The reaction mixtures are combined and co-electrophoresed down a single polyacrylamide gel tube, the separated fluorescent bands of DNA are detected near the bottom of the tube, and the sequence information is acquired directly by computer.  相似文献   

17.
人类基因组计划中的DNA序列图谱是生命科学和基因工程中的伟大成就,要解译隐藏在基因组中的生物信息还有一段很长的路要走,这是因为DNA序列的结构很难分析和识别.作者提出一种用于DNA序列结构分析的特征抽取方法.这种方法采用DNA序列码序的共生概率来抽取高维特征.然后采用相关法和/或贝叶斯分类器来分类结构模式.一些仿真试验的结果表明这种方法适合于DNA序列的结构分析。  相似文献   

18.
Importance of DNA stiffness in protein-DNA binding specificity   总被引:1,自引:0,他引:1  
M E Hogan  R H Austin 《Nature》1987,329(6136):263-266
From the first high-resolution structure of a repressor bound specifically to its DNA recognition sequence it has been shown that the phage 434 repressor protein binds as a dimer to the helix. Tight, local interactions are made at the ends of the binding site, causing the central four base pairs (bp) to become bent and overtwisted. The centre of the operator is not in contact with protein but repressor binding affinity can be reduced at least 50-fold in response to a sequence change there. This observation might be explained should the structure of the intervening DNA segment vary with its sequence, or if DNA at the centre of the operator resists the torsional and bending deformation necessary for complex formation in a sequence dependent fashion. We have considered the second hypothesis by demonstrating that DNA stiffness is sequence dependent. A method is formulated for calculating the stiffness of any particular DNA sequence, and we show that this predicted relationship between sequence and stiffness can explain the repressor binding data in a quantitative manner. We propose that the elastic properties of DNA may be of general importance to an understanding of protein-DNA binding specificity.  相似文献   

19.
Tomato yetlow leaf curl viruses betong to Begomoviruses of geminiviruses. In this work, we first found and demonstrated that the small circular DNA molecules were derived from Chinese tomato yetlow leaf curl viruses (TYLCV-CHI). These small circular DNA molecules are about 1,3 kb, which are half the full-length of TYLCV-CHI DNA A. It was shown by sequence determination and analysis that there was unknown-origin sequence insertion in the middle of the small molecules. These sequences of unknown-origin were neither homologous to DNA A nor to DNA B, and were formed by recombination of virus DNA and plant DNA. Although various defective molecules contained different unknown-origin sequence insertion, all the molecules contained the intergenic region and part of the AC1 (Rep) gene. But they did not contain full ORF.  相似文献   

20.
In prokaryotes, the degree of supercoiling of DNA can profoundly influence the use of specific promoters. In eukaryotes, a variety of indirect observations suggest that DNA topology has a similar importance in proper gene expression. Much attention has therefore been focused on the cellular proteins that control DNA supercoiling, among which are the enzymes topoisomerase I and II. A hexadecameric sequence functions as a strong attraction site for topoisomerase I. Here we report that the interaction of topoisomerase I with this sequence motif is highly specific, because a single base-pair substitution prevents strand cleavage and thereby catalytic activity at the sequence. Thus, supercoiled DNA containing the recognition sequence is relaxed preferentially by topoisomerase I compared to a control, but no difference in the relaxation rate is observed for supercoiled DNA carrying the mutated sequence. The preference for the recognition sequence seems to be an intrinsic property of all eukaryotic type I topoisomerases, suggesting that the interaction might be important in a fundamental biological process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号