首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
序列比对是生物信息学的一个非常重要的操作.它可以预测生物序列的功能、结构和进化过程等.文中首先介绍双序列比对的基本算法;接着分析和比较多序列比对的四个常用模型和三类算法以及并行比对算法;最后,给出一些研究问题.  相似文献   

2.
UTR区中的内含子对基因的表达调控过程发挥着重要的作用,它通过与相应mRNA的序列匹配方式发生相互作用并实现对基因的表达调控。采用Smith-Waterman算法进行局域比对,获得UTR区中的内含子与相应mRNA序列之间的最佳匹配片段。分析6个物种mRNA序列上最佳匹配片段的序列特征及匹配频率的分布规律,并分析这种相互作用分布的普适性。结果发现:最佳匹配片段的配对率和平均长度分布与siRNA和miRNA的结合特征一致; UTR区中的内含子与相应mRNA序列上的UTR序列存在较强的相互作用,低GC片段倾向与3′UTR区作用,而高GC片段倾向结合到5′UTR区。结论表明最佳匹配片段的序列特征符合RNA-RNA相互作用的一般规律,内含子序列应该是一类调控基因表达的功能片段。UTR区中的内含子与mRNA序列是协同进化的,通过相互作用完成应有的功能。  相似文献   

3.
基于生物序列模式提取技术的邮件过滤算法   总被引:3,自引:0,他引:3  
为了解决垃圾邮件过滤问题,考虑到中文垃圾邮件的特点和过滤系统的效率要求,应用生物信息化技术中模式提取算法TEIRESIAS的原理,设计了基于生物序列模式提取技术的垃圾邮件过滤算法BioMatrix,并实现了基于此算法的中英文邮件过滤系统.过滤系统由数量控制过滤提供垃圾邮件训练集,通过提取其中的特征模式对邮件进行分类,可以识别出约94.2%的垃圾邮件,误过滤率约0.04%.与Bayes过滤算法对比的实验结果表明,将生物序列模式提取技术应用于邮件过滤具有较好的研究和实用价值.  相似文献   

4.
 蛋白质多序列比对是一种重要的生物信息学工具,在生物的进化分析以及蛋白质的结构预测方面有着重要的应用。各种比对算法在这个领域都取得了很大的成功,但是每种算法都有其固有的缺陷。提出置换距离法,对当前流行的几种蛋白质多序列比对算法进行对比评价。由于置换距离法仅关注于不同蛋白质间进化距离的相对次序,而不考虑这些进化距离之间的细微差异,因而得到的评价结论更具有鲁棒性。另外,采用最长公共子序法度量置换距离可以比较准确的反映不同置换之间的差异性。基于该算法,对Dialign, Tcoffee, ClustalW和Muscle多序列比对算法进行了性能评估。  相似文献   

5.
生物信息学是生物技术的核心,序列比较是生物信息学中最基本、最重要的操作,通过序列比较可以发现生物序列中的功能、结构和进化的信息,序列比较的基本操作是比对。描述了常用的各类双序列比对算法,并结合实例进行了详细的解释,最后指出了序列比对算法目前存在的问题。  相似文献   

6.
生物序列相对于传统序列来说具有自己的特征。不同的序列模式挖掘算法应用到生物序列中有不同的特点和效率。本文分析目前比较流行的五种模式挖掘算法的运行过程,当应用到生物序列中时,分析了各个算法的性能,从而可以得出哪种算法更适应于不同类型的生物序列频繁模式挖掘。  相似文献   

7.
BLAST序列比对算法是NCBI综合性生物信息平台整合的众多重要功能之一。研究建立BLAST算法的脱机环境移植,可使生物信息学研究人员在构建自己专有序列数据库的同时,对DNA序列进行脱机环境比对,以期更高的序列数据安全性,避免联网状态下造成的数据丢失和泄露等严重问题。通过研究BLAST序列比对过程中涉及的标准数据格式与调用细节,实现了脱机环境下的BLAST序列比对,为建立安全序列比对提供了一种可行参考。  相似文献   

8.
研究成熟mRNA序列与其相应内含子序列的相互作用规律对于揭示基因表达调控具有重要意义.本文以黑腹果蝇第一号染色体蛋白质编码基因序列为研究对象,采用SmithWaterman局域比对的方法,在mRNA序列和内含子序列之间进行匹配性比对分析.研究发现剪切后的内含子序列与基因的5′UTR序列和3′UTR序列的相对匹配频数高于编码(CDS)序列.最佳匹配片段集合的G+C含量分布范围很广,其分布中心与3′UTR序列最为接近,5′UTR序列次之,距离CDS序列最远,这是导致两端UTR序列与内含子序列有较高匹配强度的原因.最佳匹配片段的配对率主要分布在68%~75%之间,最可几长度为20bp左右,最佳匹配片段的序列特征与miRNA相似.结果显示内含子与mRNA之间的这种匹配模式是参与基因调控的一种可能方式.  相似文献   

9.
生物序列相对于传统序列来说具有自己的特征。不同的序列模式挖掘算法应用到生物序列中有不同的特点和效率。本文分析目前比较流行的五种模式挖掘算法的运行过程,当应用到生物序列中时,分析了各个算法的性能,从而可以得出哪种算法更适应于不同类型的生物序列频繁模式挖掘。  相似文献   

10.
为了有效解决精馏分离序列优化综合问题,研究邻域(超级)结构是成功实现寻优算法的前提。由于精馏分离序列与二叉树之间具有同构性,在数据结构上精馏分离序列可以抽象为二叉树,进而采用图论方法对其进行研究。本文运用组合数学理论深入研究了精馏分离序列综合问题.简明分析了有序剖分问题的计算复杂性;通过二叉树相邻切分点变换机制,实现了对精馏分离序列的随机搜索;提炼出后序遍历相邻变换的等价规则,从而构造出高效演化邻域结构。  相似文献   

11.
通过分析动态规划算法及A^*算法的特点,针对多序列比对问题提出一种基于A^*算法的启发式算法。该算法采用了多个优化搜索机制。通过对此算法的理论分析,证明了它能够在有效地减小搜索的空间、节约搜索的时间的同时,保证得到比较好的比对结果。此算法不仅能够在多序列比对问题中得到应用,还能够用于其他有向无环图的最短路径问题的求解。  相似文献   

12.
The multiple sequence alignment problem (MSAP) is one of the most difficult problems in computational molecular biology. In this paper, we describe the optimization model and the neighborhood structure on the MSAP, then propose a scheme to solve the MSAP using Simulated Annealing Algorithm. Experiment shows that the scheme is effcient.  相似文献   

13.
A genetic algorithm on multiple sequences alignment problems in biology   总被引:2,自引:0,他引:2  
The study and comparison of sequences of characters from a finite alphabet is relevant to various areas of science, notably molecular biology. The measurement of sequence similarity involves the consideration of the possible sequence alignments in order to find an optimal one for which the “distance” between sequences is minimum. In biology informatics area, it is a more important and difficult problem due to the long length (100 at least) of sequence, this cause the compute complexity and large memory require. By associating a path in a lattice to each alignment, a geometric insight can be brought into the problem of finding an optimal alignment, this give an obvious encoding of each path. This problem can be solved by applying genetic algorithm, which is more efficient than dynamic programming and hidden Markov model using commomly now. Foundation item: Supported by Zi-qiang Foundation of Wuhan University and Open Foundation of the State Key-Laboratory of Software Engineering, Wuhan University Biography: Shi Feng(1966-), male, Associate professor, research direction: bioinformatics.  相似文献   

14.
生物序列比对算法的简述   总被引:4,自引:0,他引:4  
基因组和蛋白质组的研究极大地依赖于数据库的搜索,寻求更快更灵敏的生物序列相似性比对算法一直是生物信息学研究的热点,文章介绍了相似性比对的得分算法和各种数据库搜索工具,并对各种算法的优缺点进行了讨论与比较.  相似文献   

15.
隐马尔可夫模型是最近几年在许多机器学习领域都得到成功应用的关于序列分析的重要统计模型,特别是在蛋白质家族的识别方面.这主要是由于生物数据的急剧增长导致2个领域(计算科学和生物学)走向结合引起的.探讨了多重序列比对和序列谱隐马尔可夫模型,讨论了隐马尔可夫模型的基本算法以及如何建立HMMs.根据E值和训练分数进行蛋白质家族的识别和分类.  相似文献   

16.
DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods. In this paper, we propose a DNA sequence alignment that uses quality information and a fuzzy inference method developed based on the characteristics of DNA fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods that uses DNA sequence quality information. In conventional algorithms, DNA sequence alignment scores are calculated by the global sequence alignment algo- rithm proposed by Needleman-Wunsch, which is established by using quality information of each DNA fragment. However, there may be errors in the process of calculating DNA sequence alignment scores when the quality of DNA fragment tips is low, because only the overall DNA sequence quality information are used. In our proposed method, an exact DNA sequence alignment can be achieved in spite of the low quality of DNA fragment tips by improvement of conventional algorithms using quality information. Mapping score param- eters used to calculate DNA sequence alignment scores are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments. From the experiments by applying real genome data of National Center for Biotechnology Information, we could see that the proposed method is more efficient than conventional algorithms.  相似文献   

17.
DNA sequence alignment algorithms in computational molecular biology have been improved by diverse methods. In this paper, we propose a DNA sequence alignment that uses quality information and a fuzzy inference method developed based on characteristics of DNA fragments and a fuzzy logic system in order to improve conventional DNA sequence alignment methods that uses DNA sequence quality information. In conventional algorithms, DNA sequence alignment scores are calculated by the global sequence alignment algorithm proposed by Needleman-Wunsch, which is established by using quality information of each DNA fragment. However, there may be errors in the process of calculating DNA sequence alignment scores when the quality of DNA fragment tips is low, because only overall DNA sequence quality information are used. In our proposed method, an exact DNA sequence alignment can be achieved in spite of low quality of DNA fragment tips by improvement of conventional algorithms using quality information. Mapping score parameters used to calculate DNA sequence alignment scores are dynamically adjusted by the fuzzy logic system utilizing lengths of DNA fragments and frequencies of low quality DNA bases in the fragments. From the experiments by applying real genome data of National Center for Biotechnology Information, we could see that the proposed method is more efficient than conventional algorithms.  相似文献   

18.
This research analyzed amino acid sequence similarity between non-self T cell epitopes recognized by mouse antibodies and mouse proteins. Using sequence alignment,we found that only 8 of 1 108 epitopes are highly similar to mouse protein sequences. The result shows that non-self T cell epitopes are not similar or have little similarity to mouse protein sequences. Furthermore,reviewing the related literature,we also found that the eight epitopes would trigger immune responses in some particular environment,which are ignored by T cells in normal condition. The result suggests that no or low-similarity peptide vaccines can reduce the chance of collateral cross-reactions and enhance the antigen-specific immune response to vaccine.  相似文献   

19.
The pituitary hormones corticotropin (ACTH) and beta-lipotropin (beta-LPH) are formed from a large common precursor. Recently, we have elucidated the whole primary structure of the bovine ACTH-beta-LPH precursor (designated alternatively as preproopiocortin) by determining the nucleotide sequence of cloned DNA complementary to the mRNA coding for the precursor protein. The amino acid sequence assigned has disclosed a characteristic repetitive structure of the ACTH-beta-LPH precursor. The repetitive units of the precursor protein each contain a melanotropin (MSH) sequence (alpha-, beta- or gamma-MSH) as well as other peptide components such as beta-endorphin and corticotropin-like intermediate lobe peptide (CLIP). The repetitive units as well as their peptide components are each bounded by paired basic amino acid residues, which apparently represent the sites of proteolytic processing. Several studies have confirmed the translational initiation site and protein structure assigned (see also ref. 11 and refs therein). In view of the recent knowledge about the organization of eukaryotic genes (see refs 12, 13 for reviews), it would be of particular interest to investigate the relationship between the repetitive structure of the ACTH-beta-LPH precursor containing different functional components and the arrangement of the protein-coding sequence in its gene. We have now isolated and characterized bovine genomic DNA fragments encoding this precursor protein and have demonstrated that the protein sequence is encoded by two non-consecutive DNA segments. An intron (intervening sequence) of approximately 2.2 kilobase pairs separates the smaller exon (mRNA-coding sequence), which contains the gene sequence encoding the signal peptide, from the larger exon, which contains the gene sequence for most of the protein structure, including the known biologically active component peptides.  相似文献   

20.
Nucleotide sequence of cloned cDNA of human c-myc oncogene   总被引:4,自引:0,他引:4  
R Watt  L W Stanton  K B Marcu  R C Gallo  C M Croce  G Rovera 《Nature》1983,303(5919):725-728
Like other transforming genes of retroviruses, the v-myc gene of the avian virus, MC29, has a homologue in the genome of normal eukaryotic cells. The human cellular homologue, c-myc, located on human chromosome 8, region q24 leads to qter (refs 1, 2), is translocated into the immunoglobulin heavy-chain locus on human chromosome 14 (ref. 3) in Burkitt's lymphoma, suggesting that c-myc has a primary role in transformation of some human haematopoietic cells. In addition, c-myc is amplified in the human promyelocytic leukaemia cell line, HL60 (refs 6, 7) which also contains high levels of c-myc mRNA. Recently, Colby et al. reported the nucleotide sequence of the human c-myc DNA isolated from a genomic recombinant DNA library derived from human fetal liver. This 4,053-base pair (bp) sequence includes two exons and one intron of the myc gene, and the authors have suggested the existence of a human c-myc mRNA of 2,291 nucleotides that has a coding capacity for a protein of molecular weight (Mr) 48,812. We have approached the problem of accurately defining the characteristics of the human c-myc mRNA and c-myc protein by determining the sequence of the c-myc cDNA isolated from a cDNA library prepared from mRNA of a clone of the K562 human leukaemic cell line. K562 cells are known to contain c-myc mRNA which is similar in size to the c-myc mRNA of other human cell types. We report here the sequence of 2,121 nucleotides of a human c-myc mRNA and demonstrate that its 5' noncoding sequence does not correspond to the sequence of the reported genomic human sequence. However, our data confirm that the intact human c-myc mRNA can encode a 48,812-Mr protein with a sequence identical to that reported by Colby et al.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号