首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Opinions on the hypothesis that ancient genome duplications contributed to the vertebrate genome range from strong skepticism to strong credence. Previous studies concentrated on small numbers of gene families or chromosomal regions that might not have been representative of the whole genome, or used subjective methods to identify paralogous genes and regions. Here we report a systematic and objective analysis of the draft human genome sequence to identify paralogous chromosomal regions (paralogons) formed during chordate evolution and to estimate the ages of duplicate genes. We found that the human genome contains many more paralogons than would be expected by chance. Molecular clock analysis of all protein families in humans that have orthologs in the fly and nematode indicated that a burst of gene duplication activity took place in the period 350 650 Myr ago and that many of the duplicate genes formed at this time are located within paralogons. Our results support the contention that many of the gene families in vertebrates were formed or expanded by large-scale DNA duplications in an early chordate. Considering the incompleteness of the sequence data and the antiquity of the event, the results are compatible with at least one round of polyploidy.  相似文献   

2.
Here we present a draft genome sequence of the nematode Pristionchus pacificus, a species that is associated with beetles and is used as a model system in evolutionary biology. With 169 Mb and 23,500 predicted protein-coding genes, the P. pacificus genome is larger than those of Caenorhabditis elegans and the human parasite Brugia malayi. Compared to C. elegans, the P. pacificus genome has more genes encoding cytochrome P450 enzymes, glucosyltransferases, sulfotransferases and ABC transporters, many of which were experimentally validated. The P. pacificus genome contains genes encoding cellulase and diapausin, and cellulase activity is found in P. pacificus secretions, indicating that cellulases can be found in nematodes beyond plant parasites. The relatively higher number of detoxification and degradation enzymes in P. pacificus is consistent with its necromenic lifestyle and might represent a preadaptation for parasitism. Thus, comparative genomics analysis of three ecologically distinct nematodes offers a unique opportunity to investigate the association between genome structure and lifestyle.  相似文献   

3.
A radiation hybrid map of mouse genes   总被引:13,自引:0,他引:13  
A comprehensive gene-based map of a genome is a powerful tool for genetic studies and is especially useful for the positional cloning and positional candidate approaches. The availability of gene maps for multiple organisms provides the foundation for detailed conserved-orthology maps showing the correspondence between conserved genomic segments. These maps make it possible to use cross-species information in gene hunts and shed light on the evolutionary forces that shape the genome. Here we report a radiation hybrid map of mouse genes, a combined project of the Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research, the Medical Research Council UK Mouse Genome Centre, and the National Center for Biotechnology Information. The map contains 11,109 genes, screened against the T31 RH panel and positioned relative to a reference map containing 2,280 mouse genetic markers. It includes 3,658 genes homologous to the human genome sequence and provides a framework for overlaying the human genome sequence to the mouse and for sequencing the mouse genome.  相似文献   

4.
Radiation hybrid map of the mouse genome.   总被引:13,自引:0,他引:13  
Radiation hybrid (RH) maps are a useful tool for genome analysis, providing a direct method for localizing genes and anchoring physical maps and genomic sequence along chromosomes. The construction of a comprehensive RH map for the human genome has resulted in gene maps reflecting the location of more than 30,000 human genes. Here we report the first comprehensive RH map of the mouse genome. The map contains 2,486 loci screened against an RH panel of 93 cell lines. Most loci (93%) are simple sequence length polymorphisms (SSLPs) taken from the mouse genetic map, thereby providing direct integration between these two key maps. We performed RH mapping by a new and efficient approach in which we replaced traditional gel- or hybridization-based assays by a homogeneous 5'-nuclease assays involving a single common probe for all genetic markers. The map provides essentially complete connectivity and coverage across the genome, and good resolution for ordering loci, with 1 centiRay (cR) corresponding to an average of approximately 100 kb. The RH map, together with an accompanying World-Wide Web server, makes it possible for any investigator to rapidly localize sequences in the mouse genome. Together with the previously constructed genetic map and a YAC-based physical map reported in a companion paper, the fundamental maps required for mouse genomics are now available.  相似文献   

5.
Large scale sequencing of cDNAs provides a complementary approach to structural analysis of the human genome by generating expressed sequence tags (ESTs). We have initiated the large-scale sequencing of a 3'-directed cDNA library from the human liver cell line HepG2, that is a non-biased representation of the mRNA population. 982 random cDNA clones were sequenced yielding more than 270 kilobases. A significant portion of the identified genes encoded secretable proteins and components for protein-synthesis. The abundance of cDNA species varied from 2.2% to less than 0.004%. Fifty two percent of the mRNA were abundant species consisting of 173 genes and the rest were non-abundant, consisting of about 6,600 genes.  相似文献   

6.
Analysis of the coding genome of diffuse large B-cell lymphoma   总被引:1,自引:0,他引:1  
Diffuse large B-cell lymphoma (DLBCL) is the most common form of human lymphoma. Although a number of structural alterations have been associated with the pathogenesis of this malignancy, the full spectrum of genetic lesions that are present in the DLBCL genome, and therefore the identity of dysregulated cellular pathways, remains unknown. By combining next-generation sequencing and copy number analysis, we show that the DLBCL coding genome contains, on average, more than 30 clonally represented gene alterations per case. This analysis also revealed mutations in genes not previously implicated in DLBCL pathogenesis, including those regulating chromatin methylation (MLL2; 24% of samples) and immune recognition by T cells. These results provide initial data on the complexity of the DLBCL coding genome and identify novel dysregulated pathways underlying its pathogenesis.  相似文献   

7.
Legionella pneumophila, the causative agent of Legionnaires' disease, replicates as an intracellular parasite of amoebae and persists in the environment as a free-living microbe. Here we have analyzed the complete genome sequences of L. pneumophila Paris (3,503,610 bp, 3,077 genes), an endemic strain that is predominant in France, and Lens (3,345,687 bp, 2,932 genes), an epidemic strain responsible for a major outbreak of disease in France. The L. pneumophila genomes show marked plasticity, with three different plasmids and with about 13% of the sequence differing between the two strains. Only strain Paris contains a type V secretion system, and its Lvh type IV secretion system is encoded by a 36-kb region that is either carried on a multicopy plasmid or integrated into the chromosome. Genetic mobility may enhance the versatility of L. pneumophila. Numerous genes encode eukaryotic-like proteins or motifs that are predicted to modulate host cell functions to the pathogen's advantage. The genome thus reflects the history and lifestyle of L. pneumophila, a human pathogen of macrophages that coevolved with fresh-water amoebae.  相似文献   

8.
The human genome sequence has been finished to very high standards; however, more than 340 gaps remained when the finished genome was published by the International Human Genome Sequencing Consortium in 2004. Using fosmid resources generated from multiple individuals, we targeted gaps in the euchromatic part of the human genome. Here we report 2,488,842 bp of previously unknown euchromatic sequence, 363,114 bp of which close 26 of 250 euchromatic gaps, or 10%, including two remaining euchromatic gaps on chromosome 19. Eight (30.7%) of the closed gaps were found to be polymorphic. These sequences allow complete annotation of several human genes as well as the assignment of mRNAs. The gap sequences are 2.3-fold enriched in segmentally duplicated sequences compared to the whole genome. Our analysis confirms that not all gaps within 'finished' genomes are recalcitrant to subcloning and suggests that the paired-end-sequenced fosmid libraries could prove to be a rich resource for completion of the human euchromatic genome.  相似文献   

9.
To test the hypothesis that the human genome project will uncover many genes not previously discovered by sequencing of expressed sequence tags (ESTs), we designed and produced a set of microarrays using probes based on open reading frames (ORFs) in 350 Mb of finished and draft human sequence. Our approach aims to identify all genes directly from genomic sequence by querying gene expression. We analysed genomic sequence with a suite of ORF prediction programs, selected approximately one ORF per gene, amplified the ORFs from genomic DNA and arrayed the amplicons onto treated glass slides. Of the first 10,000 arrayed ORFs, 31% are completely novel and 29% are similar, but not identical, to sequences in public databases. Approximately one-half of these are expressed in the tissues we queried by microarray. Subsequent verification by other techniques confirmed expression of several of the novel genes. Expressed sequence tags (ESTs) have yielded vast amounts of data, but our results indicate that many genes in the human genome will only be found by genomic sequencing.  相似文献   

10.
The approach to annotating a genome critically affects the number and accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST, cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes. This annotation strategy is applicable to genomes of all organisms, including human.  相似文献   

11.
The genome of Theobroma cacao   总被引:2,自引:0,他引:2  
We sequenced and assembled the draft genome of Theobroma cacao, an economically important tropical-fruit tree crop that is the source of chocolate. This assembly corresponds to 76% of the estimated genome size and contains almost all previously described genes, with 82% of these genes anchored on the 10 T. cacao chromosomes. Analysis of this sequence information highlighted specific expansion of some gene families during evolution, for example, flavonoid-related genes. It also provides a major source of candidate genes for T. cacao improvement. Based on the inferred paleohistory of the T. cacao genome, we propose an evolutionary scenario whereby the ten T. cacao chromosomes were shaped from an ancestor through eleven chromosome fusions.  相似文献   

12.
13.
Variation in the human genome sequence is key to understanding susceptibility to disease in modern populations and the history of ancestral populations. Unlocking this information requires knowledge of the patterns and underlying causes of human sequence diversity. By applying a new population-genetic framework to two genome-wide polymorphism surveys, we find that the human genome contains sizeable regions (stretching over tens of thousands of base pairs) that have intrinsically high and low rates of sequence variation. We show that the primary determinant of these patterns is shared genealogical history. Only a fraction of the variation (at most 25%) is due to the local mutation rate. By measuring the average distance over which genealogical histories are typically preserved, these data provide the first genome-wide estimate of the average extent of correlation among variants (linkage disequilibrium). The results are best explained by extreme variability in the recombination rate at a fine scale, and provide the first empirical evidence that such recombination 'hot spots' are a general feature of the human genome and have a principal role in shaping genetic variation in the human population.  相似文献   

14.
Kun A  Santos M  Szathmáry E 《Nature genetics》2005,37(9):1008-1011
The error threshold for replication, the critical copying fidelity below which the fittest genotype deterministically disappears, limits the length of the genome that can be maintained by selection. Primordial replication must have been error-prone, and so early replicators are thought to have been necessarily short. The error threshold also depends on the fitness landscape. In an RNA world, many neutral and compensatory mutations can raise the threshold, below which the functional phenotype, rather than a particular sequence, is still present. Here we show, on the basis of comparative analysis of two extensively mutagenized ribozymes, that with a copying fidelity of 0.999 per digit per replication the phenotypic error threshold rises well above 7,000 nucleotides, which permits the selective maintenance of a functionally rich riboorganism with a genome of more than 100 different genes, the size of a tRNA. This requires an order of magnitude of improvement in the accuracy of in vitro-generated polymerase ribozymes. Incidentally, this genome size coincides with that estimated for a minimal cell achieved by top-down analysis, omitting the genes dealing with translation.  相似文献   

15.
The completed draft version of the human genome, comprised of multiple short contigs encompassing 85% or more of euchromatin, was announced in June of 2000 (ref. 1). The detailed findings of the sequencing consortium were reported several months later. The draft sequence has provided insight into global characteristics, such as the total number of genes and a more accurate definition of gene families. Also of importance are genome positional details such as local genome architecture, regional gene density and the location of transcribed units that are critical for disease gene identification. We carried out a series of mapping and computational experiments using a nonredundant collection of 925 expressed sequence tags (ESTs) and sections of the public draft genome sequence that were available at different timepoints between April 2000 and April 2001. We found discrepancies in both the reported coverage of the human genome and the accuracy of mapping of genomic clones, suggesting some limitations of the draft genome sequence in providing accurate positional information and detailed characterization of chromosomal subregions.  相似文献   

16.
17.
18.
Francisella tularensis is one of the most infectious human pathogens known. In the past, both the former Soviet Union and the US had programs to develop weapons containing the bacterium. We report the complete genome sequence of a highly virulent isolate of F. tularensis (1,892,819 bp). The sequence uncovers previously uncharacterized genes encoding type IV pili, a surface polysaccharide and iron-acquisition systems. Several virulence-associated genes were located in a putative pathogenicity island, which was duplicated in the genome. More than 10% of the putative coding sequences contained insertion-deletion or substitution mutations and seemed to be deteriorating. The genome is rich in IS elements, including IS630 Tc-1 mariner family transposons, which are not expected in a prokaryote. We used a computational method for predicting metabolic pathways and found an unexpectedly high proportion of disrupted pathways, explaining the fastidious nutritional requirements of the bacterium. The loss of biosynthetic pathways indicates that F. tularensis is an obligate host-dependent bacterium in its natural life cycle. Our results have implications for our understanding of how highly virulent human pathogens evolve and will expedite strategies to combat them.  相似文献   

19.
Genome evolution studies for the phylum Nematoda have been limited by focusing on comparisons involving Caenorhabditis elegans. We report a draft genome sequence of Trichinella spiralis, a food-borne zoonotic parasite, which is the most common cause of human trichinellosis. This parasitic nematode is an extant member of a clade that diverged early in the evolution of the phylum, enabling identification of archetypical genes and molecular signatures exclusive to nematodes. We sequenced the 64-Mb nuclear genome, which is estimated to contain 15,808 protein-coding genes, at ~35-fold coverage using whole-genome shotgun and hierarchal map-assisted sequencing. Comparative genome analyses support intrachromosomal rearrangements across the phylum, disproportionate numbers of protein family deaths over births in parasitic compared to a non-parasitic nematode and a preponderance of gene-loss and -gain events in nematodes relative to Drosophila melanogaster. This genome sequence and the identified pan-phylum characteristics will contribute to genome evolution studies of Nematoda as well as strategies to combat global parasites of humans, food animals and crops.  相似文献   

20.
Human endogenous retroviruses (HERVs), which are remnants of past retroviral infections of the germline cells of our ancestors, make up as much as 8% of the human genome and may even outnumber genes. Most HERVs seem to have entered the genome between 10 and 50 million years ago, and they comprise over 200 distinct groups and subgroups. Although repeated sequence elements such as HERVs have the potential to lead to chromosomal rearrangement through homologous recombination between distant loci, evidence for the generality of this process is lacking. To gain insight into the expansion of these elements in the genome during the course of primate evolution, we have identified 23 new members of the HERV-K (HML-2) group, which is thought to contain the most recently active members. Here we show, by phylogenetic and sequence analysis, that at least 16% of these elements have undergone apparent rearrangements that may have resulted in large-scale deletions, duplications and chromosome reshuffling during the evolution of the human genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号