首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 796 毫秒
1.
2.
To verify the genome annotation and to create a resource to functionally characterize the proteome, we attempted to Gateway-clone all predicted protein-encoding open reading frames (ORFs), or the 'ORFeome,' of Caenorhabditis elegans. We successfully cloned approximately 12,000 ORFs (ORFeome 1.1), of which roughly 4,000 correspond to genes that are untouched by any cDNA or expressed-sequence tag (EST). More than 50% of predicted genes needed corrections in their intron-exon structures. Notably, approximately 11,000 C. elegans proteins can now be expressed under many conditions and characterized using various high-throughput strategies, including large-scale interactome mapping. We suggest that similar ORFeome projects will be valuable for other organisms, including humans.  相似文献   

3.
The human genome sequence has been finished to very high standards; however, more than 340 gaps remained when the finished genome was published by the International Human Genome Sequencing Consortium in 2004. Using fosmid resources generated from multiple individuals, we targeted gaps in the euchromatic part of the human genome. Here we report 2,488,842 bp of previously unknown euchromatic sequence, 363,114 bp of which close 26 of 250 euchromatic gaps, or 10%, including two remaining euchromatic gaps on chromosome 19. Eight (30.7%) of the closed gaps were found to be polymorphic. These sequences allow complete annotation of several human genes as well as the assignment of mRNAs. The gap sequences are 2.3-fold enriched in segmentally duplicated sequences compared to the whole genome. Our analysis confirms that not all gaps within 'finished' genomes are recalcitrant to subcloning and suggests that the paired-end-sequenced fosmid libraries could prove to be a rich resource for completion of the human euchromatic genome.  相似文献   

4.
Single-nucleotide polymorphisms (SNPs) have been explored as a high-resolution marker set for accelerating the mapping of disease genes. Here we report 48,196 candidate SNPs detected by statistical analysis of human expressed sequence tags (ESTs), associated primarily with coding regions of genes. We used Bayesian inference to weigh evidence for true polymorphism versus sequencing error, misalignment or ambiguity, misclustering or chimaeric EST sequences, assessing data such as raw chromatogram height, sharpness, overlap and spacing, sequencing error rates, context-sensitivity and cDNA library origin. Three separate validations-comparison with 54 genes screened for SNPs independently, verification of HLA-A polymorphisms and restriction fragment length polymorphism (RFLP) testing-verified 70%, 89% and 71% of our predicted SNPs, respectively. Our method detects tenfold more true HLA-A SNPs than previous analyses of the EST data. We found SNPs in a large fraction of known disease genes, including some disease-causing mutations (for example, the HbS sickle-cell mutation). Our comprehensive analysis of human coding region polymorphism provides a public resource for mapping of disease genes (available at http://www.bioinformatics.ucla.edu/snp).  相似文献   

5.
The genome sequences of Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana have been predicted to contain 19,000, 13,600 and 25,500 genes, respectively. Before this information can be fully used for evolutionary and functional studies, several issues need to be addressed. First, the gene number estimates obtained in silico and not yet supported by any experimental data need to be verified. For example, it seems biologically paradoxical that C. elegans would have 50% more genes than Drosophilia. Second, intron/exon predictions need to be tested experimentally. Third, complete sets of open reading frames (ORFs), or "ORFeomes," need to be cloned into various expression vectors. To address these issues simultaneously, we have designed and applied to C. elegans the following strategy. Predicted ORFs are amplified by PCR from a highly representative cDNA library using ORF-specific primers, cloned by Gateway recombination cloning and then sequenced to generate ORF sequence tags (OSTs) as a way to verify identity and splicing. In a sample (n=1,222) of the nearly 10,000 genes predicted ab initio (that is, for which no expressed sequence tag (EST) is available so far), at least 70% were verified by OSTs. We also observed that 27% of these experimentally confirmed genes have a structure different from that predicted by GeneFinder. We now have experimental evidence that supports the existence of at least 17,300 genes in C. elegans. Hence we suggest that gene counts based primarily on ESTs may underestimate the number of genes in human and in other organisms.  相似文献   

6.
A database containing mapped partial cDNA sequences from Caenorhabditis elegans will provide a ready starting point for identifying nematode homologues of important human genes and determining their functions in C. elegans. A total of 720 expressed sequence tags (ESTs) have been generated from 585 clones randomly selected from a mixed-stage C. elegans cDNA library. Comparison of these ESTs with sequence databases identified 422 new C. elegans genes, of which 317 are not similar to any sequences in the database. Twenty-six new genes have been mapped by YAC clone hybridization. Members of several gene families, including cuticle collagens, GTP-binding proteins, and RNA helicases were discovered. Many of the new genes are similar to known or potential human disease genes, including CFTR and the LDL receptor.  相似文献   

7.
Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.  相似文献   

8.
9.
The genome of the mesopolyploid crop species Brassica rapa   总被引:21,自引:0,他引:21  
We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution. The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species. The B. rapa genome sequence provides an important resource for studying the evolution of polyploid genomes and underpins the genetic improvement of Brassica oil and vegetable crops.  相似文献   

10.
A survey of expressed genes in Caenorhabditis elegans.   总被引:29,自引:0,他引:29  
As an adjunct to the genomic sequencing of Caenorhabditis elegans, we have investigated a representative cDNA library of 1,517 clones. A single sequence read has been obtained from the 5' end of each clone, allowing its characterization with respect to the public databases, and the clones are being localized on the genome map. The result is the identification of about 1,200 of the estimated 15,000 genes of C. elegans. More than 30% of the inferred protein sequences have significant similarity to existing sequences in the databases, providing a route towards in vivo analysis of known genes in the nematode. These clones also provide material for assessing the accuracy of predicted exons and splicing patterns and will lead to a more accurate estimate of the total number of genes in the organism than has hitherto been available.  相似文献   

11.
Large scale sequencing of cDNAs provides a complementary approach to structural analysis of the human genome by generating expressed sequence tags (ESTs). We have initiated the large-scale sequencing of a 3'-directed cDNA library from the human liver cell line HepG2, that is a non-biased representation of the mRNA population. 982 random cDNA clones were sequenced yielding more than 270 kilobases. A significant portion of the identified genes encoded secretable proteins and components for protein-synthesis. The abundance of cDNA species varied from 2.2% to less than 0.004%. Fifty two percent of the mRNA were abundant species consisting of 173 genes and the rest were non-abundant, consisting of about 6,600 genes.  相似文献   

12.
Francisella tularensis is one of the most infectious human pathogens known. In the past, both the former Soviet Union and the US had programs to develop weapons containing the bacterium. We report the complete genome sequence of a highly virulent isolate of F. tularensis (1,892,819 bp). The sequence uncovers previously uncharacterized genes encoding type IV pili, a surface polysaccharide and iron-acquisition systems. Several virulence-associated genes were located in a putative pathogenicity island, which was duplicated in the genome. More than 10% of the putative coding sequences contained insertion-deletion or substitution mutations and seemed to be deteriorating. The genome is rich in IS elements, including IS630 Tc-1 mariner family transposons, which are not expected in a prokaryote. We used a computational method for predicting metabolic pathways and found an unexpectedly high proportion of disrupted pathways, explaining the fastidious nutritional requirements of the bacterium. The loss of biosynthetic pathways indicates that F. tularensis is an obligate host-dependent bacterium in its natural life cycle. Our results have implications for our understanding of how highly virulent human pathogens evolve and will expedite strategies to combat them.  相似文献   

13.
Genome-wide transcription analyses in rice using tiling microarrays   总被引:18,自引:0,他引:18  
Li L  Wang X  Stolc V  Li X  Zhang D  Su N  Tongprasit W  Li S  Cheng Z  Wang J  Deng XW 《Nature genetics》2006,38(1):124-129
  相似文献   

14.
The number of genes in the human genome is unknown, with estimates ranging from 50,000 to 90,000 (refs 1, 2), and to more than 140,000 according to unpublished sources. We have developed 'Exofish', a procedure based on homology searches, to identify human genes quickly and reliably. This method relies on the sequence of another vertebrate, the pufferfish Tetraodon nigroviridis, to detect conserved sequences with a very low background. Similar to Fugu rubripes, a marine pufferfish proposed by Brenner et al. as a model for genomic studies, T. nigroviridis is a more practical alternative with a genome also eight times more compact than that of human. Many comparisons have been made between F. rubripes and human DNA that demonstrate the potential of comparative genomics using the pufferfish genome. Application of Exofish to the December version of the working draft sequence of the human genome and to Unigene showed that the human genome contains 28,000-34,000 genes, and that Unigene contains less than 40% of the protein-coding fraction of the human genome.  相似文献   

15.
Human-mouse genome comparisons to locate regulatory sites   总被引:21,自引:0,他引:21  
  相似文献   

16.
17.
The Escherichia coli gene recQ was identified as a RecF recombination pathway gene. The gene SGS1, encoding the only RecQ-like DNA helicase in Saccharomyces cerevisiae, was identified by mutations that suppress the top3 slow-growth phenotype. Relatively little is known about the function of Sgs1p because single mutations in SGS1 do not generally cause strong phenotypes. Mutations in genes encoding RecQ-like DNA helicases such as the Bloom and Werner syndrome genes, BLM and WRN, have been suggested to cause increased genome instability. But the exact DNA metabolic defect that might underlie such genome instability has remained unclear. To better understand the cellular role of the RecQ-like DNA helicases, sgs1 mutations were analyzed for their effect on genome rearrangements. Mutations in SGS1 increased the rate of accumulating gross chromosomal rearrangements (GCRs), including translocations and deletions containing extended regions of imperfect homology at their breakpoints. sgs1 mutations also increased the rate of recombination between DNA sequences that had 91% sequence homology. Epistasis analysis showed that Sgs1p is redundant with DNA mismatch repair (MMR) for suppressing GCRs and for suppressing recombination between divergent DNA sequences. This suggests that defects in the suppression of rearrangements involving divergent, repeated sequences may underlie the genome instability seen in BLM and WRN patients and in cancer cases associated with defects in these genes.  相似文献   

18.
19.
Analysis of expressed sequence tags indicates 35,000 human genes   总被引:18,自引:0,他引:18  
Ewing B  Green P 《Nature genetics》2000,25(2):232-234
The number of protein-coding genes in an organism provides a useful first measure of its molecular complexity. Single-celled prokaryotes and eukaryotes typically have a few thousand genes; for example, Escherichia coli has 4,300 and Saccharomyces cerevisiae has 6,000. Evolution of multicellularity appears to have been accompanied by a several-fold increase in gene number, the invertebrates Caenorhabditis elegans and Drosophila melanogaster having 19,000 and 13,600 genes, respectively. Here we estimate the number of human genes by comparing a set of human expressed sequence tag (EST) contigs with human chromosome 22 and with a non-redundant set of mRNA sequences. The two comparisons give mutually consistent estimates of approximately 35,000 genes, substantially lower than most previous estimates. Evolution of the increased physiological complexity of vertebrates may therefore have depended more on the combinatorial diversification of regulatory networks or alternative splicing than on a substantial increase in gene number.  相似文献   

20.
Isolation of a candidate gene for Norrie disease by positional cloning.   总被引:17,自引:0,他引:17  
The gene for Norrie disease, an X-linked disorder characterized by progressive atrophy of the eyes, mental disturbances and deafness, has been mapped to chromosome Xp11.4 close to DXS7 and the monoamine oxidase (MAO) genes. By subcloning a YAC with a 640 kilobases (kb) insert which spans the DXS7-MAOB interval we have generated a cosmid contig which extends 250 kb beyond the MAOB gene. With one of these cosmids, microdeletions were detected in several patients with Norrie disease. Screening of cDNA libraries has enabled us to isolate and sequence a likely candidate gene for Norrie disease which is expressed in retina, choroid and fetal brain. No homologous sequences were found in DNA and protein databases indicating that this cDNA is part of a gene encoding a 'pioneer' protein.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号