首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
The genome sequences of Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana have been predicted to contain 19,000, 13,600 and 25,500 genes, respectively. Before this information can be fully used for evolutionary and functional studies, several issues need to be addressed. First, the gene number estimates obtained in silico and not yet supported by any experimental data need to be verified. For example, it seems biologically paradoxical that C. elegans would have 50% more genes than Drosophilia. Second, intron/exon predictions need to be tested experimentally. Third, complete sets of open reading frames (ORFs), or "ORFeomes," need to be cloned into various expression vectors. To address these issues simultaneously, we have designed and applied to C. elegans the following strategy. Predicted ORFs are amplified by PCR from a highly representative cDNA library using ORF-specific primers, cloned by Gateway recombination cloning and then sequenced to generate ORF sequence tags (OSTs) as a way to verify identity and splicing. In a sample (n=1,222) of the nearly 10,000 genes predicted ab initio (that is, for which no expressed sequence tag (EST) is available so far), at least 70% were verified by OSTs. We also observed that 27% of these experimentally confirmed genes have a structure different from that predicted by GeneFinder. We now have experimental evidence that supports the existence of at least 17,300 genes in C. elegans. Hence we suggest that gene counts based primarily on ESTs may underestimate the number of genes in human and in other organisms.  相似文献   

2.
A survey of expressed genes in Caenorhabditis elegans.   总被引:29,自引:0,他引:29  
As an adjunct to the genomic sequencing of Caenorhabditis elegans, we have investigated a representative cDNA library of 1,517 clones. A single sequence read has been obtained from the 5' end of each clone, allowing its characterization with respect to the public databases, and the clones are being localized on the genome map. The result is the identification of about 1,200 of the estimated 15,000 genes of C. elegans. More than 30% of the inferred protein sequences have significant similarity to existing sequences in the databases, providing a route towards in vivo analysis of known genes in the nematode. These clones also provide material for assessing the accuracy of predicted exons and splicing patterns and will lead to a more accurate estimate of the total number of genes in the organism than has hitherto been available.  相似文献   

3.
Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.  相似文献   

4.
Large scale sequencing of cDNAs provides a complementary approach to structural analysis of the human genome by generating expressed sequence tags (ESTs). We have initiated the large-scale sequencing of a 3'-directed cDNA library from the human liver cell line HepG2, that is a non-biased representation of the mRNA population. 982 random cDNA clones were sequenced yielding more than 270 kilobases. A significant portion of the identified genes encoded secretable proteins and components for protein-synthesis. The abundance of cDNA species varied from 2.2% to less than 0.004%. Fifty two percent of the mRNA were abundant species consisting of 173 genes and the rest were non-abundant, consisting of about 6,600 genes.  相似文献   

5.
6.
To test the hypothesis that the human genome project will uncover many genes not previously discovered by sequencing of expressed sequence tags (ESTs), we designed and produced a set of microarrays using probes based on open reading frames (ORFs) in 350 Mb of finished and draft human sequence. Our approach aims to identify all genes directly from genomic sequence by querying gene expression. We analysed genomic sequence with a suite of ORF prediction programs, selected approximately one ORF per gene, amplified the ORFs from genomic DNA and arrayed the amplicons onto treated glass slides. Of the first 10,000 arrayed ORFs, 31% are completely novel and 29% are similar, but not identical, to sequences in public databases. Approximately one-half of these are expressed in the tissues we queried by microarray. Subsequent verification by other techniques confirmed expression of several of the novel genes. Expressed sequence tags (ESTs) have yielded vast amounts of data, but our results indicate that many genes in the human genome will only be found by genomic sequencing.  相似文献   

7.
Single-nucleotide polymorphisms (SNPs) have been explored as a high-resolution marker set for accelerating the mapping of disease genes. Here we report 48,196 candidate SNPs detected by statistical analysis of human expressed sequence tags (ESTs), associated primarily with coding regions of genes. We used Bayesian inference to weigh evidence for true polymorphism versus sequencing error, misalignment or ambiguity, misclustering or chimaeric EST sequences, assessing data such as raw chromatogram height, sharpness, overlap and spacing, sequencing error rates, context-sensitivity and cDNA library origin. Three separate validations-comparison with 54 genes screened for SNPs independently, verification of HLA-A polymorphisms and restriction fragment length polymorphism (RFLP) testing-verified 70%, 89% and 71% of our predicted SNPs, respectively. Our method detects tenfold more true HLA-A SNPs than previous analyses of the EST data. We found SNPs in a large fraction of known disease genes, including some disease-causing mutations (for example, the HbS sickle-cell mutation). Our comprehensive analysis of human coding region polymorphism provides a public resource for mapping of disease genes (available at http://www.bioinformatics.ucla.edu/snp).  相似文献   

8.
9.
The approach to annotating a genome critically affects the number and accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST, cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes. This annotation strategy is applicable to genomes of all organisms, including human.  相似文献   

10.
Isolation of a candidate gene for Norrie disease by positional cloning.   总被引:17,自引:0,他引:17  
The gene for Norrie disease, an X-linked disorder characterized by progressive atrophy of the eyes, mental disturbances and deafness, has been mapped to chromosome Xp11.4 close to DXS7 and the monoamine oxidase (MAO) genes. By subcloning a YAC with a 640 kilobases (kb) insert which spans the DXS7-MAOB interval we have generated a cosmid contig which extends 250 kb beyond the MAOB gene. With one of these cosmids, microdeletions were detected in several patients with Norrie disease. Screening of cDNA libraries has enabled us to isolate and sequence a likely candidate gene for Norrie disease which is expressed in retina, choroid and fetal brain. No homologous sequences were found in DNA and protein databases indicating that this cDNA is part of a gene encoding a 'pioneer' protein.  相似文献   

11.
A radiation hybrid map of the zebrafish genome.   总被引:12,自引:0,他引:12  
Recent large-scale mutagenesis screens have made the zebrafish the first vertebrate organism to allow a forward genetic approach to the discovery of developmental control genes. Mutations can be cloned positionally, or placed on a simple sequence length polymorphism (SSLP) map to match them with mapped candidate genes and expressed sequence tags (ESTs). To facilitate the mapping of candidate genes and to increase the density of markers available for positional cloning, we have created a radiation hybrid (RH) map of the zebrafish genome. This technique is based on somatic cell hybrid lines produced by fusion of lethally irradiated cells of the species of interest with a rodent cell line. Random fragments of the donor chromosomes are integrated into recipient chromosomes or retained as separate minichromosomes. The radiation-induced breakpoints can be used for mapping in a manner analogous to genetic mapping, but at higher resolution and without a need for polymorphism. Genome-wide maps exist for the human, based on three RH panels of different resolutions, as well as for the dog, rat and mouse. For our map of the zebrafish genome, we used an existing RH panel and 1,451 sequence tagged site (STS) markers, including SSLPs, cloned candidate genes and ESTs. Of these, 1,275 (87.9%) have significant linkage to at least one other marker. The fraction of ESTs with significant linkage, which can be used as an estimate of map coverage, is 81.9%. We found the average marker retention frequency to be 18.4%. One cR3000 is equivalent to 61 kb, resulting in a potential resolution of approximately 350 kb.  相似文献   

12.
To verify the genome annotation and to create a resource to functionally characterize the proteome, we attempted to Gateway-clone all predicted protein-encoding open reading frames (ORFs), or the 'ORFeome,' of Caenorhabditis elegans. We successfully cloned approximately 12,000 ORFs (ORFeome 1.1), of which roughly 4,000 correspond to genes that are untouched by any cDNA or expressed-sequence tag (EST). More than 50% of predicted genes needed corrections in their intron-exon structures. Notably, approximately 11,000 C. elegans proteins can now be expressed under many conditions and characterized using various high-throughput strategies, including large-scale interactome mapping. We suggest that similar ORFeome projects will be valuable for other organisms, including humans.  相似文献   

13.
Now that some genomes have been completely sequenced, the ability to direct specific mutations into genomes is particularly desirable. Here we present a method to create mutations in the Caenorhabditis elegans genome efficiently through transgene-directed, transposon-mediated gene conversion. Engineered deletions targeted into two genes show that the frequency of obtaining the desired mutation was higher using this approach than using standard transposon insertion-deletion approaches. We also targeted an engineered green fluorescent protein insertion-replacement cassette to one of these genes, thereby confirming that custom alleles of different types can be created in vitro to make the corresponding mutations in vivo. This approach should also be applicable to heterologous transposons in C. elegans and other organisms, including vertebrates.  相似文献   

14.
Single pass sequencing and physical and genetic mapping of human brain cDNAs.   总被引:16,自引:0,他引:16  
We have performed single pass sequencing of 1,024 human brain cDNAs, over 900 of which seem to represent new human genes. Library prescreening with total brain cDNA significantly reduced repeated sequencing of highly represented cDNAs. A subset of sequenced cDNAs were physically mapped to their chromosomal locations using gene-specific STS primers derived from 3' untranslated regions. We have also determined that human brain cDNAs represent a rich source of gene-associated polymorphic markers. Microsatellite-containing cDNAs can be physically mapped and converted to highly informative genetic markers, thus facilitating integration of the human physical, expression and genetic maps.  相似文献   

15.
Analysis of expressed sequence tags indicates 35,000 human genes   总被引:18,自引:0,他引:18  
Ewing B  Green P 《Nature genetics》2000,25(2):232-234
The number of protein-coding genes in an organism provides a useful first measure of its molecular complexity. Single-celled prokaryotes and eukaryotes typically have a few thousand genes; for example, Escherichia coli has 4,300 and Saccharomyces cerevisiae has 6,000. Evolution of multicellularity appears to have been accompanied by a several-fold increase in gene number, the invertebrates Caenorhabditis elegans and Drosophila melanogaster having 19,000 and 13,600 genes, respectively. Here we estimate the number of human genes by comparing a set of human expressed sequence tag (EST) contigs with human chromosome 22 and with a non-redundant set of mRNA sequences. The two comparisons give mutually consistent estimates of approximately 35,000 genes, substantially lower than most previous estimates. Evolution of the increased physiological complexity of vertebrates may therefore have depended more on the combinatorial diversification of regulatory networks or alternative splicing than on a substantial increase in gene number.  相似文献   

16.
17.
18.
The fundamental aim of genetics is to understand how an organism's phenotype is determined by its genotype, and implicit in this is predicting how changes in DNA sequence alter phenotypes. A single network covering all the genes of an organism might guide such predictions down to the level of individual cells and tissues. To validate this approach, we computationally generated a network covering most C. elegans genes and tested its predictive capacity. Connectivity within this network predicts essentiality, identifying this relationship as an evolutionarily conserved biological principle. Critically, the network makes tissue-specific predictions-we accurately identify genes for most systematically assayed loss-of-function phenotypes, which span diverse cellular and developmental processes. Using the network, we identify 16 genes whose inactivation suppresses defects in the retinoblastoma tumor suppressor pathway, and we successfully predict that the dystrophin complex modulates EGF signaling. We conclude that an analogous network for human genes might be similarly predictive and thus facilitate identification of disease genes and rational therapeutic targets.  相似文献   

19.
Caenorhabditis elegans is the first animal whose genomic sequence has been determined. One of the new possibilities in post-sequence genetics is the analysis of complete gene families at once. We studied the family of heterotrimeric G proteins. C. elegans has 20 Galpha, 2 Gbeta and 2 Ggamma genes. There is 1 homologue of each of the 4 mammalian classes of Galpha genes, G(i)/G(o)alpha, G(s)alpha , G(q)alpha and G12alpha, and there are 16 new alpha genes. Although the conserved Galpha subunits are expressed in many neurons and muscle cells, GFP fusions indicate that 14 new Galpha genes are expressed almost exclusively in a small subset of the chemosensory neurons of C. elegans. We generated loss-of-function alleles using target-selected gene inactivation. None of the amphid-expressed genes are essential for viability, and only four show any detectable phenotype (chemotaxis defects), suggesting extensive functional redundancy. On the basis of functional analysis, the 20 genes encoding Galpha proteins can be divided into two groups: those that encode subunits affecting muscle activity (homologues of G(i)/G(o)alpha, G(s)alpha and G(q)), and those (14 new genes) that encode proteins most likely involved in perception.  相似文献   

20.
The number of genes in the human genome is unknown, with estimates ranging from 50,000 to 90,000 (refs 1, 2), and to more than 140,000 according to unpublished sources. We have developed 'Exofish', a procedure based on homology searches, to identify human genes quickly and reliably. This method relies on the sequence of another vertebrate, the pufferfish Tetraodon nigroviridis, to detect conserved sequences with a very low background. Similar to Fugu rubripes, a marine pufferfish proposed by Brenner et al. as a model for genomic studies, T. nigroviridis is a more practical alternative with a genome also eight times more compact than that of human. Many comparisons have been made between F. rubripes and human DNA that demonstrate the potential of comparative genomics using the pufferfish genome. Application of Exofish to the December version of the working draft sequence of the human genome and to Unigene showed that the human genome contains 28,000-34,000 genes, and that Unigene contains less than 40% of the protein-coding fraction of the human genome.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号