共查询到20条相似文献,搜索用时 0 毫秒
1.
The identification of promoters and first exons has been one of the most difficult problems in gene-finding. We present a set of discriminant functions that can recognize structural and compositional features such as CpG islands, promoter regions and first splice-donor sites. We explain the implementation of the discriminant functions into a decision tree that constitutes a new program called FirstEF. By using different models to predict CpG-related and non-CpG-related first exons, we showed by cross-validation that the program could predict 86% of the first exons with 17% false positives. We also demonstrated the prediction accuracy of FirstEF at the genome level by applying it to the finished sequences of human chromosomes 21 and 22 as well as by comparing the predictions with the locations of the experimentally verified first exons. Finally, we present the analysis of the predicted first exons for all of the 24 chromosomes of the human genome. 相似文献
2.
3.
4.
5.
6.
7.
High-resolution haplotype structure in the human genome 总被引:41,自引:0,他引:41
Linkage disequilibrium (LD) analysis is traditionally based on individual genetic markers and often yields an erratic, non-monotonic picture, because the power to detect allelic associations depends on specific properties of each marker, such as frequency and population history. Ideally, LD analysis should be based directly on the underlying haplotype structure of the human genome, but this structure has remained poorly understood. Here we report a high-resolution analysis of the haplotype structure across 500 kilobases on chromosome 5q31 using 103 single-nucleotide polymorphisms (SNPs) in a European-derived population. The results show a picture of discrete haplotype blocks (of tens to hundreds of kilobases), each with limited diversity punctuated by apparent sites of recombination. In addition, we develop an analytical model for LD mapping based on such haplotype blocks. If our observed structure is general (and published data suggest that it may be), it offers a coherent framework for creating a haplotype map of the human genome. 相似文献
8.
9.
10.
Tuzun E Sharp AJ Bailey JA Kaul R Morrison VA Pertz LM Haugen E Hayden H Albertson D Pinkel D Olson MV Eichler EE 《Nature genetics》2005,37(7):727-732
Inversions, deletions and insertions are important mediators of disease and disease susceptibility. We systematically compared the human genome reference sequence with a second genome (represented by fosmid paired-end sequences) to detect intermediate-sized structural variants >8 kb in length. We identified 297 sites of structural variation: 139 insertions, 102 deletions and 56 inversion breakpoints. Using combined literature, sequence and experimental analyses, we validated 112 of the structural variants, including several that are of biomedical relevance. These data provide a fine-scale structural variation map of the human genome and the requisite sequence precision for subsequent genetic studies of human disease. 相似文献
11.
Wang Z Zang C Rosenfeld JA Schones DE Barski A Cuddapah S Cui K Roh TY Peng W Zhang MQ Zhao K 《Nature genetics》2008,40(7):897-903
12.
The completed draft version of the human genome, comprised of multiple short contigs encompassing 85% or more of euchromatin, was announced in June of 2000 (ref. 1). The detailed findings of the sequencing consortium were reported several months later. The draft sequence has provided insight into global characteristics, such as the total number of genes and a more accurate definition of gene families. Also of importance are genome positional details such as local genome architecture, regional gene density and the location of transcribed units that are critical for disease gene identification. We carried out a series of mapping and computational experiments using a nonredundant collection of 925 expressed sequence tags (ESTs) and sections of the public draft genome sequence that were available at different timepoints between April 2000 and April 2001. We found discrepancies in both the reported coverage of the human genome and the accuracy of mapping of genomic clones, suggesting some limitations of the draft genome sequence in providing accurate positional information and detailed characterization of chromosomal subregions. 相似文献
13.
A high-resolution survey of deletion polymorphism in the human genome 总被引:20,自引:0,他引:20
Recent work has shown that copy number polymorphism is an important class of genetic variation in human genomes. Here we report a new method that uses SNP genotype data from parent-offspring trios to identify polymorphic deletions. We applied this method to data from the International HapMap Project to produce the first high-resolution population surveys of deletion polymorphism. Approximately 100 of these deletions have been experimentally validated using comparative genome hybridization on tiling-resolution oligonucleotide microarrays. Our analysis identifies a total of 586 distinct regions that harbor deletion polymorphisms in one or more of the families. Notably, we estimate that typical individuals are hemizygous for roughly 30-50 deletions larger than 5 kb, totaling around 550-750 kb of euchromatic sequence across their genomes. The detected deletions span a total of 267 known and predicted genes. Overall, however, the deleted regions are relatively gene-poor, consistent with the action of purifying selection against deletions. Deletion polymorphisms may well have an important role in the genetics of complex traits; however, they are not directly observed in most current gene mapping studies. Our new method will permit the identification of deletion polymorphisms in high-density SNP surveys of trio or other family data. 相似文献
14.
Kong A Gudbjartsson DF Sainz J Jonsdottir GM Gudjonsson SA Richardsson B Sigurdardottir S Barnard J Hallbeck B Masson G Shlien A Palsson ST Frigge ML Thorgeirsson TE Gulcher JR Stefansson K 《Nature genetics》2002,31(3):241-247
Determination of recombination rates across the human genome has been constrained by the limited resolution and accuracy of existing genetic maps and the draft genome sequence. We have genotyped 5,136 microsatellite markers for 146 families, with a total of 1,257 meiotic events, to build a high-resolution genetic map meant to: (i) improve the genetic order of polymorphic markers; (ii) improve the precision of estimates of genetic distances; (iii) correct portions of the sequence assembly and SNP map of the human genome; and (iv) build a map of recombination rates. Recombination rates are significantly correlated with both cytogenetic structures (staining intensity of G bands) and sequence (GC content, CpG motifs and poly(A)/poly(T) stretches). Maternal and paternal chromosomes show many differences in locations of recombination maxima. We detected systematic differences in recombination rates between mothers and between gametes from the same mother, suggesting that there is some underlying component determined by both genetic and environmental factors that affects maternal recombination rates. 相似文献
15.
A worldwide survey of haplotype variation and linkage disequilibrium in the human genome 总被引:1,自引:0,他引:1
Conrad DF Jakobsson M Coop G Wen X Wall JD Rosenberg NA Pritchard JK 《Nature genetics》2006,38(11):1251-1260
Recent genomic surveys have produced high-resolution haplotype information, but only in a small number of human populations. We report haplotype structure across 12 Mb of DNA sequence in 927 individuals representing 52 populations. The geographic distribution of haplotypes reflects human history, with a loss of haplotype diversity as distance increases from Africa. Although the extent of linkage disequilibrium (LD) varies markedly across populations, considerable sharing of haplotype structure exists, and inferred recombination hotspot locations generally match across groups. The four samples in the International HapMap Project contain the majority of common haplotypes found in most populations: averaging across populations, 83% of common 20-kb haplotypes in a population are also common in the most similar HapMap sample. Consequently, although the portability of tag SNPs based on the HapMap is reduced in low-LD Africans, the HapMap will be helpful for the design of genome-wide association mapping studies in nearly all human populations. 相似文献
16.
Khaja R Zhang J MacDonald JR He Y Joseph-George AM Wei J Rafiq MA Qian C Shago M Pantano L Aburatani H Jones K Redon R Hurles M Armengol L Estivill X Mural RJ Lee C Scherer SW Feuk L 《Nature genetics》2006,38(12):1413-1418
Numerous types of DNA variation exist, ranging from SNPs to larger structural alterations such as copy number variants (CNVs) and inversions. Alignment of DNA sequence from different sources has been used to identify SNPs and intermediate-sized variants (ISVs). However, only a small proportion of total heterogeneity is characterized, and little is known of the characteristics of most smaller-sized (<50 kb) variants. Here we show that genome assembly comparison is a robust approach for identification of all classes of genetic variation. Through comparison of two human assemblies (Celera's R27c compilation and the Build 35 reference sequence), we identified megabases of sequence (in the form of 13,534 putative non-SNP events) that were absent, inverted or polymorphic in one assembly. Database comparison and laboratory experimentation further demonstrated overlap or validation for 240 variable regions and confirmed >1.5 million SNPs. Some differences were simple insertions and deletions, but in regions containing CNVs, segmental duplication and repetitive DNA, they were more complex. Our results uncover substantial undescribed variation in humans, highlighting the need for comprehensive annotation strategies to fully interpret genome scanning and personalized sequencing projects. 相似文献
17.
Structural genomics: beyond the human genome project. 总被引:17,自引:0,他引:17
S K Burley S C Almo J B Bonanno M Capel M R Chance T Gaasterland D Lin A Sali F W Studier S Swaminathan 《Nature genetics》1999,23(2):151-157
With access to whole genome sequences for various organisms and imminent completion of the Human Genome Project, the entire process of discovery in molecular and cellular biology is poised to change. Massively parallel measurement strategies promise to revolutionize how we study and ultimately understand the complex biochemical circuitry responsible for controlling normal development, physiologic homeostasis and disease processes. This information explosion is also providing the foundation for an important new initiative in structural biology. We are about to embark on a program of high-throughput X-ray crystallography aimed at developing a comprehensive mechanistic understanding of normal and abnormal human and microbial physiology at the molecular level. We present the rationale for creation of a structural genomics initiative, recount the efforts of ongoing structural genomics pilot studies, and detail the lofty goals, technical challenges and pitfalls facing structural biologists. 相似文献
18.
To test the hypothesis that the human genome project will uncover many genes not previously discovered by sequencing of expressed sequence tags (ESTs), we designed and produced a set of microarrays using probes based on open reading frames (ORFs) in 350 Mb of finished and draft human sequence. Our approach aims to identify all genes directly from genomic sequence by querying gene expression. We analysed genomic sequence with a suite of ORF prediction programs, selected approximately one ORF per gene, amplified the ORFs from genomic DNA and arrayed the amplicons onto treated glass slides. Of the first 10,000 arrayed ORFs, 31% are completely novel and 29% are similar, but not identical, to sequences in public databases. Approximately one-half of these are expressed in the tissues we queried by microarray. Subsequent verification by other techniques confirmed expression of several of the novel genes. Expressed sequence tags (ESTs) have yielded vast amounts of data, but our results indicate that many genes in the human genome will only be found by genomic sequencing. 相似文献
19.
Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome 总被引:3,自引:0,他引:3
Weber M Hellmann I Stadler MB Ramos L Pääbo S Rebhan M Schübeler D 《Nature genetics》2007,39(4):457-466
To gain insight into the function of DNA methylation at cis-regulatory regions and its impact on gene expression, we measured methylation, RNA polymerase occupancy and histone modifications at 16,000 promoters in primary human somatic and germline cells. We find CpG-poor promoters hypermethylated in somatic cells, which does not preclude their activity. This methylation is present in male gametes and results in evolutionary loss of CpG dinucleotides, as measured by divergence between humans and primates. In contrast, strong CpG island promoters are mostly unmethylated, even when inactive. Weak CpG island promoters are distinct, as they are preferential targets for de novo methylation in somatic cells. Notably, most germline-specific genes are methylated in somatic cells, suggesting additional functional selection. These results show that promoter sequence and gene function are major predictors of promoter methylation states. Moreover, we observe that inactive unmethylated CpG island promoters show elevated levels of dimethylation of Lys4 of histone H3, suggesting that this chromatin mark may protect DNA from methylation. 相似文献