首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Sequence variation in human genes is largely confined to single-nucleotide polymorphisms (SNPs) and is valuable in tests of association with common diseases and pharmacogenetic traits. We performed a systematic and comprehensive survey of molecular variation to assess the nature, pattern and frequency of SNPs in 75 candidate human genes for blood-pressure homeostasis and hypertension. We assayed 28 Mb (190 kb in 148 alleles) of genomic sequence, comprising the 5' and 3' untranslated regions (UTRs), introns and coding sequence of these genes, for sequence differences in individuals of African and Northern European descent using high-density variant detection arrays (VDAs). We identified 874 candidate human SNPs, of which 22% were confirmed by DNA sequencing to reveal a discordancy rate of 21% for VDA detection. The SNPs detected have an average minor allele frequency of 11%, and 387 are within the coding sequence (cSNPs). Of all cSNPs, 54% lead to a predicted change in the protein sequence, implying a high level of human protein diversity. These protein-altering SNPs are 38% of the total number of such SNPs expected, are more likely to be population-specific and are rarer in the human population, directly demonstrating the effects of natural selection on human genes. Overall, the degree of nucleotide polymorphism across these human genes, and orthologous great ape sequences, is highly variable and is correlated with the effects of functional conservation on gene sequences.  相似文献   

2.
A major goal in human genetics is to understand the role of common genetic variants in susceptibility to common diseases. This will require characterizing the nature of gene variation in human populations, assembling an extensive catalogue of single-nucleotide polymorphisms (SNPs) in candidate genes and performing association studies for particular diseases. At present, our knowledge of human gene variation remains rudimentary. Here we describe a systematic survey of SNPs in the coding regions of human genes. We identified SNPs in 106 genes relevant to cardiovascular disease, endocrinology and neuropsychiatry by screening an average of 114 independent alleles using 2 independent screening methods. To ensure high accuracy, all reported SNPs were confirmed by DNA sequencing. We identified 560 SNPs, including 392 coding-region SNPs (cSNPs) divided roughly equally between those causing synonymous and non-synonymous changes. We observed different rates of polymorphism among classes of sites within genes (non-coding, degenerate and non-degenerate) as well as between genes. The cSNPs most likely to influence disease, those that alter the amino acid sequence of the encoded protein, are found at a lower rate and with lower allele frequencies than silent substitutions. This likely reflects selection acting against deleterious alleles during human evolution. The lower allele frequency of missense cSNPs has implications for the compilation of a comprehensive catalogue, as well as for the subsequent application to disease association.  相似文献   

3.
High-resolution genetic analysis of the human genome promises to provide insight into common disease susceptibility. To perform such analysis will require a collection of high-throughput, high-density analysis reagents. We have developed a polymorphism detection system that uses public-domain sequence data. This detection system is called the single nucleotide polymorphism pipeline (SNPpipeline). The analytic core of the SNPpipeline is composed of three components: PHRED, PHRAP and DEMIGLACE. PHRED and PHRAP are components of a sequence analysis suite developed to perform the semi-automated analysis required for large-scale genomes (provided courtesy of P. Green). Using these informatics tools, which examine redundant raw expressed sequence tag (EST) data, we have identified more than 3,000 candidate single-nucleotide polymorphisms (SNPs). Empiric validation studies of a set of 192 candidates indicate that 82% identify variation in a sample of ten Centre d'Etudes Polymorphism Humain (CEPH) individuals. Our results suggest that existing sequence resources may serve as a valuable source for identifying genetic variation.  相似文献   

4.
One goal in sequencing the Plasmodium falciparum genome, the agent of the most lethal form of malaria, is to discover vaccine and drug targets. However, identifying those targets in a genome in which approximately 60% of genes have unknown functions is an enormous challenge. Because the majority of known malaria antigens and drug-resistant genes are highly polymorphic and under various selective pressures, genome-wide analysis for signatures of selection may lead to discovery of new vaccine and drug candidates. Here we surveyed 3,539 P. falciparum genes ( approximately 65% of the predicted genes) for polymorphisms and identified various highly polymorphic loci and genes, some of which encode new antigens that we confirmed using human immune sera. Our collections of genome-wide SNPs ( approximately 65% nonsynonymous) and polymorphic microsatellites and indels provide a high-resolution map (one marker per approximately 4 kb) for mapping parasite traits and studying parasite populations. In addition, we report new antigens, providing urgently needed vaccine candidates for disease control.  相似文献   

5.
A general approach to single-nucleotide polymorphism discovery   总被引:29,自引:0,他引:29  
Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits. The large volume of data produced by high-throughput sequencing projects is a rich and largely untapped source of SNPs (refs 2, 3, 4, 5). We present here a unified approach to the discovery of variations in genetic sequence data of arbitrary DNA sources. We propose to use the rapidly emerging genomic sequence as a template on which to layer often unmapped, fragmentary sequence data and to use base quality values to discern true allelic variations from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organization: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES, to calculate the probability that a given site is polymorphic. Rigorous treatment of base quality permits completely automated evaluation of the full length of all sequences, without limitations on alignment depth. We demonstrate this approach by accurate SNP predictions in human ESTs aligned to finished and working-draft quality genomic sequences, a data set representative of the typical challenges of sequence-based SNP discovery.  相似文献   

6.
Targeted capture combined with massively parallel exome sequencing is a promising approach to identify genetic variants implicated in human traits. We report exome sequencing of 200 individuals from Denmark with targeted capture of 18,654 coding genes and sequence coverage of each individual exome at an average depth of 12-fold. On average, about 95% of the target regions were covered by at least one read. We identified 121,870 SNPs in the sample population, including 53,081 coding SNPs (cSNPs). Using a statistical method for SNP calling and an estimation of allelic frequencies based on our population data, we derived the allele frequency spectrum of cSNPs with a minor allele frequency greater than 0.02. We identified a 1.8-fold excess of deleterious, non-syonomyous cSNPs over synonymous cSNPs in the low-frequency range (minor allele frequencies between 2% and 5%). This excess was more pronounced for X-linked SNPs, suggesting that deleterious substitutions are primarily recessive.  相似文献   

7.
Haplotype tagging for the identification of common disease genes   总被引:61,自引:0,他引:61  
Genome-wide linkage disequilibrium (LD) mapping of common disease genes could be more powerful than linkage analysis if the appropriate density of polymorphic markers were known and if the genotyping effort and cost of producing such an LD map could be reduced. Although different metrics that measure the extent of LD have been evaluated, even the most recent studies have not placed significant emphasis on the most informative and cost-effective method of LD mapping-that based on haplotypes. We have scanned 135 kb of DNA from nine genes, genotyped 122 single-nucleotide polymorphisms (SNPs; approximately 184,000 genotypes) and determined the common haplotypes in a minimum of 384 European individuals for each gene. Here we show how knowledge of the common haplotypes and the SNPs that tag them can be used to (i) explain the often complex patterns of LD between adjacent markers, (ii) reduce genotyping significantly (in this case from 122 to 34 SNPs), (iii) scan the common variation of a gene sensitively and comprehensively and (iv) provide key fine-mapping data within regions of strong LD. Our results also indicate that, at least for the genes studied here, the current version of dbSNP would have been of limited utility for LD mapping because many common haplotypes could not be defined. A directed re-sequencing effort of the approximately 10% of the genome in or near genes in the major ethnic groups would aid the systematic evaluation of the common variant model of common disease.  相似文献   

8.
Complex SNP-related sequence variation in segmental genome duplications   总被引:23,自引:0,他引:23  
There is uncertainty about the true nature of predicted single-nucleotide polymorphisms (SNPs) in segmental duplications (duplicons) and whether these markers genuinely exist at increased density as indicated in public databases. We explored these issues by genotyping 157 predicted SNPs in duplicons and control regions in normal diploid genomes and fully homozygous complete hydatidiform moles. Our data identified many true SNPs in duplicon regions and few paralogous sequence variants. Twenty-eight percent of the polymorphic duplicon sequences we tested involved multisite variation, a new type of polymorphism representing the sum of the signals from many individual duplicon copies that vary in sequence content due to duplication, deletion or gene conversion. Multisite variations can masquerade as normal SNPs when genotyped. Given that duplicons comprise at least 5% of the genome and many are yet to be annotated in the genome draft, effective strategies to identify multisite variation must be established and deployed.  相似文献   

9.
The locations and properties of common deletion variants in the human genome are largely unknown. We describe a systematic method for using dense SNP genotype data to discover deletions and its application to data from the International HapMap Consortium to characterize and catalogue segregating deletion variants across the human genome. We identified 541 deletion variants (94% novel) ranging from 1 kb to 745 kb in size; 278 of these variants were observed in multiple, unrelated individuals, 120 in the homozygous state. The coding exons of ten expressed genes were found to be commonly deleted, including multiple genes with roles in sex steroid metabolism, olfaction and drug response. These common deletion polymorphisms typically represent ancestral mutations that are in linkage disequilibrium with nearby SNPs, meaning that their association to disease can often be evaluated in the course of SNP-based whole-genome association studies.  相似文献   

10.
Substantial efforts are focused on identifying single-nucleotide polymorphisms (SNPs) throughout the human genome, particularly in coding regions (cSNPs), for both linkage disequilibrium and association studies. Less attention, however, has been directed to the clarification of evolutionary processes that are responsible for the variability in nucleotide diversity among different regions of the genome. We report here the population sequence diversity of genomic segments within a 450-kb cluster of olfactory receptor (OR) genes on human chromosome 17. We found a dichotomy in the pattern of nucleotide diversity between OR pseudogenes and introns on the one hand and the closely interspersed intact genes on the other. We suggest that weak positive selection is responsible for the observed patterns of genetic variation. This is inferred from a lower ratio of polymorphism to divergence in genes compared with pseudogenes or introns, high non-synonymous substitution rates in OR genes, and a small but significant overall reduction in variability in the entire OR gene cluster compared with other genomic regions. The dichotomy among functionally different segments within a short genomic distance requires high recombination rates within this OR cluster. Our work demonstrates the impact of weak positive selection on human nucleotide diversity, and has implications for the evolution of the olfactory repertoire.  相似文献   

11.
Single-nucleotide polymorphisms (SNPs) have been the focus of much attention in human genetics because they are extremely abundant and well-suited for automated large-scale genotyping. Human SNPs, however, are less informative than other types of genetic markers (such as simple-sequence length polymorphisms or microsatellites) and thus more loci are required for mapping traits. SNPs offer similar advantages for experimental genetic organisms such as the mouse, but they entail no loss of informativeness because bi-allelic markers are fully informative in analysing crosses between inbred strains. Here we report a large-scale analysis of SNPs in the mouse genome. We characterized the rate of nucleotide polymorphism in eight mouse strains and identified a collection of 2,848 SNPs located in 1,755 sequence-tagged sites (STSs) using high-density oligonucleotide arrays. Three-quarters of these SNPs have been mapped on the mouse genome, providing a first-generation SNP map of the mouse. We have also developed a multiplex genotyping procedure by which a genome scan can be performed with only six genotyping reactions per animal.  相似文献   

12.
13.
Noncoding variants at human chromosome 9p21 near CDKN2A and CDKN2B are associated with type 2 diabetes, myocardial infarction, aneurysm, vertical cup disc ratio and at least five cancers. Here we compare approaches to more comprehensively assess genetic variation in the region. We carried out targeted sequencing at high coverage in 47 individuals and compared the results to pilot data from the 1000 Genomes Project. We imputed variants into type 2 diabetes and myocardial infarction cohorts directly from targeted sequencing, from a genotyped reference panel derived from sequencing and from 1000 Genomes Project low-coverage data. Polymorphisms with frequency >5% were captured well by all strategies. Imputation of intermediate-frequency polymorphisms required a higher density of tag SNPs in disease samples than is available on first-generation genome-wide association study (GWAS) arrays. Our association analyses identified more comprehensive sets of variants showing equivalent statistical association with type 2 diabetes or myocardial infarction, but did not identify stronger associations than the original GWAS signals.  相似文献   

14.
A high-resolution survey of deletion polymorphism in the human genome   总被引:20,自引:0,他引:20  
Recent work has shown that copy number polymorphism is an important class of genetic variation in human genomes. Here we report a new method that uses SNP genotype data from parent-offspring trios to identify polymorphic deletions. We applied this method to data from the International HapMap Project to produce the first high-resolution population surveys of deletion polymorphism. Approximately 100 of these deletions have been experimentally validated using comparative genome hybridization on tiling-resolution oligonucleotide microarrays. Our analysis identifies a total of 586 distinct regions that harbor deletion polymorphisms in one or more of the families. Notably, we estimate that typical individuals are hemizygous for roughly 30-50 deletions larger than 5 kb, totaling around 550-750 kb of euchromatic sequence across their genomes. The detected deletions span a total of 267 known and predicted genes. Overall, however, the deleted regions are relatively gene-poor, consistent with the action of purifying selection against deletions. Deletion polymorphisms may well have an important role in the genetics of complex traits; however, they are not directly observed in most current gene mapping studies. Our new method will permit the identification of deletion polymorphisms in high-density SNP surveys of trio or other family data.  相似文献   

15.
Cloning procedures aided by homology searches of EST databases have accelerated the pace of discovery of new genes, but EST database searching remains an involved and onerous task. More than 1.6 million human EST sequences have been deposited in public databases, making it difficult to identify ESTs that represent new genes. Compounding the problems of scale are difficulties in detection associated with a high sequencing error rate and low sequence similarity between distant homologues. We have developed a new method, coupling BLAST-based searches with a domain identification protocol, that filters candidate homologues. Application of this method in a large-scale analysis of 100 signalling domain families has led to the identification of ESTs representing more than 1,000 novel human signalling genes. The 4,206 publicly available ESTs representing these genes are a valuable resource for rapid cloning of novel human signalling proteins. For example, we were able to identify ESTs of at least 106 new small GTPases, of which 6 are likely to belong to new subfamilies. In some cases, further analyses of genomic DNA led to the discovery of previously unidentified full-length protein sequences. This is exemplified by the in silico cloning (prediction of a gene product sequence using only genomic and EST sequence data) of a new type of GTPase with two catalytic domains.  相似文献   

16.
17.
Genetic mapping with SNP markers in Drosophila.   总被引:10,自引:0,他引:10  
Map-based positional cloning of Drosophila melanogaster genes is hampered by both the time-consuming, error-prone nature of traditional methods for genetic mapping and the difficulties in aligning the genetic and cytological maps with the genome sequence. The identification of sequence polymorphisms in the Drosophila genome will make it possible to map mutations directly to the genome sequence with high accuracy and resolution. Here we report the identification of 7,223 single-nucleotide polymorphisms (SNPs) and 1,392 insertions/deletions (InDels) in common laboratory strains of Drosophila. These sequence polymorphisms define a map of 787 autosomal marker loci with a resolution of 114 kb. We have established PCR product-length polymorphism (PLP) or restriction fragment-length polymorphism (RFLP) assays for 215 of these markers. We demonstrate the use of this map by delimiting two mutations to intervals of 169 kb and 307 kb, respectively. Using a local high-density SNP map, we also mapped a third mutation to a resolution of approximately 2 kb, sufficient to localize the mutation within a single gene. These methods should accelerate the rate of positional cloning in Drosophila.  相似文献   

18.
19.
With several hundred genetic diseases and an advantageous genome structure, dogs are ideal for mapping genes that cause disease. Here we report the development of a genotyping array with approximately 27,000 SNPs and show that genome-wide association mapping of mendelian traits in dog breeds can be achieved with only approximately 20 dogs. Specifically, we map two traits with mendelian inheritance: the major white spotting (S) locus and the hair ridge in Rhodesian ridgebacks. For both traits, we map the loci to discrete regions of <1 Mb. Fine-mapping of the S locus in two breeds refines the localization to a region of approximately 100 kb contained within the pigmentation-related gene MITF. Complete sequencing of the white and solid haplotypes identifies candidate regulatory mutations in the melanocyte-specific promoter of MITF. Our results show that genome-wide association mapping within dog breeds, followed by fine-mapping across multiple breeds, will be highly efficient and generally applicable to trait mapping, providing insights into canine and human health.  相似文献   

20.
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (~4×) 1000 Genomes Project datasets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号