首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recent advances in sequencing technology make it possible to comprehensively catalog genetic variation in population samples, creating a foundation for understanding human disease, ancestry and evolution. The amounts of raw data produced are prodigious, and many computational steps are required to translate this output into high-quality variant calls. We present a unified analytic framework to discover and genotype variation among multiple samples simultaneously that achieves sensitive and specific results across five sequencing technologies and three distinct, canonical experimental designs. Our process includes (i) initial read mapping; (ii) local realignment around indels; (iii) base quality score recalibration; (iv) SNP discovery and genotyping to find all potential variants; and (v) machine learning to separate true segregating variation from machine artifacts common to next-generation sequencing technologies. We here discuss the application of these tools, instantiated in the Genome Analysis Toolkit, to deep whole-genome, whole-exome capture and multi-sample low-pass (~4×) 1000 Genomes Project datasets.  相似文献   

2.
Complex SNP-related sequence variation in segmental genome duplications   总被引:23,自引:0,他引:23  
There is uncertainty about the true nature of predicted single-nucleotide polymorphisms (SNPs) in segmental duplications (duplicons) and whether these markers genuinely exist at increased density as indicated in public databases. We explored these issues by genotyping 157 predicted SNPs in duplicons and control regions in normal diploid genomes and fully homozygous complete hydatidiform moles. Our data identified many true SNPs in duplicon regions and few paralogous sequence variants. Twenty-eight percent of the polymorphic duplicon sequences we tested involved multisite variation, a new type of polymorphism representing the sum of the signals from many individual duplicon copies that vary in sequence content due to duplication, deletion or gene conversion. Multisite variations can masquerade as normal SNPs when genotyped. Given that duplicons comprise at least 5% of the genome and many are yet to be annotated in the genome draft, effective strategies to identify multisite variation must be established and deployed.  相似文献   

3.
The detection of sequence variation, for which DNA sequencing has emerged as the most sensitive and automated approach, forms the basis of all genetic analysis. Here we describe and illustrate an algorithm that accurately detects and genotypes SNPs from fluorescence-based sequence data. Because the algorithm focuses particularly on detecting SNPs through the identification of heterozygous individuals, it is especially well suited to the detection of SNPs in diploid samples obtained after DNA amplification. It is substantially more accurate than existing approaches and, notably, provides a useful quantitative measure of its confidence in each potential SNP detected and in each genotype called. Calls assigned the highest confidence are sufficiently reliable to remove the need for manual review in several contexts. For example, for sequence data from 47-90 individuals sequenced on both the forward and reverse strands, the highest-confidence calls from our algorithm detected 93% of all SNPs and 100% of high-frequency SNPs, with no false positive SNPs identified and 99.9% genotyping accuracy. This algorithm is implemented in a software package, PolyPhred version 5.0, which is freely available for academic use.  相似文献   

4.
Most human sequence variation is in the form of single-nucleotide polymorphisms (SNPs). It has been proposed that coding-region SNPs (cSNPs) be used for direct association studies to determine the genetic basis of complex traits. The success of such studies depends on the frequency of disease-associated alleles, and their distribution in different ethnic populations. If disease-associated alleles are frequent in most populations, then direct genotyping of candidate variants could show robust associations in manageable study samples. This approach is less feasible if the genetic risk from a given candidate gene is due to many infrequent alleles. Previous studies of several genes demonstrated that most variants are relatively infrequent (<0.05). These surveys genotyped small samples (n<75) and thus had limited ability to identify rare alleles. Here we evaluate the prevalence and distribution of such rare alleles by genotyping an ethnically diverse reference sample that is more than six times larger than those used in previous studies (n=450). We screened for variants in the complete coding sequence and intron-exon junctions of two candidate genes for neuropsychiatric phenotypes: SLC6A4, encoding the serotonin transporter; and SLC18A2, encoding the vesicular monoamine transporter. Both genes have unique roles in neuronal transmission, and variants in either gene might be associated with neurobehavioral phenotypes.  相似文献   

5.
One goal in sequencing the Plasmodium falciparum genome, the agent of the most lethal form of malaria, is to discover vaccine and drug targets. However, identifying those targets in a genome in which approximately 60% of genes have unknown functions is an enormous challenge. Because the majority of known malaria antigens and drug-resistant genes are highly polymorphic and under various selective pressures, genome-wide analysis for signatures of selection may lead to discovery of new vaccine and drug candidates. Here we surveyed 3,539 P. falciparum genes ( approximately 65% of the predicted genes) for polymorphisms and identified various highly polymorphic loci and genes, some of which encode new antigens that we confirmed using human immune sera. Our collections of genome-wide SNPs ( approximately 65% nonsynonymous) and polymorphic microsatellites and indels provide a high-resolution map (one marker per approximately 4 kb) for mapping parasite traits and studying parasite populations. In addition, we report new antigens, providing urgently needed vaccine candidates for disease control.  相似文献   

6.
7.
Dissecting the genetic basis of disease risk requires measuring all forms of genetic variation, including SNPs and copy number variants (CNVs), and is enabled by accurate maps of their locations, frequencies and population-genetic properties. We designed a hybrid genotyping array (Affymetrix SNP 6.0) to simultaneously measure 906,600 SNPs and copy number at 1.8 million genomic locations. By characterizing 270 HapMap samples, we developed a map of human CNV (at 2-kb breakpoint resolution) informed by integer genotypes for 1,320 copy number polymorphisms (CNPs) that segregate at an allele frequency >1%. More than 80% of the sequence in previously reported CNV regions fell outside our estimated CNV boundaries, indicating that large (>100 kb) CNVs affect much less of the genome than initially reported. Approximately 80% of observed copy number differences between pairs of individuals were due to common CNPs with an allele frequency >5%, and more than 99% derived from inheritance rather than new mutation. Most common, diallelic CNPs were in strong linkage disequilibrium with SNPs, and most low-frequency CNVs segregated on specific SNP haplotypes.  相似文献   

8.
Single-nucleotide polymorphisms (SNPs) have been the focus of much attention in human genetics because they are extremely abundant and well-suited for automated large-scale genotyping. Human SNPs, however, are less informative than other types of genetic markers (such as simple-sequence length polymorphisms or microsatellites) and thus more loci are required for mapping traits. SNPs offer similar advantages for experimental genetic organisms such as the mouse, but they entail no loss of informativeness because bi-allelic markers are fully informative in analysing crosses between inbred strains. Here we report a large-scale analysis of SNPs in the mouse genome. We characterized the rate of nucleotide polymorphism in eight mouse strains and identified a collection of 2,848 SNPs located in 1,755 sequence-tagged sites (STSs) using high-density oligonucleotide arrays. Three-quarters of these SNPs have been mapped on the mouse genome, providing a first-generation SNP map of the mouse. We have also developed a multiplex genotyping procedure by which a genome scan can be performed with only six genotyping reactions per animal.  相似文献   

9.
Genome-wide mapping with biallelic markers in Arabidopsis thaliana.   总被引:17,自引:0,他引:17  
Single-nucleotide polymorphisms, as well as small insertions and deletions (here referred to collectively as simple nucleotide polymorphisms, or SNPs), comprise the largest set of sequence variants in most organisms. Positional cloning based on SNPs may accelerate the identification of human disease traits and a range of biologically informative mutations. The recent application of high-density oligonucleotide arrays to allele identification has made it feasible to genotype thousands of biallelic SNPs in a single experiment. It has yet to be established, however, whether SNP detection using oligonucleotide arrays can be used to accelerate the mapping of traits in diploid genomes. The cruciferous weed Arabidopsis thaliana is an attractive model system for the construction and use of biallelic SNP maps. Although important biological processes ranging from fertilization and cell fate determination to disease resistance have been modelled in A. thaliana, identifying mutations in this organism has been impeded by the lack of a high-density genetic map consisting of easily genotyped DNA markers. We report here the construction of a biallelic genetic map in A. thaliana with a resolution of 3.5 cM and its use in mapping Eds16, a gene involved in the defence response to the fungal pathogen Erysiphe orontii. Mapping of this trait involved the high-throughput generation of meiotic maps of F2 individuals using high-density oligonucleotide probe array-based genotyping. We developed a software package called InterMap and used it to automatically delimit Eds16 to a 7-cM interval on chromosome 1. These results are the first demonstration of biallelic mapping in diploid genomes and establish means for generalizing SNP-based maps to virtually any genetic organism.  相似文献   

10.
Genome-wide patterns of genetic variation among elite maize inbred lines   总被引:6,自引:0,他引:6  
Lai J  Li R  Xu X  Jin W  Xu M  Zhao H  Xiang Z  Song W  Ying K  Zhang M  Jiao Y  Ni P  Zhang J  Li D  Guo X  Ye K  Jian M  Wang B  Zheng H  Liang H  Zhang X  Wang S  Chen S  Li J  Fu Y  Springer NM  Yang H  Wang J  Dai J  Schnable PS  Wang J 《Nature genetics》2010,42(11):1027-1030
We have resequenced a group of six elite maize inbred lines, including the parents of the most productive commercial hybrid in China. This effort uncovered more than 1,000,000 SNPs, 30,000 indel polymorphisms and 101 low-sequence-diversity chromosomal intervals in the maize genome. We also identified several hundred complete genes that show presence/absence variation among these resequenced lines. We discuss the potential roles of complementation of presence/absence variations and other deleterious mutations in contributing to heterosis. High-density SNP and indel polymorphism markers reported here are expected to be a valuable resource for future genetic studies and the molecular breeding of this important crop.  相似文献   

11.
Humans show great variation in phenotypic traits such as height, eye color and susceptibility to disease. Genomic DNA sequence differences among individuals are responsible for the inherited components of these complex traits. Reports suggest that intermediate and large-scale DNA copy number and structural variations are prevalent enough to be an important source of genetic variation between individuals. Because association studies to identify genomic loci associated with particular phenotypic traits have focused primarily on genotyping SNPs, it is important to determine whether common structural polymorphisms are in linkage disequilibrium with common SNPs, and thus can be assessed indirectly in SNP-based studies. Here we examine 100 deletion polymorphisms ranging from 70 bp to 7 kb. We show that common deletions and SNPs ascertained with similar criteria have essentially the same distribution of linkage disequilibrium with surrounding SNPs, indicating that these polymorphisms may share evolutionary history and that most deletion polymorphisms are effectively assayed by proxy in SNP-based association studies.  相似文献   

12.
A general approach to single-nucleotide polymorphism discovery   总被引:29,自引:0,他引:29  
Single-nucleotide polymorphisms (SNPs) are the most abundant form of human genetic variation and a resource for mapping complex genetic traits. The large volume of data produced by high-throughput sequencing projects is a rich and largely untapped source of SNPs (refs 2, 3, 4, 5). We present here a unified approach to the discovery of variations in genetic sequence data of arbitrary DNA sources. We propose to use the rapidly emerging genomic sequence as a template on which to layer often unmapped, fragmentary sequence data and to use base quality values to discern true allelic variations from sequencing errors. By taking advantage of the genomic sequence we are able to use simpler yet more accurate methods for sequence organization: fragment clustering, paralogue identification and multiple alignment. We analyse these sequences with a novel, Bayesian inference engine, POLYBAYES, to calculate the probability that a given site is polymorphic. Rigorous treatment of base quality permits completely automated evaluation of the full length of all sequences, without limitations on alignment depth. We demonstrate this approach by accurate SNP predictions in human ESTs aligned to finished and working-draft quality genomic sequences, a data set representative of the typical challenges of sequence-based SNP discovery.  相似文献   

13.
In developed countries, age-related macular degeneration is a common cause of blindness in the elderly. A common polymorphism, encoding the sequence variation Y402H in complement factor H (CFH), has been strongly associated with disease susceptibility. Here, we examined 84 polymorphisms in and around CFH in 726 affected individuals (including 544 unrelated individuals) and 268 unrelated controls. In this sample, 20 of these polymorphisms showed stronger association with disease susceptibility than the Y402H variant. Further, no single polymorphism could account for the contribution of the CFH locus to disease susceptibility. Instead, multiple polymorphisms defined a set of four common haplotypes (of which two were associated with disease susceptibility and two seemed to be protective) and multiple rare haplotypes (associated with increased susceptibility in aggregate). Our results suggest that there are multiple disease susceptibility alleles in the region and that noncoding CFH variants play a role in disease susceptibility.  相似文献   

14.
Genome-wide genetic changes during modern breeding of maize   总被引:3,自引:0,他引:3  
Jiao Y  Zhao H  Ren L  Song W  Zeng B  Guo J  Wang B  Liu Z  Chen J  Li W  Zhang M  Xie S  Lai J 《Nature genetics》2012,44(7):812-815
The success of modern maize breeding has been demonstrated by remarkable increases in productivity over the last four decades. However, the underlying genetic changes correlated with these gains remain largely unknown. We report here the sequencing of 278 temperate maize inbred lines from different stages of breeding history, including deep resequencing of 4 lines with known pedigree information. The results show that modern breeding has introduced highly dynamic genetic changes into the maize genome. Artificial selection has affected thousands of targets, including genes and non-genic regions, leading to a reduction in nucleotide diversity and an increase in the proportion of rare alleles. Genetic changes during breeding happen rapidly, with extensive variation (SNPs, indels and copy-number variants (CNVs)) occurring, even within identity-by-descent regions. Our genome-wide assessment of genetic changes during modern maize breeding provides new strategies as well as practical targets for future crop breeding and biotechnology.  相似文献   

15.
Association studies offer a potentially powerful approach to identify genetic variants that influence susceptibility to common disease, but are plagued by the impression that they are not consistently reproducible. In principle, the inconsistency may be due to false positive studies, false negative studies or true variability in association among different populations. The critical question is whether false positives overwhelmingly explain the inconsistency. We analyzed 301 published studies covering 25 different reported associations. There was a large excess of studies replicating the first positive reports, inconsistent with the hypothesis of no true positive associations (P < 10(-14)). This excess of replications could not be reasonably explained by publication bias and was concentrated among 11 of the 25 associations. For 8 of these 11 associations, pooled analysis of follow-up studies yielded statistically significant replication of the first report, with modest estimated genetic effects. Thus, a sizable fraction (but under half) of reported associations have strong evidence of replication; for these, false negative, underpowered studies probably contribute to inconsistent replication. We conclude that there are probably many common variants in the human genome with modest but real effects on common disease risk, and that studies using large samples will convincingly identify such variants.  相似文献   

16.
Genome-wide association studies (GWAS) have proven to be a powerful method to identify common genetic variants contributing to susceptibility to common diseases. Here, we show that extremely low-coverage sequencing (0.1-0.5×) captures almost as much of the common (>5%) and low-frequency (1-5%) variation across the genome as SNP arrays. As an empirical demonstration, we show that genome-wide SNP genotypes can be inferred at a mean r(2) of 0.71 using off-target data (0.24× average coverage) in a whole-exome study of 909 samples. Using both simulated and real exome-sequencing data sets, we show that association statistics obtained using extremely low-coverage sequencing data attain similar P values at known associated variants as data from genotyping arrays, without an excess of false positives. Within the context of reductions in sample preparation and sequencing costs, funds invested in extremely low-coverage sequencing can yield several times the effective sample size of GWAS based on SNP array data and a commensurate increase in statistical power.  相似文献   

17.
Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We provide an efficient software implementation, Cortex, the first de novo assembler capable of assembling multiple eukaryotic genomes simultaneously. Four applications of Cortex are presented. First, we detect and validate both simple and complex structural variations in a high-coverage human genome. Second, we identify more than 3 Mb of sequence absent from the human reference genome, in pooled low-coverage population sequence data from the 1000 Genomes Project. Third, we show how population information from ten chimpanzees enables accurate variant calls without a reference sequence. Last, we estimate classical human leukocyte antigen (HLA) genotypes at HLA-B, the most variable gene in the human genome.  相似文献   

18.
We used resequencing and genotyping in African Americans with sickle cell anemia (SCA) to characterize associations with fetal hemoglobin (HbF) levels at the BCL11A, HBS1L-MYB and β-globin loci. Fine-mapping of HbF association signals at these loci confirmed seven SNPs with independent effects and increased the explained heritable variation in HbF levels from 38.6% to 49.5%. We also identified rare missense variants that causally implicate MYB in HbF production.  相似文献   

19.
Ankylosing spondylitis is a common form of inflammatory arthritis predominantly affecting the spine and pelvis that occurs in approximately 5 out of 1,000 adults of European descent. Here we report the identification of three variants in the RUNX3, LTBR-TNFRSF1A and IL12B regions convincingly associated with ankylosing spondylitis (P < 5 × 10(-8) in the combined discovery and replication datasets) and a further four loci at PTGER4, TBKBP1, ANTXR2 and CARD9 that show strong association across all our datasets (P < 5 × 10(-6) overall, with support in each of the three datasets studied). We also show that polymorphisms of ERAP1, which encodes an endoplasmic reticulum aminopeptidase involved in peptide trimming before HLA class I presentation, only affect ankylosing spondylitis risk in HLA-B27-positive individuals. These findings provide strong evidence that HLA-B27 operates in ankylosing spondylitis through a mechanism involving aberrant processing of antigenic peptides.  相似文献   

20.
Genetic mapping with SNP markers in Drosophila.   总被引:10,自引:0,他引:10  
Map-based positional cloning of Drosophila melanogaster genes is hampered by both the time-consuming, error-prone nature of traditional methods for genetic mapping and the difficulties in aligning the genetic and cytological maps with the genome sequence. The identification of sequence polymorphisms in the Drosophila genome will make it possible to map mutations directly to the genome sequence with high accuracy and resolution. Here we report the identification of 7,223 single-nucleotide polymorphisms (SNPs) and 1,392 insertions/deletions (InDels) in common laboratory strains of Drosophila. These sequence polymorphisms define a map of 787 autosomal marker loci with a resolution of 114 kb. We have established PCR product-length polymorphism (PLP) or restriction fragment-length polymorphism (RFLP) assays for 215 of these markers. We demonstrate the use of this map by delimiting two mutations to intervals of 169 kb and 307 kb, respectively. Using a local high-density SNP map, we also mapped a third mutation to a resolution of approximately 2 kb, sufficient to localize the mutation within a single gene. These methods should accelerate the rate of positional cloning in Drosophila.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号