首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Whole-genome association studies are predicted to be especially powerful in isolated populations owing to increased linkage disequilibrium (LD) and decreased allelic diversity, but this possibility has not been empirically tested. We compared genome-wide data on 113,240 SNPs typed on 30 trios from the Pacific island of Kosrae to the same markers typed in the 270 samples from the International HapMap Project. The extent of LD is longer and haplotype diversity is lower in Kosrae than in the HapMap populations. More than 98% of Kosraen haplotypes are present in HapMap populations, indicating that HapMap will be useful for genetic studies on Kosrae. The long-range LD around common alleles and limited diversity result in improved efficiency in genetic studies in this population and augments the power to detect association of 'hidden SNPs'.  相似文献   

2.
More than 5 million single-nucleotide polymorphisms (SNPs) with minor-allele frequency greater than 10% are expected to exist in the human genome. Some of these SNPs may be associated with risk of developing common diseases. To assess the power of currently available SNPs to detect such associations, we resequenced 50 genes in two ethnic samples and measured patterns of linkage disequilibrium between the subset of SNPs reported in dbSNP and the complete set of common SNPs. Our results suggest that using all 2.7 million SNPs currently in the database would detect nearly 80% of all common SNPs in European populations but only 50% of those common in the African American population and that efficient selection of a minimal subset of SNPs for use in association studies requires measurement of allele frequency and linkage disequilibrium relationships for all SNPs in dbSNP.  相似文献   

3.
Genome-wide association studies involving hundreds of thousands of SNPs in thousands of cases and controls are now underway. The first of many analytical challenges in these studies involves the choice of SNPs to genotype. It is not practical to construct a different panel of tag SNPs for each study, so the first generation of genome-wide scans will use predefined, commercially available marker panels, which will in part dictate their success or failure. We compare different approaches in use today, and show that although many of them provide substantial coverage of common variation in non-African populations, the precise extent is strongly dependent on the frequencies of alleles of interest and on specific considerations of study design. Overall, despite substantial differences in genotyping technologies, marker selection strategies and number of markers assayed, the first-generation high-throughput platforms all offer similar levels of genome coverage.  相似文献   

4.
Efficiency and power in genetic association studies   总被引:30,自引:0,他引:30  
We investigated selection and analysis of tag SNPs for genome-wide association studies by specifically examining the relationship between investment in genotyping and statistical power. Do pairwise or multimarker methods maximize efficiency and power? To what extent is power compromised when tags are selected from an incomplete resource such as HapMap? We addressed these questions using genotype data from the HapMap ENCODE project, association studies simulated under a realistic disease model, and empirical correction for multiple hypothesis testing. We demonstrate a haplotype-based tagging method that uniformly outperforms single-marker tests and methods for prioritization that markedly increase tagging efficiency. Examining all observed haplotypes for association, rather than just those that are proxies for known SNPs, increases power to detect rare causal alleles, at the cost of reduced power to detect common causal alleles. Power is robust to the completeness of the reference panel from which tags are selected. These findings have implications for prioritizing tag SNPs and interpreting association studies.  相似文献   

5.
Although studies suggest that SNPs derived from HapMap provide promising coverage and power for association studies, the lack of alternative variation datasets limits independent analysis. Using near-complete variation data for 76 genes resequenced in HapMap samples, we find that coverage of common variation by commercial genotyping arrays is substantially lower compared to the HapMap-based estimates. We quantify the power offered by these arrays for a range of disease models.  相似文献   

6.
Genome-wide association studies (GWAS) have proven to be a powerful method to identify common genetic variants contributing to susceptibility to common diseases. Here, we show that extremely low-coverage sequencing (0.1-0.5×) captures almost as much of the common (>5%) and low-frequency (1-5%) variation across the genome as SNP arrays. As an empirical demonstration, we show that genome-wide SNP genotypes can be inferred at a mean r(2) of 0.71 using off-target data (0.24× average coverage) in a whole-exome study of 909 samples. Using both simulated and real exome-sequencing data sets, we show that association statistics obtained using extremely low-coverage sequencing data attain similar P values at known associated variants as data from genotyping arrays, without an excess of false positives. Within the context of reductions in sample preparation and sequencing costs, funds invested in extremely low-coverage sequencing can yield several times the effective sample size of GWAS based on SNP array data and a commensurate increase in statistical power.  相似文献   

7.
Liu P  Wang Y  Vikis H  Maciag A  Wang D  Lu Y  Liu Y  You M 《Nature genetics》2006,38(8):888-895
We performed a whole-genome association analysis of lung tumor susceptibility using dense SNP maps ( approximately 1 SNP per 20 kb) in inbred mice. We reproduced the pulmonary adenoma susceptibility 1 (Pas1) locus identified in previous linkage studies and further narrowed this quantitative trait locus (QTL) to a region of less than 0.5 Mb in which at least two genes, Kras2 (Kirsten rat sarcoma oncogene 2) and Casc1 (cancer susceptibility candidate 1; also known as Las1), are strong candidates. Casc1 knockout mouse tumor bioassays showed that Casc1-deficient mice were susceptible to chemical induction of lung tumors. We also found three more genetic loci for lung adenoma development. Analysis of one of these candidate loci identified a previously uncharacterized gene Lasc1, bearing a nonsynonymous substitution (D102E). We found that the Lasc1 Glu102 allele preferentially promotes lung tumor cell growth. Our findings demonstrate the prospects for using dense SNP maps in laboratory mice to refine previous QTL regions and identify genetic determinants of complex traits.  相似文献   

8.
9.
The fine-scale distribution of meiotic recombination events in the human genome can be inferred from patterns of haplotype diversity in human populations but directly studied only by high-resolution sperm typing. Both approaches indicate that crossovers are heavily clustered into narrow recombination hot spots. But our direct understanding of hot-spot properties and distributions is largely limited to sperm typing in the major histocompatibility complex (MHC). We now describe the analysis of an unremarkable 206-kb region on human chromosome 1, which identified localized regions of linkage disequilibrium breakdown that mark the locations of sperm crossover hot spots. The distribution, intensity and morphology of these hot spots are markedly similar to those in the MHC. But we also accidentally detected additional hot spots in regions of strong association. Coalescent analysis of genotype data detected most of the hot spots but showed significant differences between sperm crossover frequencies and historical recombination rates. This raises the possibility that some hot spots, particularly those in regions of strong association, may have evolved very recently and not left their full imprint on haplotype diversity. These results suggest that hot spots could be very abundant and possibly fluid features of the human genome.  相似文献   

10.
11.
The 1000 Genomes Project and disease-specific sequencing efforts are producing large collections of haplotypes that can be used as reference panels for genotype imputation in genome-wide association studies (GWAS). However, imputing from large reference panels with existing methods imposes a high computational burden. We introduce a strategy called 'pre-phasing' that maintains the accuracy of leading methods while reducing computational costs. We first statistically estimate the haplotypes for each individual within the GWAS sample (pre-phasing) and then impute missing genotypes into these estimated haplotypes. This reduces the computational cost because (i) the GWAS samples must be phased only once, whereas standard methods would implicitly repeat phasing with each reference panel update, and (ii) it is much faster to match a phased GWAS haplotype to one reference haplotype than to match two unphased GWAS genotypes to a pair of reference haplotypes. We implemented our approach in the MaCH and IMPUTE2 frameworks, and we tested it on data sets from the Wellcome Trust Case Control Consortium 2 (WTCCC2), the Genetic Association Information Network (GAIN), the Women's Health Initiative (WHI) and the 1000 Genomes Project. This strategy will be particularly valuable for repeated imputation as reference panels evolve.  相似文献   

12.
Genome-wide association studies of 14 agronomic traits in rice landraces   总被引:20,自引:0,他引:20  
Huang X  Wei X  Sang T  Zhao Q  Feng Q  Zhao Y  Li C  Zhu C  Lu T  Zhang Z  Li M  Fan D  Guo Y  Wang A  Wang L  Deng L  Li W  Lu Y  Weng Q  Liu K  Huang T  Zhou T  Jing Y  Li W  Lin Z  Buckler ES  Qian Q  Zhang QF  Li J  Han B 《Nature genetics》2010,42(11):961-967
Uncovering the genetic basis of agronomic traits in crop landraces that have adapted to various agro-climatic conditions is important to world food security. Here we have identified ~ 3.6 million SNPs by sequencing 517 rice landraces and constructed a high-density haplotype map of the rice genome using a novel data-imputation method. We performed genome-wide association studies (GWAS) for 14 agronomic traits in the population of Oryza sativa indica subspecies. The loci identified through GWAS explained ~ 36% of the phenotypic variance, on average. The peak signals at six loci were tied closely to previously identified genes. This study provides a fundamental resource for rice genetics research and breeding, and demonstrates that an approach integrating second-generation genome sequencing and GWAS can be used as a powerful complementary strategy to classical biparental cross-mapping for dissecting complex traits in rice.  相似文献   

13.
Population stratification--allele frequency differences between cases and controls due to systematic ancestry differences-can cause spurious associations in disease studies. We describe a method that enables explicit detection and correction of population stratification on a genome-wide scale. Our method uses principal components analysis to explicitly model ancestry differences between cases and controls. The resulting correction is specific to a candidate marker's variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. Our simple, efficient approach can easily be applied to disease studies with hundreds of thousands of markers.  相似文献   

14.
A general question for linkage disequilibrium-based association studies is how power to detect an association is compromised when tag SNPs are chosen from data in one population sample and then deployed in another sample. Specifically, it is important to know how well tags picked from the HapMap DNA samples capture the variation in other samples. To address this, we collected dense data uniformly across the four HapMap population samples and eleven other population samples. We picked tag SNPs using genotype data we collected in the HapMap samples and then evaluated the effective coverage of these tags in comparison to the entire set of common variants observed in the other samples. We simulated case-control association studies in the non-HapMap samples under a disease model of modest risk, and we observed little loss in power. These results demonstrate that the HapMap DNA samples can be used to select tags for genome-wide association studies in many samples around the world.  相似文献   

15.
16.
It is increasingly apparent that the identification of true genetic associations in common multifactorial disease will require studies comprising thousands rather than the hundreds of individuals employed to date. Using 2,873 families, we were unable to confirm a recently published association of the interleukin 12B gene in 422 type I diabetic families. These results emphasize the need for large datasets, small P values and independent replication if results are to be reliable.  相似文献   

17.
The genome-wide distribution of linkage disequilibrium (LD) determines the strategy for selecting markers for association studies, but it varies between populations. We assayed LD in large samples (200 individuals) from each of 11 well-described population isolates and an outbred European-derived sample, using SNP markers spaced across chromosome 22. Most isolates show substantially higher levels of LD than the outbred sample and many fewer regions of very low LD (termed 'holes'). Young isolates known to have had relatively few founders show particularly extensive LD with very few holes; these populations offer substantial advantages for genome-wide association mapping.  相似文献   

18.
Population structure causes genome-wide linkage disequilibrium between unlinked loci, leading to statistical confounding in genome-wide association studies. Mixed models have been shown to handle the confounding effects of a diffuse background of large numbers of loci of small effect well, but they do not always account for loci of larger effect. Here we propose a multi-locus mixed model as a general method for mapping complex traits in structured populations. Simulations suggest that our method outperforms existing methods in terms of power as well as false discovery rate. We apply our method to human and Arabidopsis thaliana data, identifying new associations and evidence for allelic heterogeneity. We also show how a priori knowledge from an A. thaliana linkage mapping study can be integrated into our method using a Bayesian approach. Our implementation is computationally efficient, making the analysis of large data sets (n > 10,000) practicable.  相似文献   

19.
The Human Genome Project and its spin-offs are making it increasingly feasible to determine the genetic basis of complex traits using genome-wide association studies. The statistical challenge of analyzing such studies stems from the severe multiple-comparison problem resulting from the analysis of thousands of SNPs. Our methodology for genome-wide family-based association studies, using single SNPs or haplotypes, can identify associations that achieve genome-wide significance. In relation to developing guidelines for our screening tools, we determined lower bounds for the estimated power to detect the gene underlying the disease-susceptibility locus, which hold regardless of the linkage disequilibrium structure present in the data. We also assessed the power of our approach in the presence of multiple disease-susceptibility loci. Our screening tools accommodate genomic control and use the concept of haplotype-tagging SNPs. Our methods use the entire sample and do not require separate screening and validation samples to establish genome-wide significance, as population-based designs do.  相似文献   

20.
In an effort to pinpoint potential genetic risk factors for schizophrenia, research groups worldwide have published over 1,000 genetic association studies with largely inconsistent results. To facilitate the interpretation of these findings, we have created a regularly updated online database of all published genetic association studies for schizophrenia ('SzGene'). For all polymorphisms having genotype data available in at least four independent case-control samples, we systematically carried out random-effects meta-analyses using allelic contrasts. Across 118 meta-analyses, a total of 24 genetic variants in 16 different genes (APOE, COMT, DAO, DRD1, DRD2, DRD4, DTNBP1, GABRB2, GRIN2B, HP, IL1B, MTHFR, PLXNA2, SLC6A4, TP53 and TPH1) showed nominally significant effects with average summary odds ratios of approximately 1.23. Seven of these variants had not been previously meta-analyzed. According to recently proposed criteria for the assessment of cumulative evidence in genetic association studies, four of the significant results can be characterized as showing 'strong' epidemiological credibility. Our project represents the first comprehensive online resource for systematically synthesized and graded evidence of genetic association studies in schizophrenia. As such, it could serve as a model for field synopses of genetic associations in other common and genetically complex disorders.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号