首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Accurate and complete measurement of single nucleotide (SNP) and copy number (CNV) variants, both common and rare, will be required to understand the role of genetic variation in disease. We present Birdsuite, a four-stage analytical framework instantiated in software for deriving integrated and mutually consistent copy number and SNP genotypes. The method sequentially assigns copy number across regions of common copy number polymorphisms (CNPs), calls genotypes of SNPs, identifies rare CNVs via a hidden Markov model (HMM), and generates an integrated sequence and copy number genotype at every locus (for example, including genotypes such as A-null, AAB and BBB in addition to AA, AB and BB calls). Such genotypes more accurately depict the underlying sequence of each individual, reducing the rate of apparent mendelian inconsistencies. The Birdsuite software is applied here to data from the Affymetrix SNP 6.0 array. Additionally, we describe a method, implemented in PLINK, to utilize these combined SNP and CNV genotypes for association testing with a phenotype.  相似文献   

2.
SNP genotyping has emerged as a technology to incorporate copy number variants (CNVs) into genetic analyses of human traits. However, the extent to which SNP platforms accurately capture CNVs remains unclear. Using independent, sequence-based CNV maps, we find that commonly used SNP platforms have limited or no probe coverage for a large fraction of CNVs. Despite this, in 9 samples we inferred 368 CNVs using Illumina SNP genotyping data and experimentally validated over two-thirds of these. We also developed a method (SNP-Conditional Mixture Modeling, SCIMM) to robustly genotype deletions using as few as two SNP probes. We find that HapMap SNPs are strongly correlated with 82% of common deletions, but the newest SNP platforms effectively tag about 50%. We conclude that currently available genome-wide SNP assays can capture CNVs accurately, but improvements in array designs, particularly in duplicated sequences, are necessary to facilitate more comprehensive analyses of genomic variation.  相似文献   

3.
Single-nucleotide polymorphisms in the public domain: how useful are they?   总被引:15,自引:0,他引:15  
There is a concerted effort by a number of public and private groups to identify a large set of human single-nucleotide polymorphisms (SNPs). As of March 2001, 2.84 million SNPs have been deposited in the public database, dbSNP, at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/SNP/). The 2.84 million SNPs can be grouped into 1.65 million non-redundant SNPs. As part of the International SNP Map Working Group, we recently published a high-density SNP map of the human genome consisting of 1.42 million SNPs (ref. 3). In addition, numerous SNPs are maintained in proprietary databases. Our survey of more than 1,200 SNPs indicates that more than 80% of TSC and Washington University candidate SNPs are polymorphic and that approximately 50% of the candidate SNPs from these two sources are common SNPs (with minor allele frequency of > or =20%) in any given population.  相似文献   

4.
Egan CM  Sridhar S  Wigler M  Hall IM 《Nature genetics》2007,39(11):1384-1389
Different species, populations and individuals vary considerably in the copy number of discrete segments of their genomes. The manner and frequency with which these genetic differences arise over generational time is not well understood. Taking advantage of divergence among lineages sharing a recent common ancestry, we have conducted a genome-wide analysis of spontaneous copy number variation (CNV) in the laboratory mouse. We used high-resolution microarrays to identify 38 CNVs among 14 colonies of the C57BL/6 strain spanning approximately 967 generations of inbreeding, and we examined these loci in 12 additional strains. It is clear from our results that many CNVs arise through a highly nonrandom process: 18 of 38 were the product of recurrent mutation, and rates of change varied roughly four orders of magnitude across different loci. Recurrent CNVs are found throughout the genome, affect 43 genes and fluctuate in copy number over mere hundreds of generations, observations that raise questions about their contribution to natural variation.  相似文献   

5.
To understand the genetic heterogeneity underlying developmental delay, we compared copy number variants (CNVs) in 15,767 children with intellectual disability and various congenital defects (cases) to CNVs in 8,329 unaffected adult controls. We estimate that ~14.2% of disease in these children is caused by CNVs >400 kb. We observed a greater enrichment of CNVs in individuals with craniofacial anomalies and cardiovascular defects compared to those with epilepsy or autism. We identified 59 pathogenic CNVs, including 14 new or previously weakly supported candidates, refined the critical interval for several genomic disorders, such as the 17q21.31 microdeletion syndrome, and identified 940 candidate dosage-sensitive genes. We also developed methods to opportunistically discover small, disruptive CNVs within the large and growing diagnostic array datasets. This evolving CNV morbidity map, combined with exome and genome sequencing, will be critical for deciphering the genetic basis of developmental delay, intellectual disability and autism spectrum disorders.  相似文献   

6.
More than 5 million single-nucleotide polymorphisms (SNPs) with minor-allele frequency greater than 10% are expected to exist in the human genome. Some of these SNPs may be associated with risk of developing common diseases. To assess the power of currently available SNPs to detect such associations, we resequenced 50 genes in two ethnic samples and measured patterns of linkage disequilibrium between the subset of SNPs reported in dbSNP and the complete set of common SNPs. Our results suggest that using all 2.7 million SNPs currently in the database would detect nearly 80% of all common SNPs in European populations but only 50% of those common in the African American population and that efficient selection of a minimal subset of SNPs for use in association studies requires measurement of allele frequency and linkage disequilibrium relationships for all SNPs in dbSNP.  相似文献   

7.
Attention deficit hyperactivity disorder (ADHD) is a common, heritable neuropsychiatric disorder of unknown etiology. We performed a whole-genome copy number variation (CNV) study on 1,013 cases with ADHD and 4,105 healthy children of European ancestry using 550,000 SNPs. We evaluated statistically significant findings in multiple independent cohorts, with a total of 2,493 cases with ADHD and 9,222 controls of European ancestry, using matched platforms. CNVs affecting metabotropic glutamate receptor genes were enriched across all cohorts (P = 2.1 × 10(-9)). We saw GRM5 (encoding glutamate receptor, metabotropic 5) deletions in ten cases and one control (P = 1.36 × 10(-6)). We saw GRM7 deletions in six cases, and we saw GRM8 deletions in eight cases and no controls. GRM1 was duplicated in eight cases. We experimentally validated the observed variants using quantitative RT-PCR. A gene network analysis showed that genes interacting with the genes in the GRM family are enriched for CNVs in ~10% of the cases (P = 4.38 × 10(-10)) after correction for occurrence in the controls. We identified rare recurrent CNVs affecting glutamatergic neurotransmission genes that were overrepresented in multiple ADHD cohorts.  相似文献   

8.
The abundance and dynamics of copy number variants (CNVs) in mammalian genomes poses new challenges in the identification of their impact on natural and disease phenotypes. We used computational and experimental methods to catalog CNVs in rat and found that they share important functional characteristics with those in human. In addition, 113 one-to-one orthologous genes overlap CNVs in both human and rat, 80 of which are implicated in human disease. CNVs are nonrandomly distributed throughout the genome. Chromosome 18 is a cold spot for CNVs as well as evolutionary rearrangements and segmental duplications, suggesting stringent selective mechanisms underlying CNV genesis or maintenance. By exploiting gene expression data available for rat recombinant inbred lines, we established the functional relationship of CNVs underlying 22 expression quantitative trait loci. These characteristics make the rat an excellent model for studying phenotypic effects of structural variation in relation to human complex traits and disease.  相似文献   

9.
Targeted capture combined with massively parallel exome sequencing is a promising approach to identify genetic variants implicated in human traits. We report exome sequencing of 200 individuals from Denmark with targeted capture of 18,654 coding genes and sequence coverage of each individual exome at an average depth of 12-fold. On average, about 95% of the target regions were covered by at least one read. We identified 121,870 SNPs in the sample population, including 53,081 coding SNPs (cSNPs). Using a statistical method for SNP calling and an estimation of allelic frequencies based on our population data, we derived the allele frequency spectrum of cSNPs with a minor allele frequency greater than 0.02. We identified a 1.8-fold excess of deleterious, non-syonomyous cSNPs over synonymous cSNPs in the low-frequency range (minor allele frequencies between 2% and 5%). This excess was more pronounced for X-linked SNPs, suggesting that deleterious substitutions are primarily recessive.  相似文献   

10.
Numerous types of DNA variation exist, ranging from SNPs to larger structural alterations such as copy number variants (CNVs) and inversions. Alignment of DNA sequence from different sources has been used to identify SNPs and intermediate-sized variants (ISVs). However, only a small proportion of total heterogeneity is characterized, and little is known of the characteristics of most smaller-sized (<50 kb) variants. Here we show that genome assembly comparison is a robust approach for identification of all classes of genetic variation. Through comparison of two human assemblies (Celera's R27c compilation and the Build 35 reference sequence), we identified megabases of sequence (in the form of 13,534 putative non-SNP events) that were absent, inverted or polymorphic in one assembly. Database comparison and laboratory experimentation further demonstrated overlap or validation for 240 variable regions and confirmed >1.5 million SNPs. Some differences were simple insertions and deletions, but in regions containing CNVs, segmental duplication and repetitive DNA, they were more complex. Our results uncover substantial undescribed variation in humans, highlighting the need for comprehensive annotation strategies to fully interpret genome scanning and personalized sequencing projects.  相似文献   

11.
Characterizing genetic diversity within and between populations has broad applications in studies of human disease and evolution. We propose a new approach, spatial ancestry analysis, for the modeling of genotypes in two- or three-dimensional space. In spatial ancestry analysis (SPA), we explicitly model the spatial distribution of each SNP by assigning an allele frequency as a continuous function in geographic space. We show that the explicit modeling of the allele frequency allows individuals to be localized on the map on the basis of their genetic information alone. We apply our SPA method to a European and a worldwide population genetic variation data set and identify SNPs showing large gradients in allele frequency, and we suggest these as candidate regions under selection. These regions include SNPs in the well-characterized LCT region, as well as at loci including FOXP2, OCA2 and LRP1B.  相似文献   

12.
There is increasing evidence showing that the stromal cells surrounding cancer epithelial cells, rather than being passive bystanders, might have a role in modifying tumor outgrowth. The molecular basis of this aspect of carcinoma etiology is controversial. Some studies have reported a high frequency of genetic aberrations in carcinoma-associated fibroblasts (CAFs), whereas other studies have reported very low or zero mutation rates. Resolution of this contentious area is of critical importance in terms of understanding both the basic biology of cancer as well as the potential clinical implications of CAF somatic alterations. We undertook genome-wide copy number and loss of heterozygosity (LOH) analysis of CAFs derived from breast and ovarian carcinomas using a 500K SNP array platform. Our data show conclusively that LOH and copy number alterations are extremely rare in CAFs and cannot be the basis of the carcinoma-promoting phenotypes of breast and ovarian CAFs.  相似文献   

13.
A major goal in human genetics is to understand the role of common genetic variants in susceptibility to common diseases. This will require characterizing the nature of gene variation in human populations, assembling an extensive catalogue of single-nucleotide polymorphisms (SNPs) in candidate genes and performing association studies for particular diseases. At present, our knowledge of human gene variation remains rudimentary. Here we describe a systematic survey of SNPs in the coding regions of human genes. We identified SNPs in 106 genes relevant to cardiovascular disease, endocrinology and neuropsychiatry by screening an average of 114 independent alleles using 2 independent screening methods. To ensure high accuracy, all reported SNPs were confirmed by DNA sequencing. We identified 560 SNPs, including 392 coding-region SNPs (cSNPs) divided roughly equally between those causing synonymous and non-synonymous changes. We observed different rates of polymorphism among classes of sites within genes (non-coding, degenerate and non-degenerate) as well as between genes. The cSNPs most likely to influence disease, those that alter the amino acid sequence of the encoded protein, are found at a lower rate and with lower allele frequencies than silent substitutions. This likely reflects selection acting against deleterious alleles during human evolution. The lower allele frequency of missense cSNPs has implications for the compilation of a comprehensive catalogue, as well as for the subsequent application to disease association.  相似文献   

14.
15.
As end-stage renal disease (ESRD) has a four times higher incidence in African Americans compared to European Americans, we hypothesized that susceptibility alleles for ESRD have a higher frequency in the West African than the European gene pool. We carried out a genome-wide admixture scan in 1,372 ESRD cases and 806 controls and found a highly significant association between excess African ancestry and nondiabetic ESRD (lod score = 5.70) but not diabetic ESRD (lod = 0.47) on chromosome 22q12. Each copy of the European ancestral allele conferred a relative risk of 0.50 (95% CI = 0.39-0.63) compared to African ancestry. Multiple common SNPs (allele frequencies ranging from 0.2 to 0.6) in the gene encoding nonmuscle myosin heavy chain type II isoform A (MYH9) were associated with two to four times greater risk of nondiabetic ESRD and accounted for a large proportion of the excess risk of ESRD observed in African compared to European Americans.  相似文献   

16.
Human earwax consists of wet and dry types. Dry earwax is frequent in East Asians, whereas wet earwax is common in other populations. Here we show that a SNP, 538G --> A (rs17822931), in the ABCC11 gene is responsible for determination of earwax type. The AA genotype corresponds to dry earwax, and GA and GG to wet type. A 27-bp deletion in ABCC11 exon 29 was also found in a few individuals of Asian ancestry. A functional assay demonstrated that cells with allele A show a lower excretory activity for cGMP than those with allele G. The allele A frequency shows a north-south and east-west downward geographical gradient; worldwide, it is highest in Chinese and Koreans, and a common dry-type haplotype is retained among various ethnic populations. These suggest that the allele A arose in northeast Asia and thereafter spread through the world. The 538G --> A SNP is the first example of DNA polymorphism determining a visible genetic trait.  相似文献   

17.
Autism spectrum disorders (ASDs) are common, heritable neurodevelopmental conditions. The genetic architecture of ASDs is complex, requiring large samples to overcome heterogeneity. Here we broaden coverage and sample size relative to other studies of ASDs by using Affymetrix 10K SNP arrays and 1,181 [corrected] families with at least two affected individuals, performing the largest linkage scan to date while also analyzing copy number variation in these families. Linkage and copy number variation analyses implicate chromosome 11p12-p13 and neurexins, respectively, among other candidate loci. Neurexins team with previously implicated neuroligins for glutamatergic synaptogenesis, highlighting glutamate-related genes as promising candidates for contributing to ASDs.  相似文献   

18.
Quality and completeness of SNP databases   总被引:19,自引:0,他引:19  
To address the quality and completeness of single-nucleotide polymorphism (SNP) databases, we resequenced 173 kb (spanning 17 loci) in 150 chromosomes of west African and European ancestry. Over 88% of SNPs in the public (TSC and BAC overlap) and Celera databases were confirmed in independent resequencing. Approximately 45% of all human heterozygosity is attributable to SNPs already available from the two databases, and of SNPs with minor-allele frequencies >10%, more than half are represented.  相似文献   

19.
Copy number variation (CNV) is pervasive in the human genome and can play a causal role in genetic diseases. The functional impact of CNV cannot be fully captured through linkage disequilibrium with SNPs. These observations motivate the development of statistical methods for performing direct CNV association studies. We show through simulation that current tests for CNV association are prone to false-positive associations in the presence of differential errors between cases and controls, especially if quantitative CNV measurements are noisy. We present a statistical framework for performing case-control CNV association studies that applies likelihood ratio testing of quantitative CNV measurements in cases and controls. We show that our methods are robust to differential errors and noisy data and can achieve maximal theoretical power. We illustrate the power of these methods for testing for association with binary and quantitative traits, and have made this software available as the R package CNVtools.  相似文献   

20.
Using an Affymetrix 10K SNP array to screen for gene copy number changes in breast cancer, we detected a single-gene amplification of the ESR1 gene, which encodes estrogen receptor alpha, at 6q25. A subsequent tissue microarray analysis of more than 2,000 clinical breast cancer samples showed ESR1 amplification in 20.6% of breast cancers. Ninety-nine percent of tumors with ESR1 amplification showed estrogen receptor protein overexpression, compared with 66.6% cancers without ESR1 amplification (P < 0.0001). In 175 women who had received adjuvant tamoxifen monotherapy, survival was significantly longer for women with cancer with ESR1 amplification than for women with estrogen receptor-expressing cancers without ESR1 amplification (P = 0.023). Notably, we also found ESR1 amplification in benign and precancerous breast diseases, suggesting that ESR1 amplification may be a common mechanism in proliferative breast disease and a very early genetic alteration in a large subset of breast cancers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号