首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 421 毫秒
1.
Detecting genetic variants that are highly divergent from a reference sequence remains a major challenge in genome sequencing. We introduce de novo assembly algorithms using colored de Bruijn graphs for detecting and genotyping simple and complex genetic variants in an individual or population. We provide an efficient software implementation, Cortex, the first de novo assembler capable of assembling multiple eukaryotic genomes simultaneously. Four applications of Cortex are presented. First, we detect and validate both simple and complex structural variations in a high-coverage human genome. Second, we identify more than 3 Mb of sequence absent from the human reference genome, in pooled low-coverage population sequence data from the 1000 Genomes Project. Third, we show how population information from ten chimpanzees enables accurate variant calls without a reference sequence. Last, we estimate classical human leukocyte antigen (HLA) genotypes at HLA-B, the most variable gene in the human genome.  相似文献   

2.
Familial Mediterranean fever (FMF; MIM 249100) is an autosomal recessive disease characterized by recurrent attacks of fever with synovial, pleural or peritoneal inflammation. The disease is caused by mutations in the gene encoding the pyrin protein. Human population studies have revealed extremely high allele frequencies for several different pyrin mutations, leading to the conclusion that the mutant alleles confer a selective advantage. Here we examine the ret finger protein (rfp) domain (which contains most of the disease-causing mutations) of pyrin during primate evolution. Amino acids that cause human disease are often present as wild type in other species. This is true at positions 653 (a novel mutation), 680, 681, 726, 744 and 761. For several of these human mutations, the mutant represents the reappearance of an ancestral amino acid state. Examination of lineage-specific dN/dS ratios revealed a pattern consistent with the signature of episodic positive selection. Our data, together with previous human population studies, indicate that selective pressures may have caused functional evolution of pyrin in humans and other primates.  相似文献   

3.
Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.  相似文献   

4.
Here we report the application of high-density oligonucleotide array (DNA chip)-based analysis to determine the distant history of single nucleotide polymorphisms (SNPs) in current human populations. We analysed orthologues for 397 human SNP sites (identified in CEPH pedigrees from Amish, Venezuelan and Utah populations) from 23 common chimpanzee, 19 pygmy chimpanzee and 11 gorilla genomic DNA samples. From this data we determined 214 proposed ancestral alleles (the sequence found in the last common ancestor of humans and chimpanzees). In a diverse human population set, we found that SNP alleles with higher frequencies were more likely to be ancestral than less frequently occurring alleles. There were, however, exceptions. We also found three shared human/pygmy chimpanzee polymorphisms, all involving CpG dinucleotides, and two shared human/gorilla polymorphisms, one involving a CpG dinucleotide. We demonstrate that microarray-based assays allow rapid comparative sequence analysis of intra- and interspecies genetic variation.  相似文献   

5.
Whole-genome sequences provide a rich source of information about human evolution. Here we describe an effort to estimate key evolutionary parameters based on the whole-genome sequences of six individuals from diverse human populations. We used a Bayesian, coalescent-based approach to obtain information about ancestral population sizes, divergence times and migration rates from inferred genealogies at many neutrally evolving loci across the genome. We introduce new methods for accommodating gene flow between populations and integrating over possible phasings of diploid genotypes. We also describe a custom pipeline for genotype inference to mitigate biases from heterogeneous sequencing technologies and coverage levels. Our analysis indicates that the San population of southern Africa diverged from other human populations approximately 108-157 thousand years ago, that Eurasians diverged from an ancestral African population 38-64 thousand years ago, and that the effective population size of the ancestors of all modern humans was ~9,000.  相似文献   

6.
High-resolution haplotype structure in the human genome   总被引:41,自引:0,他引:41  
Linkage disequilibrium (LD) analysis is traditionally based on individual genetic markers and often yields an erratic, non-monotonic picture, because the power to detect allelic associations depends on specific properties of each marker, such as frequency and population history. Ideally, LD analysis should be based directly on the underlying haplotype structure of the human genome, but this structure has remained poorly understood. Here we report a high-resolution analysis of the haplotype structure across 500 kilobases on chromosome 5q31 using 103 single-nucleotide polymorphisms (SNPs) in a European-derived population. The results show a picture of discrete haplotype blocks (of tens to hundreds of kilobases), each with limited diversity punctuated by apparent sites of recombination. In addition, we develop an analytical model for LD mapping based on such haplotype blocks. If our observed structure is general (and published data suggest that it may be), it offers a coherent framework for creating a haplotype map of the human genome.  相似文献   

7.
The considerable range of observed phenotypic variation in human populations may reflect, in part, distinctive processes of natural selection and adaptation to variable environmental conditions. Although recent genome-wide studies have identified candidate regions under selection, it is not yet clear how natural selection has shaped population differentiation. Here, we have analyzed the degree of population differentiation at 2.8 million Phase II HapMap single-nucleotide polymorphisms. We find that negative selection has globally reduced population differentiation at amino acid-altering mutations, particularly in disease-related genes. Conversely, positive selection has ensured the regional adaptation of human populations by increasing population differentiation in gene regions, primarily at nonsynonymous and 5'-UTR variants. Our analyses identify a fraction of loci that have contributed, and probably still contribute, to the morphological and disease-related phenotypic diversity of current human populations.  相似文献   

8.
Variation in the human genome sequence is key to understanding susceptibility to disease in modern populations and the history of ancestral populations. Unlocking this information requires knowledge of the patterns and underlying causes of human sequence diversity. By applying a new population-genetic framework to two genome-wide polymorphism surveys, we find that the human genome contains sizeable regions (stretching over tens of thousands of base pairs) that have intrinsically high and low rates of sequence variation. We show that the primary determinant of these patterns is shared genealogical history. Only a fraction of the variation (at most 25%) is due to the local mutation rate. By measuring the average distance over which genealogical histories are typically preserved, these data provide the first genome-wide estimate of the average extent of correlation among variants (linkage disequilibrium). The results are best explained by extreme variability in the recombination rate at a fine scale, and provide the first empirical evidence that such recombination 'hot spots' are a general feature of the human genome and have a principal role in shaping genetic variation in the human population.  相似文献   

9.
The angiotensin converting enzyme (ACE) is a key component of the renin angiotensin system that contributes to the regulation of blood pressure (BP). Recent demonstration of linkage between the ACE locus and elevated BP in a rat model of hypertension has further emphasized ACE as a candidate gene in human hypertension. We report the localization of the ACE gene on the genetic map of chromosome 17, and identify an extremely polymorphic marker at the human growth hormone (hGH) locus which shows no recombination with ACE. We have found no evidence to support linkage between the ACE locus and hypertension, which suggests that mutations at the ACE locus do not commonly contribute to the pathogenesis of hypertension in our test population.  相似文献   

10.
Uric acid is the end product of purine metabolism in humans and great apes, which have lost hepatic uricase activity, leading to uniquely high serum uric acid concentrations (200-500 microM) compared with other mammals (3-120 microM). About 70% of daily urate disposal occurs via the kidneys, and in 5-25% of the human population, impaired renal excretion leads to hyperuricemia. About 10% of people with hyperuricemia develop gout, an inflammatory arthritis that results from deposition of monosodium urate crystals in the joint. We have identified genetic variants within a transporter gene, SLC2A9, that explain 1.7-5.3% of the variance in serum uric acid concentrations, following a genome-wide association scan in a Croatian population sample. SLC2A9 variants were also associated with low fractional excretion of uric acid and/or gout in UK, Croatian and German population samples. SLC2A9 is a known fructose transporter, and we now show that it has strong uric acid transport activity in Xenopus laevis oocytes.  相似文献   

11.
12.
Geographic patterns of genetic variation, including variation at drug metabolizing enzyme (DME) loci and drug targets, indicate that geographic structuring of inter-individual variation in drug response may occur frequently. This raises two questions: how to represent human population genetic structure in the evaluation of drug safety and efficacy, and how to relate this structure to drug response. We address these by (i) inferring the genetic structure present in a heterogeneous sample and (ii) comparing the distribution of DME variants across the inferred genetic clusters of individuals. We find that commonly used ethnic labels are both insufficient and inaccurate representations of the inferred genetic clusters, and that drug-metabolizing profiles, defined by the distribution of DME variants, differ significantly among the clusters. We note, however, that the complexity of human demographic history means that there is no obvious natural clustering scheme, nor an obvious appropriate degree of resolution. Our comparison of drug-metabolizing profiles across the inferred clusters establishes a framework for assessing the appropriate level of resolution in relating genetic structure to drug response.  相似文献   

13.
L Kruglyak 《Nature genetics》1999,22(2):139-144
Recently, attention has focused on the use of whole-genome linkage disequilibrium (LD) studies to map common disease genes. Such studies would employ a dense map of single nucleotide polymorphisms (SNPs) to detect association between a marker and disease. Construction of SNP maps is currently underway. An essential issue yet to be settled is the required marker density of such maps. Here, I use population simulations to estimate the extent of LD surrounding common gene variants in the general human population as well as in isolated populations. Two main conclusions emerge from these investigations. First, a useful level of LD is unlikely to extend beyond an average distance of roughly 3 kb in the general population, which implies that approximately 500,000 SNPs will be required for whole-genome studies. Second, the extent of LD is similar in isolated populations unless the founding bottleneck is very narrow or the frequency of the variant is low (<5%).  相似文献   

14.
Linkage disequilibrium (LD), or the non-random association of alleles, is poorly understood in the human genome. Population genetic theory suggests that LD is determined by the age of the markers, population history, recombination rate, selection and genetic drift. Despite the uncertainties in determining the relative contributions of these factors, some groups have argued that LD is a simple function of distance between markers. Disease-gene mapping studies and a simulation study gave differing predictions on the degree of LD in isolated and general populations. In view of the discrepancies between theory and experimental observations, we constructed a high-density SNP map of the Xq25-Xq28 region and analysed the male genotypes and haplotypes across this region for LD in three populations. The populations included an outbred European sample (CEPH males) and isolated population samples from Finland and Sardinia. We found two extended regions of strong LD bracketed by regions with no evidence for LD in all three samples. Haplotype analysis showed a paucity of haplotypes in regions of strong LD. Our results suggest that, in this region of the X chromosome, LD is not a monotonic function of the distance between markers, but is more a property of the particular location in the human genome.  相似文献   

15.
Polygenic susceptibility to breast cancer and implications for prevention   总被引:24,自引:0,他引:24  
The knowledge of human genetic variation that will come from the human genome sequence makes feasible a polygenic approach to disease prevention, in which it will be possible to identify individuals as susceptible by their genotype profile and to prevent disease by targeting interventions to those at risk. There is doubt, however, regarding the magnitude of these genetic effects and thus the potential to apply them to either individuals or populations. We have therefore examined the potential for prediction of risk based on common genetic variation using data from a population-based series of individuals with breast cancer. The data are compatible with a log-normal distribution of genetic risk in the population that is sufficiently wide to provide useful discrimination of high- and low-risk groups. Assuming all of the susceptibility genes could be identified, the half of the population at highest risk would account for 88% of all affected individuals. By contrast, if currently identified risk factors for breast cancer were used to stratify the population, the half of the population at highest risk would account for only 62% of all cases. These results suggest that the construction and use of genetic-risk profiles may provide significant improvements in the efficacy of population-based programs of intervention for cancers and other diseases.  相似文献   

16.
Substantial efforts are focused on identifying single-nucleotide polymorphisms (SNPs) throughout the human genome, particularly in coding regions (cSNPs), for both linkage disequilibrium and association studies. Less attention, however, has been directed to the clarification of evolutionary processes that are responsible for the variability in nucleotide diversity among different regions of the genome. We report here the population sequence diversity of genomic segments within a 450-kb cluster of olfactory receptor (OR) genes on human chromosome 17. We found a dichotomy in the pattern of nucleotide diversity between OR pseudogenes and introns on the one hand and the closely interspersed intact genes on the other. We suggest that weak positive selection is responsible for the observed patterns of genetic variation. This is inferred from a lower ratio of polymorphism to divergence in genes compared with pseudogenes or introns, high non-synonymous substitution rates in OR genes, and a small but significant overall reduction in variability in the entire OR gene cluster compared with other genomic regions. The dichotomy among functionally different segments within a short genomic distance requires high recombination rates within this OR cluster. Our work demonstrates the impact of weak positive selection on human nucleotide diversity, and has implications for the evolution of the olfactory repertoire.  相似文献   

17.
Single-nucleotide polymorphisms in the public domain: how useful are they?   总被引:15,自引:0,他引:15  
There is a concerted effort by a number of public and private groups to identify a large set of human single-nucleotide polymorphisms (SNPs). As of March 2001, 2.84 million SNPs have been deposited in the public database, dbSNP, at the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/SNP/). The 2.84 million SNPs can be grouped into 1.65 million non-redundant SNPs. As part of the International SNP Map Working Group, we recently published a high-density SNP map of the human genome consisting of 1.42 million SNPs (ref. 3). In addition, numerous SNPs are maintained in proprietary databases. Our survey of more than 1,200 SNPs indicates that more than 80% of TSC and Washington University candidate SNPs are polymorphic and that approximately 50% of the candidate SNPs from these two sources are common SNPs (with minor allele frequency of > or =20%) in any given population.  相似文献   

18.
Linkage disequilibrium mapping in isolated populations provides a powerful tool for fine structure localization of disease genes. Here, Luria and Delbrück's classical methods for analysing bacterial cultures are adapted to the study of human isolated founder populations in order to estimate (i) the recombination fraction between a disease locus and a marker; (ii) the expected degree of allelic homogeneity in a population; and (iii) the mutation rate of marker loci. Using these methods, we report striking linkage disequilibrium for diastrophic dysplasia (DTD) in Finland indicating that the DTD gene should lie within 0.06 centimorgans (or about 60 kilobases) of the CSF1R gene. Predictions about allelic homogeneity in Finland and mutation rates in simple sequence repeats are confirmed by independent observations.  相似文献   

19.
The effects of human population structure on large genetic association studies   总被引:21,自引:0,他引:21  
Large-scale association studies hold substantial promise for unraveling the genetic basis of common human diseases. A well-known problem with such studies is the presence of undetected population structure, which can lead to both false positive results and failures to detect genuine associations. Here we examine approximately 15,000 genome-wide single-nucleotide polymorphisms typed in three population groups to assess the consequences of population structure on the coming generation of association studies. The consequences of population structure on association outcomes increase markedly with sample size. For the size of study needed to detect typical genetic effects in common diseases, even the modest levels of population structure within population groups cannot safely be ignored. We also examine one method for correcting for population structure (Genomic Control). Although it often performs well, it may not correct for structure if too few loci are used and may overcorrect in other settings, leading to substantial loss of power. The results of our analysis can guide the design of large-scale association studies.  相似文献   

20.
Population genomics of human gene expression   总被引:1,自引:0,他引:1  
Genetic variation influences gene expression, and this variation in gene expression can be efficiently mapped to specific genomic regions and variants. Here we have used gene expression profiling of Epstein-Barr virus-transformed lymphoblastoid cell lines of all 270 individuals genotyped in the HapMap Consortium to elucidate the detailed features of genetic variation underlying gene expression variation. We find that gene expression is heritable and that differentiation between populations is in agreement with earlier small-scale studies. A detailed association analysis of over 2.2 million common SNPs per population (5% frequency in HapMap) with gene expression identified at least 1,348 genes with association signals in cis and at least 180 in trans. Replication in at least one independent population was achieved for 37% of cis signals and 15% of trans signals, respectively. Our results strongly support an abundance of cis-regulatory variation in the human genome. Detection of trans effects is limited but suggests that regulatory variation may be the key primary effect contributing to phenotypic variation in humans. We also explore several methodologies that improve the current state of analysis of gene expression variation.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号