首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 156 毫秒
1.
《清华大学学报》2020,25(5):678-689
Many human diseases involve multiple genes in complex interactions.Large Genome-Wide Association Studies (GWASs) have been considered to hold promise for unraveling such interactions.However,statistic tests for high-order epistatic interactions (≥2 Single Nucleotide Polymorphisms (SNPs)) raise enormous computational and analytical challenges.It is well known that the block-wise structure exists in the human genome due to Linkage Disequilibrium (LD) between adjacent SNPs.In this paper,we propose a novel Bayesian method,named BAM,for simultaneously partitioning SNPs into LD-blocks and detecting genome-wide multi-locus epistatic interactions that are associated with multiple diseases.Experimental results on the simulated datasets demonstrate that BAM is powerful and efficient.We also applied BAM on two GWAS datasets from WTCCC,i.e.,Rheumatoid Arthritis and Type 1 Diabetes,and accurately recovered the LD-block structure.Therefore,we believe that BAM is suitable and efficient for the full-scale analysis of multi-disease-related interactions in GWASs.  相似文献   

2.
Chromosome segment substitution lines have been created in several experimental models,including many plant and animal species,and are useful tools for the genetic analysis and mapping of complex traits.The traditional t-test is usually applied to identify a quantitative trait locus (QTL) that is contained within a chromosome segment to estimate the QTL’s effect.However,current methods cannot uncover the entire genetic structure of complex traits.For example,current methods cannot distinguish between main effects and epistatic effects.In this paper,a linear epistatic model was constructed to dissect complex traits.First,all the long substituted segments were divided into overlapping small bins,and each small bin was considered a unique independent variable.The genetic model for complex traits was then constructed.When considering all the possible main effects and epistatic effects,the dimensions of the linear model can become extremely high.Therefore,variable selection via stepwise regression (Bin-REG) was proposed for the epistatic QTL analysis in the present study.Furthermore,we tested the feasibility of using the LASSO (least absolute shrinkage and selection operator) algorithm to estimate epistatic effects,examined the fully Bayesian SSVS (stochastic search variable selection) approach,tested the empirical Bayes (E-BAYES) method,and evaluated the penalized likelihood (PENAL) method for mapping epistatic QTLs.Simulation studies suggested that all of the above methods,excluding the LASSO and PENAL approaches,performed satisfactorily.The Bin-REG method appears to outperform all other methods in terms of estimating positions and effects.  相似文献   

3.
The risks of developing complex diseases are likely to be determined by single nucleotide polymorphisms (SNPs), which are the most common form of DNA variations. Rapidly developing genotyping technologies have made it possible to assess the influence of SNPs on a particular disease. The aim of this paper is to identify the risk/protective factors of a disease, which are modeled as a subset of SNPs (with specified alleles) with the maximum odds ratio. On the basis of risk/protective factor and the relationship between nucleotides and amino acids, two novel risk/protective factors (called k-relaxed risk/protective factors and weighted-relaxed risk/protective factors) are proposed to consider more complex disease-associated SNPs. However, the enormous amount of possible SNPs interactions presents a mathematical and computational challenge. In this paper, we use the Bayesian Optimization Algorithm (BOA) to search for the risk/protective factors of a particular disease. Determining the Bayesian network (BN) structure is NP-hard; therefore, the binary particle swarm optimization was used to determine the BN structure. The proposed algorithm was tested on four datasets. Experimental results showed that the algorithm proposed in this paper is a promising method for discovering SNPs interactions that cause/prevent diseases.  相似文献   

4.
Gene expression is a critical process in biological system that is influenced and modulated by many factors including genetic variation. Expression Quantitative Trait Loci(e QTL) analysis provides a powerful way to understand how genetic variants affect gene expression. For genome wide e QTL analysis, the number of genetic variants and that of genes are large and thus the search space is tremendous. Therefore, e QTL analysis brings about computational and statistical challenges. In this paper, we provide a comprehensive review of recent advances in methods for e QTL analysis in population-based studies. We first present traditional pairwise association methods, which are widely used in human genetics. To account for expression heterogeneity, we investigate the methods for correcting confounding factors. Next, we discuss newly developed statistical learning methods including Lasso-based models. In the conclusion, we provide an overview of future method development in analyzing e QTL associations. Although we focus on human genetics in this review, the methods are applicable to many other organisms.  相似文献   

5.
Adaptive methods have been rapidly developed and applied in many fields of scientific and engineering computing, Reliable and efficient a posteriori error estimates play key roles for both adaptive finite element and boundary element methods. The aim of this paper is to develop a posteriori error estimates for boundary element methods. The standard a posteriori error estimates for boundary element methods are obtained from the classical boundary integral equations. This paper presents hyper-singular a posteriori error estimates based on the hyper-singular integral equations, Three kinds of residuals are used as the estimates for boundary element errors. The theoretical analysis and numerical examples show that the hypersingular residuals are good a posteriori error indicators in many adaptive boundary element computations.  相似文献   

6.
Network motif is defined as a frequent and unique subgraph pattern in a network, and the search involves counting all the possible instances or listing all patterns, testing isomorphism known as NP-hard and large amounts of repeated processes for statistical evaluation. Although many efficient algorithms have been introduced, exhaustive search methods are still infeasible and feasible approximation methods are yet implausible.Additionally, the fast and continual growth of biological networks makes the problem more challenging. As a consequence, parallel algorithms have been developed and distributed computing has been tested in the cloud computing environment as well. In this paper, we survey current algorithms for network motif detection and existing software tools. Then, we show that some methods have been utilized for parallel network motif search algorithms with static or dynamic load balancing techniques. With the advent of cloud computing services, network motif search has been implemented with MapReduce in Hadoop Distributed File System(HDFS), and with Storm, but without statistical testing. In this paper, we survey network motif search algorithms in general, including existing parallel methods as well as cloud computing based search, and show the promising potentials for the cloud computing based motif search methods.  相似文献   

7.
8.
The Quantitative Genetic Analysis Station (QGAStation) is a software package that has been developed to perform statistical analysis for complex traits.It consists of five domains for handling data from diallel crosses,regional trials,core germplasm collections,QTL mapping,and microarray experiments.The first domain contains genetic models for diallel cross analysis,in which genetic variance components and genetic-by-environment interactions can be estimated,and genetic effects can be predicted.The second domain evaluates the performance of varieties in regional trials by implementing a general statistical method that outperforms ANOVA in tackling unbalanced data that arises frequently in trials across multiple locations and over a number of years.The third domain,using predicted genotypic values as proxy,constructs core germplasm collections covering sufficient genetic diversity with lower redundancy.The fourth domain manages genotypic and phenotypic data for QTL mapping.Linkage maps can be constructed and genetic distances can be estimated;the statistical methods that have been implemented apply to both chiasmatic and achiasmatic organisms.Another part of this domain can filter systematic noises in phenotypic data.The fifth domain focuses on the cDNA expression data that is generated by microarray experiments.A two-step strategy has been implemented to detect differentially expressed genes and to estimate their effects.Except in the fourth domain,the major statistical methods that have been used are mixed linear model approaches that have been implemented in the C language.Computational efficiency is further boosted for computers that are equipped with graphics processing units (GPUs).A user friendly graphic interface is provided for Microsoft Windows and Apple Mac operating systems.QGAStation is available at http://ibi.zju.edu.cn/software/qga/.  相似文献   

9.
Ligularia Cass., (Compositae) is a highly diversified genus, and more than 100 species of which are distributed in the eastern Qinghai-Tibet Plateau and adjacent areas. Ligularia species have been studied with respect to secondary metabolites, and many sesquiterpenes of the furanoeremophilane type have been isolated from them. In order to find correlates among these variations, and ultimately understand the diversity-generating mechanism of Ligularia species in the Hengduan Mountains, we initiated an extensive study that uses furanoeremophilanes as a chemical index and the DNA sequence as a genetic index. Furanoeremophilanes have been detected conventionally by Ehrlich's test, which has been used in a search for novel natural products. As for the DNA sequence, we determined the nucleotide sequence of the atpB-rbcL intergenic region in the present study.  相似文献   

10.
1Introduction Ligularia Cass., (Compositae) is a highly diversified genus, and more than 100 species of which are distributed in the eastern Qinghai-Tibet Plateau and adjacent areas. Ligularia species have been studied with respect to secondary metabolites, and many sesquiterpenes of the furanoeremophilane type have been isolated from them. In order to find correlates among these variations, and ultimately understand the diversity-generating mechanism of Ligularia species in the Hengduan Mountains, we initiated an extensive study that uses furanoeremophilanes as a chemical index and the DNA sequence as a genetic index. Furanoeremophilanes have been detected conventionally by Ehrlich' s test, which has been used in a search for novel natural products. As for the DNA sequence, we determined the nucleotide sequence of the atpB-rbcL intergenic region in the present study.  相似文献   

11.
多数全基因组关联性研究(GWAS)采用不同的分型芯片,导致遗传变异位点的数目及选择准则不同。基因型填补可以依据已有的基因分型数据,对未分型的位点进行填补。在应用IMPUTE2软件对基因型和表型数据库(db Ga P)中胃癌GWAS数据进行全基因组填补,以详细介绍全基因组填补的原理和过程。以第九号染色体为例,使用1000 Genome Project模板介绍全基因组填补的过程,包括填补前的质量控制、Pre-phasing、填补过程、填补的质量评估及填补后的关联性分析。第九号染色体在填补前有21 033个位点;而在填补后有1 630 406个SNP;其中INFO0.3的SNP位点有817 494个;而填补质量较高(INFO0.5)的位点数目有584 755个。IMPUTE2软件可以快速准确的对未分型的基因型进行填补,从而可以将多个GWAS数据整合到相同的位点数和密度上,再进行联合分析可以提高检验的把握度以便发现新的遗传易感性位点。  相似文献   

12.
Genome-wide association studies (GWAS) have identified many risk loci for complex diseases, but effect sizes are typically small and information on the underlying biological processes is often lacking. Associations with metabolic traits as functional intermediates can overcome these problems and potentially inform individualized therapy. Here we report a comprehensive analysis of genotype-dependent metabolic phenotypes using a GWAS with non-targeted metabolomics. We identified 37 genetic loci associated with blood metabolite concentrations, of which 25 show effect sizes that are unusually high for GWAS and account for 10-60% differences in metabolite levels per allele copy. Our associations provide new functional insights for many disease-related associations that have been reported in previous studies, including those for cardiovascular and kidney disorders, type 2 diabetes, cancer, gout, venous thromboembolism and Crohn's disease. The study advances our knowledge of the genetic basis of metabolic individuality in humans and generates many new hypotheses for biomedical and pharmaceutical research.  相似文献   

13.
A haplotype map of the human genome   总被引:2,自引:0,他引:2  
Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.  相似文献   

14.
Although there has been much success in identifying genetic variants associated with common diseases using genome-wide association studies (GWAS), it has been difficult to demonstrate which variants are causal and what role they have in disease. Moreover, the modest contribution that these variants make to disease risk has raised questions regarding their medical relevance. Here we have investigated a single nucleotide polymorphism (SNP) in the TNFRSF1A gene, that encodes tumour necrosis factor receptor 1 (TNFR1), which was discovered through GWAS to be associated with multiple sclerosis (MS), but not with other autoimmune conditions such as rheumatoid arthritis, psoriasis and Crohn’s disease. By analysing MS GWAS data in conjunction with the 1000 Genomes Project data we provide genetic evidence that strongly implicates this SNP, rs1800693, as the causal variant in the TNFRSF1A region. We further substantiate this through functional studies showing that the MS risk allele directs expression of a novel, soluble form of TNFR1 that can block TNF. Importantly, TNF-blocking drugs can promote onset or exacerbation of MS, but they have proven highly efficacious in the treatment of autoimmune diseases for which there is no association with rs1800693. This indicates that the clinical experience with these drugs parallels the disease association of rs1800693, and that the MS-associated TNFR1 variant mimics the effect of TNF-blocking drugs. Hence, our study demonstrates that clinical practice can be informed by comparing GWAS across common autoimmune diseases and by investigating the functional consequences of the disease-associated genetic variation.  相似文献   

15.
An SNP map of human chromosome 22   总被引:35,自引:0,他引:35  
The human genome sequence will provide a reference for measuring DNA sequence variation in human populations. Sequence variants are responsible for the genetic component of individuality, including complex characteristics such as disease susceptibility and drug response. Most sequence variants are single nucleotide polymorphisms (SNPs), where two alternate bases occur at one position. Comparison of any two genomes reveals around 1 SNP per kilobase. A sufficiently dense map of SNPs would allow the detection of sequence variants responsible for particular characteristics on the basis that they are associated with a specific SNP allele. Here we have evaluated large-scale sequencing approaches to obtaining SNPs, and have constructed a map of 2,730 SNPs on human chromosome 22. Most of the SNPs are within 25 kilobases of a transcribed exon, and are valuable for association studies. We have scaled up the process, detecting over 65,000 SNPs in the genome as part of The SNP Consortium programme, which is on target to build a map of 1 SNP every 5 kilobases that is integrated with the human genome sequence and that is freely available in the public domain.  相似文献   

16.
为了寻找与瘦体重(lean body mass,LBM)相关的单核苷酸多态性(single nucleotide polymorphism, SNP)位点及易感基因,在1 000个不相关的白人中采用Affymetix 500K芯片扫描了500 000个SNPs,并进行全基因组关联分析(genome-wide association study,GWAS),显著结果在1 625个中国人样本和2 283个欧洲白人样本中进行验证,并将验证结果与研究结果进行荟萃分析。研究发现SNPsrs7905603,rs9416083,rs4409772,rs2894310与LBM关联,其中rs7905603位于基因ANXA8,其他3个SNPs位于基因C10orf11。荟萃分析得到的合并p值分别为2.08×10-5,7.44×10~(-6),6.73×10~(-6),6.76×10~(-6)。ANXA8和C10orf11基因是影响LBM变异的候选基因,这对肌少症的认识提供了新的理论依据。  相似文献   

17.
Schizophrenia (SZ) is an inheritable complex mental disease. There have been several genome-wide association studies (GWASs) of SZ to identify novel genetic susceptibility factors. To further interpret SZ GWASs, pathway-based analysis (PBA), which considers the combined effect of variants and identifies pathways associated with traits, provides a feasible solution to discover the biological function and mechanism of SZ. Furthermore, to investigate the common pathways between SZ and bipolar disorder (BD) wil...  相似文献   

18.
Maize is one of the most important cereal crops in the world. The hybrid yield advantage is responsible for about 10 percent of the total global maize production of 550 Mt[1]. It is exigent to study the yield traits so as to improve the hybrids per se in …  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号