首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
The approach to annotating a genome critically affects the number and accuracy of genes identified in the genome sequence. Genome annotation based on stringent gene identification is prone to underestimate the complement of genes encoded in a genome. In contrast, over-prediction of putative genes followed by exhaustive computational sequence, motif and structural homology search will find rarely expressed, possibly unique, new genes at the risk of including non-functional genes. We developed a two-stage approach that combines the merits of stringent genome annotation with the benefits of over-prediction. First we identify plausible genes regardless of matches with EST, cDNA or protein sequences from the organism (stage 1). In the second stage, proteins predicted from the plausible genes are compared at the protein level with EST, cDNA and protein sequences, and protein structures from other organisms (stage 2). Remote but biologically meaningful protein sequence or structure homologies provide supporting evidence for genuine genes. The method, applied to the Drosophila melanogaster genome, validated 1,042 novel candidate genes after filtering 19,410 plausible genes, of which 12,124 matched the original 13,601 annotated genes. This annotation strategy is applicable to genomes of all organisms, including human.  相似文献   

3.
We report the analysis of a Japanese male using high-throughput sequencing to × 40 coverage. More than 99% of the sequence reads were mapped to the reference human genome. Using a Bayesian decision method, we identified 3,132,608 single nucleotide variations (SNVs). Comparison with six previously reported genomes revealed an excess of singleton nonsense and nonsynonymous SNVs, as well as singleton SNVs in conserved non-coding regions. We also identified 5,319 deletions smaller than 10 kb with high accuracy, in addition to copy number variations and rearrangements. De novo assembly of the unmapped sequence reads generated around 3 Mb of novel sequence, which showed high similarity to non-reference human genomes and the human herpesvirus 4 genome. Our analysis suggests that considerable variation remains undiscovered in the human genome and that whole-genome sequencing is an invaluable tool for obtaining a complete understanding of human genetic variation.  相似文献   

4.
To identify loci for age at menarche, we performed a meta-analysis of 32 genome-wide association studies in 87,802 women of European descent, with replication in up to 14,731 women. In addition to the known loci at LIN28B (P = 5.4 × 10???) and 9q31.2 (P = 2.2 × 10?33), we identified 30 new menarche loci (all P < 5 × 10??) and found suggestive evidence for a further 10 loci (P < 1.9 × 10??). The new loci included four previously associated with body mass index (in or near FTO, SEC16B, TRA2B and TMEM18), three in or near other genes implicated in energy homeostasis (BSX, CRTC1 and MCHR2) and three in or near genes implicated in hormonal regulation (INHBA, PCSK2 and RXRG). Ingenuity and gene-set enrichment pathway analyses identified coenzyme A and fatty acid biosynthesis as biological processes related to menarche timing.  相似文献   

5.
We have used large-scale insertional mutagenesis to identify functional landmarks relevant to cancer in the recently completed mouse genome sequence. We infected Cdkn2a(-/-) mice with Moloney murine leukemia virus (MoMuLV) to screen for loci that can participate in tumorigenesis in collaboration with loss of the Cdkn2a-encoded tumor suppressors p16INK4a and p19ARF. Insertional mutagenesis by the latent retrovirus was synergistic with loss of Cdkn2a expression, as indicated by a marked acceleration in the development of both myeloid and lymphoid tumors. We isolated 747 unique sequences flanking retroviral integration sites and mapped them against the mouse genome sequence databases from Celera and Ensembl. In addition to 17 insertions targeting gene loci known to be cancer-related, we identified a total of 37 new common insertion sites (CISs), of which 8 encode components of signaling pathways that are involved in cancer. The effectiveness of large-scale insertional mutagenesis in a sensitized genetic background is demonstrated by the preference for activation of MAP kinase signaling, collaborating with Cdkn2a loss in generating the lymphoid and myeloid tumors. Collectively, our results show that large-scale retroviral insertional mutagenesis in genetically predisposed mice is useful both as a system for identifying genes underlying cancer and as a genetic framework for the assignment of such genes to specific oncogenic pathways.  相似文献   

6.
Aging is associated with reductions in hippocampal volume that are accelerated by Alzheimer's disease and vascular risk factors. Our genome-wide association study (GWAS) of dementia-free persons (n = 9,232) identified 46 SNPs at four loci with P values of <4.0 × 10(-7). In two additional samples (n = 2,318), associations were replicated at 12q14 within MSRB3-WIF1 (discovery and replication; rs17178006; P = 5.3 × 10(-11)) and at 12q24 near HRK-FBXW8 (rs7294919; P = 2.9 × 10(-11)). Remaining associations included one SNP at 2q24 within DPP4 (rs6741949; P = 2.9 × 10(-7)) and nine SNPs at 9p33 within ASTN2 (rs7852872; P = 1.0 × 10(-7)); along with the chromosome 12 associations, these loci were also associated with hippocampal volume (P < 0.05) in a third younger, more heterogeneous sample (n = 7,794). The SNP in ASTN2 also showed suggestive association with decline in cognition in a largely independent sample (n = 1,563). These associations implicate genes related to apoptosis (HRK), development (WIF1), oxidative stress (MSR3B), ubiquitination (FBXW8) and neuronal migration (ASTN2), as well as enzymes targeted by new diabetes medications (DPP4), indicating new genetic influences on hippocampal size and possibly the risk of cognitive decline and dementia.  相似文献   

7.
Hu Z  Wu C  Shi Y  Guo H  Zhao X  Yin Z  Yang L  Dai J  Hu L  Tan W  Li Z  Deng Q  Wang J  Wu W  Jin G  Jiang Y  Yu D  Zhou G  Chen H  Guan P  Chen Y  Shu Y  Xu L  Liu X  Liu L  Xu P  Han B  Bai C  Zhao Y  Zhang H  Yan Y  Ma H  Chen J  Chu M  Lu F  Zhang Z  Chen F  Wang X  Jin L  Lu J  Zhou B  Lu D  Wu T  Lin D  Shen H 《Nature genetics》2011,43(8):792-796
Lung cancer is the leading cause of cancer-related deaths worldwide. To identify genetic factors that modify the risk of lung cancer in individuals of Chinese ancestry, we performed a genome-wide association scan in 5,408 subjects (2,331 individuals with lung cancer (cases) and 3,077 controls) followed by a two-stage validation among 12,722 subjects (6,313 cases and 6,409 controls). The combined analyses identified six well-replicated SNPs with independent effects and significant lung cancer associations (P < 5.0 × 10(-8)) located in TP63 (rs4488809 at 3q28, P = 7.2 × 10(-26)), TERT-CLPTM1L (rs465498 and rs2736100 at 5p15.33, P = 1.2 × 10(-20) and P = 1.0 × 10(-27), respectively), MIPEP-TNFRSF19 (rs753955 at 13q12.12, P = 1.5 × 10(-12)) and MTMR3-HORMAD2-LIF (rs17728461 and rs36600 at 22q12.2, P = 1.1 × 10(-11) and P = 6.2 × 10(-13), respectively). Two of these loci (13q12.12 and 22q12.2) were newly identified in the Chinese population. These results suggest that genetic variants in 3q28, 5p15.33, 13q12.12 and 22q12.2 may contribute to the susceptibility of lung cancer in Han Chinese.  相似文献   

8.
9.
Yu XQ  Li M  Zhang H  Low HQ  Wei X  Wang JQ  Sun LD  Sim KS  Li Y  Foo JN  Wang W  Li ZJ  Yin XY  Tang XQ  Fan L  Chen J  Li RS  Wan JX  Liu ZS  Lou TQ  Zhu L  Huang XJ  Zhang XJ  Liu ZH  Liu JJ 《Nature genetics》2012,44(2):178-182
We performed a two-stage genome-wide association study of IgA nephropathy (IgAN) in Han Chinese, with 1,434 affected individuals (cases) and 4,270 controls in the discovery phase and follow-up of the top 61 SNPs in an additional 2,703 cases and 3,464 controls. We identified associations at 17p13 (rs3803800, P = 9.40 × 10(-11), OR = 1.21; rs4227, P = 4.31 × 10(-10), OR = 1.23) and 8p23 (rs2738048, P = 3.18 × 10(-14), OR = 0.79) that implicated the genes encoding tumor necrosis factor (TNFSF13) and α-defensin (DEFA) as susceptibility genes. In addition, we found multiple associations in the major histocompatibility complex (MHC) region (rs660895, P = 4.13 × 10(-20), OR = 1.34; rs1794275, P = 3.43 × 10(-13), OR = 1.30; rs2523946, P = 1.74 × 10(-11), OR = 1.21) and confirmed a previously reported association at 22q12 (rs12537, P = 1.17 × 10(-11), OR = 0.78). We also found that rs660895 was associated with clinical subtypes of IgAN (P = 0.003), proteinuria (P = 0.025) and IgA levels (P = 0.047). Our findings show that IgAN is associated with variants near genes involved in innate immunity and inflammation.  相似文献   

10.
Substantial efforts are focused on identifying single-nucleotide polymorphisms (SNPs) throughout the human genome, particularly in coding regions (cSNPs), for both linkage disequilibrium and association studies. Less attention, however, has been directed to the clarification of evolutionary processes that are responsible for the variability in nucleotide diversity among different regions of the genome. We report here the population sequence diversity of genomic segments within a 450-kb cluster of olfactory receptor (OR) genes on human chromosome 17. We found a dichotomy in the pattern of nucleotide diversity between OR pseudogenes and introns on the one hand and the closely interspersed intact genes on the other. We suggest that weak positive selection is responsible for the observed patterns of genetic variation. This is inferred from a lower ratio of polymorphism to divergence in genes compared with pseudogenes or introns, high non-synonymous substitution rates in OR genes, and a small but significant overall reduction in variability in the entire OR gene cluster compared with other genomic regions. The dichotomy among functionally different segments within a short genomic distance requires high recombination rates within this OR cluster. Our work demonstrates the impact of weak positive selection on human nucleotide diversity, and has implications for the evolution of the olfactory repertoire.  相似文献   

11.
We extended our previous genome-wide association study for psoriasis with a multistage replication study including 8,312 individuals with psoriasis (cases) and 12,919 controls from China as well as 3,293 cases and 4,188 controls from Germany and the United States and 254 nuclear families from the United States. We identified six new susceptibility loci associated with psoriasis in the Chinese study containing the candidate genes ERAP1, PTTG1, CSMD1, GJB2, SERPINB8 and ZNF816A (combined P < 5 × 10??) and replicated one locus, 5q33.1 (TNIP1-ANXA6), previously reported (combined P = 3.8 × 10?21) in the European studies. Two of these loci showed evidence for association in the German study at ZNF816A and GJB2 with P = 3.6 × 10?3 and P = 7.9 × 10?3, respectively. ERAP1 and ZNF816A were associated with type 1 (early onset) psoriasis in the Chinese Han population (test for heterogeneity P = 6.5 × 10?3 and P = 1.5 × 10?3, respectively). Comparisons with the results of previous GWAS of psoriasis highlight the heterogeneity of disease susceptibility between the Chinese and European populations. Our study identifies new genetic susceptibility factors and suggests new biological pathways in psoriasis.  相似文献   

12.
Rheumatoid arthritis is a common autoimmune disease characterized by chronic inflammation. We report a meta-analysis of genome-wide association studies (GWAS) in a Japanese population including 4,074 individuals with rheumatoid arthritis (cases) and 16,891 controls, followed by a replication in 5,277 rheumatoid arthritis cases and 21,684 controls. Our study identified nine loci newly associated with rheumatoid arthritis at a threshold of P < 5.0 × 10(-8), including B3GNT2, ANXA3, CSF2, CD83, NFKBIE, ARID5B, PDE2A-ARAP1, PLD4 and PTPN2. ANXA3 was also associated with susceptibility to systemic lupus erythematosus (P = 0.0040), and B3GNT2 and ARID5B were associated with Graves' disease (P = 3.5 × 10(-4) and 2.9 × 10(-4), respectively). We conducted a multi-ancestry comparative analysis with a previous meta-analysis in individuals of European descent (5,539 rheumatoid arthritis cases and 20,169 controls). This provided evidence of shared genetic risks of rheumatoid arthritis between the populations.  相似文献   

13.
Accurate and complete analysis of genome variation in large populations will be required to understand the role of genome variation in complex disease. We present an analytical framework for characterizing genome deletion polymorphism in populations using sequence data that are distributed across hundreds or thousands of genomes. Our approach uses population-level concepts to reinterpret the technical features of sequence data that often reflect structural variation. In the 1000 Genomes Project pilot, this approach identified deletion polymorphism across 168 genomes (sequenced at 4 × average coverage) with sensitivity and specificity unmatched by other algorithms. We also describe a way to determine the allelic state or genotype of each deletion polymorphism in each genome; the 1000 Genomes Project used this approach to type 13,826 deletion polymorphisms (48-995,664 bp) at high accuracy in populations. These methods offer a way to relate genome structural polymorphism to complex disease in populations.  相似文献   

14.
Waist-hip ratio (WHR) is a measure of body fat distribution and a predictor of metabolic consequences independent of overall adiposity. WHR is heritable, but few genetic variants influencing this trait have been identified. We conducted a meta-analysis of 32 genome-wide association studies for WHR adjusted for body mass index (comprising up to 77,167 participants), following up 16 loci in an additional 29 studies (comprising up to 113,636 subjects). We identified 13 new loci in or near RSPO3, VEGFA, TBX15-WARS2, NFE2L3, GRB14, DNM3-PIGC, ITPR2-SSPN, LY86, HOXC13, ADAMTS9, ZNRF3-KREMEN1, NISCH-STAB1 and CPEB4 (P = 1.9 × 10?? to P = 1.8 × 10???) and the known signal at LYPLAL1. Seven of these loci exhibited marked sexual dimorphism, all with a stronger effect on WHR in women than men (P for sex difference = 1.9 × 10?3 to P = 1.2 × 10?13). These findings provide evidence for multiple loci that modulate body fat distribution independent of overall adiposity and reveal strong gene-by-sex interactions.  相似文献   

15.
The number of genes in the human genome is unknown, with estimates ranging from 50,000 to 90,000 (refs 1, 2), and to more than 140,000 according to unpublished sources. We have developed 'Exofish', a procedure based on homology searches, to identify human genes quickly and reliably. This method relies on the sequence of another vertebrate, the pufferfish Tetraodon nigroviridis, to detect conserved sequences with a very low background. Similar to Fugu rubripes, a marine pufferfish proposed by Brenner et al. as a model for genomic studies, T. nigroviridis is a more practical alternative with a genome also eight times more compact than that of human. Many comparisons have been made between F. rubripes and human DNA that demonstrate the potential of comparative genomics using the pufferfish genome. Application of Exofish to the December version of the working draft sequence of the human genome and to Unigene showed that the human genome contains 28,000-34,000 genes, and that Unigene contains less than 40% of the protein-coding fraction of the human genome.  相似文献   

16.
Zhang F  Liu H  Chen S  Low H  Sun L  Cui Y  Chu T  Li Y  Fu X  Yu Y  Yu G  Shi B  Tian H  Liu D  Yu X  Li J  Lu N  Bao F  Yuan C  Liu J  Liu H  Zhang L  Sun Y  Chen M  Yang Q  Yang H  Yang R  Zhang L  Wang Q  Liu H  Zuo F  Zhang H  Khor CC  Hibberd ML  Yang S  Liu J  Zhang X 《Nature genetics》2011,43(12):1247-1251
We performed a genome-wide association study with 706 individuals with leprosy and 5,581 control individuals and replicated the top 24 SNPs in three independent replication samples, including a total of 3,301 individuals with leprosy and 5,299 control individuals from China. Two loci not previously associated with the disease were identified with genome-wide significance: rs2275606 (combined P = 3.94 × 10(-14), OR = 1.30) on 6q24.3 and rs3762318 (combined P = 3.27 × 10(-11), OR = 0.69) on 1p31.3. These associations implicate IL23R and RAB32 as new susceptibility genes for leprosy. Furthermore, we identified evidence of interaction between the NOD2 and RIPK2 loci, which is consistent with the biological association of the proteins encoded by these genes (NOD2-RIPK2 complex) in activating the NF-κB pathway as a part of the host defense response to infection. Our findings have expanded the biological functions of IL23R by uncovering its involvement in infectious disease susceptibility and suggest a potential involvement of autophagocytosis in leprosy pathogenesis. The IL23R association supports previous observations of the marked overlap of susceptibility genes for leprosy and Crohn's disease, implying common pathogenesis mechanisms.  相似文献   

17.
Schizophrenia is a complex disorder caused by both genetic and environmental factors. Using 9,087 affected individuals, 12,171 controls and 915,354 imputed SNPs from the Schizophrenia Psychiatric Genome-Wide Association Study (GWAS) Consortium (PGC-SCZ), we estimate that 23% (s.e. = 1%) of variation in liability to schizophrenia is captured by SNPs. We show that a substantial proportion of this variation must be the result of common causal variants, that the variance explained by each chromosome is linearly related to its length (r = 0.89, P = 2.6 × 10(-8)), that the genetic basis of schizophrenia is the same in males and females, and that a disproportionate proportion of variation is attributable to a set of 2,725 genes expressed in the central nervous system (CNS; P = 7.6 × 10(-8)). These results are consistent with a polygenic genetic architecture and imply more individual SNP associations will be detected for this disease as sample size increases.  相似文献   

18.
Lung adenocarcinoma is the most common histological type of lung cancer, and its incidence is increasing worldwide. To identify genetic factors influencing risk of lung adenocarcinoma, we conducted a genome-wide association study and two validation studies in the Japanese population comprising a total of 6,029 individuals with lung adenocarcinoma (cases) and 13,535 controls. We confirmed two previously reported risk loci, 5p15.33 (rs2853677, P(combined) = 2.8 × 10(-40), odds ratio (OR) = 1.41) and 3q28 (rs10937405, P(combined) = 6.9 × 10(-17), OR = 1.25), and identified two new susceptibility loci, 17q24.3 (rs7216064, P(combined) = 7.4 × 10(-11), OR = 1.20) and 6p21.3 (rs3817963, P(combined) = 2.7 × 10(-10), OR = 1.18). These data provide further evidence supporting a role for genetic susceptibility in the development of lung adenocarcinoma.  相似文献   

19.
Atopic dermatitis is a chronic, relapsing form of inflammatory skin disorder that is affected by genetic and environmental factors. We performed a genome-wide association study of atopic dermatitis in a Chinese Han population using 1,012 affected individuals (cases) and 1,362 controls followed by a replication study in an additional 3,624 cases and 12,197 controls of Chinese Han ethnicity, as well as 1,806 cases and 3,256 controls from Germany. We identified previously undescribed susceptibility loci at 5q22.1 (TMEM232 and SLC25A46, rs7701890, P(combined) = 3.15 × 10(-9), odds ratio (OR) = 1.24) and 20q13.33 (TNFRSF6B and ZGPAT, rs6010620, P(combined) = 3.0 × 10(-8), OR = 1.17) and replicated another previously reported locus at 1q21.3 (FLG, rs3126085, P(combined) = 5.90 × 10(-12), OR = 0.82) in the Chinese sample. The 20q13.33 locus also showed evidence for association in the German sample (rs6010620, P = 2.87 × 10(-5), OR = 1.25). Our study identifies new genetic susceptibility factors and suggests previously unidentified biological pathways in atopic dermatitis.  相似文献   

20.
We report the 207-Mb genome sequence of the North American Arabidopsis lyrata strain MN47 based on 8.3× dideoxy sequence coverage. We predict 32,670 genes in this outcrossing species compared to the 27,025 genes in the selfing species Arabidopsis thaliana. The much smaller 125-Mb genome of A. thaliana, which diverged from A. lyrata 10 million years ago, likely constitutes the derived state for the family. We found evidence for DNA loss from large-scale rearrangements, but most of the difference in genome size can be attributed to hundreds of thousands of small deletions, mostly in noncoding DNA and transposons. Analysis of deletions and insertions still segregating in A. thaliana indicates that the process of DNA loss is ongoing, suggesting pervasive selection for a smaller genome. The high-quality reference genome sequence for A. lyrata will be an important resource for functional, evolutionary and ecological studies in the genus Arabidopsis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号