首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
在大规模文本语料库上预先训练的BERT等神经语言表示模型可以很好地从纯文本中捕获丰富的语义信息.但在进行中文命名实体识别任务时,由于中文命名实体存在结构复杂、形式多样、一词多义等问题,导致中文命名实体识别效果不佳.考虑到知识图谱可以提供丰富的结构化知识事实,从而更好地进行语言理解,提出了一种融合知识图谱信息的中文命名实体识别方法,通过知识图谱中的信息实体增强语言的外部知识表示能力.实验结果表明,与BILSTM-CRF、BERT等方法相比,所提出的方法有效提升了中文命名实体的识别效果,在MSRA与搜狐新闻网标注数据集上,F1值分别达到了95. 4%与93. 4%.  相似文献   

2.
针对化学资源文本中的命名实体,提出一种适合于化学资源文本的命名实体识别方法,旨在将化学物质、属性、参数、量值4种命名实体进行识别.该方法根据化学资源文本的语言规律及特点,建立BLSTM-CRF模型对命名实体进行初步识别,并使用基于词典与规则相结合的方法对识别结果进行校正.实验结果表明,该方法在化学资源文本中能够较好地完成命名实体识别任务,在测试语料上的F1值最高能达到94.26%.  相似文献   

3.
生物医学命名实体识别是生物医学文本挖掘的基本任务.机器学习方法是生物医学命名实体研究的主流方法,选取有效的机器学习算法和采取有效的识别策略是提高生物医学命名实体识别性能的关键,鉴于条件随机域算法在自然语言处理领域的优势,本文采用该算法并结合多种识别策略对生物医学命名实体识别进行研究.实验取得了良好的效果,F测度达到了70.52%,与其它相关系统比较,识别性能有了明显提高.  相似文献   

4.
在大规模文本语料库上预先训练的BERT(bidirectional encoder representation from transformers, BERT)等神经语言表示模型可以较好地从纯文本中捕获丰富的语义信息。但在进行中文命名实体识别任务时,由于命名实体存在结构复杂、形式多样、一词多义等问题,识别效果不佳。基于知识图谱可以提供丰富的结构化知识,从而更好地进行语言理解,提出了一种融合知识图谱信息的中文命名实体识别方法,通过知识图谱中的信息实体增强语言的外部知识表示能力。实验结果表明,与BERT、OpenAI GPT、ALBERT-BiLSTM-CRF等方法相比,所提出的方法有效提升了中文命名实体的识别效果,在MSRA(Microsoft Research Asia, MSRA)与搜狐新闻网标注数据集上,F_1值分别达到了95.4%与93.4%。  相似文献   

5.
相比规范新闻文本中命名实体识别(named entity recognition,NER),中文社交媒体中命名实体识别的性能偏低,这主要受限于文本的规范性和标注语料的规模。近年来中文社交媒体的命名实体识别研究主要针对标注语料规模小这一问题,倾向于使用外部知识或者借助联合训练来提升最终的识别性能,但对社交媒体文本不规范导致的对文本自身蕴含特征的挖掘不够这一问题的研究很少。该文着眼于文本自身,提出了一种结合双向长短时记忆和自注意力机制的命名实体识别方法。该方法通过在多个不同子空间捕获上下文相关信息来更好地理解和表示句子结构,充分挖掘文本自身蕴含的特征,并最终提升不规范文本的实体识别性能。在Weibo NER公开语料上进行了多组对比实验,实验结果验证了方法的有效性。结果表明:在不使用外部资源和联合训练的情况下,命名实体识别的F1值达到了58.76%。  相似文献   

6.
多特征中文命名实体识别   总被引:1,自引:0,他引:1  
命名实体识别任务是对文本中的实体进行定位,并将其分类至预定义的类别中.目前主流的中文命名实体识别的模型是基于字符的命名实体识别模型.该模型在使用句法特征之前,需先进行分词,不能很好的引入句子的句法信息.另外,基于字符的模型没有利用词典中的先验词典信息,以及中文偏旁部首蕴含的象形信息.针对上述问题,论文提出了融合句法和多粒度语义信息的多特征中文命名实体识别模型.实验证明论文模型相对目前主流模型有了较大的提高,同时论文还通过实验分析了各种特征对模型识别效果的影响.  相似文献   

7.
中文微博命名体识别   总被引:1,自引:0,他引:1  
近年来微博的快速发展为命名体识别提供了新的载体,同时微博的特点也为命名体识别研究带来了挑战.针对微博特点,本文提出了基于拼音相似距离以及文本相似距离聚类算法对微博文本进行规范化,消除了微博的语言表达不规范造成的干扰.同时,本文还提出了篇章级、句子级以及词汇级三级粒度的特征提取,使用条件随机场模型进行训练数据,并识别命名体,采用由微博文本相似聚类获得的实体关系类对命名体类型进行修正.由于缺少大量的微博训练数据,本文采用半监督学习框架训练模型.通过对新浪微博数据的实验结果表明,本方法能够有效地提高微博中命名体识别的效果.  相似文献   

8.
为了解决柬埔寨语词法标注语料稀缺、柬埔寨语命名实体缺乏明显标识特征的问题,提出一种引入英柬跨语言特征的柬埔寨语命名实体识别方法.首先,借助英语命名实体的成熟模型及英柬双语平行语料的词对齐关系,将源语言的实体类别映射到目标语言;然后根据柬埔寨语词向量构造最近邻图,采用标签传播算法,获得柬埔寨语单词的实体类别分布,完成跨语言知识转移;最后,将柬埔寨语单词的命名实体类别分布作为约束特征融入到条件随机场模型中.实验结果表明,融入跨语言特征的条件随机场模型能有效地提升柬埔寨语命名实体识别的效果.  相似文献   

9.
命名实体识别是自然语言处理的重要基础,同时也是信息抽取,机器翻译等应用的关键技术.近年来,网络媒体微博的迅速发展,为命名实体识别研究提供了全新的载体.针对中文微博文本短、表达不清、网络化严重等特点,对目前命名实体识别两种应用比较广泛的方法,基于最大熵模型的识别方法和基于条件随机场模型的识别,进行对比研究.在真实的微博数据上进行对比实验.通过实验结果的对比得出这两种方法在中文微博命名实体识别上的优缺点.  相似文献   

10.
中文命名实体识别在中文信息处理中扮演着重要的角色. 在中文信息文本中, 许多命名实体内部包含着嵌套实体. 然而, 已有研究大多聚焦在非嵌套实体识别, 无法充分捕获嵌套实体之间的边界信息. 采用分层标注方式进行嵌套命名实体识别(nested named entity recognition, NNER), 将每层的实体识别解析为一个单独的任务, 并通过Gate过滤机制来促进层级之间的信息交换. 利用公开的1998年《人民日报》NNER语料进行了多组实验, 验证了模型的有效性. 实验结果表明, 在不使用外部资源词典信息的情况下, 该方法在《人民日报》数据集上的F1值达到了91.41%, 有效提高了中文嵌套命名实体识别的效果.  相似文献   

11.
Language markedness is a common phenomenon in languages, and is reflected from hearing, vision and sense, i.e. the variation in the three aspects such as phonology, morphology and semantics. This paper focuses on the interpretation of markedness in language use following the three perspectives, i.e. pragmatic interpretation, psychological interpretation and cognitive interpretation, with an aim to define the function of markedness.  相似文献   

12.
The Williston Basin is a significant petroleum province, containing oil production zones that include the Middle Cambrian to Lower Ordovician, Upper Ordovician, Middle Devonian, Upper Devonian and Mississippian and within the Jurassic and Cretaceous. The oils of the Williston Basin exhibit a wide range of geochemical characteristics defined as "oil families", although the geochemical signature of the Cambrian Deadwood Formation and Lower Ordovician Winnipeg reservoired oils does not match any "oil family". Despite their close stratigraphic proximity, it is evident that the oils of the Lower Palaeozoic within the Williston Basin are distinct. This suggests the presence of a new "oil family" within the Williston Basin. Diagnostic geochemical signatures occur in the gasoline range chromatograms, within saturate fraction gas chromatograms and biomarker fingerprints. However, some of the established criteria and cross-plots that are currently used to segregate oils into distinct genetic families within the basin do not always meet with success, particularly when applied to the Lower Palaeozoic oils of the Deadwood and Winnipeg Formation.  相似文献   

13.
王慧 《科技信息》2008,(10):240-240
Wuthering Heights, Emily Bronte's only novel, was published in December of 1847 under the pseudonym Ellis Bell. The book did not gain immediate success, but it is now thought one of the finest novels in the English language. Catherine is the key character of this masterpiece, because everybody and everything center on her though she had a short life. We can understand this masterpiece better if we know Catherine well.  相似文献   

14.
The discovery of the prolific Ordovician Red River reservoirs in 1995 in southeastern Saskatchewan was the catalyst for extensive exploration activity which resulted in the discovery of more than 15 new Red River pools. The best yields of Red River production to date have been from dolomite reservoirs. Understanding the processes of dolomitization is, therefore, crucial for the prediction of the connectivity, spatial distribution and heterogeneity of dolomite reservoirs.The Red River reservoirs in the Midale area consist of 3~4 thin dolomitized zones, with a total thickness of about 20 m, which occur at the top of the Yeoman Formation. Two types of replacement dolomite were recognized in the Red River reservoir: dolomitized burrow infills and dolomitized host matrix. The spatial distribution of dolomite suggests that burrowing organisms played an important role in facilitating the fluid flow in the backfilled sediments. This resulted in penecontemporaneous dolomitization of burrow infills by normal seawater. The dolomite in the host matrix is interpreted as having occurred at shallow burial by evaporitic seawater during precipitation of Lake Almar anhydrite that immediately overlies the Yeoman Formation. However, the low δ18O values of dolomited burrow infills (-5.9‰~ -7.8‰, PDB) and matrix dolomites (-6.6‰~ -8.1‰, avg. -7.4‰ PDB) compared to the estimated values for the late Ordovician marine dolomite could be attributed to modification and alteration of dolomite at higher temperatures during deeper burial, which could also be responsible for its 87Sr/86Sr ratios (0.7084~0.7088) that are higher than suggested for the late Ordovician seawaters (0.7078~0.7080). The trace amounts of saddle dolomite cement in the Red River carbonates are probably related to "cannibalization" of earlier replacement dolomite during the chemical compaction.  相似文献   

15.
Location based services is promising due to its novel working style and contents.A software platform is proposed to provide application programs of typical location based services and support new applications developing efficiently. The analysis shows that this scheme is easy implemented, low cost and adapt to all kinds of mobile nework system.  相似文献   

16.
以AC-13级配为基础,将橡胶颗粒代替部分集料掺入混合料中,以低温弯曲试验为评价方法对不同橡胶颗粒掺量下沥青混合料的低温抗裂性进行研究,并引入应变能密度值对混合料的低温抗裂性进行综合评价.试验结果表明:橡胶颗粒沥青混合料试件的破坏微应变均超过2 300,满足冬寒区的技术指标;无论是否掺加橡胶颗粒,随着温度的下降,沥青混合料破坏时的最大弯拉强度增大,弯拉应变降低,劲度模量增大;弯曲应变能密度在胶粒掺量为1%左右时具有较大的弯曲应变能密度值,此时橡胶颗粒沥青混合料具有较好的低温抗裂性.  相似文献   

17.
AcomputergeneratorforrandomlylayeredstructuresYUJia shun1,2,HEZhen hua2(1.TheInstituteofGeologicalandNuclearSciences,NewZealand;2.StateKeyLaboratoryofOilandGasReservoirGeologyandExploitation,ChengduUniversityofTechnology,China)Abstract:Analgorithmisintrod…  相似文献   

18.
理论推导与室内实验相结合,建立了低渗透非均质砂岩油藏启动压力梯度确定方法。首先借助油藏流场与电场相似的原理,推导了非均质砂岩油藏启动压力梯度计算公式。其次基于稳定流实验方法,建立了非均质砂岩油藏启动压力梯度测试方法。结果表明:低渗透非均质砂岩油藏的启动压力梯度确定遵循两个等效原则。平面非均质油藏的启动压力梯度等于各级渗透率段的启动压力梯度关于长度的加权平均;纵向非均质油藏的启动压力梯度等于各渗透率层的启动压力梯度关于渗透率与渗流面积乘积的加权平均。研究成果可用于有效指导低渗透非均质砂岩油藏的合理井距确定,促进该类油藏的高效开发。  相似文献   

19.
Quality traits in wheat (Triticum aestirum L.) were studied by quantitative trait locus (QTL) analysis in a recombinant inbred line (RIL) population, a set of 131 lines derived from Chuan 35050 × Shannong 483 cross (ChSh). Grains from RILs were assayed for 21 quality traits related to protein and starch. A total of 35 putative QTLs for 19 traits with a single QTL explaining 7.99-40.52% of phenotypic variations were detected on 10 chromosomes, 1D, 2A, 2D, 3B, 3D, 5A, 6A, 6B, 6D, and 7B. The additive effects of 30 QTLs were positive, contributed by Chuan 35050, the remaining 5 QTLs were negative with the additive effect contributed by Shannong 483. For protein traits, 15 QTLs were obtained and most of them were located on chromosomes 1 D, 3B and 6D, while 20 QTLs for starch traits were detected and most of them were located on chromosomes 3D, 6B and 7B. Only 7 QTLs for protein and starch traits were co-located in three regions on chromosomes 1D, 2A and 2D. These protein and starch trait QTLs showed a distinct distribution pattern in certain regions and chromosomes. Twenty-two QTLs were clustered in 6 regions of 5 chromosomes. Two QTL clusters for protein traits were located on chromosomes 1D and 3B, respectively, three clusters for starch traits on chromosomes 3D, 6B and 7B, and one cluster including protein and starch traits on chromosome 1D.  相似文献   

20.
As an American modern novelist who were famous in the literary world, Hemingway was not a person who always followed the trend but a sharp observer. At the same time, he was a tragedy maestro, he paid great attention on existence, fate and end-result. The dramatis personae's tragedy of his works was an extreme limit by all means tragedy on the meaning of fearless challenge that failed. The beauty of tragedy was not produced on the destruction of life, but now this kind of value was in the impact activity. They performed for the reader about the tragedy on challenging for the limit and the death.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号