首页 | 本学科首页   官方微博 | 高级检索  
     检索      

问答社区问句中多字词表达提取
引用本文:吴瑞红,吕学强,李卓,舒燕.问答社区问句中多字词表达提取[J].吉林大学学报(理学版),2014,52(6):1230-1238.
作者姓名:吴瑞红  吕学强  李卓  舒燕
作者单位:1. 北京信息科技大学 网络文化与数字传播北京市重点实验室, 北京 100101;2. 北京拓尔思信息技术股份有限公司, 北京 100101
基金项目:国家自然科学基金,北京市教委科技发展计划重点项目暨北京市自然科学基金 B 类重点项目
摘    要:基于互动问答社区问句中多字词表达和问句理解的关系,提出针对互动问答社区问句进行多字词表达抽取,并基于互动问答社区问句中多字词表达的特点,提出适用于互动问答社区的多字词表达提取方法.该方法在利用互信息和停用词表的方法从问句中抽取候选多字词表达的基础上,将候选多字词表达分为正确串、残缺串、冗余串和错误串4类,借助搜索引擎对查询串的优化和候选多字词表达在互联网上的检索结果,设计候选多字词表达校正方法,实现对多字词表达的提取.以新浪爱问知识人问题库中的问句进行实验,结果表明,多字词表达抽取的准确率、召回率和F值分别达到84%,52%和0.64,验证了该方法的有效性.

关 键 词:多字词表达  问句理解  互信息  搜索引擎  
收稿时间:2013-09-09

Extraction of Multiword Expressions in Questions of Question Answering Communities
WU Ruihong,L Xueqiang,LI Zhuo,SHU Yan.Extraction of Multiword Expressions in Questions of Question Answering Communities[J].Journal of Jilin University: Sci Ed,2014,52(6):1230-1238.
Authors:WU Ruihong  L Xueqiang  LI Zhuo  SHU Yan
Institution:1. Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University, Beijing 100101, China;2. Beijing TRS Information Technology Co. Ltd., Beijing 100101, China
Abstract:The multiword expressions (MWEs) in the questions of question answering communities have direct relationship with question interpretation. We first proposed the idea of extracting MWEs from the questions of question answering communities. According to the characteristics of multiword expressions in the questions, we proposed a method of extracting MWEs in questions of question answering communities. In this method, we first used mutual information method and stop words filtering method to get the candidate MWEs. Then we classified the candidate MWEs into four types: right string, incomplete string, redundancy string and error string. At last, with the help of query optimization in search engines and the candidate MWEs retrieval results on the internet, we designed a revising method to get the MWEs. We took the questions in Sina iask question library as the experimental corpus. And the results show that the precision, recall and the F measure can reach 84%, 52%, 0.64 respectively, which proves the effectiveness of the proposed method.
Keywords:multiword expressions  question interpretation  mutual information  search engine
本文献已被 CNKI 等数据库收录!
点击此处可从《吉林大学学报(理学版)》浏览原始摘要信息
点击此处可从《吉林大学学报(理学版)》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号