首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于词袋绑定的问句新特征自动生成
引用本文:杨思春,高超,戴新宇,陈家骏,杨思国.基于词袋绑定的问句新特征自动生成[J].北京理工大学学报,2012,32(6):590-595.
作者姓名:杨思春  高超  戴新宇  陈家骏  杨思国
作者单位:南京大学计算机软件新技术国家重点实验室 江苏 南京210093;安徽工业大学计算机学院 安徽 马鞍山243002;滁州学院计算机系 安徽 滁州239000;南京大学计算机软件新技术国家重点实验室 江苏 南京210093;安徽省大型工程软件工程研究中心 安徽 合肥233000
基金项目:国家自然科学基金资助项目(61003112);计算机软件新技术国家重点实验室(南京大学)开放课题基金(KFKT2010B02);安徽省高校省级自然科学研究重点项目(KJ2011A048)
摘    要:针对中文问句分类缺乏丰富的句法语义特征,提出一种基于词袋绑定的问句新特征自动生成方法.在词袋(BOW)、词性(POS)和词义(WS)等基本特征的基础上,通过将词性、词义等与词袋分别进行绑定,自动获取一类新的问句特征即词袋绑定特征.采用SVM分类器在哈工大中文问句集上实验,结果表明与原来单个的POS、WS等基本特征相比,对应的W/POS、W/WS等词袋绑定特征在分类精度上均获得了显著的提升;而且对这些词袋绑定特征进行启发式组合以后,在77个小类问题类别的总体分类精度达到82.333%,取得了较好的分类效果.说明在基本特征基础上借助词袋绑定操作进一步构造问句新特征的方法简单而有效.

关 键 词:问答系统  问句分类  特征提取  词袋绑定
收稿时间:2011/7/28 0:00:00

Generation of New Type of Question Features Based on Bag-of-Words Binding
YANG Si-chun,GAO Chao,DAI Xin-yu,CHEN Jia-jun and YANG Si-guo.Generation of New Type of Question Features Based on Bag-of-Words Binding[J].Journal of Beijing Institute of Technology(Natural Science Edition),2012,32(6):590-595.
Authors:YANG Si-chun  GAO Chao  DAI Xin-yu  CHEN Jia-jun and YANG Si-guo
Institution:State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu 210093, China;School of Computer Science, Anhui University of Technology, Maanshan, Anhui 243002, China;Department of Computer Science, Chuzhou University, Chuzhou, Anhui 239000, China;State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu 210093, China;State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, Jiangsu 210093, China;Research Center for Large-Scale Engineering Software of Anhui Province, Hefei, Anhui 233000, China
Abstract:Aiming at difficulties from lack of rich syntax and semantic features for Chinese question classification, a method is proposed to automatically generate new types of features based on bag-of-words binding in this work. Considering the basic features of bag-of-words(BOW), part of speech(POS), word sense(WS) and others, new types of features could be generated by binding them with bag-of-words respectively, named as W/POS, W/WS, etc. Experiment has been implemented with SVM classifier and the Chinese question set provided by Harbin Institute of Technology. The results show that, compared with the basic features of POS, WS and others, the classification accuracies of bag-of-words binding features of W/POS, W/WS and others get significantly increase. Furthermore, the classification accuracy of the combined bag-of-words binding features for 77 question categories could be up to 82.333%, which indicates the effectiveness of the proposed method for question classification.
Keywords:question answering system  question classification  feature extraction  bag-of-words binding
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京理工大学学报》浏览原始摘要信息
点击此处可从《北京理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号