首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于支持向量机的中文文本中地名识别
引用本文:李丽双,黄德根,陈春荣,等.基于支持向量机的中文文本中地名识别[J].大连理工大学学报,2007,47(3):433-438.
作者姓名:李丽双  黄德根  陈春荣  
作者单位:大连理工大学,计算机科学与工程系,辽宁,大连,116024
摘    要:提出并实现了一种基于支持向量机(SVM)的中文文本中地名的自动识别方法.结合地名的特点,抽取单字本身、基于字的词性、是否在地名特征词表中及其上下文的信息作为向量的特性,并将其转化为二进制表示,在此基础上建立了训练集,并通过对多项式Kernel函数的测试,得到了用支持向量机进行地名识别的机器学习模型.实验表明,所建立的SVM地名识别模型是有效的,系统开式召回率和精确率分别达86.69% 和93.82%,F-值为90.12%.

关 键 词:支持向量机  中文文本  地名识别  机器学习
文章编号:1000-8608(2007)03-0433-06
修稿时间:2005-05-202006-03-17

Identification of location names from Chinese texts based on support vector machine
LI Li-shuang,HUANG De-gen,CHEN Chun-rong,et al.Identification of location names from Chinese texts based on support vector machine[J].Journal of Dalian University of Technology,2007,47(3):433-438.
Authors:LI Li-shuang  HUANG De-gen  CHEN Chun-rong  
Institution:Dept. of Comput. Sci. and Eng., Dalian Univ. of Technol., Dalian 116024, China
Abstract:Based on the characteristics of location names in Chinese texts,a method of automatic identification of Chinese location names using support vector machine(SVM) is proposed.The character itself,character-based part of speech(POS) tag,the information whether a character appears in a location name characteristic word table and context information are extracted as the features of the vectors.Each sample is represented by a long binary vector,and thus a training set is established.The machine learning models of automatic identification of location names are obtained by testing polynomial kernel functions.The results show that the models are efficient in identifying location names from Chinese texts.The recall,precision and F-measure are up to 86.69%,93.82% and 90.12% respectively in open test.
Keywords:support vector machine  Chinese texts  identification of location names  machine learning
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《大连理工大学学报》浏览原始摘要信息
点击此处可从《大连理工大学学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号