首页 | 本学科首页   官方微博 | 高级检索  
     

基于改进BP网络的中文歧义字段分词方法研究
引用本文:张利,张立勇,张晓淼,等. 基于改进BP网络的中文歧义字段分词方法研究[J]. 大连理工大学学报, 2007, 47(1): 131-135
作者姓名:张利  张立勇  张晓淼  
作者单位:大连理工大学,电子与信息工程学院,辽宁,大连,116024;大连理工大学,国有资产处,辽宁,大连,116024;大连理工大学,附属医院,辽宁,大连,116024
摘    要:文本挖掘中中文歧义字段的自动分词是计算机科学面临的一个难题.针对汉语书写时按句连写,词间无间隙,歧义字段分词困难的特点,对典型歧义中所蕴含的语法现象进行了归纳总结,建立了供词性编码使用的词性代码库.以此为基础,通过对具有特殊语法规则的歧义字段中的字、词进行代码设定,转化为神经网络能够接受的输入向量表示形式,然后对样本进行训练,通过改进BP神经网络的自学习来掌握这些语法规则.训练结果表明:算法在歧义字段分词上达到了93.13%的训练精度和92.50%的测试精度.

关 键 词:文本挖掘  歧义字段  自然语言处理  神经网络
文章编号:1000-8608(2007)01-0131-05
修稿时间:2005-12-172006-11-09

Research on ambiguous words segmentation algorithm based on improved BP neural network
ZHANG Li,ZHANG Li-yong,ZHANG Xiao-miao,et al. Research on ambiguous words segmentation algorithm based on improved BP neural network[J]. Journal of Dalian University of Technology, 2007, 47(1): 131-135
Authors:ZHANG Li  ZHANG Li-yong  ZHANG Xiao-miao  et al
Affiliation:1. School of Electr. and Inf. Eng., Dalian Univ. of Technol., Dalian 116024, Ohina; 2. Nat. Assets Adm. Office, Dalian Univ. of Technol., Dalian 116024, Ohina; 3. Hosp. of Dalian Univ. of Technol., Dalian 116024, China
Abstract:In the text mining, the technology of Chinese automatic word segmentation is a difficult problem that the computer science has to face. Aiming at the characteristics of Chinese writing, such as no space between words, continuous writing in sentences and difficulty of segmenting the ambiguous words, the grammatical phenomena are summarized which lie in the typical ambiguity, and the codes library of different parts of speech used for coding is built up. On this basis, words in ambiguity fields with special grammatical rules are set with codes and transformed to the representation form of inputting vector which can be accepted by the neural network. Then the samples are trained and the grammatical rules can be obtained by improving the self-learning of BP neural network. After a lot of training through adopting the BP network, the algorithm reaches 93. 13% of training precision and 92.50% of test precision on ambiguous words segmentation.
Keywords:text mining   ambiguous words   natural language processing   neural network
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《大连理工大学学报》浏览原始摘要信息
点击此处可从《大连理工大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号