一个改进的汉语词性标注系统 An Impoved Part-of-Speech (POS) Tagging System期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一个改进的汉语词性标注系统

引用本文：	屈刚,陆汝占.一个改进的汉语词性标注系统[J].上海交通大学学报,2003,37(6):897-900.

作者姓名：	屈刚陆汝占

作者单位：	上海交通大学,计算机科学与工程系,上海,200030

摘要：	汉语词性标注的难点在于确定具有多个词类的词(兼类词)在上下文中的词性。基于兼类词在词典中仅占很小的比例(约为3％)，提出了具有双重状态的隐马尔可夫模型，它不但有一个常规的状态转移概率矩阵，还在逻辑上为每个具有多个词类的词保留一个专有的状态转移概率矩阵，使模型从一个状态转移到另一个状态的概率不再和观察无关，提高了模型的精确性。
关键词：	词性标注隐马尔可夫模型自然语言处理
文章编号：	1006-2467(2003)06-0897-04
修稿时间：	2002年6月2日
An Impoved Part-of-Speech (POS) Tagging System

QU Gang,LU Ru zhan.An Impoved Part-of-Speech (POS) Tagging System[J].Journal of Shanghai Jiaotong University,2003,37(6):897-900.

Authors:	QU Gang LU Ru zhan

Abstract:	The key problem of Part of Speech (POS) tagging is to identify the POS of the words that have multiple categories in the context. Since multiple categories words only take up a small portion in dictionary, this paper presented a bi states hidden Markov model, which not only has a regular state transfer probability matrix, but also maintains a state transfer matrix for each multiple category words. The state transfer matrix is no longer context free, which improves the accuracy of the model.

Keywords:	part of speech(POS) tagging hidden Markov model natural language processing(NLP)
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏