古汉语句子切分与句读标记方法研究 Research on Sentence Segmentation and Punctuation in Ancient Chinese期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

古汉语句子切分与句读标记方法研究

引用本文：	王川,张小红,韩采华.古汉语句子切分与句读标记方法研究[J].河南大学学报(自然科学版),2009,39(5).

作者姓名：	王川张小红韩采华

作者单位：	1. 河南师范大学,计算机与信息技术学院,河南,新乡,453007 2. 河南财政税务高等专科学校,信息工程系,郑州,450002 3. 郑州大学,省信息网络重点学科开放实验室,郑州,450000;河南省广播电视大学,郑州,450000

摘要：	利用自然语言理解技术进行古汉语断句及句读标注的主要挑战是数据稀疏问题.为了解决这一难题,设计了一种六字位标记集,提出了一种基于层叠式条件随机场模型的古文断句与句读标记方法.基于六字位标集,低层模型用观察序列确定句子边界,高层模型同时使用观察序列和低层的句子边界信息进行句读标记.实验在5 M混合古文语料上分别进行了封闭测试和开放测试,封闭测试断句与句读标注的F值分别达到96.48%和91.35%,开放测试断句与句读标注的F值分别达到71.42%和67.67%.
关键词：	古汉语层叠条件随机场数据稀疏句子切分句读标注
Research on Sentence Segmentation and Punctuation in Ancient Chinese

WANG Chuan,ZHANG Xiao-hong,HAN Cai-hua.Research on Sentence Segmentation and Punctuation in Ancient Chinese[J].Journal of Henan University(Natural Science),2009,39(5).

Authors:	WANG Chuan ZHANG Xiao-hong HAN Cai-hua

Institution:	1.College of Computer and Information Technology;Henan Normal University;Henan Xinxiang 453007;China;2.Computer and Information Engineering;Henan Junior College of Finance & Taxation;Zhengzhou 450002;3.Provincial Key Lab on Information Network;Zhengzhou University;Zhengzhou 450000;4.Henan Radio & Television University;China

Abstract:	Data sparseness is a primary challenge in sentence segmentation and punctuation in ancient Chinese using natural language processing technology.In order to overcome this difficulty,a 6-tag set was designed and a method based on cascaded Conditional Random Fields was proposed.The main idea is as follows: based on the 6-tag set,a low level model determines the boundaries of sentences according to observation sequence and a high level model punctuates sentences taking consideration of both observation sequence...

Keywords:	ancient Chinese cascaded conditional random fields data sparseness sentence segmentation punctuation
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏