首页 | 本学科首页   官方微博 | 高级检索  
     检索      

古汉语句子切分与句读标记方法研究
引用本文:王川,张小红,韩采华.古汉语句子切分与句读标记方法研究[J].河南大学学报(自然科学版),2009,39(5).
作者姓名:王川  张小红  韩采华
作者单位:1. 河南师范大学,计算机与信息技术学院,河南,新乡,453007
2. 河南财政税务高等专科学校,信息工程系,郑州,450002
3. 郑州大学,省信息网络重点学科开放实验室,郑州,450000;河南省广播电视大学,郑州,450000
摘    要:利用自然语言理解技术进行古汉语断句及句读标注的主要挑战是数据稀疏问题.为了解决这一难题,设计了一种六字位标记集,提出了一种基于层叠式条件随机场模型的古文断句与句读标记方法.基于六字位标集,低层模型用观察序列确定句子边界,高层模型同时使用观察序列和低层的句子边界信息进行句读标记.实验在5 M混合古文语料上分别进行了封闭测试和开放测试,封闭测试断句与句读标注的F值分别达到96.48%和91.35%,开放测试断句与句读标注的F值分别达到71.42%和67.67%.

关 键 词:古汉语  层叠条件随机场  数据稀疏  句子切分  句读标注  

Research on Sentence Segmentation and Punctuation in Ancient Chinese
WANG Chuan,ZHANG Xiao-hong,HAN Cai-hua.Research on Sentence Segmentation and Punctuation in Ancient Chinese[J].Journal of Henan University(Natural Science),2009,39(5).
Authors:WANG Chuan  ZHANG Xiao-hong  HAN Cai-hua
Institution:1.College of Computer and Information Technology;Henan Normal University;Henan Xinxiang 453007;China;2.Computer and Information Engineering;Henan Junior College of Finance & Taxation;Zhengzhou 450002;3.Provincial Key Lab on Information Network;Zhengzhou University;Zhengzhou 450000;4.Henan Radio & Television University;China
Abstract:Data sparseness is a primary challenge in sentence segmentation and punctuation in ancient Chinese using natural language processing technology.In order to overcome this difficulty,a 6-tag set was designed and a method based on cascaded Conditional Random Fields was proposed.The main idea is as follows: based on the 6-tag set,a low level model determines the boundaries of sentences according to observation sequence and a high level model punctuates sentences taking consideration of both observation sequence...
Keywords:ancient Chinese  cascaded conditional random fields  data sparseness  sentence segmentation  punctuation  
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号