首页 | 本学科首页   官方微博 | 高级检索  
     检索      

汉语自动分词的研究现状与困难
引用本文:张春霞,郝天永.汉语自动分词的研究现状与困难[J].系统仿真学报,2005,17(1):138-143,147.
作者姓名:张春霞  郝天永
作者单位:1. 中国科学院计算技术研究所,北京,100080;中国科学院研究生院,北京,100039
2. 中国科学院计算技术研究所,北京,100080
基金项目:自然科学基金(#60073017和#60273019) 科技部重大基础项目基金(#2001CCA03000和#2002DEA30036)的资助。
摘    要:汉语自动分词是信息提取、信息检索、机器翻译、文本分类、自动文摘、语音识别、文本语音转换、自然语言理解等中文信息处理领域的基础研究课题。尽管已被研究了二十多年,分词仍然是中文信息处理的瓶颈问题。基于对汉语自动分词研究的现状分析,构建了自动分词的形式化模型,论述了影响分词的诸多因素,分析了分词中存在的两个最大困难及其解决方法。最后指出了目前分词研究中尤其是在分词评测方面存在的问题以及未来的研究工作。

关 键 词:汉语自动分词  形式化模型  未登录词  分词评测
文章编号:1004-731X(2005)01-0138-06

The State of the Art and Difficulties in Automatic Chinese Word Segmentation
ZHANG Chun-xia,HAO Tian-yong.The State of the Art and Difficulties in Automatic Chinese Word Segmentation[J].Journal of System Simulation,2005,17(1):138-143,147.
Authors:ZHANG Chun-xia  HAO Tian-yong
Abstract:Automatic Chinese word segmentation is a basic research issue on Chinese information processing tasks such as information extraction, information retrieval, machine translation, text classification, automatic text summarization, speech recognition, text-to-speech, natural language understanding, and so on. Though it has been investigated for more than twenty years, it is still a bottleneck for Chinese information processing. We give a detailed analysis of the state of the art in automatic Chinese word segmentation, build a formal model of word segmentation, discuss factors affecting word segmentation and the two great difficulties in word segmentation and their resolutions, and finally, point out the existing problems, especially those on the word segmentation evaluation, as well as the research problems to be resolved.
Keywords:automatic Chinese word segmentation  formal model  unknown words  word segmentation evaluation
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号