首页 | 本学科首页   官方微博 | 高级检索  
     

藏文文本分词赋码一体化研究
引用本文:扎西加,高定国. 藏文文本分词赋码一体化研究[J]. 西藏大学学报, 2012, 0(2): 57-61
作者姓名:扎西加  高定国
作者单位:[1]西藏大学工学院 [2]西藏大学藏文信息技术研究中心,西藏拉萨850000
基金项目:2011年度国家自然科学基金项目“藏语依存树库的构建”(项目号:61163043);国家自然科学基金项目“基于虚词的藏语基本句型的形式化研究”(项目号:61063015);教育部人文社会科学基金青年项目“现代藏文音节字的自动校对方法研究”(项目号:10YJCZH033);国家语委项目“大型藏文基础语料库构建”(项目号:MZl15-039);2011年度西藏自治区一般科技计划项目“基于语料库的藏语词汇计量研究”阶段性成果.
摘    要:在藏文文本理解中虚词发挥着重要的句法、语义桥接作用,其规则的有效性在藏文分词处理中扮演着特殊的角色。由于虚词本身及其角色的丰富性,在一定意义上可以说藏文分词处理是虚词识别的过程。因此,虚词识别的正确与否直接影响着藏文文本分词处理的效果。文章依据藏语自身的语法规律和虚词功能的特殊性,首先构建了虚词知识库、虚词兼类库,以及其作为藏文连续文本中识别虚词的依据;其次,研制了标有词汇属性的分词词表和一定规模的训练语料库资源,以基于条件随机域(CRF)的方法进行词性标注,并结合虚词和词性赋码的资源制作了藏文自动分词赋码一体化处理的模型。

关 键 词:藏文  分词  词性赋码

Study on Integration of Tibetan Language Text Participle POS Tagging
Tashi-gyal Gao Ding-guo. Study on Integration of Tibetan Language Text Participle POS Tagging[J]. Journal of Tibet University, 2012, 0(2): 57-61
Authors:Tashi-gyal Gao Ding-guo
Affiliation:Tashi-gyal Gao Ding-guo Sehool of Engineering, Tibet University,Lhasa 850000, Tibet; ibetan information technology research center, Tibet University. Lhasa 850000, Tibet)
Abstract:The function words have an important connection function of the syntax and the semantics in the understanding of Tibetan language text and its effectiveness of regulation also plays a special role in the Tibetan word processing. It can be said that the Tibetan word processing is a procedure of the function words identification in a certain sense, because of it has richness of function words and its rich role. Therefore, the correct identification of the function words directly impacts the effectiveness of Tibetan language text participle. According to the particularity of the Tibetan grammar rules and the role of the function words, in the present paper, firstly, a function word knowledge base, and simultaneous base of function words and a baseline of identification function words in the continuous Tibetan language text were constructed. Secondly, a participle word list of vocabulary attribute is produced and a Tibetan automatic POS tagging integration treatment model was achieved by the certain scale training corpus as a resources, the method of Conditional Random Fields (CRF) based proceeding speech tagging combining with the recourses of the function words and POS tagging.
Keywords:Tibetan language  Participle  POS tagging
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号