首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于汉维对齐的双语语料库的获取方法
引用本文:玛依拉·艾尼扎提,胡学钢.一种基于汉维对齐的双语语料库的获取方法[J].合肥工业大学学报(自然科学版),2011(11):1670-1673.
作者姓名:玛依拉·艾尼扎提  胡学钢
作者单位:合肥工业大学计算机与信息学院;新疆轻工业职业技术学院计算机系;
基金项目:中央高校基本科研业务费专项基金资助项目(2011HJQC1013)
摘    要:文章以维吾尔文为对象,提出了一种汉维对齐的维文语料库获取方法,通过对照汉维特点,首先对维文进行词干切分,并在此基础上借助词干表和词频表进行词性标注,然后对汉维进行对齐,从而实现汉维双语语料库的获取,对维文及其他少数民族语言的分析及研究提供一种可行的方法.

关 键 词:汉维双语语料库  词性标注  对齐  Viterbi算法

Research on the building of Chinese-Uygur bilingual corpus based on alignment technology
Mahira Ganizat,HU Xue-gang.Research on the building of Chinese-Uygur bilingual corpus based on alignment technology[J].Journal of Hefei University of Technology(Natural Science),2011(11):1670-1673.
Authors:Mahira Ganizat    HU Xue-gang
Institution:Mahira Ganizat1,2,HU Xue-gang1(1.School of Computer and Information,Hefei University of Technology,Hefei 230009,China,2.Dept.of Computer,Xinjiang Institute of Light Industry Technology,Urumqi 830021,China)
Abstract:Focusing on the Uygur,this paper proposes an approach to building a Chinese-Uygur bilingual corpus based on alignment technology.According to the characters of both Chinese and Uygur,the Uygur words are segmented,the part of speech(POS) tagging is conducted in light of the table of term frequency,and the alignment of Chinese and Uygur is made to build the Chinese-Uygur bilingual corpus.The above approach is valuable for the research on the Uygur or the other minority languages.
Keywords:Chinese-Uygur bilingual corpus  part of speech(POS) tagging  alignment  Viterbi algorithm  
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号