首页 | 本学科首页   官方微博 | 高级检索  
     检索      

以词为本的编码方案的探讨
引用本文:程元斌.以词为本的编码方案的探讨[J].江汉大学学报(自然科学版),2013,41(2):47-52.
作者姓名:程元斌
作者单位:江汉大学数学与计算机科学学院,湖北武汉,430056
摘    要:语言是人进行思维的主要工具,词是语言处理的基本单位。在计算机信息处理中,目前是按字设计编码。随着计算机信息处理技术的发展,这种完全按字编码的不足也日益显示出来。从信息处理的基本需求以及词的基本特性出发,提出字词综合考虑且以词为本的统一编码方案。该方案以现行的主要编码标准UTF-16为基础,维持现有的字编码,增加词编码;词编码以包括一定语义信息及语义关系的概念空间树进行逻辑组织,以适应聚类检索及语种间代码转换的原则进行空间组织。最后指出了需要进一步深入研究的几个疑难问题。

关 键 词:词编码  UTF-16  聚类检索  概念空间树  自然语言处理

Encoding Scheme Based on Words
CHENG Yuan-bin.Encoding Scheme Based on Words[J].Journal of Jianghan University:Natural Sciences,2013,41(2):47-52.
Authors:CHENG Yuan-bin
Institution:CHENG Yuan-bin(School of Mathematics and Computer Science,Jianghan University,Wuhan 430056,Hubei,China)
Abstract:Language is the main tool of thinking.Words are the basic unit of language.Howev er,character encoding is the present encoding method in computer information processing.With in-depth development of computer information processing,the disadvantages of character encoding increasingly appear.From the basic needs of information processing and the basic characteristics of the words,an unified encoding scheme on comprehensive consideration of word-character,and word-oriented is proposed.The scheme based on the existing coding standard UTF-16,maintains the existing character encoding,adds words coding;words encoding are logical organized with the concept space tree including some semantic information and semantic relationship,adapting to clus ter retrieval and language code convert between two languages are the principles of spatial organiza tion.At last,points out several problems which need further study.
Keywords:words encoding  UTF-16  cluster retrieval  concept space tree  natural language processing
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号