首页 | 本学科首页   官方微博 | 高级检索  
     检索      

面向彝文网页的敏感内容分级系统研究
作者单位:;1.云南民族大学云南省高校少数民族语言文字信息化处理工程研究中心;2.文山学院
摘    要:随着互联网和彝文信息化的快速发展,彝文网络上充斥着大量的敏感信息,极大的影响了我国边疆的舆情信息安全.但彝文信息技术的发展与中英文相比还比较滞后,因彝文语言结构复杂、彝语分布环境广泛等原因,彝文的信息采集和文本分词等技术还不够成熟,这对涉外彝文网页的敏感内容监管带来巨大的挑战.为解决彝文网络信息的安全传播和舆情稳定,试图提出彝文敏感内容分级模型,并结合自研的彝文爬虫及分词等技术.构建一种面向彝文网页的敏感内容分级的算法模型和演示系统,相比于同类的民族语言舆情分析系统,不仅可实现敏感词的识别和过滤,还具有敏感内容分级、敏感源地址追踪等功能.通过人工评测与分析,该系统对敏感内容的分级可达到48%的准确率,敏感词的识别率为80%.

关 键 词:彝文网络  敏感信息  内容分级  舆情分析

A rating system for the sensitive content in the Yi-language web-pages
Institution:,Yunnan Province for Minority Language Information Processing Engineering Research Center,Yunnan Minzu University,Wenshan University
Abstract:With the rapid development of the Internet and Yi-language informatization, the Yi-language network is full of sensitive information, which greatly affects the information security in terms of public opinion in China's border areas. However, compared with that of the Chinese language and the English language, the development of information technology in the Yi language is still lagging behind. Because of the complex language structure and wide distribution environment of the Yi language, the technology of information collection and word segmentation of the Yi language is not mature enough, which brings great challenges to the supervision of the sensitive content in foreign-related web-pages in the Yi language. In order to promote the safe dissemination of the Yi network information and help the stability of the public opinion, we try to propose a sensitive word filtering algorithm and a content sensitivity classification model for the Yi text, and construct an algorithm model and a demonstration system for sensitive content rating of the Yi web based on self-developed reptile and word segmentation techniques of the Yi language. Compared with other similar systems in ethnic minority languages, this one has not only the functions of identifying and filtering sensitive words, but also the functions of rating the sensitive content and tracking the sensitive sources. Through a manual analysis, the system can achieve 48% accuracy in rating the sensitive content, and 80% accuracy in detecting sensitive words.
Keywords:Yi-language network  sensitive information  content rating  public opinion analysis
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号