首页 | 本学科首页   官方微博 | 高级检索  
     

一种中文领域网页过滤方法
引用本文:刘杰,骆力明,吴宇航,马轶芳,蔡红梅. 一种中文领域网页过滤方法[J]. 北京理工大学学报, 2014, 34(5): 533-536
作者姓名:刘杰  骆力明  吴宇航  马轶芳  蔡红梅
作者单位:首都师范大学信息工程学院,北京 100048;首都师范大学信息工程学院,北京 100048;首都师范大学信息工程学院,北京 100048;首都师范大学信息工程学院,北京 100048;首都师范大学信息工程学院,北京 100048
基金项目:国家自然科学基金资助项目(61371194)
摘    要:鉴于互联网上各种不良网页的影响,提出了一种使用贝叶斯分类算法和领域本体过滤中文网页的方法。 该方法根据正反例领域网页计算领域特征词的权重,建立领域特征词库并制作领域本体,根据正例领域网页得到本体元素权重库;使用贝叶斯分类算法得到候选网页;根据领域本体对候选网页进行语义相关度计算并进行网页过滤。 该方法可以区分相同领域网页中的正反例网页并可兼顾网页过滤的实时性。 通过游戏领域网页的测试,准确率和召回率均在98%以上, 语义分析游戏相关网页的平均时间为1~2 s, 对用户浏览网页速度的影响较小, 效果令人满意。 

关 键 词:网页过滤  网页屏蔽  语义过滤
收稿时间:2013-09-18

A Method of Filtering Chinese Webpage
LIU Jie,LUO Li-ming,WU Yu-hang,MA Yi-fang and CAI Hong-mei. A Method of Filtering Chinese Webpage[J]. Journal of Beijing Institute of Technology(Natural Science Edition), 2014, 34(5): 533-536
Authors:LIU Jie  LUO Li-ming  WU Yu-hang  MA Yi-fang  CAI Hong-mei
Affiliation:Information and Engineering College, Capital Normal University, Beijing 100048, China
Abstract:In view of the adverse effects of a variety of useless webpages, a method based on the Bayesian classification algorithm and domain ontology was proposed to filter the unwanted Chinese webpages. The method firstly calculated the weight of domain feature words according to the positive and negative domain webpages, established domain feature lexicon and constructed the domain ontology, got the weights library of ontology elements according to the positive domain webpages; then acquired the candidates by using the Bayesian classification algorithm; lastly semantically analyzed and filtered the candidates according to the domain ontology. This method can not only distinguish the positive and negative webpages which are in the same field but also get a good performance on the real-time of webpages filtering. The experiments on huge numbers of game-related webpages have shown promising results. The precision and recall are more than 98%, the average time of semantically analyzing one game webpage is 1~2 s, it has little effect on user browsing webpages.
Keywords:webpage filtering  webpage shielding  semantic filtering
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《北京理工大学学报》浏览原始摘要信息
点击此处可从《北京理工大学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号