首页 | 本学科首页   官方微博 | 高级检索  
     检索      

文档复制检测技术
引用本文:麻会东,刘国华,梁鹏,苑迎.文档复制检测技术[J].燕山大学学报,2007,31(5):410-417.
作者姓名:麻会东  刘国华  梁鹏  苑迎
作者单位:燕山大学,信息科学与工程学院,河北,秦皇岛,066004
基金项目:教育部科学技术研究重点项目;河北省教育厅自然科学基金
摘    要:随着数字图书馆和互联网的飞速发展,数字化文档唾手可得。近年来学术剽窃现象屡见报端,互联网上日益增多的重复网页降低了检索效率,给用户带来不便。文档复制检测技术在保护知识产权和优化搜索引擎方面起着重要作用,是近年来数据库安全领域研究的热点。文档复制检测方法有两类:一是基于词频统计的方法,一是基于字符串匹配的方法。本文详尽分析了现有基于这两类方法的复制检测技术,并指出它们的优缺点,针对两类方法都存在的问题提出一些改进方案。最后总结了复制检测技术应满足的特性,讨论了检测方法的准确性和文档分解规则。

关 键 词:复制检测  剽窃  指纹  文本块  匹配
文章编号:1007-791X(2007)05-0410-08
修稿时间:2006-12-20

Document copy detection technology
MA Hui-dong,LIU Guo-hua,LIANG Peng,YUAN ying.Document copy detection technology[J].Journal of Yanshan University,2007,31(5):410-417.
Authors:MA Hui-dong  LIU Guo-hua  LIANG Peng  YUAN ying
Institution:1. College of Information Science and Engineering, Yanshan University, Qinhuangdao, Hebei 066004, China
Abstract:With the rapid development of digital library and the internet, digital documents are easily acquired. Recent years, there are many news about plagiarism on reseach, and the number of duplicated pages on the web is increasing, which lower the efficiency of search and put users to inconvenience. The copy detection technique plays an important role on intellectual property protection and information retrieval, this technique is the hot topic in field of database security. There are two approachs to copy detection, one is based on the frequency of words which appear in the document, the other is based on the string match. The exsited systems based on the two approachs are analyzed and the merit and shortage is pointed out. The improved scheme is proposed. The characters that copy detection technique should satisfy are summarized, and the veracity of detection and rules of document division are discussed.
Keywords:copy detection  plagiarism  fingerprint  chunk  match
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号