首页 | 本学科首页   官方微博 | 高级检索  
     

分类任务中标签噪声的研究综述
引用本文:佟强,刁恩虎,李丹,谌彤童,刘旭红,刘秀磊. 分类任务中标签噪声的研究综述[J]. 科学技术与工程, 2022, 22(31): 13626-13635
作者姓名:佟强  刁恩虎  李丹  谌彤童  刘旭红  刘秀磊
作者单位:北京信息科技大学
基金项目:促进高校分类发展-重点研究培育项目(2121YJPY225);科研机构创新能力建设-数据科学与情报分析研究所;促进高校内涵发展—面向边缘计算的创新科研平台建设项目(2020KYNH105)
摘    要:近年来,随着机器学习的发展,分类系统的性能有了很大的飞跃。模型需要大量带标签数据才能使训练结果达到要求,而获取高质量的标注数据费时费力。为了降低成本,出现了众包、自动化系统等方法标注训练数据。但是,这些标注方法往往会产生大量错误标注,即标签噪声。另外,信息不足、专家错误和编码错误等因素,也可能使标签受到污染。训练过程中对标签噪声的处理不当,可能会使预测精度和准确性降低,或者使模型复杂度增加。因此,研究标签噪声对推广机器学习在各领域的应用和降低机器学习算法的部署成本等方面具有重要意义。通过综述产生标签噪声的原因、影响以及近几年来应对标签噪声的一些技术方法,对标签噪声的研究现状和发展前景进行分析。

关 键 词:机器学习  分类系统  数据标注  模型复杂度  标签噪声
收稿时间:2021-12-16
修稿时间:2022-08-17

A Survey of Label Noise in Classification
Tong Qiang,Diao Enhu,Li Dan,Chen Tongtong,Liu Xuhong,Liu Xiulei. A Survey of Label Noise in Classification[J]. Science Technology and Engineering, 2022, 22(31): 13626-13635
Authors:Tong Qiang  Diao Enhu  Li Dan  Chen Tongtong  Liu Xuhong  Liu Xiulei
Affiliation:Beijing Information Science and Technology University
Abstract:In recent years, with the development of machine learning, the classification system has made great progress. Its performance largely depends on the quality of the training samples. It is time-consuming to obtain high-quality labeled data. To reduce costs, there are lots of methods labeling training data, such as crowdsourcing and automated systems. However, these methods often cause that a large number of data is mislabeled, namely label noisy. Besides the previous difficulty, insufficient information, expert errors and coding errors are also influence labels. If label noise is not handled appropriately during the training process, it may reduce the prediction accuracy, or increase the complexity of the model. Therefore, the research of label noise is of great significance for promoting the application of deep learning in various fields and reducing the deployment cost of deep learning algorithms. In this paper, a comprehensive introduction is provided about this research topic. Specifically, the types and effect of label noise are introduced, and the processing methods of label noise are analyzed.
Keywords:machine learning   classification system   data annotation   model complexity   label noise  
点击此处可从《科学技术与工程》浏览原始摘要信息
点击此处可从《科学技术与工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号