基于样本不确定性和代表性相结合的可控主动学习算法研究 Controlling active learning algorithm based on uncertainty and representative of data selection期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于样本不确定性和代表性相结合的可控主动学习算法研究

引用本文：	胡正平,高文涛,万春艳. 基于样本不确定性和代表性相结合的可控主动学习算法研究[J]. 燕山大学学报, 2009, 33(4): 341-346

作者姓名：	胡正平高文涛万春艳

作者单位：	1. 燕山大学,信息科学与工程学院,河北,秦皇岛,066004 2. 齐齐哈尔市第三中学,黑龙江,齐齐哈尔,161000

基金项目：	河北省自然科学基金资助项目，中国博士后科学基金资助项目

摘要：	通过选取最有信息量的样本提交专家进行标注,主动学习算法可以有效地减少无效标注样本的工作量.在充分考虑位于分类边界的不确定样本和基于先验分布的具有代表性样本的基础上,本文构造了不确定性与代表性相结合的可控主动学习算法.首先利用样本的kNN分布状况建立不确定性置信度模型,该思路不需要知道样本分布的具体类型和参数计算;然后在样本聚集度模型的基础上进行聚类,在此基础上建立代表性置信度模型.最后将不确定性置信度模型与代表性置信度模型进行综合,构造可控的主动学习策略,使得每次主动学习选择的样本更具有"价值".在UCI机器学习数据库上的仿真实验结果表明本文的思路是合理可行的,在实验所用数据集上,当达到相同的目标正确率时,本文的方法比随机采样算法所需的样本数量少得多.
关键词：	可控主动学习不确定性样本样本先验分布代表性样本
Controlling active learning algorithm based on uncertainty and representative of data selection

HU Zheng-ping,GAO Wen-tao,WAN Chun-yan. Controlling active learning algorithm based on uncertainty and representative of data selection[J]. Journal of Yanshan University, 2009, 33(4): 341-346

Authors:	HU Zheng-ping GAO Wen-tao WAN Chun-yan

Affiliation:	1. College of Information Science and engineering;Yanshan University;Qinhuangdao;Hebei 066004;China;2. Qiqihaer third middle school;Qiqihaer;Heilongjiang 161000

Abstract:	Active learning algorithm can alleviate effectively the efforts of ineffective labeling instances by selecting the most informative examples for experts to label. Fully considered the uncertain samples close to the classification boundary and representative samples near the center of the prior data distribution,the controlling active learning method based on uncertainty and representative of data selection is presented. Firstly,uncertainty confidence level model is constructed using NN distribution of sampl...

Keywords:	controlling active learning uncertainty samples prior data distribution representative samples
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏