系统工程与电子技术 ›› 2021, Vol. 43 ›› Issue (3): 700-708.doi: 10.12305/j.issn.1001-506X.2021.03.13

• 系统工程 • 上一篇    下一篇

面向多类别分类问题的子抽样主动学习方法

施伟(), 黄红蓝(), 冯旸赫(), 刘忠()   

  1. 国防科技大学系统工程学院, 湖南 长沙 410073
  • 收稿日期:2020-03-13 出版日期:2021-03-01 发布日期:2021-03-16
  • 作者简介:施伟(1997-), 男, 硕士研究生, 主要研究方向为小样本主动学习和层次强化学习。E-mail:shiwei15@nudt.edu.cn|黄红蓝(1995-), 女, 博士研究生, 主要研究方向为主动学习、强化学习和小样本学习。E-mail:huanghonglan17@nudt.edu.cn|冯旸赫(1985-), 男, 副教授, 博士, 主要研究方向为认知计算、深度学习、深度强化学习、主动学习。E-mail:fengyanghe@yeah.net|刘忠(1968-), 男, 教授, 博士, 主要研究方向为智能指挥控制、计划系统技术。E-mail:liuzhong@nudt.edu.cn
  • 基金资助:
    国家自然科学基金(71701205)

Subsampling oriented active learning method for multi-category classification problem

Wei SHI(), Honglan HUANG(), Yanghe FENG(), Zhong LIU()   

  1. College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
  • Received:2020-03-13 Online:2021-03-01 Published:2021-03-16

摘要:

由于传统主动学习方法的计算量随着问题规模的增大呈指数增长, 因此很难应用于大规模多类数据分类任务中。为解决该问题, 设计了一种基于子抽样的主动学习(subsampling-based active learning, SBAL)算法。该算法将无监督聚类算法与传统主动学习方法整合, 在二者之间增加了子抽样操作, 该操作能够显著降低算法的时间复杂度, 在保证实验准确率的基础上减少实验耗时, 从而更加高效地处理大规模数据集的分类问题。实验结果显示, 采用SBAL算法的实验性能优于传统主动学习算法, 证明了所提算法可以突破传统主动学习方法不能处理大规模数据集多类别分类问题的局限性。

关键词: 子抽样, 主动学习, 无监督聚类, 多类别分类问题

Abstract:

Because the computational amount of the traditional active learning method increases exponentially with the increase of problem size, it is difficult to apply to the large-scale multi-category data classification tasks. To solve this problem, a subsampling-based active learning (SBAL) algorithm is designed. This algorithm integrates unsupervised clustering algorithm with traditional active learning method, and adds subsampling operation between them. This operation can significantly reduce the time complexity of the algorithm, reduce the experimental time-consuming on the basis of ensuring the accuracy of the experiment, so as to deal with the classification problem of large-scale data sets more efficiently. The experimental results show that the experimental performance of the SBAL algorithm is better than that of the traditional active learning algorithm, which proves that the proposed method can break through the limitation that the traditional active learning method can not deal with multi-category classification of large-scale data sets.

Key words: subsampling, active learning, unsupervised clustering, multi-category classification problem

中图分类号: