谱聚类欠取样下自编码网络不平衡数据挖掘
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:


Unbalanced Data Mining of Self-Encoding Network under Spectral Clustering Undersampling
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 增强出版
  • |
  • 文章评论
    摘要:

    不平衡数据集的应用领域日益广泛, 需求也越来越高, 为提升整体数据集的分类准确率, 以谱聚类欠取样为前提条件, 构建一种自编码网络不平衡数据挖掘方法. 把聚类问题转换成无向图多路径划分问题, 通过无向图与标准化处理完成谱聚类, 经过有选择地欠取样处理多数类数据集, 获取分类边界偏移量, 利用学习过程是无监督学习的自编码网络, 升、降维数据, 获取各维度隐藏特征, 实现各层面的数据高效表示学习, 根据最大均值差异与预设阈值的对比结果, 调整自编码网络, 基于得到的分类界面, 完成不平衡数据挖掘. 选用具有不同实际应用背景的UCI数据集, 从中抽取10组数据作为测试集, 经谱聚类欠取样处理与模拟实验, 发现所提方法大幅提升少数类分类精度与整体挖掘性能, 具有较好的适用性与可行性.

    Abstract:

    The application fields of unbalanced data sets are becoming increasingly extensive, and the demand for them is getting higher. Taking the spectral clustering undersampling as a prerequisite, this study develops an unbalanced data mining method based on a self-encoding network to improve the classification accuracy of the overall data set. The clustering problem is converted into the multi-path partition problem of an undirected graph, and the spectral clustering is completed depending on the undirected graph and standardized processing. The majority of data sets are processed through selective undersampling to yield the classification boundary offset. The learning process is a self-encoding network of unsupervised learning, based on which the dimensionality of data is increased or reduced so that hidden features of each dimension can be obtained and the efficient representation and learning of data are realized at all levels. The self-encoding network is adjusted according to the comparison between the maximum mean difference and the preset threshold. The unbalanced data mining is then completed with the obtained classification interface. UCI data sets with different practical application backgrounds are selected, from which 10 sets of data are extracted as test sets. After spectral clustering undersampling, the simulation experiments demonstrate that the proposed method greatly improves the classification accuracy of the minority and overall mining performance, which shows good applicability and feasibility.

    参考文献
    相似文献
    引证文献
引用本文

王舒梵,严涛,姜新盈.谱聚类欠取样下自编码网络不平衡数据挖掘.计算机系统应用,2021,30(10):331-335

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-12-24
  • 最后修改日期:2021-01-25
  • 录用日期:
  • 在线发布日期: 2021-10-08
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京海淀区中关村南四街4号 中科院软件园区 7号楼305房间,邮政编码:100190
电话:010-62661041 传真: Email:csa (a) iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号