与少数群体和多数群体类相比,对不平衡问题采用过度抽样方法 (A Synthetic Over-sampling method with Minority and Majority classes for imbalance problems) - 专知论文

会员服务 ·

0

Performer · INFORMS · 示例 · 类别 · binary ·

2021 年 8 月 10 日

A Synthetic Over-sampling method with Minority and Majority classes for imbalance problems

翻译：与少数群体和多数群体类相比,对不平衡问题采用过度抽样方法

Hadi A. Khorshidi,Uwe Aickelin

from arxiv, This work has been submitted to IEEE Transactions on Cybernetics for possible publication

Class imbalance is a substantial challenge in classifying many real-world cases. Synthetic over-sampling methods have been effective to improve the performance of classifiers for imbalance problems. However, most synthetic over-sampling methods generate non-diverse synthetic instances within the convex hull formed by the existing minority instances as they only concentrate on the minority class and ignore the vast information provided by the majority class. They also often do not perform well for extremely imbalanced data as the fewer the minority instances, the less information to generate synthetic instances. Moreover, existing methods that generate synthetic instances using the majority class distributional information cannot perform effectively when the majority class has a multi-modal distribution. We propose a new method to generate diverse and adaptable synthetic instances using Synthetic Over-sampling with Minority and Majority classes (SOMM). SOMM generates synthetic instances diversely within the minority data space. It updates the generated instances adaptively to the neighbourhood including both classes. Thus, SOMM performs well for both binary and multiclass imbalance problems. We examine the performance of SOMM for binary and multiclass problems using benchmark data sets for different imbalance levels. The empirical results show the superiority of SOMM compared to other existing methods.

翻译：在对许多真实世界案例进行分类方面,分类不平衡是一个巨大的挑战。合成过度抽样方法对于提高分类者在不平衡问题方面的表现是有效的。然而,大多数合成过度抽样方法在由现有少数群体案例形成的锥体内产生非多元合成案例,因为它们只集中在少数群体,忽视多数群体提供的大量信息。它们也往往不能很好地使用极不平衡的数据,因为少数群体案例较少,产生合成案例的信息较少。此外,在多数群体有多种模式分布的情况下,利用现有方法产生合成案例无法有效发挥作用。我们提出了一个新方法,利用与少数群体和多数群体类的合成过度抽样(SOMM)生成多样化和可适应的合成案例。SOMM在少数群体数据空间内生成的合成案例多种多样,因此,SOMM在使用不同不平衡水平的基准数据集处理二进制和多级问题方面表现良好。实证结果显示SOMM与其他现有方法相比,SOMM具有优越性。

0

相关内容

Performer

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

专知会员服务

26+阅读 · 2020年3月27日

【论文推荐】用于低资源药物发现的元学习初始化，Meta-Learning Initializations for Low-Resource Drug Discovery

【论文推荐】用于低资源药物发现的元学习初始化，Meta-Learning Initializations for Low-Resource Drug Discovery

专知会员服务

27+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【KDD2019|讲座推荐】工业中可解释的人工智能：Fake News Research: Theories, Detection Strategies, and Open Problems

专知会员服务

67+阅读 · 2019年12月9日

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

专知会员服务

46+阅读 · 2019年11月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

已删除

将门创投

3+阅读 · 2019年4月12日

Learning with Memory-based Virtual Classes for Deep Metric Learning

Arxiv

0+阅读 · 2021年10月8日

Influence-Balanced Loss for Imbalanced Visual Classification

Arxiv

0+阅读 · 2021年10月6日

ImGAGN:Imbalanced Network Embedding via Generative Adversarial Graph Networks

Arxiv

14+阅读 · 2021年6月5日

Deep Stable Learning for Out-Of-Distribution Generalization

Arxiv

12+阅读 · 2021年4月16日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

Improving Collaborative Metric Learning with Efficient Negative Sampling

Arxiv

3+阅读 · 2019年9月24日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks

Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks

Arxiv

6+阅读 · 2018年7月5日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Improved Training of Generative Adversarial Networks Using Representative Features

Arxiv

7+阅读 · 2018年1月28日

VIP会员

文章信息

相关主题

相关VIP内容

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

【论文推荐】逆问题，深度学习，对称性破缺，Inverse Problems, Deep Learning, and Symmetry Breaking

专知会员服务

26+阅读 · 2020年3月27日

【论文推荐】用于低资源药物发现的元学习初始化，Meta-Learning Initializations for Low-Resource Drug Discovery

【论文推荐】用于低资源药物发现的元学习初始化，Meta-Learning Initializations for Low-Resource Drug Discovery

专知会员服务

27+阅读 · 2020年3月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【2020新书】Python大数据处理，Mastering Large Datasets with Python

【2020新书】Python大数据处理，Mastering Large Datasets with Python

专知会员服务

54+阅读 · 2020年2月2日

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

【Python Tricks新书】The book: A Buffet of Awesome Python Features，299页pdf

专知会员服务

45+阅读 · 2020年1月1日

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

【AAAI2020】多模态注意力语义图嵌入多标签分类（Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification）

专知会员服务

92+阅读 · 2019年12月22日

【KDD2019|讲座推荐】工业中可解释的人工智能：Fake News Research: Theories, Detection Strategies, and Open Problems

专知会员服务

67+阅读 · 2019年12月9日

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

专知会员服务

46+阅读 · 2019年11月15日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

MIT新书《强化学习与最优控制》

MIT新书《强化学习与最优控制》

专知会员服务

280+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】在低维与高维空间中对潜在表征的分析、建模与变换

《美军使用大语言模型技术生成领域特定文档》2025最新379页

【NeurIPS 2025】以语言为中心的全模态表征学习的可扩展性研究

智能体化多模态大语言模型综述

相关资讯

已删除

将门创投

3+阅读 · 2019年4月12日

相关论文

Learning with Memory-based Virtual Classes for Deep Metric Learning

Arxiv

0+阅读 · 2021年10月8日

Influence-Balanced Loss for Imbalanced Visual Classification

Arxiv

0+阅读 · 2021年10月6日

ImGAGN:Imbalanced Network Embedding via Generative Adversarial Graph Networks

Arxiv

14+阅读 · 2021年6月5日

Deep Stable Learning for Out-Of-Distribution Generalization

Arxiv

12+阅读 · 2021年4月16日

Contrastive Learning with Hard Negative Samples

Arxiv

7+阅读 · 2020年10月9日

Improving Collaborative Metric Learning with Efficient Negative Sampling

Arxiv

3+阅读 · 2019年9月24日

Exploiting Synthetically Generated Data with Semi-Supervised Learning for Small and Imbalanced Datasets

Arxiv

3+阅读 · 2019年3月24日

Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks

Generating Realistic Geology Conditioned on Physical Measurements with Generative Adversarial Networks

Arxiv

6+阅读 · 2018年7月5日

The Search Problem in Mixture Models

Arxiv

3+阅读 · 2018年2月24日

Improved Training of Generative Adversarial Networks Using Representative Features

Arxiv

7+阅读 · 2018年1月28日

微信扫码咨询专知VIP会员