ECOD: 利用经验累积分配职能不受监督地探测外部外星 (ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions) - 专知论文

会员服务 ·

0

异常点 · 估计/估计量 · 累积分布函数 · 数据点 · Extensibility ·

2022 年 1 月 2 日

ECOD: Unsupervised Outlier Detection Using Empirical Cumulative Distribution Functions

翻译：ECOD: 利用经验累积分配职能不受监督地探测外部外星

Zheng Li,Yue Zhao,Xiyang Hu,Nicola Botta,Cezar Ionescu,George H. Chen

from arxiv, Code is available in PyOD library at https://github.com/yzhao062/pyod

Outlier detection refers to the identification of data points that deviate from a general data distribution. Existing unsupervised approaches often suffer from high computational cost, complex hyperparameter tuning, and limited interpretability, especially when working with large, high-dimensional datasets. To address these issues, we present a simple yet effective algorithm called ECOD (Empirical-Cumulative-distribution-based Outlier Detection), which is inspired by the fact that outliers are often the "rare events" that appear in the tails of a distribution. In a nutshell, ECOD first estimates the underlying distribution of the input data in a nonparametric fashion by computing the empirical cumulative distribution per dimension of the data. ECOD then uses these empirical distributions to estimate tail probabilities per dimension for each data point. Finally, ECOD computes an outlier score of each data point by aggregating estimated tail probabilities across dimensions. Our contributions are as follows: (1) we propose a novel outlier detection method called ECOD, which is both parameter-free and easy to interpret; (2) we perform extensive experiments on 30 benchmark datasets, where we find that ECOD outperforms 11 state-of-the-art baselines in terms of accuracy, efficiency, and scalability; and (3) we release an easy-to-use and scalable (with distributed support) Python implementation for accessibility and reproducibility.

翻译：外向检测是指确定不同于一般数据分布的数据点; 现有的未经监督的方法往往由于计算成本高、超参数调整复杂和解释有限,特别是在与大型高维数据集合作时,其计算成本高、超参数调高、解释性有限,特别是在使用大型高维数据集时。为了解决这些问题,我们提出了一个简单而有效的算法,称为ECAD(基于模拟分布分布的外向检测),它受到以下事实的启发:外部线往往是分布尾部中出现的“极端事件”。简言之,ECOD首先通过计算数据每个维度的经验累积分布,以非参数性的方式估计输入数据的基本分布。 ECOD随后利用这些经验分布来估计每个数据点每个维度的尾端概率。最后,ECOD通过汇总估计的尾部概率,对每个数据点的外向分数进行比较。我们的贡献如下:(1) 我们提出一种新型的外向外探测方法,即ECOD,它既无参数又易于解释;(2) 我们用30个基准性累积性累积性进行广泛的实验,我们发现11号的精确性,我们在那里找到了一个基准值的精确度。

2

相关内容

异常点

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

论文周报 | 推荐系统领域最新研究进展

论文周报 | 推荐系统领域最新研究进展

机器学习与推荐算法

2+阅读 · 2022年4月11日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

社交网络级联数据流异常检测模型研究

国家自然科学基金

4+阅读 · 2015年12月31日

实时流数据变系数多分类模型研究

国家自然科学基金

1+阅读 · 2014年12月31日

多维在线跨语言Calling Network建模及其在可信国家电子税务软件中的实证应用

国家自然科学基金

0+阅读 · 2014年12月31日

线性算子的谱结构及其扰动分析

国家自然科学基金

0+阅读 · 2014年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

对偶框架各向异性提升变换理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

多天线OFDM信道全信息压缩估计理论与方法

国家自然科学基金

0+阅读 · 2011年12月31日

随机分析与计算机通信网络中的排队模型

国家自然科学基金

0+阅读 · 2009年12月31日

分布式数据流的集成模式挖掘模型和概念漂移检测算法研究

国家自然科学基金

2+阅读 · 2008年12月31日

超过程及相关SPDE的研究

国家自然科学基金

0+阅读 · 2008年12月31日

A dimension-oblivious domain decomposition method based on space-filling curves

Arxiv

0+阅读 · 2022年4月20日

Composite Anomaly Detection via Hierarchical Dynamic Search

Arxiv

0+阅读 · 2022年4月20日

Graph-theoretic algorithms for Kolmogorov operators: Approximating solutions and their gradients in elliptic and parabolic problems on manifolds

Arxiv

0+阅读 · 2022年4月19日

CenterNet++ for Object Detection

Arxiv

0+阅读 · 2022年4月18日

Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey

Arxiv

0+阅读 · 2022年4月18日

Transfer Learning under High-dimensional Generalized Linear Models

Arxiv

0+阅读 · 2022年4月17日

Shape-guided Object Inpainting

Arxiv

0+阅读 · 2022年4月16日

A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation

Arxiv

0+阅读 · 2022年4月15日

Latent Gaussian Model Boosting

Arxiv

0+阅读 · 2022年4月14日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

VIP会员

文章信息

相关主题

估计/估计量

累积分布函数

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【因果基础】Causality Basics，36页ppt

专知会员服务

52+阅读 · 2021年8月8日

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

深度学习优化算法，73页ppt，Optimization Algorithms on Deep Learning

专知会员服务

135+阅读 · 2021年6月16日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

181+阅读 · 2019年10月11日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

数据驱动死亡：以色列AI战争机器如何锁定目标

【普林斯顿博士论文】通过以人为本的评估推动负责任的人工智能

ICML 2025 | BiAssemble: 双臂机器人几何拼合问题的协同可供性学习

ICML 2025杰出论文出炉：8篇获奖，南大研究者榜上有名

相关资讯

论文周报 | 推荐系统领域最新研究进展

论文周报 | 推荐系统领域最新研究进展

机器学习与推荐算法

2+阅读 · 2022年4月11日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

ACM TOMM Call for Papers

ACM TOMM Call for Papers

CCF多媒体专委会

2+阅读 · 2022年3月23日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

强化学习三篇论文避免遗忘等

强化学习三篇论文避免遗忘等

CreateAMind

20+阅读 · 2019年5月24日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

A dimension-oblivious domain decomposition method based on space-filling curves

Arxiv

0+阅读 · 2022年4月20日

Composite Anomaly Detection via Hierarchical Dynamic Search

Arxiv

0+阅读 · 2022年4月20日

Graph-theoretic algorithms for Kolmogorov operators: Approximating solutions and their gradients in elliptic and parabolic problems on manifolds

Arxiv

0+阅读 · 2022年4月19日

CenterNet++ for Object Detection

Arxiv

0+阅读 · 2022年4月18日

Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey

Arxiv

0+阅读 · 2022年4月18日

Transfer Learning under High-dimensional Generalized Linear Models

Arxiv

0+阅读 · 2022年4月17日

Shape-guided Object Inpainting

Arxiv

0+阅读 · 2022年4月16日

A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation

Arxiv

0+阅读 · 2022年4月15日

Latent Gaussian Model Boosting

Arxiv

0+阅读 · 2022年4月14日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

相关基金

社交网络级联数据流异常检测模型研究

国家自然科学基金

4+阅读 · 2015年12月31日

实时流数据变系数多分类模型研究

国家自然科学基金

1+阅读 · 2014年12月31日

多维在线跨语言Calling Network建模及其在可信国家电子税务软件中的实证应用

国家自然科学基金

0+阅读 · 2014年12月31日

线性算子的谱结构及其扰动分析

国家自然科学基金

0+阅读 · 2014年12月31日

高维数据的假设检验

国家自然科学基金

0+阅读 · 2012年12月31日

对偶框架各向异性提升变换理论与应用研究

国家自然科学基金

0+阅读 · 2012年12月31日

多天线OFDM信道全信息压缩估计理论与方法

国家自然科学基金

0+阅读 · 2011年12月31日

随机分析与计算机通信网络中的排队模型

国家自然科学基金

0+阅读 · 2009年12月31日

分布式数据流的集成模式挖掘模型和概念漂移检测算法研究

国家自然科学基金

2+阅读 · 2008年12月31日

超过程及相关SPDE的研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员