分类核心数据集 -- -- 简化和加强 (Coresets for Classification -- Simplified and Strengthened) - 专知论文

会员服务 ·

0

子采样 · 对率损失 · 重要性采样 · 损失 · 损失函数（机器学习） ·

2021 年 6 月 8 日

Coresets for Classification -- Simplified and Strengthened

翻译：分类核心数据集 -- -- 简化和加强

Tung Mai,Anup B. Rao,Cameron Musco

We give relative error coresets for training linear classifiers with a broad class of loss functions, including the logistic loss and hinge loss. Our construction achieves $(1\pm \epsilon)$ relative error with $\tilde O(d \cdot \mu_y(X)^2/\epsilon^2)$ points, where $\mu_y(X)$ is a natural complexity measure of the data matrix $X \in \mathbb{R}^{n \times d}$ and label vector $y \in \{-1,1\}^n$, introduced in by Munteanu et al. 2018. Our result is based on subsampling data points with probabilities proportional to their $\ell_1$ $Lewis$ $weights$. It significantly improves on existing theoretical bounds and performs well in practice, outperforming uniform subsampling along with other importance sampling methods. Our sampling distribution does not depend on the labels, so can be used for active learning. It also does not depend on the specific loss function, so a single coreset can be used in multiple training scenarios.

翻译：我们为培训具有广泛损失功能的线性分类员,包括后勤损失和断链损失,提供了相对错误核心。我们的建筑以$\tilde O(d\cddt\mu_y(X)\2/\epsilon ⁇ 2美元)为单位实现$( 1\pm \ epsilon) 相对错误。 $\ mu_y( X)\\\\\ epsilon2美元是数据矩阵 $X\ y( X) 的自然复杂性测量值 $y( y)\ { in\ mathbb{ {r\n\ \ times d} $ 和标签矢量 $y $ \ {1\ 1\ {1\ ⁇ n$, 由 Munteanu etanu et al. 2018 引入。我们的建筑的建筑工程以 $( 2018) 。我们的结果基于与它们的概率成份子抽样数据点成的子取样点为单位, 和 $\ $\ $_ $1 lewis $1 $1 lewxx lexxx 重量成单位。它在实际操作中可以使用一个单一核心部分。它可以在多个学习中不取决于特定损失函数。在多个训练中使用。在多个中使用。

0

相关内容

子采样

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

已删除

将门创投

4+阅读 · 2018年6月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Multi-type Disentanglement without Adversarial Training

Arxiv

0+阅读 · 2021年8月3日

Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Arxiv

0+阅读 · 2021年7月30日

Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification

Arxiv

6+阅读 · 2021年5月17日

Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification

Arxiv

9+阅读 · 2021年3月26日

Distance Encoding -- Design Provably More Powerful GNNs for Structural Representation Learning

Arxiv

9+阅读 · 2020年8月31日

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Arxiv

5+阅读 · 2018年7月21日

Combination of Domain Knowledge and Deep Learning for Sentiment Analysis

Arxiv

3+阅读 · 2018年6月22日

Population Anomaly Detection through Deep Gaussianization

Arxiv

6+阅读 · 2018年5月5日

Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-scale Image Retrieval

Arxiv

8+阅读 · 2018年4月3日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

VIP会员

文章信息

相关主题

重要性采样

损失函数（机器学习）

相关VIP内容

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

【新书：机器学习简介】《A Concise Introduction to Machine Learning》by A.C. Faul (CRC 2019)

专知会员服务

77+阅读 · 2020年2月8日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

【机器学习基础最新版】（Mathematics for Machine Learning），417页pdf

专知会员服务

244+阅读 · 2019年10月21日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【NTU博士论文】深度神经网络的参数高效推理与训练

人工智能：实时战斗适应

【NeurIPS2025】MIDAS：一种基于错配的用于失衡多模态学习的数据增强策略

从感知到认知：多模态大语言模型中视觉-语言交互推理综述

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

已删除

将门创投

4+阅读 · 2018年6月26日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

【推荐】(Python)多种模型(Naive Bayes, SVM, CNN, LSTM, etc)实现推文情感分析

机器学习研究会

13+阅读 · 2017年12月25日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Multi-type Disentanglement without Adversarial Training

Arxiv

0+阅读 · 2021年8月3日

Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Arxiv

0+阅读 · 2021年7月30日

Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification

Arxiv

6+阅读 · 2021年5月17日

Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification

Arxiv

9+阅读 · 2021年3月26日

Distance Encoding -- Design Provably More Powerful GNNs for Structural Representation Learning

Arxiv

9+阅读 · 2020年8月31日

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Invariant Information Distillation for Unsupervised Image Segmentation and Clustering

Arxiv

5+阅读 · 2018年7月21日

Combination of Domain Knowledge and Deep Learning for Sentiment Analysis

Arxiv

3+阅读 · 2018年6月22日

Population Anomaly Detection through Deep Gaussianization

Arxiv

6+阅读 · 2018年5月5日

Iterative Manifold Embedding Layer Learned by Incomplete Data for Large-scale Image Retrieval

Arxiv

8+阅读 · 2018年4月3日

Active Learning from Positive and Unlabeled Data

Arxiv

3+阅读 · 2016年2月24日

微信扫码咨询专知VIP会员