Is all of machine learning supervised to some degree? The field of machine learning has traditionally been categorized pedagogically into $supervised~vs~unsupervised~learning$; where supervised learning has typically referred to learning from labeled data, while unsupervised learning has typically referred to learning from unlabeled data. In this paper, we assert that all machine learning is in fact supervised to some degree, and that the scope of supervision is necessarily commensurate to the scope of learning potential. In particular, we argue that clustering algorithms such as k-means, and dimensionality reduction algorithms such as principal component analysis, variational autoencoders, and deep belief networks are each internally supervised by the data itself to learn their respective representations of its features. Furthermore, these algorithms are not capable of external inference until their respective outputs (clusters, principal components, or representation codes) have been identified and externally labeled in effect. As such, they do not suffice as examples of unsupervised learning. We propose that the categorization `supervised vs unsupervised learning' be dispensed with, and instead, learning algorithms be categorized as either $internally~or~externally~supervised$ (or both). We believe this change in perspective will yield new fundamental insights into the structure and character of data and of learning algorithms.
翻译:机器学习领域传统上一直被归类为 $suited~vs~unsurved~unsurved~teleging $; 受监督学习通常是指从标签数据中学习,而不受监督学习通常是指从未标签数据中学习。 在本文中,我们断言所有机器学习实际上都受到某种程度的监督,监督范围必然与学习潜力的范围相称。特别是,我们认为,诸如 k- 运算法和诸如主要组成部分分析、变式自动计算器和深信网络等维维度减少算法等组合算法通常都由数据本身监督,以了解它们各自的特征。此外,这些算法无法进行外部推论,直到它们各自的产出(集群、主要组成部分或代表代码)被确定并被外部贴上实际标签。因此,它们并不足够作为未经监督学习的范例。我们提议,“超超超超超超超超超超超超超超超超超超超超超超超超超超超超常学习的学习算法和深深信网络”等等算法, 由数据本身本身受到内部监督监督,而是由数据本身监督,而由数据本身受到内部监督,而由数据内部监督,而由数据本身的解读,因此,因此而学习,因此,因此,因此,因此,这些算法将学习,因此,这些系统将被用于将被用于将被纳入将数据结构分析,将被用于将数据结构分类。