We study a multiclass multiple instance learning (MIL) problem where the labels only suggest whether any instance of a class exists or does not exist in a training sample or example. No further information, e.g., the number of instances of each class, relative locations or orders of all instances in a training sample, is exploited. Such a weak supervision learning problem can be exactly solved by maximizing the model likelihood fitting given observations, and finds applications to tasks like multiple object detection and localization for image understanding. We discuss its relationship to the classic classification problem, the traditional MIL, and connectionist temporal classification (CTC). We use image recognition as the example task to develop our method, although it is applicable to data with higher or lower dimensions without much modification. Experimental results show that our method can be used to learn all convolutional neural networks for solving real-world multiple object detection and localization tasks with weak annotations, e.g., transcribing house number sequences from the Google street view imagery dataset.
翻译:我们研究的是多级多实例学习(MIL)问题,其中标签只显示某一类的事例是否存在或不存在于培训样本或实例中。没有利用任何进一步的信息,例如每一类、相对地点或培训样本中所有事例的订单的数量。这种薄弱的监督学习问题可以通过最大限度地利用模型的匹配性观测结果来完全解决,并找到多种物体探测和定位等任务的应用,以了解图像。我们讨论其与典型分类问题、传统的MIL和连接器时间分类(CTC)的关系。我们用图像识别作为制定方法的样板任务,尽管它适用于具有更高或更低尺寸的数据,而没有太大的修改。实验结果表明,我们的方法可以用来学习所有革命神经网络,用微弱的注释解决现实世界多物体探测和本地化任务,例如谷歌街头图像数据集转录住宅号序列。