High content imaging assays can capture rich phenotypic response data for large sets of compound treatments, aiding in the characterization and discovery of novel drugs. However, extracting representative features from high content images that can capture subtle nuances in phenotypes remains challenging. The lack of high-quality labels makes it difficult to achieve satisfactory results with supervised deep learning. Self-Supervised learning methods, which learn from automatically generated labels has shown great success on natural images, offer an attractive alternative also to microscopy images. However, we find that self-supervised learning techniques underperform on high content imaging assays. One challenge is the undesirable domain shifts present in the data known as batch effects, which may be caused by biological noise or uncontrolled experimental conditions. To this end, we introduce Cross-Domain Consistency Learning (CDCL), a novel approach that is able to learn in the presence of batch effects. CDCL enforces the learning of biological similarities while disregarding undesirable batch-specific signals, which leads to more useful and versatile representations. These features are organised according to their morphological changes and are more useful for downstream tasks - such as distinguishing treatments and mode of action.
翻译:高内容成像分析可以捕捉大量化合物处理的丰富小孔反应数据,有助于鉴定和发现新型药物。然而,从高含量图像中提取能捕捉到苯型类细微细细细细微差别的具有代表性的特征仍然具有挑战性。由于缺乏高质量的标签,很难在监督的深层学习中取得令人满意的结果。从自动生成的标签中学习的自我监督学习方法在自然图像上也表现出极大的成功,也为显微镜图像提供了一种有吸引力的替代方法。然而,我们发现,自我监督的学习技术在高含量成像分析方面不完善。一个挑战就是被称为批量效应的数据中存在的不良域变化,这可能是由生物噪音或无节制实验条件造成的。为此,我们引入了跨界一致性学习(CDCLL),这是一种新颖的方法,在出现批量效应时可以学习。CDCLL在忽视不可取的批量信号的同时,对生物相似性进行学习,从而导致更有用和多功能的演示。这些特征是根据其形态变化而成型的特征组织起来的,并且对于下游作业方式更为有用。