The discovery of novel inhibitor molecules for emerging drug-target proteins is widely acknowledged as a challenging inverse design problem: Exhaustive exploration of the vast chemical search space is impractical, especially when the target structure or active molecules are unknown. Here we validate experimentally the broad utility of a deep generative framework trained at-scale on protein sequences, small molecules, and their mutual interactions -- that is unbiased toward any specific target. As demonstrators, we consider two dissimilar and relevant SARS-CoV-2 targets: the main protease and the spike protein (receptor binding domain, RBD). To perform target-aware design of novel inhibitor molecules, a protein sequence-conditioned sampling on the generative foundation model is performed. Despite using only the target sequence information, and without performing any target-specific adaptation of the generative model, micromolar-level inhibition was observed in in vitro experiments for two candidates out of only four synthesized for each target. The most potent spike RBD inhibitor also exhibited activity against several variants in live virus neutralization assays. These results therefore establish that a single, broadly deployable generative foundation model for accelerated hit discovery is effective and efficient, even in the most general case where neither target structure nor binder information is available.
翻译:发现新出现的药物目标蛋白的新抑制分子分子被公认为是一个具有挑战性的反向设计问题:对庞大的化学搜索空间进行彻底探索是不切实际的,特别是当目标结构或活跃分子未知时。在这里,我们实验地验证了在蛋白序列、小分子及其相互作用方面接受过大规模培训的深层基因化框架的广泛效用 -- -- 对任何具体目标都是不带偏见的。作为示威者,我们认为两个不同和相关的SARS-COV-2目标:主要蛋白质和尖刺蛋白(受体约束域,RBD)。为了对新的抑制分子进行有目标的设计,在基因化基础模型上进行蛋白质序列定序取样。尽管我们只使用目标序列信息,而且没有对基因化模型进行任何特定目标调整,但在体外实验中观察到微摩尔级抑制作用,每个目标只有四种合成的两名候选人。最强大的峰化RBD抑制剂还展示了与活性病毒中性变异体(受域,RBD)的活动。因此,这些结果证明一个单一的、最广泛的部署性、最广泛的基质和加速的基质的模型既不是有效的常规的,也是快速的,也是快速的基的。