PICABench：我们距离物理真实的图像编辑还有多远？ (PICABench: How Far Are We from Physically Realistic Image Editing?)

Image editing has achieved remarkable progress recently. Modern editing models could already follow complex instructions to manipulate the original content. However, beyond completing the editing instructions, the accompanying physical effects are the key to the generation realism. For example, removing an object should also remove its shadow, reflections, and interactions with nearby objects. Unfortunately, existing models and benchmarks mainly focus on instruction completion but overlook these physical effects. So, at this moment, how far are we from physically realistic image editing? To answer this, we introduce PICABench, which systematically evaluates physical realism across eight sub-dimension (spanning optics, mechanics, and state transitions) for most of the common editing operations (add, remove, attribute change, etc). We further propose the PICAEval, a reliable evaluation protocol that uses VLM-as-a-judge with per-case, region-level human annotations and questions. Beyond benchmarking, we also explore effective solutions by learning physics from videos and construct a training dataset PICA-100K. After evaluating most of the mainstream models, we observe that physical realism remains a challenging problem with large rooms to explore. We hope that our benchmark and proposed solutions can serve as a foundation for future work moving from naive content editing toward physically consistent realism.

翻译：近期，图像编辑领域取得了显著进展。现代编辑模型已能够遵循复杂指令对原始内容进行操控。然而，除了完成编辑指令外，伴随的物理效应是生成真实感的关键。例如，移除一个物体时，其阴影、反射以及与邻近物体的相互作用也应被同步移除。遗憾的是，现有模型与基准测试主要关注指令完成度，却忽视了这些物理效应。那么，当前我们距离物理真实的图像编辑究竟还有多远？为回答此问题，我们提出了PICABench，该系统针对大多数常见编辑操作（添加、移除、属性变更等），从八个子维度（涵盖光学、力学及状态转换）全面评估物理真实性。我们进一步提出了PICAEval——一种可靠的评估协议，该协议采用VLM-as-a-judge方法，并结合逐案例、区域级的人工标注与问题设计。除基准测试外，我们还通过从视频中学习物理规律探索了有效解决方案，并构建了包含10万样本的训练数据集PICA-100K。在对主流模型进行全面评估后，我们发现物理真实性仍是一个具有广阔探索空间的挑战性问题。我们期望本基准测试及提出的解决方案能为未来研究从朴素内容编辑迈向物理一致的真实感提供基础支撑。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日