Shot assembly is a crucial step in film production and video editing, involving the sequencing and arrangement of shots to construct a narrative, convey information, or evoke emotions. Traditionally, this process has been manually executed by experienced editors. While current intelligent video editing technologies can handle some automated video editing tasks, they often fail to capture the creator's unique artistic expression in shot assembly.To address this challenge, we propose an energy-based optimization method for video shot assembly. Specifically, we first perform visual-semantic matching between the script generated by a large language model and a video library to obtain subsets of candidate shots aligned with the script semantics. Next, we segment and label the shots from reference videos, extracting attributes such as shot size, camera motion, and semantics. We then employ energy-based models to learn from these attributes, scoring candidate shot sequences based on their alignment with reference styles. Finally, we achieve shot assembly optimization by combining multiple syntax rules, producing videos that align with the assembly style of the reference videos. Our method not only automates the arrangement and combination of independent shots according to specific logic, narrative requirements, or artistic styles but also learns the assembly style of reference videos, creating a coherent visual sequence or holistic visual expression. With our system, even users with no prior video editing experience can create visually compelling videos. Project page: https://sobeymil.github.io/esa.com
翻译:镜头组装是电影制作和视频编辑中的关键步骤,涉及镜头的排序与排列,以构建叙事、传递信息或唤起情感。传统上,这一过程由经验丰富的编辑手动执行。虽然当前的智能视频编辑技术能够处理部分自动化视频编辑任务,但它们往往无法在镜头组装中捕捉创作者的独特艺术表达。为应对这一挑战,我们提出了一种基于能量优化的视频镜头组装方法。具体而言,我们首先通过大型语言模型生成的脚本与视频库进行视觉语义匹配,获取与脚本语义对齐的候选镜头子集。接着,对参考视频中的镜头进行分割和标注,提取镜头尺寸、摄像机运动和语义等属性。随后,我们采用基于能量的模型从这些属性中学习,根据候选镜头序列与参考风格的匹配度进行评分。最后,通过结合多种语法规则实现镜头组装优化,生成与参考视频组装风格一致的视频。我们的方法不仅能够根据特定逻辑、叙事需求或艺术风格自动排列和组合独立镜头,还能学习参考视频的组装风格,创建连贯的视觉序列或整体视觉表达。借助我们的系统,即使没有视频编辑经验的用户也能创作出视觉上引人入胜的视频。项目页面:https://sobeymil.github.io/esa.com