This memorandum presents four recommendations aimed at strengthening the principles of AI model reliability and AI model governability, as DoW, ODNI, NIST, and CAISI refine AI assurance frameworks under the AI Action Plan. Our focus concerns the open scientific problem of misalignment and its implications on AI model behavior. Specifically, misalignment and scheming capabilities can be a red flag indicating AI model insufficient reliability and governability. To address the national security threats arising from misalignment, we recommend that DoW and the IC strategically leverage existing testing and evaluation pipelines and their OT authority to future proof the principles of AI model reliability and AI model governability through a suite of scheming and control evaluations.
翻译:暂无翻译