Modern cloud-native systems increasingly rely on multi-cluster deployments to support scalability, resilience, and geographic distribution. However, existing resource management approaches remain largely reactive and cluster-centric, limiting their ability to optimize system-wide behavior under dynamic workloads. These limitations result in inefficient resource utilization, delayed adaptation, and increased operational overhead across distributed environments. This paper presents an AI-driven framework for adaptive resource optimization in multi-cluster cloud systems. The proposed approach integrates predictive learning, policy-aware decision-making, and continuous feedback to enable proactive and coordinated resource management across clusters. By analyzing cross-cluster telemetry and historical execution patterns, the framework dynamically adjusts resource allocation to balance performance, cost, and reliability objectives. A prototype implementation demonstrates improved resource efficiency, faster stabilization during workload fluctuations, and reduced performance variability compared to conventional reactive approaches. The results highlight the effectiveness of intelligent, self-adaptive infrastructure management as a key enabler for scalable and resilient cloud platforms.
翻译:现代云原生系统日益依赖多集群部署以实现可扩展性、弹性与地理分布。然而,现有的资源管理方法在很大程度上仍属于被动式且以集群为中心,限制了其在动态工作负载下优化系统全局行为的能力。这些局限性导致分布式环境中的资源利用效率低下、适应延迟以及运维开销增加。本文提出一种面向多集群云系统的AI驱动自适应资源优化框架。该方法整合了预测性学习、策略感知决策与持续反馈机制,实现了跨集群的主动协调式资源管理。通过分析跨集群遥测数据与历史执行模式,该框架动态调整资源分配以平衡性能、成本与可靠性目标。原型实现表明,相较于传统的被动式方法,该框架提升了资源效率,加快了工作负载波动期间的稳定速度,并降低了性能波动性。研究结果凸显了智能自适应基础设施管理作为可扩展弹性云平台关键使能技术的有效性。