Benchmarking optimization algorithms is fundamental for the advancement of computational intelligence. However, widely adopted artificial test suites exhibit limited correspondence with the diversity and complexity of real-world engineering optimization tasks. This paper presents a new benchmark suite comprising 231 bounded, continuous, unconstrained optimization problems, the majority derived from engineering design and simulation scenarios, including computational fluid dynamics and finite element analysis models. In conjunction with this suite, a novel performance metric is introduced, which employs random sampling as a statistical reference, providing nonlinear normalization of objective values and enabling unbiased comparison of algorithmic efficiency across heterogeneous problems. Using this framework, 20 deterministic and stochastic optimization methods were systematically evaluated through hundreds of independent runs per problem, ensuring statistical robustness. The results indicate that only a few of the tested optimization methods consistently achieve excellent performance, while several commonly used metaheuristics exhibit severe efficiency loss on engineering-type problems, emphasizing the limitations of conventional benchmarks. Furthermore, the conducted tests are used for analyzing various features of the optimization methods, providing practical guidelines for their application. The proposed test suite and metric together offer a transparent, reproducible, and practically relevant platform for evaluating and comparing optimization methods, thereby narrowing the gap between the available benchmark tests and realistic engineering applications.
翻译:基准测试优化算法是计算智能领域发展的基础。然而,广泛采用的人工测试套件与真实世界工程优化任务的多样性和复杂性之间对应关系有限。本文提出一个新的基准测试套件,包含231个有界、连续、无约束的优化问题,其中大多数源自工程设计与仿真场景,包括计算流体动力学和有限元分析模型。与此套件配套,引入一种新颖的性能度量标准,该标准采用随机抽样作为统计参考,提供目标值的非线性归一化,并支持在异构问题间对算法效率进行无偏比较。利用该框架,通过每个问题数百次独立运行,系统评估了20种确定性和随机优化方法,确保了统计稳健性。结果表明,仅有少数测试的优化方法能持续取得优异性能,而几种常用的元启发式算法在工程类问题上表现出严重的效率损失,突显了传统基准测试的局限性。此外,所进行的测试用于分析优化方法的多种特征,为其应用提供实用指南。所提出的测试套件与度量标准共同构建了一个透明、可复现且具有实际相关性的平台,用于评估和比较优化方法,从而缩小现有基准测试与真实工程应用之间的差距。