In safety-critical decision-making, the environment may evolve over time, and the learner adjusts its risk level accordingly. This work investigates risk-averse online optimization in dynamic environments with varying risk levels, employing Conditional Value-at-Risk (CVaR) as the risk measure. To capture the dynamics of the environment and risk levels, we employ the function variation metric and introduce a novel risk-level variation metric. Two information settings are considered: a first-order scenario, where the learner observes both function values and their gradients; and a zeroth-order scenario, where only function evaluations are available. For both cases, we develop risk-averse learning algorithms with a limited sampling budget and analyze their dynamic regret bounds in terms of function variation, risk-level variation, and the total number of samples. The regret analysis demonstrates the adaptability of the algorithms in non-stationary and risk-sensitive settings. Finally, numerical experiments are presented to demonstrate the efficacy of the methods.
翻译:在安全关键决策中,环境可能随时间演变,学习者需相应调整其风险水平。本研究探讨了在风险水平变化的动态环境中进行风险规避在线优化的问题,采用条件风险价值(CVaR)作为风险度量。为捕捉环境与风险水平的动态性,我们采用函数变差度量并引入一种新颖的风险水平变差度量。考虑两种信息设定:一阶场景(学习者可观测函数值及其梯度)和零阶场景(仅能获取函数评估值)。针对两种情况,我们开发了具有有限采样预算的风险规避学习算法,并依据函数变差、风险水平变差及总样本数分析了其动态遗憾界。遗憾分析证明了算法在非平稳和风险敏感场景中的适应性。最后,通过数值实验验证了所提方法的有效性。