样本驱动的最佳制止:从秘书问题到预言人不平等 (Sample-driven optimal stopping: From the secretary problem to the i.i.d. prophet inequality)

We take a unifying approach to single selection optimal stopping problems with random arrival order and independent sampling of items. In the problem we consider, a decision maker (DM) initially gets to sample each of $N$ items independently with probability $p$, and can observe the relative rankings of these sampled items. Then, the DM faces the remaining items in an online fashion, observing the relative rankings of all revealed items. While scanning the sequence the DM makes irrevocable stop/continue decisions and her reward for stopping the sequence facing the item with rank $i$ is $Y_i$. The goal of the DM is to maximize her reward. We start by studying the case in which the values $Y_i$ are known to the DM, and then move to the case in which these values are adversarial. For the former case, we write the natural linear program that captures the performance of an algorithm, and take its continuous limit. We prove a structural result about this continuous limit, which allows us to reduce the problem to a relatively simple real optimization problem. We establish that the optimal algorithm is given by a sequence of thresholds $t_1\le t_2\le\cdots$ such that the DM should stop if seeing an item with current ranking $i$ after time $t_i$. Additionally we are able to recover several classic results in the area such as those for secretary problem and the minimum ranking problem. For the adversarial case, we obtain a similar linear program with an additional stochastic dominance constraint. Using the same machinery we are able to pin down the optimal competitive ratios for all values of $p$. Notably, we prove that as $p$ approaches 1, our guarantee converges linearly to 0.745, matching that of the i.i.d.~prophet inequality. Also interesting is the case $p=1/2$, where our bound evaluates to $0.671$, which improves upon the state of the art.

翻译：我们用统一的方法来选择单选的最佳问题, 随机抵达顺序和独立抽样项目。我们认为问题所在, 决策者( DM) 最初会独立地对每件美元项目进行抽样, 概率为$p$, 并且可以观察这些抽样项目的相对排名。然后, 管理部会以在线方式面对其余项目, 观察所有披露项目的相对排名。在扫描该序列时, DM 会做出不可撤销的停止/ 继续决定, 并奖励她停止该项目所面临的排序, 标为$1 Y/2 美元。管理部的目标是最大限度地获得她的奖赏。我们首先研究一个金额为$y_ preal$的样本, 然后转到这些数值是对抗的案例中。对于前一个案例, 我们用自然线性程序来记录算算算算算所有披露项目的绩效, 并且持续限制的结构性结果是相同的。我们能将问题降低到一个相对简单的真实优化问题。我们确定最佳的算法是以 $_ $1\ preal creal ex case case, rial case rial rideal rideal rideal as the liver liver lade as the liver list liver list list list list the list list list list the list list list rigleglection.