This thesis considers sequential decision problems, where the loss/reward incurred by selecting an action may not be inferred from observed feedback. A major part of this thesis focuses on the unsupervised sequential selection problem, where one can not infer the loss incurred for selecting an action from observed feedback. We also introduce a new setup named Censored Semi Bandits, where the loss incurred for selecting an action can be observed under certain conditions. Finally, we study the channel selection problem in the communication networks, where the reward for an action is only observed when no other player selects that action to play in the round. These problems find applications in many fields like healthcare, crowd-sourcing, security, adaptive resource allocation, among many others. This thesis aims to address the above-described sequential decision problems by exploiting specific structures these problems exhibit. We develop provably optimal algorithms for each of these setups with weak feedback and validate their empirical performance on different problem instances derived from synthetic and real datasets.
翻译:该论文考虑了顺序决定问题,其中选择行动所造成的损失/回报不能从观察到的反馈中推断出来。本论文的主要部分侧重于未经监督的顺序选择问题,其中无法推断从观察到的反馈中选择行动所造成的损失。我们还引入了一个新的设置,名为Cenored Sime Bridits, 在某些条件下可以观察到选择行动所造成的损失。最后,我们研究了通信网络中的频道选择问题,只有在没有其他参与者选择该行动在回合中发挥作用时才观察到该行动的奖赏。这些问题在许多领域,如保健、众包、安全、适应性资源分配等,都存在应用问题。本论文的目的是通过利用这些问题所展示的具体结构,解决上面描述的顺序决定问题。我们为每一个这些组合开发了最优化的算法,对来自合成和真实数据集的不同问题案例的经验性表现进行了验证。