We study coverage processes in which each draw reveals a subset of $[n]$, and the goal is to determine the expected number of draws until all items are seen at least once. A classical example is the Coupon Collector's Problem, where each draw reveals exactly one item. Motivated by shotgun DNA sequencing, we introduce a model where each draw is a contiguous window of fixed length, in both cyclic and non-cyclic variants. We develop a unifying combinatorial tool that shifts the task of finding coverage time from probability, to a counting problem over families of subsets of $[n]$ that together contain all items, enabling exact calculation. Using this result, we obtain exact expressions for the window models. We then leverage past results on a continuous analogue of the cyclic window model to analyze the asymptotic behavior of both models. We further study what we call uniform $\ell$-regular models, where every draw has size $\ell$ and every item appears in the same number of admissible draws. We compare these to the batch sampling model, in which all $\ell$-subsets are drawn uniformly at random and present upper and lower bounds, which were also obtained independently by Berend and Sher. We conjecture, and prove for special cases, that this model maximizes the coverage time among all uniform $\ell$-regular models. Finally, we prove a universal upper bound on the entire class of uniform $\ell$-regular models, which illuminates the fact that many sampling models share the same leading asymptotic order, while potentially differing significantly in lower-order terms.
翻译:我们研究覆盖过程,其中每次抽样揭示集合$[n]$的一个子集,目标是确定所有元素至少出现一次所需的期望抽样次数。经典案例是优惠券收集问题,其中每次抽样恰好揭示一个元素。受鸟枪法DNA测序启发,我们引入一种模型,其中每次抽样为固定长度的连续窗口,包含循环与非循环两种变体。我们开发了一种统一的组合工具,将覆盖时间的计算任务从概率问题转化为对$[n]$子集族的计数问题,这些子集族共同包含所有元素,从而实现精确计算。利用该结果,我们获得了窗口模型的精确表达式。随后,我们借助循环窗口模型的连续类比已有结果,分析两种模型的渐近行为。我们进一步研究称为均匀$\ell$-正则模型,其中每次抽样规模为$\ell$且每个元素在可容许抽样中出现的次数相同。我们将这些模型与批量抽样模型进行比较,后者中所有$\ell$-子集均匀随机抽取,并给出上下界(该结果亦由Berend和Sher独立获得)。我们猜想并针对特殊情形证明,该模型在所有均匀$\ell$-正则模型中具有最大覆盖时间。最后,我们证明了均匀$\ell$-正则模型整个类别的普适上界,这揭示了众多抽样模型具有相同的主导渐近阶,而低阶项可能存在显著差异。