Hidden Markov models are versatile tools for modeling sequential observations, where it is assumed that a hidden state process selects which of finitely many distributions generates any given observation. Specifically for time series of counts, the Poisson family often provides a natural choice for the state-dependent distributions, though more flexible distributions such as the negative binomial or distributions with a bounded range can also be used. However, in practice, choosing an adequate class of (parametric) distributions is often anything but straightforward, and an inadequate choice can have severe negative consequences on the model's predictive performance, on state classification, and generally on inference related to the system considered. To address this issue, we propose an effectively nonparametric approach to fitting hidden Markov models to time series of counts, where the state-dependent distributions are estimated in a completely data-driven way without the need to select a distributional family. To avoid overfitting, we add a roughness penalty based on higher-order differences between adjacent count probabilities to the likelihood, which is demonstrated to produce smooth probability mass functions of the state-dependent distributions. The feasibility of the suggested approach is assessed in a simulation experiment, and illustrated in two real-data applications, where we model the distribution of i) major earthquake counts and ii) acceleration counts of an oceanic whitetip shark (Carcharhinus longimanus) over time.
翻译:隐藏的Markov 模型是模拟连续观测的多功能工具, 假设隐藏状态进程选择了有限多分布的哪一部分产生任何特定观测。 具体到时间序列, Poisson 家族通常为基于状态的分布提供一种自然选择, 尽管也可以使用更灵活的分布方式, 如负二进制或有界限的分布方式。 但是, 在实践中, 选择一个适当的( 参数) 分布类别往往只是简单易行, 而选择不充分的选择可能对模型的预测性、 状态分类和与所考虑的系统相关的推论产生严重的负面影响。 为了解决这个问题, 我们提出了一种有效的非参数性方法, 将隐藏的Markov 模型与时间序列相匹配, 在那里, 以完全的数据驱动的方式估算国家分布方式, 而不必选择一个分布式的分布式。 为了避免过度, 我们添加了一种粗度的处罚, 依据更相近的计数概率之间的差异, 从而证明, 能够产生由国家依赖的模型分布方式的概率。 为了解决这个问题, 我们提出了一种有效的非参数 方法, 在两个海洋加速的计算中, 在模拟中, 模拟了一种实际的地震统计中, 计算式的计算法 。