We present a structured additive regression approach to model conditional densities given scalar covariates, where only samples of the conditional distributions are observed. This links our approach to distributional regression models for scalar data. The model is formulated in a Bayes Hilbert space -- preserving nonnegativity and integration to one under summation and scalar multiplication -- with respect to an arbitrary finite measure. This allows to consider, amongst others, continuous, discrete and mixed densities. Our theoretical results include asymptotic existence, uniqueness, consistency, and asymptotic normality of the penalized maximum likelihood estimator, as well as confidence regions and inference for the (effect) densities. For estimation, we propose to maximize the penalized log-likelihood corresponding to an appropriate multinomial, or equivalently, Poisson regression model, which we show to approximate the original penalized maximum likelihood problem. We apply our framework to a motivating gender economic data set from the German Socio-Economic Panel Study (SOEP), analyzing the distribution of the woman's share in a couple's total labor income given covariate effects for year, place of residence and age of the youngest child. As the income share is a continuous variable having discrete point masses at zero and one for single-earner couples, the corresponding densities are of mixed type.
翻译:本文提出了一种结构化加性回归方法,用于建模给定标量协变量的条件密度,其中仅能观测到条件分布的样本。这使我们的方法与标量数据的分布回归模型相联系。该模型在贝叶斯希尔伯特空间中构建——该空间在加法和标量乘法运算下保持非负性与积分为一的特性——且相对于任意有限测度。这使得我们能够处理连续、离散及混合型密度等多种情况。我们的理论结果包括惩罚极大似然估计量的渐近存在性、唯一性、相合性及渐近正态性,以及(效应)密度的置信区域与统计推断。在估计方面,我们提出通过最大化对应于适当多项分布(或等价地,泊松回归模型)的惩罚对数似然函数进行估计,并证明该近似可逼近原始惩罚极大似然问题。我们将所提框架应用于德国社会经济面板研究(SOEP)中具有启发性的性别经济数据集,分析在给定年份、居住地及最小子女年龄等协变量效应下,女性在夫妻双方总收入中所占份额的分布。由于收入份额是连续变量,且在单收入家庭情况下于零和一值处存在离散点质量,对应的密度属于混合类型。