We propose a penalized Dirichlet regression framework for modeling compositional data, using a softmax link to ensure that the mean vector lies on the simplex and to avoid log-ratio transformations or zero replacement. The model is formulated in a GLM-like setting and incorporates an mathematical equation (ridge) penalty on the regression coefficients to improve stability in the presence of multicollinearity, high-dimensional covariate spaces and weak effects. The classical Dirichlet regression is recovered as the special case with zero penalty, so that the proposed estimator nests the standard approach. Estimation is carried out via a gradient-based block coordinate ascent algorithm, for which we derive closed-form expressions for the log-likelihood gradient and for the Jacobian of the softmax transformation. We investigate the performance of the method through a simulation study that includes orthogonal and correlated designs, different noise levels and a sparse high-dimensional scenario. The results show that the penalized and unpenalized estimators are essentially equivalent in simple well-posed settings, while the ridge-penalized model achieves systematically lower coefficient RMSE and higher log-likelihood in more challenging configurations. Finally, we apply the method to US male cause-of-death data, where cross-validated penalization yields improved fit and smooth, interpretable age–cause profiles supported by bootstrap confidence intervals.
Stabilizing Inference in Dirichlet Regression via Ridge-Penalized Model
Andrea Nigri
2026-01-01
Abstract
We propose a penalized Dirichlet regression framework for modeling compositional data, using a softmax link to ensure that the mean vector lies on the simplex and to avoid log-ratio transformations or zero replacement. The model is formulated in a GLM-like setting and incorporates an mathematical equation (ridge) penalty on the regression coefficients to improve stability in the presence of multicollinearity, high-dimensional covariate spaces and weak effects. The classical Dirichlet regression is recovered as the special case with zero penalty, so that the proposed estimator nests the standard approach. Estimation is carried out via a gradient-based block coordinate ascent algorithm, for which we derive closed-form expressions for the log-likelihood gradient and for the Jacobian of the softmax transformation. We investigate the performance of the method through a simulation study that includes orthogonal and correlated designs, different noise levels and a sparse high-dimensional scenario. The results show that the penalized and unpenalized estimators are essentially equivalent in simple well-posed settings, while the ridge-penalized model achieves systematically lower coefficient RMSE and higher log-likelihood in more challenging configurations. Finally, we apply the method to US male cause-of-death data, where cross-validated penalization yields improved fit and smooth, interpretable age–cause profiles supported by bootstrap confidence intervals.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


