We propose a penalized Dirichlet regression framework for modeling compositional data, using a softmax link to ensure that the mean vector lies on the simplex and to avoid log-ratio transformations or zero replacement. The model is formulated in a GLM-like setting and incorporates an mathematical equation (ridge) penalty on the regression coefficients to improve stability in the presence of multicollinearity, high-dimensional covariate spaces and weak effects. The classical Dirichlet regression is recovered as the special case with zero penalty, so that the proposed estimator nests the standard approach. Estimation is carried out via a gradient-based block coordinate ascent algorithm, for which we derive closed-form expressions for the log-likelihood gradient and for the Jacobian of the softmax transformation. We investigate the performance of the method through a simulation study that includes orthogonal and correlated designs, different noise levels and a sparse high-dimensional scenario. The results show that the penalized and unpenalized estimators are essentially equivalent in simple well-posed settings, while the ridge-penalized model achieves systematically lower coefficient RMSE and higher log-likelihood in more challenging configurations. Finally, we apply the method to US male cause-of-death data, where cross-validated penalization yields improved fit and smooth, interpretable age–cause profiles supported by bootstrap confidence intervals.

Stabilizing Inference in Dirichlet Regression via Ridge-Penalized Model

Andrea Nigri
2026-01-01

Abstract

We propose a penalized Dirichlet regression framework for modeling compositional data, using a softmax link to ensure that the mean vector lies on the simplex and to avoid log-ratio transformations or zero replacement. The model is formulated in a GLM-like setting and incorporates an mathematical equation (ridge) penalty on the regression coefficients to improve stability in the presence of multicollinearity, high-dimensional covariate spaces and weak effects. The classical Dirichlet regression is recovered as the special case with zero penalty, so that the proposed estimator nests the standard approach. Estimation is carried out via a gradient-based block coordinate ascent algorithm, for which we derive closed-form expressions for the log-likelihood gradient and for the Jacobian of the softmax transformation. We investigate the performance of the method through a simulation study that includes orthogonal and correlated designs, different noise levels and a sparse high-dimensional scenario. The results show that the penalized and unpenalized estimators are essentially equivalent in simple well-posed settings, while the ridge-penalized model achieves systematically lower coefficient RMSE and higher log-likelihood in more challenging configurations. Finally, we apply the method to US male cause-of-death data, where cross-validated penalization yields improved fit and smooth, interpretable age–cause profiles supported by bootstrap confidence intervals.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11369/479872
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact