Despite the increasing adoption of Machine Learning (ML) models for agricultural production forecasting and international comparative studies, most cross-country analyses train and evaluate models within the same national context, implicitly assuming their ability to generalize across different settings. This paper explicitly investigates the existence of a cross-country generalization bias by assessing whether ML models trained on data from one country retain adequate predictive performance when applied to a structurally different context. Using a harmonized dataset from Italy and Lithuania, we implement a cross-country evaluation framework in which supervised models are trained exclusively on one country and tested out-of-distribution on the other. Model performance is assessed using standard predictive metrics and compared against within-country benchmarks to quantify potential generalization gaps. In addition, we analyze the stability of feature importance and error distributions to identify systematic shifts in the patterns learned by the models. The results highlight that model transferability is not guaranteed in the presence of cross-country distribution shift, underscoring the need for explicit cross-country generalization assessments in ML-based comparative research for food security and agricultural planning.
Cross-country generalization bias in agricultural production: do machine learning models learn the same way?
Antonio Vairo
;Luca Grilli
;
2026-01-01
Abstract
Despite the increasing adoption of Machine Learning (ML) models for agricultural production forecasting and international comparative studies, most cross-country analyses train and evaluate models within the same national context, implicitly assuming their ability to generalize across different settings. This paper explicitly investigates the existence of a cross-country generalization bias by assessing whether ML models trained on data from one country retain adequate predictive performance when applied to a structurally different context. Using a harmonized dataset from Italy and Lithuania, we implement a cross-country evaluation framework in which supervised models are trained exclusively on one country and tested out-of-distribution on the other. Model performance is assessed using standard predictive metrics and compared against within-country benchmarks to quantify potential generalization gaps. In addition, we analyze the stability of feature importance and error distributions to identify systematic shifts in the patterns learned by the models. The results highlight that model transferability is not guaranteed in the presence of cross-country distribution shift, underscoring the need for explicit cross-country generalization assessments in ML-based comparative research for food security and agricultural planning.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


