Exploring a Federated Learning Approach to Enhance Authorship Attribution of Misleading Information from Heterogeneous Sources

Marulli, F.; Balzanella, A.; Campanile, L.; Iacono, M.; Mastroianni, M.

doi:10.1109/IJCNN52387.2021.9534377

Authorship Attribution (AA) is currently applied in several applications, among which fraud detection and anti-plagiarism checks: this task can leverage stylometry and Natural Language Processing techniques. In this work, we explored some strategies to enhance the performance of an AA task for the automatic detection of false and misleading information (e.g., fake news). We set up a text classification model for AA based on stylometry exploiting recurrent deep neural networks and implemented two learning tasks trained on the same collection of fake and real news, comparing their performances: one is based on Federated Learning architecture, the other on a centralized architecture. The goal was to discriminate potential fake information from true ones when the fake news comes from heterogeneous sources, with different styles. Preliminary experiments show that a distributed approach significantly improves recall with respect to the centralized model. As expected, precision was lower in the distributed model. This aspect, coupled with the statistical heterogeneity of data, represents some open issues that will be further investigated in future work.

Exploring a Federated Learning Approach to Enhance Authorship Attribution of Misleading Information from Heterogeneous Sources

Marulli F.;Balzanella A.;Campanile L.;Iacono M.;Mastroianni M.

2021-01-01

Abstract

Authorship Attribution (AA) is currently applied in several applications, among which fraud detection and anti-plagiarism checks: this task can leverage stylometry and Natural Language Processing techniques. In this work, we explored some strategies to enhance the performance of an AA task for the automatic detection of false and misleading information (e.g., fake news). We set up a text classification model for AA based on stylometry exploiting recurrent deep neural networks and implemented two learning tasks trained on the same collection of fake and real news, comparing their performances: one is based on Federated Learning architecture, the other on a centralized architecture. The goal was to discriminate potential fake information from true ones when the fake news comes from heterogeneous sources, with different styles. Preliminary experiments show that a distributed approach significantly improves recall with respect to the centralized model. As expected, precision was lower in the distributed model. This aspect, coupled with the statistical heterogeneity of data, represents some open issues that will be further investigated in future work.