Raman Spectroscopy Assisted by Machine Learning Algorithms for the Prediction of Different Types of Oral Cancer Cells

Lasalvia, M.; Capozzi, V.; Perna, G.

doi:10.3390/app16052380

Oral squamous cell carcinoma (OSCC) cytology involves extracting a cell sample consisting of single cells or small clusters of cells from patients’ head and neck area in order to identify abnormal morphological characteristics after staining it. This method is used to screen for early cancer and the formation of metastases within the oral cavity. OSCC diagnosis partly depends on pathologists’ skills and also laboratories’ instrumentation. The use of Raman spectroscopy could support diagnoses performed using traditional methods, providing information based on the cellular biochemical environment. Technical drawbacks related to low signal-to-noise ratios of Raman spectroscopy and the need to obtain diagnostic information within a reasonable time frame have recently led to the analysis of Raman spectra using machine learning (ML) methods in order to obtain reliable information about the correct attribution of unknown cellular spectra. So, we used Raman micro-spectroscopy combined with machine learning methods to build classification models, which allow the diagnosis of different grades of OSCC in cell samples. The Raman spectra were analysed in the 980–1800 cm−1 range by focusing the laser beam onto the nucleus and the cytoplasm regions of single cells from different cell lines modelling healthy (HaCaT) and cancer (Cal-27, SAS and HSC-3) cytological samples. We considered six classification algorithms (k-Nearest Neighbours, Logistic Regression, Naïve Bayes, artificial Neural Network, Random Forest and Support Vector Machine) to classify unknown Raman spectra. We report two classification tasks: a 4-level classification, which encompasses healthy cells, two different types of cancer cells, and one type of metastatic cells, and a 3-level classification, which includes healthy cells, non-metastatic cancer cells, and metastatic cancer cells. Our findings show that both Neural Network and Support Vector Machine algorithms applied to Raman spectra measured in the cytoplasm region can achieve sensitivity, precision and F1-score values larger than 90% in the 3-groups classifications, whereas Support Vector Machine performs better in the 4-groups classification with respect to a Neural Network. These results contribute to increasing confidence in the clinical translation of ML-assisted Raman spectroscopy as a tool to support conventional cytological techniques.