Objective: Our aim was to develop and compare machine learning (ML) algorithms for identification of Parkinson’s disease (PD) patients via acoustic analysis of vowel articulation and to leverage explainability techniques for elucidation of new PD biomarkers.
Background: Vocal impairments are common among PD patients and characterized with hypokinetic dysarthria, hypophonia, monotony, and hoarseness. These acoustic biomarkers may allow for ML-based detection of PD.
Method: Two public datasets containing PD speech recordings including the NeuroVoz and Italian Parkinson’s Voice/Speech datasets were utilized. Vowel articulation tasks were pooled together. Each recording was truncated to 2 seconds, adjusted to an audio frequency of 16 kHz, and processed using time and pitch augmentation. Three different modeling approaches were utilized: ResNet18, HuBERT, Audio Spectrogram Transformer (AST). WAV files were inputted into the HuBERT model while log-mel spectrogram (LMS) transformed data were used for others. Fivefold cross-validation was conducted. ResNet18 was used for exploration of explainability. A concatenated set of LMS transformed vowel sounds per each patient were inputted. Grad-CAM was used to visualize class activation maps.
Results: 150 patients (44% female) with a mean age of 67.1±9.87 were included. 74 patients (49.3%) had a diagnosis of PD. The HuBERT model demonstrated an average area under the curve of receiver operating characteristics (AUC-ROC) of 0.733±0.0429 and accuracy of 0.621±0.0713. The ResNet18 model demonstrated AUC-ROC of 0.839±0.0501 and accuracy of 0.750±0454. The AST demonstrated AUC-ROC of 0.820±0.0548 and accuracy of 0.752±0.0622. Inputting concatenated LMS into ResNet18 yielded AUC-ROC of 0.903±0.0602 and accuracy of 0.828±0.0482. Among the control cohort, ML focused on vowels A, E, I, O U at distributions of 0.145, 0.290, 0.210, 0.210, 0.145, respectively. Among the PD cohort, ML focused on A and E at distributions of 0.5 each.
Conclusion: ML algorithms are capable of accurately classifying PD patients from controls based on vowel articulation. Utilizing ResNet18 with LMS transformed data yielded better performance. Furthermore, alterations in vowel sounds A and E production may represent a phenotype unique to PD patients. Our framework may allow not only for early and improved PD detection but also may contribute to discovery of novel PD biomarkers.
To cite this abstract in AMA style:
K. Tsutsumi, P. Chang, S. Isfahani. Detection of novel acoustic biomarkers among Parkinson’s disease patients via an explainable machine learning model [abstract]. Mov Disord. 2025; 40 (suppl 1). https://www.mdsabstracts.org/abstract/detection-of-novel-acoustic-biomarkers-among-parkinsons-disease-patients-via-an-explainable-machine-learning-model/. Accessed October 5, 2025.« Back to 2025 International Congress
MDS Abstracts - https://www.mdsabstracts.org/abstract/detection-of-novel-acoustic-biomarkers-among-parkinsons-disease-patients-via-an-explainable-machine-learning-model/