Objective: This study aims to determine whether an inertial measurement unit (IMU) can detect differences in the consistency of Movement Disorder Society-Unified Parkinson’s Disease Rating Scale (MDS-UPDRS) between evaluators with varying levels of expertise, as well as the alignment between clinician evaluations and IMU data.
Background: The MDS-UPDRS Part III is widely used to assess the severity of motor symptoms in Parkinson’s disease (PD). However, the score can be significantly influenced by inter-rater variability. IMU-based motion detection has been proposed to be an objective method for clinical assessments. These wearable devices have demonstrated their potential in reflecting disease severity in PD patients.
Method: Thirty-seven patients with PD were recruited. Foot tapping (MDS-UPDRS Part 3.7) was selected, with bilateral IMUs placed at the ankles. The test was conducted under medication or deep brain stimulation being off and on. 160 recordings were obtained and videotaped for further UPDRS scoring by 2 raters: one beginner and one experienced specialist. K-means clustering, an unsupervised learning algorithm, was employed to classify IMU data into 4 categories corresponding to UPDRS scores. Statistical analyses, including sensitivity, specificity, F1 score, Cohen’s Kappa, and Spearman correlation, were conducted.
Results: Figure 1 presents a Sankey diagram illustrating the differences between rater 1 (beginner), rater 2 (experienced specialist), and IMU-based clusters. Discrepancies were most prominent in scores 0 and 1, while the scores assigned by the experienced specialist showed greater consistency with IMU-based clusters for scores 2 and 3 compared to the beginner. Regarding consistency with the IMU-based cluster, rater 2 demonstrated higher accuracy (0.41 vs. 0.26), F1 score (0.35 vs. 0.25), and Cohen’s Kappa (0.15 vs. 0.05) compared to rater 1. The scores assigned by both raters correlated with the IMU-based cluster, with Spearman coefficients of 0.40 for rater 2 and 0.37 for rater 1 (p < 0.001).
Conclusion: Inter-rater variability affects the accuracy and precision of MDS-UPDRS Part III. Objective IMU measurement may help provide supplementary suggestions for PD motor scoring. However, discrepancies between machine and clinician assessments persist. Further feature selection and the application of advanced modeling may be required in the future.
Figure 1
To cite this abstract in AMA style:
TC. Fang, PS. Lu, YJ. Guo, JA. Huang, LW. Ko. Evaluating Inter-rater Variability in UPDRS Using IMU: A Comparison Between Beginner and Experienced Specialists [abstract]. Mov Disord. 2025; 40 (suppl 1). https://www.mdsabstracts.org/abstract/evaluating-inter-rater-variability-in-updrs-using-imu-a-comparison-between-beginner-and-experienced-specialists/. Accessed October 5, 2025.« Back to 2025 International Congress
MDS Abstracts - https://www.mdsabstracts.org/abstract/evaluating-inter-rater-variability-in-updrs-using-imu-a-comparison-between-beginner-and-experienced-specialists/