Data-driven biomarkers outperform theory-based biomarkers in predicting stroke motor outcomes

Emily R Olafson*, Christoph Sperber, Keith W Jamison, Mark D Bowren, Aaron D Boes Jr, Justin W Andrushko, Michael R Borich, Lara A Boyd, Jessica M Cassidy, Adriana B Conforto, Steven C Cramer, Adrienne N Dula, Fatemeh Geranmayeh, Brenton Hordacre, Neda Jahanshad, Steven A Kautz, Bethany Lo, Bradley J MacIntosh, Fabrizio Piras, Andrew D RobertsonNa Jin Seo, Surjo R Soekadar, Sophia I Thomopoulos, Daniela Vecchio, Timothy B Weng, Lars T Westlye, Carolee J Winstein, George F Wittenberg, Kristin A Wong, Paul M Thompson, Sook-Lei Liew, Amy F Kuceyeski

*Corresponding author for this work

Research output: Working paperPreprint


Chronic motor impairments are a leading cause of disability after stroke. Previous studies have predicted motor outcomes based on the degree of damage to predefined structures in the motor system, such as the corticospinal tract. However, such theory-based approaches may not take full advantage of the information contained in clinical imaging data. The present study uses data-driven approaches to predict chronic motor outcomes after stroke and compares the accuracy of these predictions to previously-identified theory-based biomarkers.

Using a cross-validation framework, regression models were trained using lesion masks and motor outcomes data from 789 stroke patients (293 female/496 male) from the ENIGMA Stroke Recovery Working Group (age 64.9±18.0 years; time since stroke 12.2±0.2 months; normalised motor score 0.7±0.5 (range [0,1]). The out-of-sample prediction accuracy of two theory-based biomarkers was assessed: lesion load of the corticospinal tract, and lesion load of multiple descending motor tracts. These theory-based prediction accuracies were compared to the prediction accuracy from three data-driven biomarkers: lesion load of lesion-behaviour maps, lesion load of structural networks associated with lesion-behaviour maps, and measures of regional structural disconnection.

In general, data-driven biomarkers had better prediction accuracy - as measured by higher explained variance in chronic motor outcomes - than theory-based biomarkers. Data-driven models of regional structural disconnection performed the best of all models tested (R2 = 0.210, p < 0.001), performing significantly better than predictions using the theory-based biomarkers of lesion load of the corticospinal tract (R2 = 0.132, p< 0.001) and of multiple descending motor tracts (R2 = 0.180, p < 0.001). They also performed slightly, but significantly, better than other data-driven biomarkers including lesion load of lesion-behaviour maps (R2 =0.200, p < 0.001) and lesion load of structural networks associated with lesion-behaviour maps (R2 =0.167, p < 0.001). Ensemble models - combining basic demographic variables like age, sex, and time since stroke - improved prediction accuracy for theory-based and data-driven biomarkers. Finally, combining both theory-based and data-driven biomarkers with demographic variables improved predictions, and the best ensemble model achieved R2 = 0.241, p < 0.001.

Overall, these results demonstrate that models that predict chronic motor outcomes using data-driven features, particularly when lesion data is represented in terms of structural disconnection, perform better than models that predict chronic motor outcomes using theory-based features from the motor system. However, combining both theory-based and data-driven models provides the best predictions.

Original languageEnglish
Place of PublicationCold Spring Harbor, US
PublisherCold Spring Harbor Laboratory Press
Number of pages42
Publication statusSubmitted - 1 Sept 2023

Cite this