Introduction
Selecting patients for spinal fusion surgery is challenging. While machine learning (ML) has become a valuable tool in decision-making within medical specialties such as oncology, its application in spine surgery remains relatively unexplored. The aim of our paper is to assess different ML algorithms to predict the likelihood of success after lumbar spinal fusion surgery.
Methods
A retrospective analysis of prospectively collected data was performed of patients who underwent primary posterior lumbar fusion surgery by a single surgeon between January 2018 and February 2023. The dataset comprised 24 preoperative variables and 5 outcome measures sourced from the patients’ demographic data, medical history, surgical characteristics, patient reported outcome measures (PROMs) and subjective measures of satisfaction. Four distinct ML algorithms were used: support vector machine (SVM), naïve Bayes (NB), decision tree (DT), and artificial neural network (ANN). To optimise reliability, a 7-fold cross-validation method for model building and testing was employed. Accuracy (measure of all correctly identified cases), recall (measure of all correctly identified positive cases from all the actual positive cases), precision (measure of all correctly identified positive cases from all the predicted positive cases) and F1 score (calculated using the mean of recall and precision) were analysed for each model for both PROMs and satisfaction as two separate measures of success. PROMs success was defined as having achieved a clinically meaningful improvement in at least two of the three PROMs (ODI, back VAS, leg VAS). Satisfaction success was defined as being either satisfied or very satisfied, and would undergo the surgery again.
Results
423 patients were identified with 397 included in the analysis after data cleaning. Overall, SVM demonstrated superior performance compared to other ML algorithms in predicting surgical success (Table 1). SVM demonstrated an acceptable level of accuracy and precision, good F1 score and very high level of recall for both success models (92% and 96%). This indicates a low level of false negatives, meaning that the likelihood of not correctly predicting a successful outcome is low.
Discussion
The predictive power of the SVM algorithm was deemed sufficient to be of high clinical value, even in this small dataset. Interestingly, the ANN did not exhibit reliable performance in predicting success, possibly due to the limited size of the training dataset, which limited the ANN's ability to learn effectively from the data.
ML excels at analysing large amounts of data to identify complex patterns and relationships hidden in data and has a promising role in two areas of medicine – diagnoses and outcome prediction. The results of our study demonstrate the potential benefits of using ML to aid in predicting outcome in lumbar fusion patients and thereby improving patient selection, particularly by minimising the number of patients incorrectly identified as likely to have a poor outcome. It is likely that our algorithm will become more robust with an increased number of patients. In the future, ML has potential to use additional less structured data such as clinical notes and MRI scans to further optimise its role in decision-making.