Original Research Open Access Logo

Early Alzheimer?s disease diagnosis using an XG-Boost model applied to MRI images

Khoi Nguyen 1, 2
My Nguyen 2, 3
Khiet Dang 1, 2
Bao Pham 1, 2
Vy Huynh 2, 3
Toi Vo 1, 2
Lua Ngo 1, 2
Huong Ha 1, 2, *
  1. School of Biomedical Engineering, International University, Viet Nam
  2. Vietnam National University Ho Chi Minh City, Ho Chi Minh City, Viet Nam
  3. Faculty of Biology ? Biotechnology, University of Science, Viet Nam
Correspondence to: Huong Ha, School of Biomedical Engineering, International University, Viet Nam; Vietnam National University Ho Chi Minh City, Ho Chi Minh City, Viet Nam. Email: [email protected].
Volume & Issue: Vol. 10 No. 9 (2023) | Page No.: 5896-5911 | DOI: 10.15419/bmrat.v10i9.832
Published: 2023-09-30

Online metrics


Statistics from the website

  • HTML Views: 5470
  • PDF Views: 1506
  • XML Views: 117

Statistics from Dimensions

Copyright The Author(s) 2024. This article is published with open access by BioMedPress. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0) which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited. 

Abstract

Introduction: Early Alzheimer's disease (AD) diagnosis is critical to improving the success of new treatments in clinical trials, especially at the early mild cognitive impairment (EMCI) stage. This study aimed to tackle this problem by developing an accurate classification model for early AD detection at the EMCI stage based on magnetic resonance imaging (MRI).

Methods: This study developed the proposed classification model through a machine-learning pipeline with three main steps. First, features were extracted from MRI images using FreeSurfer. Second, the extracted features were filtered using principal component analysis (PCA), backward elimination (BE), and extreme gradient (XG)-Boost importance (XGBI), the efficiency of which was evaluated. Finally, the selected features were combined with cognitive scores (Mini Mental State Examination [MMSE] and Clinical Dementia Rating [CDR]) to create an XG-Boost three-class classifier: AD vs. EMCI vs. cognitively normal (CN).

Results: The MMSE and CDR had the highest importance weights, followed by the thickness of the left superior temporal sulcus and banks of the superior temporal lobe. Without feature selection, the model had the lowest accuracy of 69.0%. After feature selection and the addition of cognitive scores, the accuracy of the PCA, BE, and XGBI approaches improved to 74.0%, 90.9%, and 91.5%, respectively. The BE with tuning parameters model was chosen as the final model since it had the highest accuracy of 92.0%. The area under the receiver operating characteristic curve for the CN, AD, and EMCI classes were 0.98, 0.94, and 0.88, respectively.

Conclusion: Our proposed model shows promise in early AD diagnosis and can be fine-tuned in the future through testing on a multi-dataset.

Introduction

Alzheimer’s disease (AD) is the most common neurodegenerative disorder that greatly reduces patients’ quality of life and makes them utterly dependent on their caregivers1, 2. Prolonged medical treatment and care exert a substantial economic strain on patients and their families, potentially costing >1.1 trillion US dollars worldwide1. Unfortunately, once cognitive symptoms manifest, current medications cannot reverse disease progression due to the continued loss of neurons without replacement by cell division3, 4. Therefore, identifying patients at the early mild cognitive impairment (EMCI) stage is critical to improving the success of new treatments or interventions in clinical trials.

Several breakthrough approaches have attempted to predict AD at its preclinical stage, which could allow the application of medications to halt AD development from its onset3, 5, 6, 7, 8. About 80% of patients diagnosed with mild cognitive impairment (MCI) convert to AD within six years9. Recent studies have focused on this transitional phase to detect the preclinical AD stage, particularly EMCI5. One promising approach to detect EMCI is identifying brain morphological changes through neuroimaging data, such as magnetic resonance imaging (MRI).

Early AD detection using brain MRI data remains clinically challenging since the subtle changes during its transitional period cannot be assessed manually3. Automatic computation and artificial intelligence (AI) approaches such as deep learning (DL) or machine learning (ML) are required to identify brain structural features at the EMCI stage. Of numerous AI-assisted methods, DL has been broadly used because of its high performance, especially the convolutional neural network (CNN)5, 10. Kang . combined a 2D CNN with transfer learning to identify EMCI by processing a multi-modal dataset (MRI and diffusion tensor imaging data), achieving the highest accuracy of 94.2% for cognitively normal (CN) . EMCI patients5. In addition, Kolahkaj . built a DL architecture based on the BrainNet CNN model to detect EMCI, achieving high accuracies for binary classification: 0.96, 0.98, and 0.95 for NC/EMCI, NC/MCI, and EMCI/MCI, respectively11.

Despite its significant results, DL has several limitations that could hinder clinical applications. Firstly, DL models are prone to encounter overfitting due to the many parameters considered12. Secondly, analysts cannot provide a plausible explanation for the algorithm’s performance, which is called a black box. Therefore, to build an understandable prediction model, making the shift to ML for early AD detection is beneficial for neurologists and doctors.

While most ML studies have focused on binary classification, some have focused on multi-class classification. However, there is a growing need for a multi-class algorithm that can effectively distinguish the prodromal stage (EMCI) from the array of other stages (late MCI [LMCI], AD, and CN), enabling an early AD diagnosis. Moreover, it is important to note that existing multi-class ML models have low accuracies. In 2022, Techa. showed that a new model based on three CNN architectures (DenseNet196, VGG16, and ResNet50) achieved 89% accuracy in discriminating normal, very mild dementia, mild dementia, moderate dementia, and AD13. Alorf implemented a Brain Connectivity-Based Convolutional Network in 2022, which provided 84.03% accuracy for six-class classification (AD, LMCI, MCI, EMCI, subjective memory complaints, and CN)14. Another major difficulty when identifying the initial AD stages is the subtle structural change in subjects with EMCI. EMCI is elusive and cannot be recognized by the diagnostic criteria for AD15. Furthermore, EMCI and MCI are highly heterogeneous since they can be easily mistaken for multiple pathological conditions, especially other neurodegenerative diseases16, 17. Therefore, EMCI classification requires further evaluation and approaches to optimize its efficiency.

One potential ML model to address the early AD detection challenge is extreme gradient boosting (XG-Boost). XG-Boost is a scalable tree-based ensemble learning implemented from the gradient boosting system. It introduces errors from the previous weak learner to the latter learner, improving its learning accuracy18. Since its results depend on many decision trees, XG-Boost shows high compatibility, competitive execution speed, and accuracy when applied to large data sets, making it suitable for clinical application19. While few studies have used XG-Boost for AD diagnosis, the preliminary results are promising. Ong . proposed an XG-Boost model to classify AD and CN subjects using the FreeSurfer library to extract insight features from MRI, achieving an area under the receiver operative characteristic (ROC) curve (AUC) of 91%20. Tuan . presented an XG-Boost model to classify AD and normal subjects based on the tissues segmented by a CNN and Gaussian mixture model21. Their highest accuracy was 89% when combined with a support vector machine (SVM) and CNN21. However, both models had several limitations, such as high computation cost and susceptibility to sample size and complexity. They also did not attempt to classify three classes. Therefore, future improvement is required to enhance the models’ accuracy and validity.

This study used XG-Boost for three-class classification, primarily focusing on distinguishing CN, EMCI, and AD. It also evaluated and optimized three feature selection methods—backward elimination, XG-Boost importance (XGBI), and principal component analysis (PCA)—to identify the most suitable method for the XG-Boost model. When combined with the Mini Mental State Examination (MMSE) and Clinical Dementia Rating (CDR) scores, our model achieved the highest accuracy of 92% for distinguishing AD, EMCI, and CN. Only three features overlapped between the BE and XGBI feature selection methods: MMSE, CDR, and left hippocampus volume. While these results showed that the model still depends on the cognitive symptoms of AD rather than its brain structural changes, our model has great potential as an assistive tool for AD diagnosis with high performance, especially when considering its multi-class classification.

Methods

Participants

This study obtained its data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu)22. The ADNI was launched in 2003 as a public-private partnership led by Principal Investigator Michael W. Weiner, MD. Its primary goal has been to test whether serial MRI, positron emission tomography, biological markers, and clinical and neuropsychological assessments can be combined to measure MCI and EMCI progression22.

The data comprised 663 subjects who were equally grouped into three classes: CN, EMCI, and AD. Their demographic information is summarized in Table 1.

Table 1

Demographic information 663 recruited subjects from ADNI

CN

EMCI

(n = 221)

AD

(n = 221)

p

Age

75.28 ± 5.76

71.45 ± 7.23*

75.4 ± 7.702

< 0.0001

Sex (M/F)

120/101

118/103

120/101

0.9760

MMSE Score

29.06 ± 1.1

28.12 ± 1.66*

22.8 ± 2.63*

< 0.0001

CDR Score

0.03 ± 0.11

0.47 ± 0.16*

0.81 ± 0.32*

< 0.0001

Education (Years)

16.18 ± 3.88

16.09 ± 2.65

14.65 ± 4.35*

< 0.0001

ApoE4 (+/-)

157/64

82/139

58/163

< 0.0001

Figure 1

A study framework of AD detection, which includes three main steps. T1-weighted MRI data were collected from the ANDI database (step 1) and preprocessed through FreeSurfer software to obtain brain structure features. Sequentially they were combined with two cognitive scores and tuned by three selection methods to construct six approaches for input features (step 2). Finally, generated inputs passed through the XG-Boost model to create the decision tree for AD status (ternary classification), which are CN, EMCI, and AD in step 3. The outcome also showed the accuracies of the respective input.

Abbreviations: CN: Normal Cognitive; EMCI: Early Mild Cognitive Impairment; AD: Alzheimer’s disease; PCA: Principle Component Analysis; XG-Boost: Extreme Gradient Boosting.

Figure 2

The process of FreeSurfer in features extraction. MRI Preprocess: Including image registration, skull stripping and intensity normalization. Cortical reconstruction and subcortical segmentation: (1) Convert a three-dimensional anatomical volume into a two-dimensional surface; (2) Segment gray matter and white matter to create the brain mask file for visualizing after. Region determination and brain parameters analysis: (1) Inflate the surfaces into a sphere and map cortical parcel actions back onto individual subjects using two atlases (Killiany and Destrieux atlas) (2) Establish the boundary between white matter and cortex and compute gray matter thickness.

Table 2

358 features were extracted by Freesurfer from 663 ADNI subjects, particular dimension for each brain region

No.

Subject ID

Brain Segmentati-on Volume Without Ventricles

Left Entorhinal Cortex (temporal lobe)

White Surface Total Area in the left hemisphere

Banks of Superior Temporal Sulcus in the left hemisphere

...

Number of Defect Holes in right hemispherical Surface Prior to fixin

1

135_S_4598

1076438.0

285.0

84644.5

996.0

...

17.0

2

099_S_4480

945976.0

310.0

76032.8

744.0

...

33.0

3

099_S_2146

1138086.0

453.0

88770.5

1118.0

...

46.0

...

...

...

...

...

...

...

...

662

082_S_1079

1131880.0

446.0

94008.6

1244.0

...

73.0

663

130_S_5059

1160101.0

601.0

85947.9

862.0

...

49.0

* Where area in mm2, volume in mm3

Figure 3

Density plots showing the distribution among three classes (AD, EMCI, CN) of two cognitive scores and several MRI features. (A) Global CDR Scores, (B) MMSE Scores, (C) Left hemisphere bankssts thickness, (D) Right hemisphere fusiform volume, (E) eTIV, (F) Left Hippocampus volume. Blue: AD, orange: EMCI, green: CN.

Abbreviations: CN: Normal Cognitive; EMCI: Early Mild Cognitive Impairment; AD: Alzheimer’s disease; PCA: Principle Component Analysis

Figure 4

Venn diagram showing the total number from overlapping features between two different selection methods.

Table 3

The results of feature selection by Approach 3, Approach 4, and Approach 5

Method

Backward Elimination (Approach 3)

XGBoost Importance (Approach 4)

PCA (Approach 5)

Number of features after selection

29

228

71

Type of features

Brain features and cognitive scores

Brain features and cognitive scores

PCA features

Figure 5

Feature weights after backward elimination and trained by XGBoost.

Abbreviations: XG-Boost: Extreme Gradient Boosting.

Figure 6

Accuracy of six approaches with 10-fold cross-validation. Approach 1: Brain structure features, Approach 2: Brain structural features and two cognitive scores, Approach 3: XG-Boost Importance and two cognitive scores, Approach 4: Backward Elimination and two cognitive scores, Approach 5: PCA features, Approach 6: Backward Elimination and two cognitive scores with tuning parameters.

Abbreviations: PCA: Principle Component Analysis; XG-Boost: Extreme Gradient Boosting.

Table 4

The performance results of six approaches for three-class classification

Approach

Class

Accuracy

Precision

Recall

F1 score

1

CN

68.8 %

64 %

56 %

60 %

EMCI

64 %

75 %

69 %

AD

79 %

74 %

77 %

2

CN

86 %

80 %

97 %

88 %

EMCI

97 %

71 %

82 %

AD

83 %

98 %

90 %

3

CN

91.05 %

89 %

98 %

93 %

EMCI

95 %

83 %

89 %

AD

91 %

95 %

93 %

4

CN

90.9 %

91 %

98 %

95 %

EMCI

92 %

79 %

85 %

AD

90 %

96 %

93 %

5

CN

74 %

68 %

59 %

63 %

EMCI

75 %

77 %

76 %

AD

78 %

86 %

82 %

6

CN

92 %

88 %

97 %

93 %

EMCI

91 %

85 %

88 %

AD

96 %

94 %

95 %

Figure 7

Receiver Operating Characteristic (ROC) curves of Approach 1 and Approach 6 for three classes classification. The green line corresponds to AD, the blue line represents for EMCI, and the red line shows CN.

Abbreviations: CN: Normal Cognitive; EMCI: Early Mild Cognitive Impairment; AD: Alzheimer’s disease; PCA: Principle Component Analysis

Figure 8

Visualization results for the ground truths andthe corresponding predictions in three classes (CN, EMCI, AD). The first and second columns illustrate the correctly-predicted examples, while the last column shows the wrongly-predicted ones. Abbreviations: CN: Normal Cognitive; EMCI: Early Mild Cognitive Impairment; AD: Alzheimer’s disease; PCA: Principle Component Analysis

Structural MRI data

The structural MRI scans used in this study were the T1-weighted magnetization prepared-rapid gradient echo scans from ADNI 1 and ADNI GO/2. Various MRI scanner models were used for MRI acquisition; details of the acquisition protocol for the MRI data can be found on the ADNI website (http://adni.loni.usc.edu)22.

Study design

An overview of the study design is shown in Figure 1. Firstly, the MRI images were preprocessed with FreeSurfer to extract 358 features, including volumetric and thickness measurements. Three feature selection methods were used, and their efficiencies were compared. This step determined the optimal features from the 360 elements (FreeSurfer features, MMSE score, and CDR score). The data were divided into two sets with a ratio of 80% training to 20% testing using Python’s Scikit-learn library. Finally, the proposed models were evaluated using the performance metrics of accuracy, precision, recall, F1-score, and ROC curves with AUCs to identify the most efficient classification algorithm.

Feature extraction

Six hundred sixty-three MRI images were reconstructed and segmented using FreeSurfer (version 5.3; http://surfer.nmr.mgh.harvard.edu). This open-source software measures and visualizes the human brain’s functional, connective, and structural characteristics to extract brain structural features23. This software’s processing operations have two major stages (Figure 2).

Feature selection

Feature selection plays a significant role in ML and pattern recognition. Pearson’s product-moment correlation coefficient () was first applied to remove all linearly related features with a > 0.9. The reason for using this method is that several features extracted by Freesurfer are sub-regions or different measurements of the same brain region. Therefore, including highly relevant features in a particular brain-diagnosed area is redundant from a neuroscience perspective. Moreover, highly correlated features may lead to overfitting, impacting model performance. Therefore, applying non-linear feature selection can improve model performance and reduce training time efficiently. The next step was performed with three feature selection methods to compare their efficiency.

PCA is a multivariate exploratory analysis approach that reduces the complexity of multidimensional data while preserving trends and key patterns24, 25. PCA was applied using Python’s Scikit-learn library with different numbers of principal components (PCs; 1–321) to determine the optimal set of features for the classification model. Then, in each model, the PCs were incrementally included in 10 PC increments to observe changes in accuracy with Python’s Matplotlib library.

BE is a feature selection strategy that excludes characteristics strongly associated with the exposure without significantly influencing dependent variables or predicted outputs26, 27. BE was applied in five main steps: (i) select a significance level (SL) that is suitable for the model (SL = 0.05), (ii) calculate original least squares with Python’s Statsmodels library before determining the -values of all features, (iii) compare the calculated -value with the SL, (iv) remove features and predictors with a -value greater than the SL, and (v) modify it to fit the model with the remaining variables.

XG-Boost has the advantage of extracting importance scores for each feature in the predictive problem, enabling the determination of the highest importance score. The next step removes all unusable features with zero importance coefficients depending on their ranking. This action is repeatedly performed until stable accuracy and non-zero importance coefficients are achieved.

This study investigated six approaches for feature selection. Feature selection was not applied in the first and second approaches. The first approach used all 358 features extracted by Freesurfer to train the model. The second approach added the two cognitive scores to the 358 Freesurfer features. The third approach used XGBI to filter the Freesurfer features and included the two cognitive scores when training the model. The fourth and sixth approaches used BE for feature selection and included the two cognitive scores; however, the sixth approach also applied parameter tuning. Finally, the fifth approach used PCA for feature selection.

Classification

XG-Boost is a scalable and efficient gradient-boosting framework used to combine a series of weak base learners (small decision trees) into a single powerful learner (a big tree)28, 29. The enhanced performance of XG-Boost has been shown in several major areas. Firstly, XG-Boost introduces a regularization component into the objective function, making the model less prone to overfitting. Secondly, it conducts a second-order rather than first-order Taylor expansion on the objective function, enabling it to specify the loss function more accurately. Thirdly, XG-Boost has a fast training speed due to data compression, multithreading, and GPU acceleration30, 31.

The objective function is defined as:

where represents the prediction for the round, represents the structure of a decision tree, and represents the regularization component. is given by:

where represents the penalty coefficient and represents the L2 norm of leaf scores. After iterations, the model’s function is added to a new decision tree:

and the objective function is updated:

with the Taylor expansion specification:

where represents the first derivative and represents the second derivative of the loss function. and are given by31:

This study applied the model from the open-source XG-Boost library. The algorithm also applies the softmax parameter and the cross-entropy function. After fitting the data, the Matplotlib library visualizes the fitting process and stops the process early to prevent overfitting.

Tenfold cross-validation32

Grid Search cross-validation (GridSearchCV) is an object provided by Python’s Scikit-learn library that generates a set of hyperparameters for tenfold cross-validation to achieve a maximally accurate model (estimator). GridSearch evaluates the grid of indicated parameters based on the estimator during the call to fit, including predicting, scoring, or transforming methods. Then, it returns the best-performing combination of hyperparameters with a maximum score (the scoring strategy of the basic estimator). Any other estimator can be applied to this object in this manner. Lastly, all modifiers and an estimator are assembled by a pipeline, resulting in a combined estimator that can implement several reductions afterward, such as tuning dimensions before fitting.

Results

Feature extraction

After preprocessing and extraction, 358 features were exported. Table 2 shows a portion of the extraction results. From the extraction results, we assessed the discriminative power of several features and two additional cognitive scores (CDR and MMSE) using the point distributions between three classes: AD, CN, and EMCI (Figure 3). We selected the top four weighted features according to XGBI and BE: left hemisphere banks of superior temporal sulcus thickness, right hemisphere fusiform volume, left hemisphere estimated total intracranial volume (eTIV), and left hippocampus volume. The two scores of the dementia tests (CDR and MMSE) showed a distinctive distribution in the density plots between the three classes (Figure 3A, B). In contrast, a significant overlap existed between classes in the eTIV distribution (Figure 3E). Nevertheless, the AD group separated relatively well from the CN and EMCI groups in the distributions of the other three Freesurfer features, especially the left hippocampus volume (Figure 3F). Overall, the density plots in Figure 3showed the great potential of CDR and MMSE to enhance model accuracy when combined with the extracted features. These plots also highlight the challenges in distinguishing the CN and EMCI groups.

Feature selection

Several primary factors, such as redundancy (feature-feature) and relevance (feature-class), must be considered during feature selection33. For redundancy minimization, this study used Pearson’s product-moment correlation coefficient to measure the association between features and remove all linearly related features34. This phase reduced the features from 360 to 324. Next, PCA, a popular feature selection method, was used to reduce dimensionality and identify highly effective and minimally redundant features. PCA created 33 feature sets; the first contained one feature, the second 11 features, and so on until the final set contained 321 features. Then, the performance of these feature sets was compared to investigate the efficiency of the PCA method.

Besides PCA, Table 3 and Figure 4 summarize the results with the other two feature selection methods (XGBI and BE) to maximize relevance. The XG-Boost library identified several features with unimportant values during the training process. Consequently, Approach 4 selected 228 features with non-zero importance coefficients to ensure that every feature benefits the training model. In addition, BE was applied for its speed and simplicity in removing irrelevant features with -values > 0.05. Interestingly, it only identified 29 features, of which 15 were shared with XGBI, including the two cognitive scores and 13 brain structure features (Figure 4).

After selection, XG-Boost continued to train on the features, resulting in the best performance with Approach 4 (see the Classification results section). Figure 5 shows the weights of top-ranked features with Approach 4. The two cognitive scores were most influential in the prediction since their weights are approximately sixfold higher than those of the brain structure features (0.263 and 0.257, respectively). Moreover, the thickness of the left superior temporal sulcus was the most informative brain structure feature. The temporal lobe was also the most informative brain region because several features extracted from it had high weights, including the superior temporal sulcus, fusiform gyrus, transverse temporal gyrus, middle temporal gyrus, the temporal pole from the right hemisphere, and hippocampus from the left hemisphere. In conclusion, the temporal lobe shows the most significant changes in patients with AD.

Classification

The accuracies of all approaches and the details of each approach are summarized in Figure 1, Figure 6. The accuracies of these three-class classification models were assessed by the proportion of correct expected observations to all actual class observations with tenfold cross-validation. Approach 1, using 358 brain features, had the lowest accuracy (69.00% ± 3.00%). The accuracy improved with Approach 2, which added the two cognitive scores to the feature set (86.00% ± 2.00%). The accuracy improved again with Approach 3, which used XGBI to select the features (91.05% ± 3.34%). However, the accuracy decreased with Approaches 4 (90.90% ± 3.35%) and 5 (74.00%). In Approach 5, the accuracies ranged from 63% to 74%, corresponding to 1 to 321 PCA features; the highest accuracy is shown in Figure 6. Approach 6, using BE for feature selection and tuning model parameters with grid search, achieved 92.00% accuracy.

The performance of the six approaches is summarized in Table 4. In Approach 1, the AD class had the highest precision (79%), recall (74%), and F1 score (77%), while the CN class had the lowest precision, recall, and F1 score. In Approach 6, the AD class also achieved the highest precision (96%) and F1 score (95%). However, the CN class had the highest recall (97%) and a higher F1 score (93%) than the EMCI class (88%).

Figure 7 presents ROC curves showing the classification performance of Approaches 1 and 6. The ROC curve for Approach 1 showed that the model had poor performance in classifying CN and EMCI subjects (Figure 7A). The AUC of the EMCI class (0.83) was slightly higher than that of the CN class (0.82). However, Approach 1 performed well in identifying the AD class (AUC = 0.92). The ROC curve for Approach 6 showed that the final model classified the EMCI class less accurately than the CN and AD classes (AUC = 0.88; Figure 7B). Nevertheless, the ROC curves of all three classes were significantly improved with Approach 6 compared to Approach 1. The ROC curves for the CN (AUC= 0.94) and AD (AUC = 0.98) classes demonstrated excellent performance. The ground truths and their corresponding predictions in three classes are illustrated in Figure 8.

Discussion

This study’s primary aim was to implement the XG-Boost algorithm in early AD detection at the EMCI stage. The model performance significantly improved from 68.8% to 92.0% after adding two cognitive scores (MMSE and CDR) and selecting features (Figure 6 and Table 4). The final model achieved the highest accuracy of 92% by combining Pearson’s correlations with BE for feature selection, reducing the number of features from 360 to 29 (Figure 4 and Table 3 ). In addition, BE was explicitly recognized as the most suitable selection method (Figure 6 and Table 4). The ROC curve illustrated excellent performance for Approach 6 (Figure 8B), with the AD class having the highest AUC (0.98), followed by the CN class (0.94) and the EMCI class (0.88).

Feature weights

The BE method in Approach 4 showed that the hippocampus and temporal lobe features were the most important. This result is expected since structural changes in these regions are considered early indicators of MCI and AD35. During the earliest stages of AD, brain atrophy typically follows the hippocampal pathway (entorhinal cortex, hippocampus, and posterior cingulate cortex) and is associated with early memory deficits36. Furthermore, the variations in structural measures, including hippocampus and temporal lobe volumes, sulcus width and thickness, and subcortical nuclei volume, correlate with cognitive performance37, 38, 39, 40.

Our study found that the two cognitive scores (MMSE and CDR) had substantially higher weights than the brain features. We conclude that the ML architecture designed in this study remains insufficiently effective. Clinically, these two scores are used as parts of the preferred standard diagnosis procedure for AD. Moreover, MMSE and CDR mainly depend on general cognitive and behavioral states rather than the underlying biological changes in the nervous system41, 42. Consequently, while the final model still shows considerable performance, it remains too dependent on symptom testing rather than brain structure changes.

Roles of cognitive scores and feature selection

Performance differed significantly between the first approach excluding the cognitive scores and the other approaches including them. Specifically, after adding MMSE and CDR to the feature set, the accuracy increased drastically by nearly 20%, from 69% ± 3% to 86% ± 2%. We suggest that future model development should minimize the influences of the two scores in the prediction to make applying the model in the clinical setting less dependent on the availability of well-trained neurologists to conduct such cognitive tests. There has been a recent increase in the number of studies completing this task. For example, Liu . reported a multi-model DL framework with accuracies of 88.9% for classifying AD and CN and 76.2% for classifying MCI and CN43. Farooq . compared GoogLeNet, ResNet-18, and ResNet-152, reporting accuracies of 98% for all three models44. However, most recent studies only used a DL approach, which could hinder technology acceptance by medical doctors45.

Our study also illustrated that feature selection, especially BE and XGBI, plays a crucial role in the classification model. Both methods led to significant increases in model performance, which surpassed the results of other approaches. The reason is that, from a biological perspective, not all brain features contribute to AD pathology46, 47, 48. Several studies suggest that several brain regions are affected by AD-related atrophy, including the frontal, temporal, and parietal lobes or cerebellum brain regions46, 47, 48. Other feature selection methods also showed outstanding accuracy. For example, Fang proposed several ML algorithms combined with goal-directed conceptual aggregation to demonstrate the effectiveness of this method compared to other approaches (PCA, least absolute shrinkage and selection operator, and univariate feature selection). They achieved 79.25 % accuracy in classifying CN . EMCI and 83.33% in classifying CN . LMCI49. Khagi . combined SVM and K-nearest neighbors with one of four feature selection methods (ReliefF, Laplacian, UDFS, and Mutinffs), reporting accuracies of approximately 99% for AD classification50.

Model selection and comparison

While the models in Approaches 3, 4, and 6 performed relatively similarly, Approach 6 was chosen to be the final model. Firstly, this approach achieved the highest accuracy (92%). Secondly, this model had a shorter training time (45.5 seconds) than Approach 3 (242.6 seconds). Moreover, in the feature selection step, Approach 6 selected features automatically, while Approach 4 required manual feature selection. In addition, by running GridSearch, Approach 6 could obtain optimal parameters compared to Approach 4 (without GridSearch).

Approaches 1 and 6 had greater difficulty classifying EMCI than the other classes. The AUC for the CN class was the lowest in Approach 1 (0.82) but increased significantly in Approach 6 (0.94). This increase indicates that feature selection may eliminate misleading features, which remained significant for CN classification51. However, the AUC of the EMCI class increased slightly from 0.83 to 0.88; therefore, EMCI is the most challenging class for the model to identify. Brain structural changes in patients with EMCI are likely not prominent enough for the model to recognize easily. Moreover, the EMCI classification remains challenging, and this class often showed low accuracy in previous studies. For example, Goryawala . only achieved an accuracy of 0.616 for distinguishing CN and EMCI and 0.814 for distinguishing EMCI and AD52.

Overall, three-way classification in the AD diagnosis model still performs poorly. The proposed model is compared to current models inTable 5. However, most current models using three-way classification focus on the MCI class, while the EMCI class is more important in facilitating early AD diagnosis. This oversight underscores the distinctiveness of this study, which introduces novelty by addressing three-class classification involving EMCI, AD, and CN categories. Therefore, the proposed method shows substantial promise in its performance compared to other methods. Compared with state-of-the-art models for three-way classification, the method proposed in this study achieves promising performance with 92% accuracy. However, Ahmed . developed a multi-class deep CNN framework for early AD diagnosis, achieving 93.86% accuracy for three-way AD/MCI/CN classification53. It is important to note that their focus was on MCI, whereas our study focuses on the more challenging EMCI classification. Consequently, our model offers a more sophisticated approach and, therefore, has a competitive advantage.

Table 5

Model performance of three-way classification in early diagnosis of Alzheimer

Study

Sample size

Method

Model performance

54

224 CN, 133 MCI, 85 AD

Modified Tresnet

63.2 %

55

200 CN, 441 MCI, 105 AD

Decision tree with linear discriminant analysis

66.7 %

56

197 CN, 330 MCI, 279 AD

3D CNN with 8 instance normalization layers

66.9 %

57

CN vs. MCI vs. AD

XG-Boost

66.8 %

58

229 CN, 398 MCI, 192 AD

VGG-16 (Visual Geometry Group 16)

80.66 %

59

115 CN, 133 MCI, 58 AD

ResNet-18 with Weighted Loss and Transfer Learning and Mish Activation

88.3 %

60

229 CN, 382 MCI, 187 AD

Combined Graph convolutional networks and CNN

89.4 %

Proposed method

221 CN, 221 MCI, 221 AD

XG-Boost and BE

92 %

Conclusions

This study developed an ML model for early AD diagnosis based on structural MRI scans using XG-Boost to classify three classes: CN, EMCI, and AD. We also evaluated three feature selection methods (BE, XGBI, and PCA) to identify the optimal method for our model. The final model using BE with tuning parameters achieved the highest accuracy of 92%. The AUCs for the AD, CN, and EMCI classes were 0.98, 0.94, and 0.88, respectively. Compared to previous three-class classification methods, the proposed method appears promising for early AD detection.

While the XG-Boost model attained high accuracy with the aid of BE, several technical issues remain unsolved. Firstly, the AUC was lower for the EMCI class than for the CN and AD classes. Therefore, additional interventions in fitting parameters to enhance the performance of EMCI accuracy are essential. In addition, the model should be modified to reduce its dependence on MMSE and CDR scores. Finally, the model should be tested on multi-datasets to optimize its performance.

Abbreviations

ADNI: Alzheimer’s Disease Neuroimaging Initiative; AD: Alzheimer's disease; AI: Artificial Learning; BE: Backward Elimination; CAD: Computer-Aided Diagnosis; CDR: Clinical Dementia Rating; CN: Cognitive Normal; CNN: Convolutional Neural Network; DL: Deep Learning; eTIV: estimated Total Intracranial Volume; EMCI: Early MCI; GMM: Gaussian Mixture Model; GridSearchCV: Grid Search cross-validation; GDCA: Goal-Directed Conceptual Aggregation; GLCM: Gray Level Co-occurrence Matrix; KNN: K Nearest Neighbor; LMCI: Late MCI; ML: Machine Learning; MCI: Mild Cognitive Impairment; MMSE: Mini-Mental State Examination; MRI: Magnetic Resonance Imaging; OLS: Ordinary Least Square; RELM: Rough Extreme Learning Machine; ROC-AUC: Area Under The ROC Curve; PET: Positron Emission Tomography; PCA: Principle Component Analysis; PC: Principle Components; sMRI: structural MRI; SVM: Support Vector Machine; SL: Significance Level; XGBI: XG-Boost Importance.

Acknowledgments

Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu) and the Alzheimer's Disease Metabolomics Consortium (ADMC). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wpcontent/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf and https://sites.duke.edu/adnimetab/team

Author’s contributions

All authors contributed to the ideas, designed, did the experiments. All authors read and approved the final manuscript.

Funding

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCM) under grant number NCM2020-28-01.

Availability of data and materials

The data that support the findings of this study are available in ADNI at http://adni.loni.usc.edu/data-samples/access-data/

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

References

  1. W. Wong. Economic burden of Alzheimer disease and managed care considerations. The American Journal of Managed Care (ISSN: 1936-2692). 2020; 26 (8) : 177-83.
  2. A. Kumar. Alzheimer Disease. 2021 Aug 11. StatPearls. Treasure Island (FL): StatPearls Publishing, 2022. 2022;
  3. X.A. Bi, Q. Xu, X. Luo, Q. Sun, Z. Wang. Analysis of progression toward Alzheimer's disease based on evolutionary weighted random support vector machine cluster. Frontiers in Neuroscience (ISSN: 1662-4548). 2018; 12716-
  4. K. Tatiparti, S. Sau, M.A. Rauf, A.K. Iyer. Smart treatment strategies for alleviating tauopathy and neuroinflammation to improve clinical outcome in Alzheimer's disease. Drug Discovery Today (ISSN: 1878-5832). 2020; 25 (12) : 2110-29.
  5. L. Kang, J. Jiang, J. Huang, T. Zhang. Identifying early mild cognitive impairment by multi-modality mri-based deep learning. Frontiers in Aging Neuroscience (ISSN: 1663-4365). 2020; 12206-
  6. F. Zhang, B. Pan, P. Shao, P. Liu, S. Shen, P. Yao, Alzheimer's Disease Neuroimaging Initiative, Australian Imaging Biomarkers Lifestyle flagship study of ageing. A single model deep learning approach for Alzheimer's disease diagnosis. Neuroscience (ISSN: 1873-7544). 2022; 491200-14.
  7. X. Xing, G. Liang , Y. Zhang , S. Khanal , A.L. Lin , N. Jacobs . Advit: Vision transformer on multi-modality pet images for alzheimer disease diagnosis. In2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI) 2022; 20221-4.
  8. V.S. Diogo, H.A. Ferreira, D. Prata, undefined Alzheimer's Disease Neuroimaging Initiative. Early diagnosis of Alzheimer's disease using machine learning: a multi-diagnostic, generalizable approach. Alzheimer & Research & Therapy (ISSN: 1758-9193). 2022; 14 (1) : 107-
  9. M. Tábuas-Pereira, I. Baldeiras, D. Duro, B. Santiago, M.H. Ribeiro, M.J. Leitão. Prognosis of early-onset vs. late-onset mild cognitive impairment: comparison of conversion rates and its predictors. Geriatrics (Basel, Switzerland) (ISSN: 2308-3417). 2016; 1 (2) : 11-
  10. G. Mirzaei, H. Adeli. Machine learning techniques for diagnosis of alzheimer disease, mild cognitive disorder, and other types of dementia. Biomedical Signal Processing and Control (ISSN: 1746-8094). 2022; 72103293-
  11. S. Kolahkaj, H. Zare. A connectome-based deep learning approach for Early MCI and MCI detection using structural brain networks. Neuroscience Informatics (Online) (ISSN: 2772-5286). 2023; 3 (1) : 100118-
  12. L. Rice, E. Wong, Z. Kolter. Overfitting in adversarially robust deep learning. International Conference on Machine Learning 2020; 8093-8104.
  13. C. Techa. Alzheimer's disease multi-class classification model based on CNN and StackNet using brain MRI data.. International Conference on Advanced Intelligent Systems and Informatics 2022; 248-259.
  14. A. Alorf, M.U. Khan. Multi-label classification of Alzheimer's disease stages from resting-state fMRI-based correlation connectivity data and deep learning. Computers in Biology and Medicine (ISSN: 1879-0534). 2022; 151106240-
  15. H. Alfalahi, S.B. Dias, A.H. Khandoker, K.R. Chaudhuri, L.J. Hadjileontiadis. A scoping review of neurodegenerative manifestations in explainable digital phenotyping. NPJ Parkinson & Disease (ISSN: 2373-8057). 2023; 9 (1) : 49-
  16. J. Garre-Olmo. [Epidemiology of Alzheimer's disease and other dementias]. Revista de neurologia (ISSN: 1576-6578). 2018; 66 (11) : 377-86.
  17. H.C. Riek, D.C. Brien, B.C. Coe, J. Huang, J.E. Perkins, R. Yep, ONDRI Investigators. Cognitive correlates of antisaccade behaviour across multiple neurodegenerative diseases. Brain Communications (ISSN: 2632-1297). 2023; 5 (2) :
  18. M. Jayasudha, M. Elangovan, M. Mahdal, J. Priyadarshini. Accurate estimation of tensile strength of 3D printed parts using machine learning algorithms. Processes (Basel, Switzerland) (ISSN: 2227-9717). 2022; 10 (6) : 1158-
  19. X. Sun. Application and Comparison of Artificial Neural Networks and XGBoost on Alzheimer's Disease. InProceedings of the 2021 international conference on bioinformatics and intelligent computing 2021; 2021101-105.
  20. H. Ong. A Machine Learning Framework Based on Extreme Gradient Boosting for Intelligent Alzheimer's Disease Diagnosis Using Structure MRI. International Conference on the Development of Biomedical Engineering in Vietnam 2020; 2020815-827.
  21. T.A. Tuan. Alzheimer's diagnosis using deep learning in segmenting and classifying 3D brain MR images. The International Journal of Neuroscience (ISSN: 0020-7454). 2020; 132 (7) : 689-98.
  22. T.A. Tuan , T.B. Pham , J.Y. Kim , J.M. Tavares . Alzheimer’s diagnosis using deep learning in segmenting and classifying 3D brain MR images. International Journal of Neuroscience 2021; 132 (7) : 689-98.
  23. . FreeSurfer. 6 Aug 2021; Available from: https://surfer.nmr.mgh.harvard.edu.. 2021;
  24. P. Geladi, J. Linderholm. Principal Component Analysis. 2020; 2020
  25. J. Lever, M. Krzywinski, N. Altman. Principal component analysis. Nature Methods (ISSN: 1548-7091). 2017; 14 (7) : 641-2.
  26. D. Dunkler, M. Plischke, K. Leffondré, G. Heinze. Augmented backward elimination: a pragmatic and purposeful way to develop statistical models. PLoS One (ISSN: 1932-6203). 2014; 9 (11) : e113677-
  27. P. Royston, W. Sauerbrei. Multivariable model-building: a pragmatic approach to regression anaylsis based on fractional polynomials for modelling continuous variables 2008; 777
  28. T. Chen. Xgboost: extreme gradient boosting. R package version 0.4-2, 2015. 1(4): p. 1-4.. 2015;
  29. Y. Liu, L. Liu, L. Yang, L. Hao, Y. Bao. Measuring distance using ultra-wideband radio technology enhanced by extreme gradient boosting decision tree (XGBoost). Automation in Construction (ISSN: 0926-5805). 2021; 126103678-
  30. R. Mitchell. Xgboost: Scalable GPU accelerated learning. arXiv preprint arXiv:1806.11248, 2018.
  31. J. Guo, L. Yang, R. Bie, J. Yu, Y. Gao, Y. Shen. An XGBoost-based physical fitness evaluation model using advanced feature selection and Bayesian hyper-parameter optimization for wearable running monitoring. Computer Networks (ISSN: 1389-1286). 2019; 151166-80.
  32. F. Pedregosa. Scikit-learn: Machine learning in Python. the Journal of machine Learning research 2011; 2011825-2830.
  33. J. Cai, J. Luo, S. Wang, S. Yang. Feature selection in machine learning: A new perspective. Neurocomputing (ISSN: 0925-2312). 2018; 30070-9.
  34. J. Liu, R. Li, R. Wu. Feature selection for varying coefficient models with ultrahigh-dimensional covariates. Journal of the American Statistical Association (ISSN: 0162-1459). 2014; 109 (505) : 266-74.
  35. L.E. Wisse, G.J. Biessels, S.M. Heringa, H.J. Kuijf, D.H. Koek, P.R. Luijten, Utrecht Vascular Cognitive Impairment (VCI) Study Group. Hippocampal subfield volumes at 7T in early Alzheimer's disease and normal aging. Neurobiology of Aging (ISSN: 1558-1497). 2014; 35 (9) : 2039-45.
  36. R.I. Scahill, J.M. Schott, J.M. Stevens, M.N. Rossor, N.C. Fox. Mapping the evolution of regional atrophy in Alzheimer's disease: unbiased analysis of fluid-registered serial MRI. Proceedings of the National Academy of Sciences of the United States of America (ISSN: 0027-8424). 2002; 99 (7) : 4703-7.
  37. B.H. Ridha, V.M. Anderson, J. Barnes, R.G. Boyes, S.L. Price, M.N. Rossor. Volumetric MRI and cognitive measures in Alzheimer disease : comparison of markers of progression. Journal of Neurology (ISSN: 0340-5354). 2008; 255 (4) : 567-74.
  38. X. Hua, S. Lee, I. Yanovsky, A.D. Leow, Y.Y. Chou, A.J. Ho, Alzheimer's Disease Neuroimaging Initiative. Optimizing power to track brain degeneration in Alzheimer's disease and mild cognitive impairment with tensor-based morphometry: an ADNI study of 515 subjects. NeuroImage (ISSN: 1095-9572). 2009; 48 (4) : 668-81.
  39. P.J. Visser, P. Scheltens, F.R. Verhey, B. Schmand, L.J. Launer, J. Jolles. Medial temporal lobe atrophy and memory dysfunction as predictors for dementia in subjects with mild cognitive impairment. Journal of Neurology (ISSN: 0340-5354). 1999; 246 (6) : 477-85.
  40. B.C. Dickerson, A. Bakkour, D.H. Salat, E. Feczko, J. Pacheco, D.N. Greve. The cortical signature of Alzheimer's disease: regionally specific cortical thinning relates to symptom severity in very mild to mild AD dementia and is detectable in asymptomatic amyloid-positive individuals. Cerebral Cortex (New York, N.Y.) (ISSN: 1460-2199). 2009; 19 (3) : 497-510.
  41. B. Brossard. pubifying Dementia: the Use of the Mini-Mental State Exam in Medical Research and Practice, in Measuring Mental Disorders. 2018, Elsevier. p. 127-154. 2018;
  42. A. Sinha, A. Sinha. Mild Cognitive Impairment and its Diagnosis to Progression to Dementia with Several Screening Measures. The Open Psychology Journal (ISSN: 1874-3501). 2018; 11 (1) : 142-7.
  43. M. Liu, F. Li, H. Yan, K. Wang, Y. Ma, L. Shen, Alzheimer's Disease Neuroimaging Initiative. A multi-model deep convolutional neural network for automatic hippocampus segmentation and classification in Alzheimer's disease. NeuroImage (ISSN: 1095-9572). 2020; 208116459-
  44. A. Farooq. A deep CNN based multi-class classification of Alzheimer's disease using MRI. 2017 IEEE International Conference on Imaging systems and techniques (IST) 2017; 1-6.
  45. A.S. Ahuja. The impact of artificial intelligence in medicine on the future role of the physician. PeerJ (ISSN: 2167-8359). 2019; 7e7702-
  46. H. Patel, R.J. Dobson, S.J. Newhouse. A meta-analysis of Alzheimer's disease brain transcriptomic data. Journal of Alzheimer's Disease (ISSN: 1875-8908). 2019; 68 (4) : 1635-56.
  47. W. Jagust. Imaging the evolution and pathophysiology of Alzheimer disease. Nature Reviews. Neuroscience (ISSN: 1471-0048). 2018; 19 (11) : 687-700.
  48. P. Gautam, N. Cherbuin, P.S. Sachdev, W. Wen, K.J. Anstey. Relationships between cognitive function and frontal grey matter volumes and thickness in middle aged and early old-aged adults: the PATH Through Life Study. NeuroImage (ISSN: 1095-9572). 2011; 55 (3) : 845-55.
  49. C. Fang, C. Li, P. Forouzannezhad, M. Cabrerizo, R.E. Curiel, D. Loewenstein, Alzheimer's Disease Neuroimaging Initiative. Gaussian discriminative component analysis for early detection of Alzheimer's disease: A supervised dimensionality reduction algorithm. Journal of Neuroscience Methods (ISSN: 1872-678X). 2020; 344108856-
  50. B. Khagi, G.R. Kwon, R. Lama. Comparative analysis of Alzheimer's disease classification by CDR level using CNN, feature selection, and machine-learning techniques. International Journal of Imaging Systems and Technology (ISSN: 0899-9457). 2019; 29 (3) : 297-310.
  51. U.M. Khaire, R. Dhanalakshmi. Stability of feature selection algorithm: A review. Journal of King Saud University. Computer and Information Sciences (ISSN: 1319-1578). 2019; 34 (4) : 1060-73.
  52. M. Goryawala. Inclusion of neuropsychological scores in atrophy models improves diagnostic classification of Alzheimer’s disease and mild cognitive impairment. Computational intelligence and neuroscience 2015; 2015865265-
  53. H.M. Ahmed, Z.F. Elsharkawy, A.S. Elkorany. Alzheimer disease diagnosis for magnetic resonance brain images using deep learning neural networks. Multimedia Tools and Applications (ISSN: 1380-7501). 2023; 82 (12) : 17963-77.
  54. M.W. Oktavian, N. Yudistira, A. Ridok. Classification of Alzheimer's Disease Using the Convolutional Neural Network (CNN) with Transfer Learning and Weighted Loss. arXiv preprint arXiv:2207.01584, 2022. 2022;
  55. B.Y. Lim, K.W. Lai, K. Haiskin, K.A. Kulathilake, Z.C. Ong, Y.C. Hum. Deep learning model for prediction of progressive mild cognitive impairment to Alzheimer's disease using structural MRI. Frontiers in Aging Neuroscience (ISSN: 1663-4365). 2022; 14876202-
  56. J. Stubblefield. Study the combination of brain MRI imaging and other datatypes to improve Alzheimer′ s disease diagnosis. MedRxiv, 2022: p. 2022. 2022;
  57. L. Lin, M. Xiong, G. Zhang, W. Kang, S. Sun, S. Wu, Initiative Alzheimer's Disease Neuroimaging. A Convolutional Neural Network and Graph Convolutional Network Based Framework for AD Classification. Sensors (Basel) (ISSN: 1424-8220). 2023; 23 (4) : 1914-
  58. W. Lin, Q. Gao, M. Du, W. Chen, T. Tong. Multiclass diagnosis of stages of Alzheimer's disease using linear discriminant analysis scoring for multimodal data. Computers in Biology and Medicine (ISSN: 1879-0534). 2021; 134104478-
  59. Z. Xu, H. Deng, J. Liu, Y. Yang. Diagnosis of Alzheimer's Disease Based on the Modified Tresnet. Electronics (Basel) (ISSN: 2079-9292). 2021; 10 (16) : 1908-
  60. S. Liu, C. Yadav , C. Fernandez-Granda , N. Razavian . On the design of convolutional neural networks for automatic detection of Alzheimer’s disease. InMachine Learning for Health Workshop 2020; 2020184-201.

Comments