Optimized machine learning framework for cardiovascular disease diagnosis: a novel ethical perspective

Ghadah Alwakid, Farman Ul Haq, Noshina Tariq, Mamoona Humayun, Momina Shaheen, Marwa Alsadun

    Research output: Contribution to journalArticlepeer-review

    25 Downloads (Pure)

    Abstract

    Alignment of advanced cutting-edge technologies such as Artificial Intelligence (AI) has emerged as a significant driving force to achieve greater precision and timeliness in identifying cardiovascular diseases (CVDs). However, it is difficult to achieve high accuracy and reliability in CVD diagnostics due to complex clinical data and the selection and modeling process of useful features. Therefore, this paper studies advanced AI-based feature selection techniques and the application of AI technologies in the CVD classification. It uses methodologies such as Chi-square, Info Gain, Forward Selection, and Backward Elimination as an essence of cardiovascular health indicators into a refined eight-feature subset. This study emphasizes ethical considerations, including transparency, interpretability, and bias mitigation. This is achieved by employing unbiased datasets, fair feature selection techniques, and rigorous validation metrics to ensure fairness and trustworthiness in the AI-based diagnostic process. In addition, the integration of various Machine Learning (ML) models, encompassing Random Forest (RF), XGBoost, Decision Trees (DT), and Logistic Regression (LR), facilitates a comprehensive exploration of predictive performance. Among this diverse range of models, XGBoost stands out as the top performer, achieving exceptional scores with a 99% accuracy rate, 100% recall, 99% F1-measure, and 99% precision. Furthermore, we venture into dimensionality reduction, applying Principal Component Analysis (PCA) to the eight-feature subset, effectively refining it to a compact six-attribute feature subset. Once again, XGBoost shines as the model of choice, yielding outstanding results. It achieves accuracy, recall, F1-measure, and precision scores of 98%, 100%, 98%, and 97%, respectively, when applied to the feature subset derived from the combination of Chi-square and Forward Selection methods.
    Original languageEnglish
    JournalBMC Cardiovascular Disorders
    Volume25
    Issue number1
    Early online date20 Feb 2025
    DOIs
    Publication statusE-pub ahead of print - 20 Feb 2025

    Keywords

    • Machine learning
    • Cardiovascular diseases
    • Chi-square
    • Principal component analysis
    • K-nearest neighbours
    • Logistic regression
    • Decision trees
    • Random forest
    • Artificial intelligence
    • Feature selection
    • XGBoost

    Cite this