Machine Learning Algorithm-Based Prediction of Diabetes Among Female Population Using PIMA Dataset

Afshan Ahmed, Jalaluddin Khan, Mohd Arsalan, Kahksha Ahmed, Abdelaaty A. Shahat, Abdulsalam Alhalmi, Sameena Naaz, Arkalgud Ramaprasad (Editor), Francesco Faita (Editor)

Research output: Contribution to journalArticlepeer-review

16 Downloads (Pure)

Abstract

Background: Diabetes is a metabolic disorder characterized by increased blood sugar levels. Early detection of diabetes could help individuals to manage and delay the progression of this disorder effectively. Machine learning (ML) methods are important in forecasting the progression and diagnosis of different medical problems with better accuracy. Although they cannot substitute the work of physicians in the prediction and diagnosis of disease, they can be of great help in identifying hidden patterns based on the results and outcome of disease. Methods: In this research, we retrieved the PIMA dataset from the Kaggle repository, the retrieved dataset was further processed for applied PCA, heatmap, and scatter plot for exploratory data analysis (EDA), which helps to find out the relationship between various features in the dataset using visual representation. Four different ML algorithms Random Forest (RF), Decision Tree (DT), Naïve Bayes (NB), and Logistic regression (LR) were implemented on Rattle using Python for the prediction of diabetes among the female population. Results: Results of our study showed that RF performs better in terms of accuracy of 80%, precision of 82%, error rate of 20%, and sensitivity of 88% as compared to other developed models DT, NB, and LR. Conclusions: Diabetes is a common problem prevailing across the globe, ML-based prediction models can help in the prediction of diabetes much earlier before the worsening of the condition.
Original languageEnglish
Pages (from-to)37
Number of pages1
JournalHealthcare (Basel, Switzerland)
Volume13
Issue number1
Early online date29 Dec 2024
DOIs
Publication statusE-pub ahead of print - 29 Dec 2024

Keywords

  • machine learning
  • logistic regression
  • diabetes
  • Naïve Bayes
  • decision tree
  • random forest

Cite this