Abstract
Millions of people throughout the world suffer from the chronic illness diabetes mellitus. Effective diabetes care and complication avoidance depend on early diabetes prediction and diagnosis. Using the three distinct datasets—the PIMA India dataset, the NHANES dataset, and Mendeley’s diabetes dataset—we give a thorough analysis of diabetic prediction in this study. Lazy Predict enables us to efficiently evaluate a wide range of classifiers on each dataset, providing valuable insights into model performance. The top-performing model on each dataset is selected as the best individual model. Furthermore, ensembles are created by combining the predictions of the top ten models without any resampling and with resampling techniques. Random forest achieved the highest accuracy of 79% on the PIMA dataset, XGB achieved the highest accuracy of 99% on Mendeley’s dataset, and the dummy classifier attained the highest accuracy of 88%. for the NHANES dataset. However, the ensembles without oversampling consistently outperformed their counterparts with resampling. Surprisingly, the ensemble without oversampling exhibited the highest accuracy overall, followed by the ensemble with oversampling, challenging the common notion that resampling always leads to improved performance.
Original language | English |
---|---|
Title of host publication | Third International Conference on Computing and Communication Networks. ICCCN 2023 |
Publisher | Springer Nature |
Publication status | Published - 21 Jul 2024 |