Comparative Analysis of Resampling Techniques and Machine Learning Classifiers in Multiclass Classification of Diabetes Mellitus

Afshan Hashmi, Md Tabrez Nafis, Sameena Naaz, Imran Hussain

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This research study explores the effects of various
resampling techniques with different machine learning
classifiers on the accuracy of multi-class classification of
Diabetes using an imbalanced dataset. The diabetes dataset of
Mendeley is a multi-class dataset with information about
patients with no diabetes, pre-diabetes, and diabetes. The
dataset is imbalanced, where the majority class is diabetic. This
study is a comparative analysis of various oversampling
techniques, undersampling techniques, and hybrid techniques
with different machine learning algorithms to accurately
classify the person as diabetic, pre-diabetic, or non-diabetic.
Eight machine-learning algorithms and ten resampling
techniques were applied to the dataset to classify the patient
accurately. The result indicates that the combination of
XGBoost with K mean smote and smote N attains the highest
accuracy of 99.2%. It also suggests that oversampling
techniques perform better than undersampling techniques and
hybrid techniques.
Original languageEnglish
Title of host publicationInternational Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS 2023)
PublisherIEEE Computer Society
Publication statusPublished - 6 Dec 2023

Cite this