Project Title: Healthcare Provision for Diabetes Patients Prediction and Classification 
Student: Jingru WANG
Course: MSc in Computer Science  

Abstract:
Diabetes is a major health problem currently, which has high incident rate all over the 
world. It may cause serious conditions like death and sequela. It is important to detect 
and predict early. In our study, we applied 7 machine learning techniques for predicting 
and classifying diabetes, which aim to predict whether the patients have diabetes or not. 
We used logistic regression (LR) model, extreme gradient boosting (XGBoost) model, 
artificial neural network (ANN) model, quadratic discriminant analysis (QDA) model, 
classification and regression tree (CART) model, support vector machine (SVM) model 
and linear regression model to predict and classify. We also do 5 folds cross validation 
for these 7 models to prevent overfitting and find the best result from the 5 results. Then 
we compared these models without cross validation and models with 5 folds cross 
validation. In model performance of these 7 models, we use 4 indexes to evaluate the 
model: accuracy, sensitivity, specificity and precision. In medical area, the accuracy is 
very important. So, we consider that the “best” model should have the highest accuracy. 
The “best” model in our study is the logistic regression model with 5 folds, which obtain 
accuracy about 72.55%. We also found the most stable model with similar value of 4 
indexes. It is the support vector machine (using radial basis kernel function) with 5 folds 
cross validation, with accuracy of 70.59%.