Project Title: Healthcare Provision for Diabetes Patients Prediction and Classification Student: Jingru WANG Course: MSc in Computer Science Abstract: Diabetes is a major health problem currently, which has high incident rate all over the world. It may cause serious conditions like death and sequela. It is important to detect and predict early. In our study, we applied 7 machine learning techniques for predicting and classifying diabetes, which aim to predict whether the patients have diabetes or not. We used logistic regression (LR) model, extreme gradient boosting (XGBoost) model, artificial neural network (ANN) model, quadratic discriminant analysis (QDA) model, classification and regression tree (CART) model, support vector machine (SVM) model and linear regression model to predict and classify. We also do 5 folds cross validation for these 7 models to prevent overfitting and find the best result from the 5 results. Then we compared these models without cross validation and models with 5 folds cross validation. In model performance of these 7 models, we use 4 indexes to evaluate the model: accuracy, sensitivity, specificity and precision. In medical area, the accuracy is very important. So, we consider that the “best” model should have the highest accuracy. The “best” model in our study is the logistic regression model with 5 folds, which obtain accuracy about 72.55%. We also found the most stable model with similar value of 4 indexes. It is the support vector machine (using radial basis kernel function) with 5 folds cross validation, with accuracy of 70.59%.