cross_validate() in R

In the process of learning cross-validation in R for my degree in Cognitive Science, I decided to write a cross-validation function that could handle most of the tasks we were faced with. This seemed a good way to learn more about the subject and practice my R skills.

The function can (so far) be used with gaussian and binomial models – lm(), lmer(), glm() og glmer().

cross_validate() creates balanced folds (balanced on 1 variable so far).

For every fold it:

  • creates a training set and a test set
  • trains the model on the training set
  • predicts the dependent variable on the test set

With gaussian models – lm() and lmer() – it returns the average values of RMSE, r2m, r2c, AIC, and BIC.

With binomial models – glm() and glmer() – it uses the predictions to make a confusion matrix and a ROC curve. The associated values – Area Under the Curve, Sensitivity, Specificity, etc. – are returned.

With both models it counts convergence warnings, so it is possible to adjust the model or choose another.

cross_validate_list() is used to cross-validate multiple models at once. This yields a dataframe with the previously mentioned values, making model comparison easy.

Find the latest versions of the code and the manual here:

https://github.com/LudvigOlsen/R-cross_validate

Date: October 2016
Skills: Programming, R