cvms | Cross-Validation for Model Selection | R

In the process of learning cross-validation in R for my degree in Cognitive Science, I decided to write a cross-validation function that could handle most of the tasks we were faced with. This seemed a good way to learn more about the subject and practice my R skills. This has now turned into a package that makes it quick and easy to compare Gaussian and binomial regression models – lm(), lmer(), glm() og glmer().

As input, cross_validate takes a data frame where one of the columns is a grouping factor indicating which fold a row is in. This is a one-liner with groupdata2::fold.

It is possible to cross-validate multiple models at once, yielding an output (tibble) that makes it easy to compare (and report) the models, while giving access to the details of each model.

See examples of code and output on the GitHub page:

https://github.com/LudvigOlsen/cvms

Here’s a quick code example, that first folds the data (balanced so that there’s a similar number of each diagnosis in every fold and so that each participant only appears in one fold), then cross-validates a simple Gaussian regression model.

 

cv <- fold(data,

k = 4,

cat_col = 'diagnosis',

id_col = 'participant') %>%

cross_validate("score~diagnosis",

folds_col = '.folds',

family='gaussian',

REML = FALSE)

 

Date: October 2016
Skills: Programming, R