Programmering

groupdata2 version 1.1.0 released on CRAN

A few days ago, I released a new version of my R package, groupdata2, on CRAN. groupdata2 contains a set of functions for grouping data, such as creating balanced partitions and folds for cross-validation. Version 1.1.0 adds the balance() function for using up- and downsampling to balance the categories (e.g. classes) in your dataset. The main difference between existing up-/downsampling tools and balance() is that it has methods for dealing with IDs.

cvms 0.1.0 released on CRAN

After a fairly long life on GitHub, my R package, cvms, for cross-validating linear and logistic regression, is finally on CRAN! With a few additions in the past months, this is a good time to catch you all up on the included functionality. For examples, check out the readme on GitHub! The main purpose of cvms is to allow researchers to quickly compare their models with cross-validation, with a tidy output containing the relevant metrics.

Running cross_validate from cvms in parallel

The cvms package is useful for cross-validating a list of linear and logistic regression model formulas in R. To speed up the process, I’ve added the option to cross-validate the models in parallel. In this post, I will walk you through a simple example and introduce the combine_predictors() function, which generates model formulas by combining a list of fixed effects. We will be using the simple participant.scores dataset from cvms.

Repeated cross-validation in cvms and groupdata2

I have spent the last couple of days adding functionality for performing repeated cross-validation to cvms and groupdata2. In this quick post I will show an example. (Please note: At the moment, you need to use the github version of groupdata2. I hope to update it on CRAN this month.) In cross-validation, we split our training set into a number (often denoted “k”) of groups called folds. We repeatedly train our machine learning model on k-1 folds and test it on the last fold, such that each fold becomes test set once.

groupdata2 v1.0.0 release

After having spent a lot of time developing groupdata2 in the spring, adding a lot of new features, I’ve finally found the time to submit it to CRAN . For those new to groupdata2, it’s a collection of tools for creating groups from your data (hence the name!). From basic greedy groups, to balanced folds for cross-validation, to automatically finding group starts (whenever an element in a vector differs from the previous element), it has a lot to offer.

splitChunk - RStudio addin for splitting code chunks in R Markdown

When working with R Markdown I usually use the key command cmd+alt+i to insert new code chunks, i.e. " ```{r}\n\n``` “. Often I do multiple things in one chunk and then want to split the chunk in two and write some text in-between. To do this I have created an addin for RStudio that inserts " ```\n\n```{r} “. I have set this up with the key command cmd+alt+shift+i as it is kind of a “shifted” version of inserting a new chunk.

Untoggle inline output in RMarkdown in RStudio

A lot of my class mates found themselves frustrated when updating to the newer versions of RStudio, because the output started showing inline, below the code, instead of in the console or Plots window as usual. Personally I like being able to see the output and the code at the same time, so I found out how to change back. It was pretty easy: Go to RStudio » Preferences » R Markdown Now untoggle the “Show output inline for all R Markdown documents” RMarkdown: turn off inline output Restart RStudio and everything should work as before.