groupdata2 is my first R package. It is meant to make grouping / splitting of data easy, while containing a large range of methods for grouping in different contexts. It contains a function for creating balanced folds for cross-validation.


  • group_factor
  • group
  • splt
  • fold


Returns a factor with group numbers, e.g. (1,1,1,2,2,2,3,3,3).

This can be used to subset, aggregate, group_by, etc.

Create equally sized groups by setting force_equal = TRUE

Randomize grouping factor by setting randomize = TRUE


Returns the given data as a dataframe with added grouping factor made with group_factor(). The dataframe is grouped by the grouping factor for easy use with dplyr pipelines.


Creates the specified groups with group_factor() and splits the given data by the grouping factor with base::split. Returns the splits in a list.


Creates (optionally) balanced folds for use in cross-validation. Balance folds on one categorical variable and/or make sure that all datapoints sharing an ID is in the same fold.


There is a wide range of methods for creating the groups and more are own their way.


You can find groupdata2 on CRAN and GitHub (dev. version).

Skills: Programming, R