groupdata2

groupdata2 is my first R package. It is meant to make grouping / splitting of data easy, while containing a large range of methods for grouping in different contexts. It contains a function for creating balanced folds for cross-validation.

Functions

  • group_factor
  • group
  • splt
  • fold

group_factor()

Returns a factor with group numbers, e.g. (1,1,1,2,2,2,3,3,3).

This can be used to subset, aggregate, group_by, etc.

Create equally sized groups by setting force_equal = TRUE

Randomize grouping factor by setting randomize = TRUE

group()

Returns the given data as a dataframe with added grouping factor made with group_factor(). The dataframe is grouped by the grouping factor for easy use with dplyr pipelines.

splt()

Creates the specified groups with group_factor() and splits the given data by the grouping factor with base::split. Returns the splits in a list.

fold()

Creates (optionally) balanced folds for use in cross-validation. Balance folds on one categorical variable and/or make sure that all datapoints sharing an ID is in the same fold.

Methods

There is a wide range of methods for creating the groups and more are own their way.

Where?

You can find groupdata2 on CRAN and GitHub (dev. version).

Skills: Programming, R