Title: | Tabulate and Summarise Categorical Data |
---|---|
Description: | Functions for tabulating and summarising categorical variables. Most functions are designed to work with dataframes, and use the 'tidyverse' idiom of taking the dataframe as the first argument so they work within pipelines. Equivalent functions that operate directly on vectors are also provided where it makes sense. This package aims to make exploratory data analysis involving categorical variables quicker, simpler and more robust. |
Authors: | Oliver Hawkins [aut, cre] |
Maintainer: | Oliver Hawkins <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.18.0 |
Built: | 2025-03-13 03:30:35 UTC |
Source: | https://github.com/olihawkins/tabbycat |
This function crosstabulates the frequencies of one categorical variable
within the groups of another. The results are sorted on the values of the
variable whose distribution is shown in each column i.e. the variable
specified with row_cat
. If this variable is a character vector it
will be sorted alphabetically. If it is a factor it will be sorted in the
order of its levels.
cat_compare( data, row_cat, col_cat, na.rm.row = FALSE, na.rm.col = FALSE, na.rm = NULL, only = "", clean_names = getOption("tabbycat.clean_names"), na_label = getOption("tabbycat.na_label") )
cat_compare( data, row_cat, col_cat, na.rm.row = FALSE, na.rm.col = FALSE, na.rm = NULL, only = "", clean_names = getOption("tabbycat.clean_names"), na_label = getOption("tabbycat.na_label") )
data |
A dataframe containing the two variables of interest. |
row_cat |
The column name of a categorical variable whose distribution
will be calculated for each group in |
col_cat |
The column name of a categorical variable which will be
split into groups and the distrubtion of |
na.rm.row |
A boolean indicating whether to exclude NAs from the row results. The default is FALSE. |
na.rm.col |
A boolean indicating whether to exclude NAs from the column results. The default is FALSE. |
na.rm |
A boolean indicating whether to exclude NAs from both row and
column results. This argument is provided as a convenience. It allows you
to set |
only |
A string indicating that only one set of frequency columns
should be returned in the results. If |
clean_names |
A boolean indicating whether the column names of the
results tibble should be cleaned, so that any column names produced from
data are converted to snake_case. The default is TRUE, but this can be
changed with |
na_label |
A string indicating the label to use for the columns that contain data for missing values. The default value is "na", but use this argument to set a different value if the default value collides with data in your dataset. |
A tibble showing the distribution of row_cat
within each
group in col_cat
.
This function shows the distrbution of values within given a categorical
variable for one group within another categorical variable, and compares it
with the distribution among all observations not in that group. Its purpose
is to let you see quickly whether the distribution within that group differs
from the distribution for the rest of the observations. The results are
sorted in descending order of frequency for the named group i.e. the group
named in col_group
.
cat_contrast( data, row_cat, col_cat, col_group, na.rm.row = FALSE, na.rm.col = FALSE, na.rm = NULL, only = "", clean_names = getOption("tabbycat.clean_names"), na_label = getOption("tabbycat.na_label"), other_label = getOption("tabbycat.other_label") )
cat_contrast( data, row_cat, col_cat, col_group, na.rm.row = FALSE, na.rm.col = FALSE, na.rm = NULL, only = "", clean_names = getOption("tabbycat.clean_names"), na_label = getOption("tabbycat.na_label"), other_label = getOption("tabbycat.other_label") )
data |
A dataframe containing the two variables of interest. |
row_cat |
The column name of a categorical variable whose distribution
should be calculated for each exclusive group in |
col_cat |
The column name of a categorical variable that will be split into two exclusive groups, one containing observations with a particular value of that variable, and another containing all other observations. |
col_group |
The name of the group within |
na.rm.row |
A boolean indicating whether to exclude NAs from the row results. The default is FALSE. |
na.rm.col |
A boolean indicating whether to exclude NAs from the column results. The default is FALSE. |
na.rm |
A boolean indicating whether to exclude NAs from both row and
column results. This argument is provided as a convenience. It allows you
to set |
only |
A string indicating that only one set of frequency columns
should be returned in the results. If |
clean_names |
A boolean indicating whether the column names of the
results tibble should be cleaned, so that any column names produced from
data are converted to snake_case. The default is TRUE, but this can be
changed with |
na_label |
A string indicating the label to use for the columns that contain data for missing values. The default value is "na", but use this argument to set a different value if the default value collides with data in your dataset. |
other_label |
A string indicating the label to use for the columns that contain data for observations not in the named group. The default value is "other", but use this argument to set a different value if the default value collides with data in your dataset. |
A tibble showing the distribution of row_cat
within each of
the two exclusive groups in col_cat
.
This function differs from cat_vcount
in that it operates on columns
in dataframes rather than directly on vectors, which means it is more useful
in pipelines but handles a narrower range of inputs. The results are sorted
in descending order of frequency.
cat_count( data, cat, na.rm = FALSE, only = "", clean_names = getOption("tabbycat.clean_names") )
cat_count( data, cat, na.rm = FALSE, only = "", clean_names = getOption("tabbycat.clean_names") )
data |
A dataframe containing a categorical vector for which frequencies will be calculated. |
cat |
The column name of the categorical variable for which frequencies will be calculated. |
na.rm |
A boolean indicating whether to exclude NAs from the results. The default is FALSE. |
only |
A string indicating that only one of the frequency columns
should be returned in the results. If |
clean_names |
A boolean indicating whether the column names of the
results tibble should be cleaned, so that any column names produced from
data are converted to snake_case. The default is TRUE, but this can be
changed with |
A tibble showing the frequency of each value in cat
.
The results are sorted on the values of the categorical variable i.e.
the variable specified with cat
. If this variable is a character
vector it will be sorted alphabetically. If it is a factor it will be
sorted in the order of its levels. This function can be called as either
cat_summarise
or cat_summarize
.
cat_summarise( data, cat, num, na.rm = FALSE, clean_names = getOption("tabbycat.clean_names") ) cat_summarize( data, cat, num, na.rm = FALSE, clean_names = getOption("tabbycat.clean_names") )
cat_summarise( data, cat, num, na.rm = FALSE, clean_names = getOption("tabbycat.clean_names") ) cat_summarize( data, cat, num, na.rm = FALSE, clean_names = getOption("tabbycat.clean_names") )
data |
A dataframe containing a categorical variable and numerical variable to summarise. |
cat |
The name of a column in |
num |
The name of a column in |
na.rm |
A boolean indicating whether to exclude NAs from the row
results. Note that NAs are **always** ignored in calculating the summary
statistics for |
clean_names |
A boolean indicating whether the column names of the
results tibble should be cleaned, so that any column names produced from
data are converted to snake_case. The default is TRUE, but this can be
changed with |
A tibble showing summary statistics for num
for each group
in cat
.
This function differs from cat_count
in that it operates directly on
vectors, rather than on columns in dataframes, which means it is less useful
in pipelines but can handle a wider range of inputs. The results are sorted
in descending order of frequency.
cat_vcount( cat, na.rm = FALSE, only = "", clean_names = getOption("tabbycat.clean_names") )
cat_vcount( cat, na.rm = FALSE, only = "", clean_names = getOption("tabbycat.clean_names") )
cat |
A categorical vector for which frequencies will be calculated. |
na.rm |
A boolean indicating whether to exclude NAs from the results. The default is FALSE. |
only |
A string indicating that only one of the frequency columns
should be returned in the results. If |
clean_names |
A boolean indicating whether the column names of the
results tibble should be cleaned, so that any column names produced from
data are converted to snake_case. The default is TRUE, but this can be
changed with |
A tibble showing the frequency of each value in cat
.