Ensure data.frame/list elements conform to a given schema

If any of the data-masked expressions in ... are not all TRUE, rlang::abort is called for the first expression which was not (all) TRUE. .names and .size arguments can be used to check for given names and size of the data.frame/list. The checking of size is from the vctrs package (using vctrs::vec_size) and thus applies vctrs size rules.

Usage

schema(.data, ...)

# S3 method for class 'data.frame'
schema(
  .data,
  ...,
  .names = NULL,
  .size = NULL,
  .message = NULL,
  .class = NULL,
  .error_call = caller_env()
)

# S3 method for class 'list'
schema(
  .data,
  ...,
  .names = NULL,
  .size = NULL,
  .message = NULL,
  .class = NULL,
  .error_call = caller_env()
)

Arguments

.data: a data.frame or list to check the schema of.
...: any number of R expressions to be evaluated using .data as a data-mask, which should each evaluate to (a logical vector of all) TRUE for no error to occur.
.names: optional character vector of names which must be present in the data.frame/list.
.size: optional scalar integerish value for the size of that the data.frame/list must have.
.message: single default error message for non-named expressions.
.class: class to assign to the error (passed to rlang::abort).
.error_call: the call environment to use for the error (passed to rlang::abort).

Details

schema_cast and schema_recycle are versions of schema() that attempt to coerce the data to the desired schema.

Examples

# NB: Some of these examples are expected to produce an error. To
#     prevent them from terminating a run with example() they are
#     piped into a call to try().

li <- list(x = 1, y = "hi", z = \(x) x > 1)
li |>
  schema(x == 1, is.character(y), is.function(z)) # all TRUE

li |>
  schema(x == 1, is.numeric(y)) |>
  try()
#> Error in eval(expr, envir) : Error in `schema()`
#> ℹ Argument `is.numeric(y)` for data mask `li` returned `FALSE`.
# => Error: Argument `is.numeric(y)` for data mask `.data` returned `FALSE`.

li |>
  schema(length(x)) |>
  try()
#> Error in eval(expr, envir) : Error in `schema()`
#> ℹ Expression `length(x)` for object `li` must evaluate to class <logical> not
#>   <integer>.
# => Error: Expression `length(x)` for object `.data` must evaluate to class
# <logical> not <integer>.
# even when if(1) "ok" works

# The default error message can be overridden to be more informative:
df <- data.frame(a = 1L:3L, b = c("x", "y", "z"))
df |>
  schema("a must be double" = is.double(a)) |>
  try()
#> Error in eval(expr, envir) : Error in `schema()`
#> a must be double
# => Error: a must be double

# Alternatively, one error message can be used for all expressions:
df |>
  schema(
    is.integer(a),
    !grepl("x", b),
    .message = "a must be integer and b cannot contain 'x'."
  ) |> try()
#> Error in eval(expr, envir) : Error in `schema()`
#> ℹ a must be integer and b cannot contain 'x'.
# => Error: a must be integer and b cannot contain 'x'.

# injection and glue can be used to supply expressions, names, and messages:
x <- "my error"
schema(df, "{x}" = FALSE) |> try()
#> Error in eval(expr, envir) : Error in `schema()`
#> my error
# => Error: my error
y <- FALSE
schema(df, {{ x }} := !!y) |> try()
#> Error in eval(expr, envir) : Error in `schema()`
#> my error
# => Error: my error
schema(df, !!x := !is.character(b)) |> try()
#> Error in eval(expr, envir) : Error in `schema()`
#> my error
# => Error: my error
x <- list("my error" = FALSE)
schema(df, !!!x) |> try()
#> Error in eval(expr, envir) : Error in `schema()`
#> my error
# => Error: my error