Tests whether a set of columns constitute primary keys of a data.frame, i.e., whether they uniquely identify every row in the table.
Usage
validate_primary_keys(.data, keys)
# S3 method for class 'validate_pk'
print(x, ...)Value
An S3 object of class validate_pk containing:
- table_name
Name of the input table from the original call
- keys
Character vector of column names tested
- is_primary_key
Logical: TRUE if keys uniquely identify all rows AND no key column contains NA values
- n_rows
Total number of rows in the table
- n_unique_keys
Number of distinct key combinations
- n_duplicate_keys
Number of key combinations that appear more than once
- duplicate_keys
A data.frame of duplicated key values with their counts
- has_numeric_keys
Logical: TRUE if any key column is of type double
- has_na_keys
Logical: TRUE if any key column contains NA values
- na_in_keys
Named logical vector indicating which key columns contain NAs
See also
Other join validation:
compare_tables(),
validate_join(),
validate_var_relationship()
Examples
df <- data.frame(
id = c(1L, 2L, 3L, 4L),
group = c("A", "A", "B", "B"),
value = c(10, 20, 30, 40)
)
validate_primary_keys(df, "id")
#>
#> ── Primary Key Validation ──────────────────────────────────────────────────────
#> Table: df
#> Key column: id
#>
#> Metric Value
#> ─────────────────────── ─────
#> Total rows 4
#> Unique key combinations 4
#> Duplicate key combos 0
#>
#> ✔ YES - Keys uniquely identify all rows.
validate_primary_keys(df, "group")
#>
#> ── Primary Key Validation ──────────────────────────────────────────────────────
#> Table: df
#> Key column: group
#>
#> Metric Value
#> ─────────────────────── ─────
#> Total rows 4
#> Unique key combinations 2
#> Duplicate key combos 2
#>
#> ✖ NO - Keys do NOT uniquely identify all rows.
#>
#> Duplicate keys (showing up to 10):
#> group n
#> 1 A 2
#> 2 B 2