Performs a dplyr join AND records enriched diagnostics in an audit trail.
These functions replace the pattern of wrapping a join with two
audit_tap() calls, capturing information that plain taps cannot:
match rates, relationship type, duplicate keys, and unmatched row counts.
Usage
left_join_tap(
.data,
y,
...,
.trail = NULL,
.label = NULL,
.stat = NULL,
.numeric_summary = TRUE,
.cols_include = NULL,
.cols_exclude = NULL
)
right_join_tap(
.data,
y,
...,
.trail = NULL,
.label = NULL,
.stat = NULL,
.numeric_summary = TRUE,
.cols_include = NULL,
.cols_exclude = NULL
)
inner_join_tap(
.data,
y,
...,
.trail = NULL,
.label = NULL,
.stat = NULL,
.numeric_summary = TRUE,
.cols_include = NULL,
.cols_exclude = NULL
)
full_join_tap(
.data,
y,
...,
.trail = NULL,
.label = NULL,
.stat = NULL,
.numeric_summary = TRUE,
.cols_include = NULL,
.cols_exclude = NULL
)
anti_join_tap(
.data,
y,
...,
.trail = NULL,
.label = NULL,
.stat = NULL,
.numeric_summary = TRUE,
.cols_include = NULL,
.cols_exclude = NULL
)
semi_join_tap(
.data,
y,
...,
.trail = NULL,
.label = NULL,
.stat = NULL,
.numeric_summary = TRUE,
.cols_include = NULL,
.cols_exclude = NULL
)Arguments
- .data
A data.frame or tibble (left table in the join).
- y
A data.frame or tibble (right table in the join).
- ...
Arguments passed to the corresponding
dplyr::*_join()function, includingby,suffix,keep,multiple,unmatched, etc. Thebyargument should be passed by name for enriched diagnostics.- .trail
An
audit_trail()object, orNULL(the default). WhenNULL, behavior depends on.stat: if.statis alsoNULL, a plain dplyr join is performed; if.statis provided,validate_join()diagnostics are printed before the join.- .label
Optional character label for this snapshot. If
NULL, auto-generated as"left_join_1"etc.- .stat
An unquoted column name for stat tracking, e.g.,
amount. Passed tovalidate_join().- .numeric_summary
Logical. If
FALSE, skip numeric summary computation in the snapshot (defaultTRUE).- .cols_include
Character vector of column names to include in the snapshot schema, or
NULL(the default) to include all columns. Mutually exclusive with.cols_exclude.- .cols_exclude
Character vector of column names to exclude from the snapshot schema, or
NULL(the default). Mutually exclusive with.cols_include.
Details
Enriched diagnostics (match rates, relationship type, duplicate keys) require
equality joins — by as a character vector, named character vector, or
simple equality join_by() expression (e.g., join_by(id),
join_by(a == b)). For non-equi join_by() expressions, the tap records
a basic snapshot without match-rate diagnostics.
All dplyr join features (join_by, multiple, unmatched, suffix, etc.)
work unchanged via ....
When .trail is NULL:
.statalsoNULL: plain dplyr join.statprovided: printsvalidate_join()diagnostics, then joins.labelprovided: warns that label is ignored
See also
Other operation taps:
filter_tap()
Examples
orders <- data.frame(id = 1:4, amount = c(100, 200, 300, 400))
customers <- data.frame(id = c(2, 3, 5), name = c("A", "B", "C"))
# With trail
trail <- audit_trail("join_example")
result <- orders |>
audit_tap(trail, "raw") |>
left_join_tap(customers, by = "id", .trail = trail, .label = "joined")
print(trail)
#>
#> ── Audit Trail: "join_example" ─────────────────────────────────────────────────
#> Created: 2026-03-24 12:14:32
#> Snapshots: 2
#>
#> # Label Rows Cols NAs Type
#> ─ ────── ──── ──── ─── ───────────────────────────────────
#> 1 raw 4 2 0 tap
#> 2 joined 4 3 2 left_join (one-to-one, 50% matched)
#>
#> Changes:
#> From To Rows Cols NAs
#> ──── ────── ──── ──── ───
#> raw joined = +1 +2
# Without trail (plain join)
result2 <- left_join_tap(orders, customers, by = "id")