Automated Data Quality Checks for Recurring Dataset Deliveries


[Up] [Top]

Documentation for package ‘dqcheckr’ version 0.2.0

Help Pages

check_allowed_values QC-09: Check for values outside the allowed set
check_col_count QC-05: Report column count
check_distinct_counts QC-08: Report distinct value counts for character columns
check_duplicate_rows QC-03: Check for fully-duplicate rows
check_empty_column QC-02: Check for entirely empty columns
check_inferred_types QC-06: Report inferred column types
check_key_uniqueness QC-12: Check uniqueness of key column(s)
check_min_row_count QC-14: Check row count bounds and optional file size
check_missing_rate QC-01: Check missing rate per column
check_non_numeric QC-11: Check non-numeric rate in numeric columns
check_numeric_bounds QC-10: Check for out-of-range numeric values
check_numeric_stats QC-07: Report numeric summary statistics
check_outliers QC-15: Detect statistical outliers in numeric columns
check_pattern QC-13: Check values against a regex pattern
check_row_count QC-04: Report row count
check_schema_contract SC-01 / SC-02: Check columns against the expected schema contract
compare_snapshots Compare two snapshots from the SQLite database
detect_files Detect current and previous dataset files
dq_result Construct a data quality result object
infer_col_type Infer the logical type of a character column
list_snapshots List snapshots available in the database
load_config Load and merge dataset configuration
overall_status Compute the worst status across a list of dq_result objects
read_dataset Read a dataset file into a data frame
read_recent_snapshots Read recent snapshot history from the SQLite database
resolve_col_type Resolve the effective type of a column, respecting config overrides
run_comparison_checks Run all version comparison checks between two dataset snapshots
run_custom_checks Run organisation-specific custom checks
run_dq_check Run a full data quality check pipeline
run_qc_checks Run all generic quality checks on a dataset