ukbflow logo

ukbflow

RAP-Native R Workflow for UK Biobank Analysis

R-CMD-check Codecov Lifecycle License: MIT

๐Ÿ“š Documentation โ€ข ๐Ÿš€ Get Started โ€ข ๐Ÿ’ฌ Issues โ€ข ๐Ÿค Contributing

Languages: English | ็ฎ€ไฝ“ไธญๆ–‡


Overview

ukbflow provides a streamlined, RAP-native R workflow for UK Biobank analysis โ€” from phenotype extraction and disease derivation to association analysis and publication-quality figures.

UK Biobank Data Policy (2024+): Individual-level data must remain within the RAP environment. Only summary-level outputs may be downloaded locally. All ukbflow functions are designed with this constraint in mind.

library(ukbflow)

# Simulate UKB-style data locally (on RAP: replace with extract_batch() + job_wait())
data <- ops_toy(n = 5000, seed = 2026) |>
  derive_missing()

# Derive lung cancer outcome (ICD-10 C34) and follow-up time
data <- data |>
  derive_icd10(name = "lung", icd10 = "C34",
               source = c("cancer_registry", "hes")) |>
  derive_followup(name        = "lung",
                  event_col   = "lung_icd10_date",
                  baseline_col = "p53_i0",
                  censor_date  = as.Date("2022-10-31"),
                  death_col    = "p40000_i0")

# Define exposure: ever vs. never smoker
data[, smoking_ever := factor(
  ifelse(p20116_i0 == "Never", "Never", "Ever"),
  levels = c("Never", "Ever")
)]

# Cox regression: smoking โ†’ lung cancer (3-model adjustment)
res <- assoc_coxph(data,
  outcome_col  = "lung_icd10",
  time_col     = "lung_followup_years",
  exposure_col = "smoking_ever",
  covariates   = c("p21022", "p31", "p22189"))

# Forest plot
res_df <- as.data.frame(res)
plot_forest(
  data      = res_df,
  est       = res_df$HR,
  lower     = res_df$CI_lower,
  upper     = res_df$CI_upper,
  ci_column = 2L
)

Installation

# Recommended
pak::pkg_install("evanbio/ukbflow")

# or
remotes::install_github("evanbio/ukbflow")

Requirements: R โ‰ฅ 4.1 ยท dxpy (dx-toolkit, required for RAP interaction)

pip install dxpy

Core Features

Layer Key Functions Description
Connection auth_login, auth_select_project Authenticate to RAP via dx-toolkit
Data Access fetch_metadata, extract_batch, job_wait Retrieve phenotype data from UKB dataset on RAP
Data Processing decode_names, decode_values, derive_icd10, derive_followup, derive_case Harmonize multi-source records; derive analysis-ready cohort
Association Analysis assoc_coxph, assoc_logistic, assoc_subgroup Three-model adjustment; subgroup & trend analysis
Genomic Scoring grs_bgen2pgen, grs_score, grs_standardize Distributed plink2 scoring on RAP worker nodes
Visualization plot_forest, plot_tableone Publication-ready figures & tables
Utilities ops_setup, ops_toy, ops_na, ops_snapshot, ops_withdraw Environment check, synthetic data, pipeline diagnostics, and cohort management

Function Reference

Auth & Fetch
Extract & Decode
Job Monitoring
Derive โ€” Phenotypes
Derive โ€” Survival
Association Analysis
Visualisation
Utilities & Diagnostics
GRS Pipeline

Documentation

Full vignettes and function reference:

https://evanbio.github.io/ukbflow/


Contributing

Bug reports, feature requests, and pull requests are welcome. See CONTRIBUTING.md.


License

MIT License ยฉ 2026 Yibin Zhou


Made with โค๏ธ by Yibin Zhou

โฌ† Back to Top