gwas2crispr: From GWAS to CRISPR-ready Files

Overview

gwas2crispr prepares genome-wide association study (GWAS) results for downstream clustered regularly interspaced short palindromic repeats (CRISPR) workflows.

The package retrieves significant single-nucleotide polymorphisms (SNPs) for supported GWAS Catalog trait identifiers from the EMBL-EBI GWAS Catalog REST API v2 and returns CRISPR-ready outputs for the GRCh38/hg38 human genome build.

The main outputs are:

The public argument name efo_id is retained for backward compatibility. In gwas2crispr 0.1.5, selected EFO, MONDO, and NCIT identifiers are supported when available through the GWAS Catalog API. HP, Orphanet, and ORPHA identifiers are accepted for compatibility with selected records.

Example accepted formats include EFO_0001663, EFO:0001663, MONDO_0007254, MONDO:0007254, NCIT_C4872, and NCIT:C4872.

Installation

Install from CRAN:

install.packages("gwas2crispr")

Optional packages for FASTA output:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install(c(
  "Biostrings",
  "GenomeInfoDb",
  "BSgenome.Hsapiens.UCSC.hg38"
))

Development version:

if (!requireNamespace("devtools", quietly = TRUE))
  install.packages("devtools")

devtools::install_github("leopard0ly/gwas2crispr")

Fetch GWAS associations

library(gwas2crispr)

gwas_data <- fetch_gwas(
  efo_id  = "EFO_0000707",
  p_cut   = 1e-6,
  verbose = FALSE
)

names(gwas_data)
head(gwas_data$associations)

Selected non-EFO identifiers use the same argument name when supported by the GWAS Catalog API:

fetch_gwas(efo_id = "MONDO_0007254", p_cut = 5e-8, verbose = FALSE)
fetch_gwas(efo_id = "NCIT_C4872", p_cut = 5e-8, verbose = FALSE)

Run without writing files

By default, no files are written.

res <- run_gwas2crispr(
  efo_id     = "EFO_0000707",
  p_cut      = 1e-6,
  flank_bp   = 300,
  out_prefix = NULL,
  verbose    = FALSE
)

res$summary
head(res$snps_full)
head(res$bed)

Write files safely

To write output files, provide out_prefix. In examples, use tempdir().

out_prefix <- file.path(tempdir(), "lung")

res <- run_gwas2crispr(
  efo_id     = "EFO_0000707",
  p_cut      = 1e-6,
  flank_bp   = 300,
  out_prefix = out_prefix,
  verbose    = FALSE
)

res$written

Expected output paths:

paste0(out_prefix, "_snps_full.csv")
paste0(out_prefix, "_snps_hg38.bed")
paste0(out_prefix, "_snps_flank300.fa")

The FASTA file is created only when the optional genome packages are available.

Output structure

names(res)

Common outputs:

res$summary
res$snps_full
res$bed
res$fasta
res$written

Session information

sessionInfo()
#> R version 4.4.3 (2025-02-28 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 22621)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=C                  LC_CTYPE=Arabic_Libya.utf8   
#> [3] LC_MONETARY=Arabic_Libya.utf8 LC_NUMERIC=C                 
#> [5] LC_TIME=Arabic_Libya.utf8    
#> 
#> time zone: Africa/Tripoli
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39     R6_2.6.1          fastmap_1.2.0     xfun_0.56        
#>  [5] cachem_1.1.0      knitr_1.51        htmltools_0.5.9   rmarkdown_2.30   
#>  [9] lifecycle_1.0.5   cli_3.6.6         sass_0.4.10       jquerylib_0.1.4  
#> [13] compiler_4.4.3    rstudioapi_0.18.0 tools_4.4.3       evaluate_1.0.5   
#> [17] bslib_0.10.0      yaml_2.3.10       otel_0.2.0        jsonlite_2.0.0   
#> [21] rlang_1.2.0