gwas2crispr prepares genome-wide association study
(GWAS) results for downstream clustered regularly interspaced short
palindromic repeats (CRISPR) workflows.
The package retrieves significant single-nucleotide polymorphisms (SNPs) for supported GWAS Catalog trait identifiers from the EMBL-EBI GWAS Catalog REST API v2 and returns CRISPR-ready outputs for the GRCh38/hg38 human genome build.
The main outputs are:
The public argument name efo_id is retained for backward
compatibility. In gwas2crispr 0.1.5, selected EFO, MONDO, and NCIT
identifiers are supported when available through the GWAS Catalog API.
HP, Orphanet, and ORPHA identifiers are accepted for compatibility with
selected records.
Example accepted formats include EFO_0001663,
EFO:0001663, MONDO_0007254,
MONDO:0007254, NCIT_C4872, and
NCIT:C4872.
Install from CRAN:
Optional packages for FASTA output:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(c(
"Biostrings",
"GenomeInfoDb",
"BSgenome.Hsapiens.UCSC.hg38"
))Development version:
library(gwas2crispr)
gwas_data <- fetch_gwas(
efo_id = "EFO_0000707",
p_cut = 1e-6,
verbose = FALSE
)
names(gwas_data)
head(gwas_data$associations)Selected non-EFO identifiers use the same argument name when supported by the GWAS Catalog API:
By default, no files are written.
To write output files, provide out_prefix. In examples,
use tempdir().
out_prefix <- file.path(tempdir(), "lung")
res <- run_gwas2crispr(
efo_id = "EFO_0000707",
p_cut = 1e-6,
flank_bp = 300,
out_prefix = out_prefix,
verbose = FALSE
)
res$writtenExpected output paths:
paste0(out_prefix, "_snps_full.csv")
paste0(out_prefix, "_snps_hg38.bed")
paste0(out_prefix, "_snps_flank300.fa")The FASTA file is created only when the optional genome packages are available.
sessionInfo()
#> R version 4.4.3 (2025-02-28 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 22621)
#>
#> Matrix products: default
#>
#>
#> locale:
#> [1] LC_COLLATE=C LC_CTYPE=Arabic_Libya.utf8
#> [3] LC_MONETARY=Arabic_Libya.utf8 LC_NUMERIC=C
#> [5] LC_TIME=Arabic_Libya.utf8
#>
#> time zone: Africa/Tripoli
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> loaded via a namespace (and not attached):
#> [1] digest_0.6.39 R6_2.6.1 fastmap_1.2.0 xfun_0.56
#> [5] cachem_1.1.0 knitr_1.51 htmltools_0.5.9 rmarkdown_2.30
#> [9] lifecycle_1.0.5 cli_3.6.6 sass_0.4.10 jquerylib_0.1.4
#> [13] compiler_4.4.3 rstudioapi_0.18.0 tools_4.4.3 evaluate_1.0.5
#> [17] bslib_0.10.0 yaml_2.3.10 otel_0.2.0 jsonlite_2.0.0
#> [21] rlang_1.2.0