Obtaining PIN and Gene Sets Data

2024-05-04

Get PIN File

For retrieving the PIN file for an organism of your choice, you may use the function get_pin_file(). As of this version, the only source for PIN data is “BioGRID”.

By default, the function downloads the PIN data from BioGRID and processes it, saves it in a temporary file and returns the path:

## the default organism is "Homo_sapiens"
path_to_pin_file <- get_pin_file()

You can retrieve the PIN data for the organism of your choice, by setting the org argument:

## retrieving PIN data for "Gallus_gallus"
path_to_pin_file <- get_pin_file(org = "Gallus_gallus")

You may also supply a path/to/PIN/file to save the PIN file for later use (in this case, the path you supply will be returned):

## saving the "Homo_sapiens" PIN as "/path/to/PIN/file"
path_to_pin_file <- get_pin_file(path2pin = "/path/to/PIN/file")

You may also retrieve a specific version of BioGRID via setting the release argument:

## retrieving PIN data for "Mus_musculus" from BioGRID release 3.5.179
path_to_pin_file <- get_pin_file(
  org = "Mus_musculus",
  release = "3.5.179"
)

Get Gene Sets List

To retrieve organism-specific gene sets list, you may use the function get_gene_sets_list(). The available sources for gene sets are “KEGG”, “Reactome” and “MSigDB”. The function retrieves the gene sets data from the source and processes it into a list of two objects used by pathfindR for active-subnetwork-oriented enrichment analysis: 1. gene_sets A list containing the genes involved in each gene set 2. descriptions A named vector containing the descriptions for each gene set

By default, get_gene_sets_list() obtains “KEGG” gene sets for “hsa”.

KEGG Pathway Gene Sets

To obtain the gene sets list of the KEGG pathways for an organism of your choice, use the KEGG organism code for the selected organism. For a full list of all available organisms, see here.

## obtaining KEGG pathway gene sets for Rattus norvegicus (rno)
gsets_list <- get_gene_sets_list(org_code = "rno")

Reactome Pathway Gene Sets

For obtaining Reactome pathway gene sets, set the source argument to “Reactome”. This downloads the most current Reactome pathways in gmt format and processes it into the list object that pathfindR uses:

gsets_list <- get_gene_sets_list(source = "Reactome")

For Reactome, there is only one collection of pathway gene sets.

MSigDB Gene Sets

Using msigdbr, pathfindR can retrieve all MSigDB gene sets. For this, set the source argument to “MSigDB” and the collection argument to the desired MSigDB collection (one of H, C1, C2, C3, C4, C5, C6, C7):

gsets_list <- get_gene_sets_list(
  source = "MSigDB",
  collection = "C2"
)

The default organism for MSigDB is “Homo sapiens”, you may obtain the gene sets data for another organism by setting the species argument:

## obtaining C5 gene sets data for "Drosophila melanogaster"
gsets_list <- get_gene_sets_list(
  source = "MSigDB",
  species = "Drosophila melanogaster",
  collection = "C5"
)
## see msigdbr::msigdbr_show_species() for all available organisms
msigdbr::msigdbr_show_species()
#> Warning in msigdbr::msigdbr_show_species(): 'msigdbr::msigdbr_show_species' is deprecated.
#> Use 'msigdbr_species' instead.
#> See help("Deprecated")
#>  [1] "Anolis carolinensis"             "Bos taurus"                     
#>  [3] "Caenorhabditis elegans"          "Canis lupus familiaris"         
#>  [5] "Danio rerio"                     "Drosophila melanogaster"        
#>  [7] "Equus caballus"                  "Felis catus"                    
#>  [9] "Gallus gallus"                   "Homo sapiens"                   
#> [11] "Macaca mulatta"                  "Monodelphis domestica"          
#> [13] "Mus musculus"                    "Ornithorhynchus anatinus"       
#> [15] "Pan troglodytes"                 "Rattus norvegicus"              
#> [17] "Saccharomyces cerevisiae"        "Schizosaccharomyces pombe 972h-"
#> [19] "Sus scrofa"                      "Xenopus tropicalis"

You may also obtain the gene sets for a subcollection by setting the subcollection argument:

## obtaining C3 - MIR: microRNA targets
gsets_list <- get_gene_sets_list(
  source = "MSigDB",
  collection = "C3",
  subcollection = "MIR"
)