4. Obtaining a Presence-Absence Matrix

One way of organizing biodiversity data is by using presence-absence matrices (PAMs), where a one represents the presence of species j in cell i, and a zero indicates absence. From a PAM, we can estimate a variety of metrics related to biodiversity patterns, including richness, range size, and composition. For a comprehensive list of biodiversity metrics, refer to the PAM_indices function in the biosurvey package.

Loading data

Before you begin, use the load_florabr function to load the data. For more detailed information on obtaining and loading the data, please refer to 1. Getting started with florabr

library(florabr)
library(terra)
#Folder where you stored the data with the function get_florabr()
#Load data
bf <- load_florabr(data_dir = my_dir,
                   data_version = "Latest_available",
                   type = "short") #short version
#> Loading version 393.389

Getting a presence-absence matrix

The get_pam() function facilitates the utilization of species distribution information in Brazil to generate a PAM. Each site represents a Biome, a State, a vegetation type, or any combination of these. In addition to the PAM, the function also provides a summary and a SpatVector containing the number of species in each site.

As an example, let’s obtain a PAM consisting of all tree species native and endemic to the Amazon biome:

#Load Brazilian Flora data
data("bf_data")
#Select endemic and native species of trees with occurrence only in Amazon
am_trees <- select_species(data = bf_data,
                          include_subspecies = FALSE,
                          include_variety = FALSE,
                          Kingdom = "Plantae",
                          Group = "All", Subgroup = "All",
                          Family = "All", Genus = "All",
                          LifeForm = "Tree", filter_LifeForm = "only",
                          Habitat = "All", filter_Habitat = "in",
                          Biome = "Amazon",
                          filter_Biome = "only",
                          State = "All", filter_State = "and",
                          VegetationType = "All",
                          filter_Vegetation = "in",
                          Endemism = "Endemic", Origin = "Native",
                          TaxonomicStatus = "Accepted",
                          NomenclaturalStatus = "All")
#Get presence-absence matrix
pam_am <- get_pam(data = am_trees, by_Biome = TRUE, by_State = TRUE,
                 by_vegetationType = FALSE, remove_empty_sites = TRUE,
                 return_richness_summary = TRUE,
                 return_spatial_richness = TRUE,
                 return_plot = TRUE)
#Visualize (as tibble) the PAM for the first 5 species and 7 sites
tibble::tibble(pam_am$PAM[1:7, 1:5])
# # A tibble: 7 × 5
#   Biome  States `Euxylophora paraensis` `Galipea congestiflora` `Hortia longifolia`
#   <fct>  <fct>                    <dbl>                   <dbl>               <dbl>
# 1 Amazon AM                           1                       0                   1
# 2 Amazon MA                           1                       1                   0
# 3 Amazon PA                           1                       1                   1
# 4 Amazon TO                           1                       1                   0
# 5 Amazon MT                           0                       0                   1
# 6 Amazon RR                           0                       0                   1
# 7 Amazon AC                           0                       0                   0

Since return_richness_summary is set to TRUE, the function also returns a data frame containing the number of species per site.

#Visualize (as tibble) the richness summary table
tibble::tibble(pam_am$Richness_summary[1:7,])
# # A tibble: 7 × 3
#   Biome  States Richness
#   <fct>  <fct>     <dbl>
# 1 Amazon AM          599
# 2 Amazon MA           34
# 3 Amazon PA          294
# 4 Amazon TO            7
# 5 Amazon MT           63
# 6 Amazon RR           45
# 7 Amazon AC          109

If return_spatial_richness is set to TRUE, the function will return a SpatVector containing the number of species per site. Additionally, when return_plot is also set to TRUE, the function returns a plot.