1 About this

This document represents academic research whose main theme is an analysis of the location and frequency of observation, as well as a species distribution modeling (based on environmental variables) of two american wildcats species, Jaguar (Panthera onca) and Pumas (Felis concolor or Puma concolor). This study is used a database of observations recorded from the iNaturalist species information repository (https://www.inaturalist.org/) and geographical information about biodiversity informatics from the American Museum of Natural History website (https://biodiversityinformatics.amnh.org/).

2 Libraries Used

library(dplyr)
library(tibble)
library(tidyr)
library(sp)
library(DT)
library(leaflet)
library(leaflet.extras)
library(raster)
library(rasterVis)
library(rgdal)
library(ggplot2)
library(plotly)
library(LaCroixColoR)
library(sf)
library(here)
library(proto)
library(highcharter)
library(rgeos)
library(velox)
library(usdm)
library(gtools)
library(corrplot)
library(Hmisc)
library(dismo)

3 Extraction, Transformation and Load Databases

3.1 Datasets

Steps Related with Load Datasets.

df <- read.csv(paste0(dir, "/Datasets/samples/wildcats_df.csv"), stringsAsFactors=F, sep=",", na.strings=c("","NA"), encoding="UTF-8")

colnames(df)
## [1] "X.U.FEFF.common_name" "scientific_name"      "latitude"            
## [4] "longitude"            "created_at"           "updated_at"          
## [7] "place_country_name"

Column Name Changes.

df <- df[,c(1:5,7)]

names(df)[1] <- "Nombre_Comun"
names(df)[2] <- "Nombre_Cientifico"
names(df)[3] <- "Latitud"
names(df)[4] <- "Longitud"
names(df)[5] <- "Fecha_Registro"
names(df)[6] <- "Pais"

colnames(df)
## [1] "Nombre_Comun"      "Nombre_Cientifico" "Latitud"          
## [4] "Longitud"          "Fecha_Registro"    "Pais"

3.1.1 Number of register in the datasets

Here, we show the data summary by species to know the quantity of these.

df%>%
  dplyr::group_by(Nombre_Comun)%>%
  dplyr::summarise(length(Nombre_Comun), .groups = 'drop')

Next, we procced to the correction of data in “Nombre_Comun” column in order to standardize the duplicate registers.

ndf <- df%>%
  mutate(Nombre_Comun=ifelse(Nombre_Comun=="Puma de América del Norte", "Puma",Nombre_Comun))%>%
  mutate(Nombre_Comun=ifelse(Nombre_Comun=="Puma de América del Sur", "Puma",Nombre_Comun))

and after that we created a bar chart with the data by species to know the result.

count_sp <- ndf%>%
            group_by(Nombre_Comun)%>%
            summarise(Count=length(Nombre_Comun), .groups = 'drop')

plot_ly(data=count_sp, x = ~Nombre_Comun, y = ~Count, type = 'bar', text = ~Count, textposition = 'auto',
        marker = list(color = c('#ffa500','#13ED3F'),
                      line = list(color = c('rgba(204,204,204,1)', 'rgba(204,204,204,1)'),
                                  width = 1.5))) %>%
  layout(title = "Count of Register Species in the Inicial Dataframe",
         xaxis = list(title = "Species"),
         yaxis = list(title = "Count"))

3.1.2 Map of Species Distribution to World Scale

cf <- colorFactor(c("#ffa500","#13ED3F"), domain=ndf$Nombre_Comun)

m <- leaflet(df) %>% addTiles(group = "OSM (default)") %>%
  addProviderTiles(providers$Stamen.Toner, group = "Toner") %>%
  addProviderTiles(providers$Stamen.TonerLite, group = "Toner Lite") %>%
  setView(-63.54105072, -8.88123188, zoom = 2) %>% 
  addCircles(~Longitud, ~Latitud, popup=ndf$Nombre_Comun, group = ndf$Nombre_Comun ,weight = 3, radius=4, 
              color=cf(ndf$Nombre_Comun), stroke = TRUE, fillOpacity = 0.5)  %>%
  addLegend("bottomright", colors= c("#ffa500","#13ED3F"), labels=c("Jaguar","Puma"), title="Species") %>% 
  addLayersControl(
    baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
    overlayGroups = ndf$Nombre_Comun,
    options = layersControlOptions(collapsed = TRUE))%>%
  suspendScroll()

m

3.2 Datasets about Environmental Variables in the Study Area

The study area corresponds to the territories of Central and South America as well as the Caribbean.

Some of the coverages used to know the habitat of the species are climatic variables that are derived from data provided by the Intergovernmental Panel on Climate Change and that were produced using interpolation based on readings taken from weather stations around the world from 1961 to 1990.

The coverages are:

*cld6190_ann : cloud cover, annual

*dtr6190_ann : daytime temperature range, annual

*ecoreg : ecoregions

*frs6190_ann : frequency of frosts, annual

*h_dem : digital elevation model

*pre6190_ann : precipitation, annual

*pre6190_I1 : precipitation, january

*pre6190_I4 : precipitation, april

*pre6190_I7 : precipitation, july

*pre6190_I10 : precipitation, october

*tmn6190_ann : mean temperature, annual

*tmp6190_ann : minimum temperature, annual

*tmx6190_ann : maximum temperature, annual

*vap6190_ann : vapor pressure, annual

3.2.1 Environmental Data

r_files <- list.files(paste0(dir, "Datasets/coverages"),
                       full.names = T)

3.2.1.1 Environmental variables visualization

Next, the rasters that represent the environmental variables that will be used in the distribution modeling are graphed.

st <- raster::stack(r_files) 

raster::plot(st)

3.2.1.2 Sample of one of the Environmental Variables: Ecoregions

The exemplary variable, ecoregion, has the following categories:

1: Mediterranean Forests, Woodlands & Scrub 2: Tropical & Subtropical Dry Broadleaf Forests 3: Temperate Broadleaf & Mixed Forests 4: Tropical & Subtropical Coniferous Forests 5: Temperate Grasslands, Savannas & Shrublands 6: Mangroves 7: ( ) 8: Montane Grasslands & Shrublands 9: Tropical & Subtropical Grasslands, Savannas & Shrublands 10: Tropical & Subtropical Moist Broadleaf Forests 11: Flooded Grasslands & Savannas 12: Montane Grasslands & Shrublands 13: Magellanic subpolar forests 14: Rock and Ice 15: ( )

ecoreg = raster(r_files[3])
raster::plot(ecoreg, main="Ecoregions of Study Area")

raster::hist(ecoreg, main="Histogram of Ecoregions in Study Area", col="green")

3.3 Delimitation of the Study Area

Next, a vector-type spatial file corresponding to the countries that make up the study area is loaded.

area_estudio <- sf::st_read(dsn= paste0(dir,"/Datasets/samples/map.geojson"), layer="map", quiet=TRUE)

3.3.0.1 Map of Countries in the Study Area

The following map shows the countries that make up the study area.

map.area <- plot_geo(area_estudio, name=~pais, color = I("#89C5DA"))%>% hide_legend()%>%
  layout(title = "Countries of Study Area")

map.area

3.3.0.2 Mapping the Recorded Locations of Species in the Study Area

In advance, a selection is made from the database of the species with respect to the study area

coords <- data.frame(x=ndf$Longitud, y=ndf$Latitud)
prj <- CRS("+proj=longlat +datum=WGS84 +no_defs")
points <- SpatialPointsDataFrame(coords, ndf, proj4string = prj)
wildcats <- crop(points,ecoreg)
wildcats <- data.frame(wildcats)
wildcats <- wildcats[,1:6]

and then the observation points of the species to be studied are displayed on a dynamic map.

cf2 <- colorFactor(c("#ffa500","#13ED3F"), domain=wildcats$Nombre_Comun)

m2 <- leaflet(wildcats) %>% addTiles(group = "OSM (default)") %>%
  addProviderTiles(providers$Stamen.Toner, group = "Toner") %>%
  addProviderTiles(providers$Stamen.TonerLite, group = "Toner Lite") %>%
  setView(-63.54105072, -8.88123188, zoom = 2) %>% 
  addCircles(~Longitud, ~Latitud, popup=wildcats$Nombre_Comun, group = wildcats$Nombre_Comun ,weight = 3, radius=4, 
              color=cf2(wildcats$Nombre_Comun), stroke = TRUE, fillOpacity = 0.5)  %>%
  addLegend("bottomright", colors= c("#ffa500","#13ED3F"), labels=c("Jaguar","Puma"), title="Species") %>% 
  addLayersControl(
    baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
    overlayGroups = wildcats$Nombre_Comun,
    options = layersControlOptions(collapsed = TRUE))%>%
  suspendScroll()
m2

3.3.0.3 Database Visualization

First, an inclusion of new variables related to the issue of temporality is carried out

wildcats$Hora_Registro <- strftime(wildcats$Fecha_Registro, format="%H") #col7
wildcats$Fecha_Registro <- as.POSIXct(wildcats$Fecha_Registro, tz="", format="%Y-%m-%d") #col5
wildcats$Y.M.R <- strftime(wildcats$Fecha_Registro, format="%Y-%m") #col8
wildcats$M.R <- format(wildcats$Fecha_Registro, "%m") #col9
wildcats$Y.R <- format(wildcats$Fecha_Registro, "%Y") #col10

wildcats <- wildcats[, colnames(wildcats)[c(1:4,6,5,8,10,9,7)]] #Reorganizacion de las columnas

and then proceed to view the final database in a dynamic table.

datatable(wildcats, extensions = c('Buttons','FixedColumns', 'RowReorder'),
  filter = list(position = 'top', clear = FALSE),
  options = list(dom = 'Bfrtip',
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print'), dom = 't',
    scrollX = TRUE,
    fixedColumns = TRUE,
    rowReorder = TRUE, order = list(c(0 , 'asc'))))

4 Results

4.1 Spatio-temporal Statistics of the Studied Species

In this first section, analyzes are carried out on spatio-temporal statistics regarding the pumas and jaguars of a sector of the American continent.

4.1.1 Number of Species According to Location and Seasonality

The following graphs show the number of species by country

4.1.1.1 Species by Country and Month of Observation

In this first graph of this subsection, the amount per country is shown in relation to the month in which the observations occurred.

p_m <- plot_ly(data=wildcats, x= ~Pais, split= ~Nombre_Comun, color= I(cf2(wildcats$Nombre_Comun)), 
        frame=wildcats$M.R, type = "histogram", alpha = 0.6)%>%
        layout(barmode = "overlay", title = "Species by Country and Month of Observation")
p_m%>%animation_opts(frame= 2000, redraw = TRUE)%>%
        animation_slider(currentvalue = list(prefix = "Month ", font = list(color="red")))

4.1.1.2 Number of Jaguars per Country

In order to represent the number of species per country, a pre-selection of the data per species is carried out.

c.jaguars <- wildcats%>%
              dplyr::filter(Nombre_Comun == "Jaguar")%>%
              dplyr::group_by(Pais)%>%
              dplyr::summarise(Cant=n(), .groups = 'drop')
c.pumas <- wildcats%>%
              dplyr::filter(Nombre_Comun == "Puma")%>%
              dplyr::group_by(Pais)%>%
              dplyr::summarise(Cant=n(), .groups = 'drop')

Next, the jaguar information is graphed.

data(worldgeojson, package = "highcharter")

highchart() %>% 
  hc_title(text = "Number of Jaguars") %>% 
  hc_subtitle(text = "per Country") %>% 
  hc_add_series_map(map = worldgeojson, df = c.jaguars, name = "number of individuals",
                    value = "Cant", joinBy = c("name", "Pais"), allAreas=FALSE, animation=TRUE)%>% 
  hc_colorAxis(min= 1,
            type='logarithmic',
            stops = color_stops())%>% 
  hc_tooltip(useHTML = TRUE, headerFormat = "",
             pointFormat = "Country: {point.name}/ Number of Species: {point.Cant}")%>%
  hc_legend(layout= 'horizontal',
            verticalAlign= 'top')%>%
  hc_mapNavigation(enabled = TRUE) 
## Warning: `mutate_()` was deprecated in dplyr 0.7.0.
## Please use `mutate()` instead.
## See vignette('programming') for more help

4.1.1.3 Number of Pumas per Country

data(worldgeojson, package = "highcharter")

highchart() %>% 
  hc_title(text = "Number of Pumas") %>% 
  hc_subtitle(text = "per Country") %>% 
  hc_add_series_map(map = worldgeojson, df = c.pumas, name = "number of individuals",
                    value = "Cant", joinBy = c("name", "Pais"), allAreas= FALSE, animation=TRUE)%>% 
  hc_colorAxis(min= 1,
            type='logarithmic',
            stops = color_stops())%>% 
  hc_tooltip(useHTML = TRUE, headerFormat = "",
             pointFormat = "Country: {point.name}/ Number of Species: {point.Cant}")%>%
  hc_legend(layout= 'horizontal',
            verticalAlign= 'top')%>%
  hc_mapNavigation(enabled = TRUE) 

4.1.2 Number of Individuals Observed on a Timeline

In this subsection, quantities of the species observed are analyzed in relation to the temporality of the observations.

4.1.2.1 Species by Month and Year

As was done in a previous point, in order to represent the number of species according to the observation period, a preselection of the data is carried out by species.

c.jaguars <- wildcats%>%
              dplyr::filter(Nombre_Comun == "Jaguar")%>%
              dplyr::group_by(Fecha_Registro)%>%
              dplyr::summarise(Cant=n())
c.pumas <- wildcats%>%
              dplyr::filter(Nombre_Comun == "Puma")%>%
              dplyr::group_by(Fecha_Registro)%>%
              dplyr::summarise(Cant=n())

ct.jaguars <- ts(data=c.jaguars, start = c(2011,10), end = c(2019,12), frequency = 12)
ct.pumas <- ts(data=c.pumas, start = c(2011,10), end = c(2019,12), frequency = 12)

Next, a time series is plotted on a line of quantities of observed species.

highchart(type = "stock") %>% 
  hc_title(text = "Number of Species") %>% 
  hc_subtitle(text = "on the Observation Timeline") %>% 
  hc_add_series(ct.jaguars[,2], type = "line", pointInterval= 30*24*3600*1000, name="Number of Jaguars Observed to Date", color = "#ffa500")%>%
  hc_add_series(ct.pumas[,2], type = "line", pointInterval= 30*24*3600*1000, name="Number of Pumas Observed to Date", color = "#13ED3F")%>%
  hc_xAxis(title= list(text='Date'), type='datetime')%>%
  hc_yAxis(title= list(text='Number of Species Observed'), opposite = FALSE) %>% 
  hc_credits(enabled = TRUE, text = "By: Jose Francisco Nuñez O", position=list(align = "center"),
             href = "http://geoimaginarte.github.io/mysite") %>% 
  hc_legend(enabled=TRUE)

4.1.2.2 Decomposition and Behavior of Observations of Individuals on the Timeline

Likewise, a decomposition of the time series is carried out to observe the behavior of the time line according to the observed, the trend, the seasonal and the random.

decomp_jaguars <- decompose(ct.jaguars[,2])
decomp_pumas <- decompose(ct.pumas[,2])
plot(decomp_jaguars)
par(adj = 1)
title(sub = "Jaguars", cex.sub = 1.5, font.sub = 3, col.sub = "#ffa500")

plot(decomp_pumas, sub="Pumas")
par(adj = 1)
title(sub = "Pumas", cex.sub = 1.5, font.sub = 3, col.sub = "#13ED3F")

4.1.3 Species Distribution Pattern

In this section, the location of the observed species is analyzed in terms of distribution, density per occupied space (grouping), in addition to location in relation with the ecoregion.

4.1.3.1 Spatial Distribution of Species by Month

In these first maps the corresponding location of the individuals of the different species is visualized, as well as the location dynamics in relation to the month in which specie was observed.

#colour = cf2(Nombre_Comun)

e.m <- ggplot(wildcats, aes(x=Longitud, y=Latitud, frame=M.R, text = paste0("Country:", Pais))) +
  borders("world", xlim = c(-120, -30), ylim = c(-50, 25))+
  geom_point(data=filter(wildcats), shape = 21, colour= cf2(wildcats$Nombre_Comun) ,size = 1)+
  facet_grid(cols = vars(Nombre_Comun)) +
  labs(title = "Spatial Distribution of Species by Month")

ggplotly(e.m)%>% 
  animation_opts(
    2000, easing = "elastic", redraw = FALSE
  )%>%
  animation_slider(
    currentvalue = list(prefix = "Month ", font = list(color="red"))
  )

4.1.3.2 Geometric Distribution Pattern

Similar the previous maps of this Subpart, the location of the individuals of the different species is visualized, referring to the pattern of elliptic spatial distribution that such individuals follow as location dynamics according to the month in which specie was observed.

pd <- qplot(data = wildcats, x=Longitud, y=Latitud) +
  stat_ellipse(geom = "polygon", alpha = 1/2, aes(fill = Nombre_Comun, frame=M.R))+
  borders("world", xlim = c(-120, -30), ylim = c(-50, 25))+
  geom_point(data=filter(wildcats), aes(frame=M.R), shape = 21, fill = cf2(wildcats$Nombre_Comun), size = 2)+
  facet_grid(cols = vars(Nombre_Comun)) +
  labs(title = "Species Distribution Pattern")

ggplotly(pd)%>% 
  animation_opts(
    2000, easing = "elastic", redraw = FALSE
  )%>%
  animation_slider(
    currentvalue = list(prefix = "Month ", font = list(color="red"))
  )

4.1.3.3 Density of Each Species

In this pair of maps on each species studied, the level of density is displayed that shows the location of the individuals in the different months in which specie was observed.

dn <- ggplot(wildcats, aes(x=Longitud, y=Latitud)) +
  borders("world", regions=".", xlim = c(-120, -30), ylim = c(-50, 25))+
  stat_density2d(aes(fill = stat(level), frame=wildcats$M.R), geom="polygon", alpha=0.3)  +
  geom_point(data=filter(wildcats), aes(frame=M.R), shape = 21, fill = cf2(wildcats$Nombre_Comun), size = 2, alpha=0.3)+
  facet_grid(cols = vars(Nombre_Comun)) +
  scale_fill_viridis_c(option = "viridis") +
  theme(legend.position = "magma") +
  labs(title = "Density of Each Species")

ggplotly(dn)%>% 
  animation_opts(
    4000, easing = "elastic", redraw = FALSE
  )%>%
  animation_slider(
    currentvalue = list(prefix = "Month ", font = list(color="red"))
  )

4.1.3.4 Species According to Ecoregion

The categories of Ecoregions/Biomes:

1: Mediterranean Forests, Woodlands & Scrub 2: Tropical & Subtropical Dry Broadleaf Forests 3: Temperate Broadleaf & Mixed Forests 4: Tropical & Subtropical Coniferous Forests 5: Temperate Grasslands, Savannas & Shrublands 6: Mangroves 7: ( ) 8: Montane Grasslands & Shrublands 9: Tropical & Subtropical Grasslands, Savannas & Shrublands 10: Tropical & Subtropical Moist Broadleaf Forests 11: Flooded Grasslands & Savannas 12: Montane Grasslands & Shrublands 13: Magellanic subpolar forests 14: Rock and Ice 15: ( )

At this point, three different graphs are shown in which species area represented: 1) the spatial location of the individuals according to the ecoregion (in orange: jaguars, green: pumas), 2) the spatial location of the individuals by species differentiated by color according to ecoregion, and 3) the number of individuals per species according to ecoregion.

ecoreg_spec <- raster::extract(ecoreg, wildcats[,4:3]) %>% 
  cbind(wildcats, .) %>% 
  as.data.frame()

ecoreg_spec <- na.omit(ecoreg_spec)

ecoreg_spec$Ecoreg <- as.character(ecoreg_spec$.)

r_eco <- plot(ecoreg)+
  points(x=wildcats$Longitud, y=wildcats$Latitud, pch=20, col= cf2(wildcats$Nombre_Comun))

ggplot(ecoreg_spec, aes(x=Longitud, y=Latitud, color = ecoreg_spec$Ecoreg)) +
  borders("world", xlim = c(-120, -30), ylim = c(-50, 25))+
  geom_point(data=filter(ecoreg_spec), shape = 21, size = 1)+
  scale_colour_manual(values = lacroix_palette("PassionFruit", n=14))+
  facet_grid(cols = vars(Nombre_Comun)) +
  labs(title = "Species According to Ecoregion")

ggplot(ecoreg_spec, aes(y=Ecoreg, x=Nombre_Comun, colour = Ecoreg)) +
  geom_count(alpha=0.5) +
  scale_colour_manual(values = lacroix_palette("PassionFruit", n=14))+
  labs(title = "Number of Species According to Ecoregion",
       x = "Specie",
       y = "Ecoregions",
       size = "")

4.2 Correlation Analysis with Environmental Variables

This analysis is carried out in order to evaluate which are the most suitable variables for spatial modeling.

To achieve this analysis, it proceed to unite the observation points of the species together with the raster of the environmental variables. This process excludes ecoregion, which is a categorical variable. The wildcats database only works with a few variables that have to do with the species and its location. Once the result of the union is obtained, it proceed to eliminate the null values, which would be those pixels of the raster variables that do not achieve a point of spatial union with the individuals of the species.

#Relevant topics are extracted from environmental variables
select_r = dropLayer(st,3)
jp_swd <- raster::extract(select_r, wildcats[,4:3]) %>% 
  cbind(wildcats, .) %>% 
  as.data.frame()

#Rows with NULL values are deleted
jp_swd <- na.omit(jp_swd) 

4.2.1 Plotting the Correlation of Variables

With the previous result, it proceed to apply a multiple correlation analysis and similarly graph the result of said analysis. In the graphics we see a representation with circles that have different sizes and colors. The color scale is based on values that range from 1 to -1, where -1 is a multiproportional inverse correlation, that is, while one variable increases the other decreases, in turn, values close to 1 show a direct multiproportional correlation, that is, there is a strong correlation in which case while one variable increases the other variable also does, and for its part, values close to 0 have a rather weak correlation, that is, there is no association between the two variables. Moreover, with the size of the circle is possible to visualize the magnitude of the correlation, osea, between the larger the circle closest to -1 or 1.

mt <- cor(jp_swd[,11:23])
corrplot(mt, method = 'circle', type = 'upper')

4.2.2 Environmental Variables that are Least Interrelated

To perform this analysis, the linear regression VIF is used, which is the inflation factor of the variance. This quantifies the intensity of multicolinearity in a normal least squares regression analysis. It provides an index that measures the extent to which the variance (the square of the estimated standard deviation) of an estimated regression coefficient is increased because of collinearity. It is considering collinearity in the context of a statistical model is used to estimate the relationship between a response variable and a set of predictor variables (“independent” or “explanatory”), that is, describes the situation in which two or more predictor variables in a statistical model are linearly related. Given the case that VIF factors greater than 10 involve serious multicolinearity problems, the argument th, which refers to the correlation threshold, is set as 10, in this way it will be known which variables have correlation problems and these will be discarded for distribution modeling.

# Analisis VIF
vif.res <- vif(x = jp_swd[,11:ncol(jp_swd)])
vif.step <- vifstep(x = jp_swd[,11:ncol(jp_swd)], th = 10)
vrs <- vif.step@results$Variables %>% as.character()
vif.step
## 4 variables from the 13 input variables have collinearity problem: 
##  
## pre6190_ann tmp6190_ann tmn6190_ann tmx6190_ann 
## 
## After excluding the collinear variables, the linear correlation coefficients ranges between: 
## min correlation ( pre6190_l10 ~ pre6190_l1 ):  -0.009063609 
## max correlation ( vap6190_ann ~ frs6190_ann ):  -0.8508242 
## 
## ---------- VIFs of the remained variables -------- 
##     Variables      VIF
## 1 cld6190_ann 3.025670
## 2 dtr6190_ann 1.678499
## 3 frs6190_ann 4.102199
## 4       h_dem 2.447628
## 5  pre6190_l1 1.755003
## 6 pre6190_l10 5.388244
## 7  pre6190_l4 4.074227
## 8  pre6190_l7 2.896292
## 9 vap6190_ann 5.176151
vrs
## [1] "cld6190_ann" "dtr6190_ann" "frs6190_ann" "h_dem"       "pre6190_l1" 
## [6] "pre6190_l10" "pre6190_l4"  "pre6190_l7"  "vap6190_ann"

4.2.3 Database and Variables with Multicollinearity

From the previous step, it was possible to know which variables have collinearity problems while it was known which variables are better related to each other, which are printed in the following table together with the individuals of the species and their respective spatial locations.

jp_fnl <- jp_swd %>% 
  as_tibble %>%  
  dplyr::select(Nombre_Cientifico:Longitud, vrs)
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(vrs)` instead of `vrs` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
head(jp_fnl)

4.2.4 Correlation between Environmental Variables with Multicollinearity

Regarding the result of the correlation, the relationships between variables with multicollinearity are plotted:

*cld6190_ann : cloud cover, annual

*dtr6190_ann : daytime temperature range, annual

*frs6190_ann : frost frequency, annual

*h_dem : Digital Elevation Model

*pre6190_I1 : precipitation, january

*pre6190_I4 : precipitation, april

*pre6190_I7 : precipitation, july

*pre6190_I10 : precipitation, october

*vap6190_ann : vapor pressure, annual

r_corr= st[[c(1,2,4,5,7,8,9,10,14)]]
raster::pairs(r_corr)

4.3 Species Distribution Modeling

The first step in this modeling process is to extract the predictor values at the point locations. The predictors refer to the rasters that make up the environmental variables that will be used in the modeling, which are those that present multicollinearity.

In addition, 500 random points are created based on the predictors, which are used to evaluate the presence (1) and absence (0) of individuals of the studied species in relation to the environmental variables (predictors). At this point, tables are printed to view the presence and absence results, as well as a general summary of this analysis.

pvals <- raster::extract(r_corr, wildcats[,4:3]) %>% 
                          cbind(wildcats, .) %>% 
                          as.data.frame()
presvals <- pvals[,11:19]
predictors <- r_corr

backgr <- randomPoints(predictors, 500)
absvals <- raster::extract(predictors, backgr)
pb <- c(rep(1, nrow(presvals)), rep(0, nrow(absvals)))
sdmdata <- data.frame(cbind(pb, rbind(presvals, absvals)))
head(sdmdata)
tail(sdmdata)
summary(sdmdata)
##        pb          cld6190_ann     dtr6190_ann   frs6190_ann    
##  Min.   :0.0000   Min.   :32.00   Min.   : 54   Min.   :  0.00  
##  1st Qu.:0.0000   1st Qu.:58.00   1st Qu.: 99   1st Qu.:  0.00  
##  Median :1.0000   Median :64.00   Median :115   Median :  1.00  
##  Mean   :0.6736   Mean   :63.44   Mean   :112   Mean   : 12.93  
##  3rd Qu.:1.0000   3rd Qu.:68.00   3rd Qu.:123   3rd Qu.:  3.00  
##  Max.   :1.0000   Max.   :84.00   Max.   :171   Max.   :206.00  
##                   NA's   :9       NA's   :9     NA's   :9       
##      h_dem          pre6190_l1      pre6190_l10       pre6190_l4    
##  Min.   :   0.0   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.: 106.8   1st Qu.: 22.00   1st Qu.: 27.00   1st Qu.: 20.50  
##  Median : 195.0   Median : 51.00   Median : 41.00   Median : 34.00  
##  Mean   : 496.0   Mean   : 52.25   Mean   : 53.66   Mean   : 44.69  
##  3rd Qu.: 529.8   3rd Qu.: 75.00   3rd Qu.: 79.00   3rd Qu.: 62.00  
##  Max.   :4920.0   Max.   :173.00   Max.   :210.00   Max.   :189.00  
##  NA's   :52       NA's   :9        NA's   :9        NA's   :9       
##    pre6190_l7      vap6190_ann   
##  Min.   :  0.00   Min.   :  9.0  
##  1st Qu.:  6.00   1st Qu.:202.5  
##  Median : 28.00   Median :256.0  
##  Mean   : 42.95   Mean   :226.2  
##  3rd Qu.: 72.00   3rd Qu.:266.0  
##  Max.   :204.00   Max.   :307.0  
##  NA's   :9        NA's   :9

4.3.1 Fit Model

In this process, the methods take a formula that identifies the dependent and independent variables, accompanied by a data frame containing these variables.

For this, the function (glm) is used, which allows to fit generalized linear models, specified by a symbolic description of the linear predictor and a description of the error distribution.

In a first model, it is use the environmental variables with multicolinearity:

*cld6190_ann : cloud cover, annual

*dtr6190_ann : daytime temperature range, annual

*frs6190_ann : frost frequency, annual

*h_dem : Digital Elevation Model

*pre6190_I1 : precipitation, january

*pre6190_I4 : precipitation, april

*pre6190_I7 : precipitation, july

*pre6190_I10 : precipitation, october

*vap6190_ann : vapor pressure, annual

m1 <- glm ( pb ~ cld6190_ann +  dtr6190_ann +   frs6190_ann + h_dem + pre6190_l1 + pre6190_l10 + pre6190_l4 + pre6190_l7 + vap6190_ann, data = sdmdata )
class ( m1 )
## [1] "glm" "lm"
summary ( m1 )
## 
## Call:
## glm(formula = pb ~ cld6190_ann + dtr6190_ann + frs6190_ann + 
##     h_dem + pre6190_l1 + pre6190_l10 + pre6190_l4 + pre6190_l7 + 
##     vap6190_ann, data = sdmdata)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3055  -0.4267   0.1940   0.2883   1.0441  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.257e-01  1.595e-01  -5.176 2.58e-07 ***
## cld6190_ann  9.488e-03  1.864e-03   5.091 4.03e-07 ***
## dtr6190_ann  4.550e-03  7.590e-04   5.995 2.55e-09 ***
## frs6190_ann  2.016e-03  6.707e-04   3.006  0.00270 ** 
## h_dem       -6.598e-05  2.103e-05  -3.138  0.00174 ** 
## pre6190_l1   1.413e-03  4.492e-04   3.146  0.00169 ** 
## pre6190_l10  5.114e-03  5.104e-04  10.020  < 2e-16 ***
## pre6190_l4  -6.039e-03  5.832e-04 -10.354  < 2e-16 ***
## pre6190_l7   7.965e-04  4.239e-04   1.879  0.06046 .  
## vap6190_ann  1.257e-03  3.955e-04   3.177  0.00152 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.1811072)
## 
##     Null deviance: 322.70  on 1477  degrees of freedom
## Residual deviance: 265.87  on 1468  degrees of freedom
##   (54 observations deleted due to missingness)
## AIC: 1680.9
## 
## Number of Fisher Scoring iterations: 2
m1
## 
## Call:  glm(formula = pb ~ cld6190_ann + dtr6190_ann + frs6190_ann + 
##     h_dem + pre6190_l1 + pre6190_l10 + pre6190_l4 + pre6190_l7 + 
##     vap6190_ann, data = sdmdata)
## 
## Coefficients:
## (Intercept)  cld6190_ann  dtr6190_ann  frs6190_ann        h_dem   pre6190_l1  
##  -8.257e-01    9.488e-03    4.550e-03    2.016e-03   -6.598e-05    1.413e-03  
## pre6190_l10   pre6190_l4   pre6190_l7  vap6190_ann  
##   5.114e-03   -6.039e-03    7.965e-04    1.257e-03  
## 
## Degrees of Freedom: 1477 Total (i.e. Null);  1468 Residual
##   (54 observations deleted due to missingness)
## Null Deviance:       322.7 
## Residual Deviance: 265.9     AIC: 1681

4.3.2 Climate Envelope Modeling

The BIOCLIM algorithm calculates the similarity of a location by comparing the values of environmental variables at any location with a percentage distribution of the values at known locations of occurrence. The closer to the 50th percentile (the median), the more suitable the location. The tails of the distribution are not distinguished, that is, the 10th percentile is treated as equivalent to the 90th percentile.

For this case only presence data is used, then it is use ‘presvals’ instead of ‘sdmdata’.

bc <- bioclim ( presvals [, c ( 'cld6190_ann',  'dtr6190_ann',  'frs6190_ann', 'h_dem', 'pre6190_l1', 'pre6190_l10', 'pre6190_l4', 'pre6190_l7', 'vap6190_ann' )])
class ( bc )
## [1] "Bioclim"
## attr(,"package")
## [1] "dismo"
bc
## class    : Bioclim 
## 
## variables: cld6190_ann dtr6190_ann frs6190_ann h_dem pre6190_l1 pre6190_l10 pre6190_l4 pre6190_l7 vap6190_ann 
## 
## 
## presence points: 1002 
##    cld6190_ann dtr6190_ann frs6190_ann h_dem pre6190_l1 pre6190_l10 pre6190_l4
## 1           66         118           0   107         72          30         34
## 2           57          99           0    31        113         110         88
## 3           60         121           1   250         14          45         14
## 5           63         123          10   757         86          43         32
## 7           56          84           1    51         48          98         25
## 9           60         119           1   275         15          46         14
## 10          58          97           0     2        121         111         87
## 12          60          95           0     2         23          58         12
## 14          66         118           0   110         72          30         34
## 15          53         116           1   200         47         130         43
##    pre6190_l7 vap6190_ann
## 1           6         260
## 2         150         269
## 3          42         266
## 5           6         214
## 7         134         268
## 9          43         266
## 10        165         278
## 12         47         281
## 14          6         260
## 15        171         238
##   (... ...  ...)
pairs ( bc )

4.3.3 Model Prediction

All these “model” objects, regardless of their exact class, can be used with the “predict” function in order to make predictions for any combination of independent variable values.

Then, predictions are made with the glm model object ‘m1’ and for the bioclim model ‘bc’, for three records with values for the variables: cld6190_ann, dtr6190_ann, frs6190_ann, h_dem, pre6190_l1, pre6190_l10, pre6190_l4, pre6190np_l4, pre6190np0_l4 . The latter are shown in a first dataframe.

In the results of the prediction there are categories where 1 indicates less frequency to find species, 2 average frequency and 3 where it would be more frequent to find species.

cld6190_ann = c(40, 60, 80)
dtr6190_ann = c(40, 100, 140)
frs6190_ann = c(50, 125, 200)
h_dem = c(1000, 3000, 6000)
pre6190_l1 = c(50, 100, 150)
pre6190_l10 = c(50, 100, 150)
pre6190_l4 = c(50, 100, 150)
pre6190_l7 = c(50, 100, 150)
vap6190_ann = c(50, 100, 250)

pdt = data.frame ( cbind (cld6190_ann, dtr6190_ann, frs6190_ann,  h_dem, pre6190_l1, pre6190_l10, pre6190_l4, pre6190_l7, vap6190_ann))
pdt
predict (m1 , pdt) 
##          1          2          3 
## -0.1022662  0.5068132  1.0845869
predict (bc , pdt)

4.3.3.1 Response Graphs for Each Variable

Making such predictions for some environments can be very useful to explore and understand the predictions of the model, therefore at this point, the “response” function is used to create response graphs for each variable, with the other variables at their mean value, based on the Bioclim Model.

response (bc)

4.3.3.2 Suitability Score Map

Finally, the results of the prediction are represented on a map to be able to visualize the possible distribution of species based on environmental variables and the presence of observed individuals of the species studied.

In the color spectrum of the map, the highest values are observed, such as those where it is suitable to find these species, and on the contrary in the spectrum, where the presence of these is less suitable.

p <- predict (predictors,m1)
plot (p)