This document represents academic research whose main theme is an analysis of the location and frequency of observation, as well as a species distribution modeling (based on environmental variables) of two american wildcats species, Jaguar (Panthera onca) and Pumas (Felis concolor or Puma concolor). This study is used a database of observations recorded from the iNaturalist species information repository (https://www.inaturalist.org/) and geographical information about biodiversity informatics from the American Museum of Natural History website (https://biodiversityinformatics.amnh.org/).
library(dplyr)
library(tibble)
library(tidyr)
library(sp)
library(DT)
library(leaflet)
library(leaflet.extras)
library(raster)
library(rasterVis)
library(rgdal)
library(ggplot2)
library(plotly)
library(LaCroixColoR)
library(sf)
library(here)
library(proto)
library(highcharter)
library(rgeos)
library(velox)
library(usdm)
library(gtools)
library(corrplot)
library(Hmisc)
library(dismo)
Steps Related with Load Datasets.
df <- read.csv(paste0(dir, "/Datasets/samples/wildcats_df.csv"), stringsAsFactors=F, sep=",", na.strings=c("","NA"), encoding="UTF-8")
colnames(df)
## [1] "X.U.FEFF.common_name" "scientific_name" "latitude"
## [4] "longitude" "created_at" "updated_at"
## [7] "place_country_name"
Column Name Changes.
df <- df[,c(1:5,7)]
names(df)[1] <- "Nombre_Comun"
names(df)[2] <- "Nombre_Cientifico"
names(df)[3] <- "Latitud"
names(df)[4] <- "Longitud"
names(df)[5] <- "Fecha_Registro"
names(df)[6] <- "Pais"
colnames(df)
## [1] "Nombre_Comun" "Nombre_Cientifico" "Latitud"
## [4] "Longitud" "Fecha_Registro" "Pais"
Here, we show the data summary by species to know the quantity of these.
df%>%
dplyr::group_by(Nombre_Comun)%>%
dplyr::summarise(length(Nombre_Comun), .groups = 'drop')
Next, we procced to the correction of data in “Nombre_Comun” column in order to standardize the duplicate registers.
ndf <- df%>%
mutate(Nombre_Comun=ifelse(Nombre_Comun=="Puma de América del Norte", "Puma",Nombre_Comun))%>%
mutate(Nombre_Comun=ifelse(Nombre_Comun=="Puma de América del Sur", "Puma",Nombre_Comun))
and after that we created a bar chart with the data by species to know the result.
count_sp <- ndf%>%
group_by(Nombre_Comun)%>%
summarise(Count=length(Nombre_Comun), .groups = 'drop')
plot_ly(data=count_sp, x = ~Nombre_Comun, y = ~Count, type = 'bar', text = ~Count, textposition = 'auto',
marker = list(color = c('#ffa500','#13ED3F'),
line = list(color = c('rgba(204,204,204,1)', 'rgba(204,204,204,1)'),
width = 1.5))) %>%
layout(title = "Count of Register Species in the Inicial Dataframe",
xaxis = list(title = "Species"),
yaxis = list(title = "Count"))
cf <- colorFactor(c("#ffa500","#13ED3F"), domain=ndf$Nombre_Comun)
m <- leaflet(df) %>% addTiles(group = "OSM (default)") %>%
addProviderTiles(providers$Stamen.Toner, group = "Toner") %>%
addProviderTiles(providers$Stamen.TonerLite, group = "Toner Lite") %>%
setView(-63.54105072, -8.88123188, zoom = 2) %>%
addCircles(~Longitud, ~Latitud, popup=ndf$Nombre_Comun, group = ndf$Nombre_Comun ,weight = 3, radius=4,
color=cf(ndf$Nombre_Comun), stroke = TRUE, fillOpacity = 0.5) %>%
addLegend("bottomright", colors= c("#ffa500","#13ED3F"), labels=c("Jaguar","Puma"), title="Species") %>%
addLayersControl(
baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
overlayGroups = ndf$Nombre_Comun,
options = layersControlOptions(collapsed = TRUE))%>%
suspendScroll()
m
The study area corresponds to the territories of Central and South America as well as the Caribbean.
Some of the coverages used to know the habitat of the species are climatic variables that are derived from data provided by the Intergovernmental Panel on Climate Change and that were produced using interpolation based on readings taken from weather stations around the world from 1961 to 1990.
The coverages are:
*cld6190_ann : cloud cover, annual
*dtr6190_ann : daytime temperature range, annual
*ecoreg : ecoregions
*frs6190_ann : frequency of frosts, annual
*h_dem : digital elevation model
*pre6190_ann : precipitation, annual
*pre6190_I1 : precipitation, january
*pre6190_I4 : precipitation, april
*pre6190_I7 : precipitation, july
*pre6190_I10 : precipitation, october
*tmn6190_ann : mean temperature, annual
*tmp6190_ann : minimum temperature, annual
*tmx6190_ann : maximum temperature, annual
*vap6190_ann : vapor pressure, annual
r_files <- list.files(paste0(dir, "Datasets/coverages"),
full.names = T)
Next, the rasters that represent the environmental variables that will be used in the distribution modeling are graphed.
st <- raster::stack(r_files)
raster::plot(st)
The exemplary variable, ecoregion, has the following categories:
1: Mediterranean Forests, Woodlands & Scrub 2: Tropical & Subtropical Dry Broadleaf Forests 3: Temperate Broadleaf & Mixed Forests 4: Tropical & Subtropical Coniferous Forests 5: Temperate Grasslands, Savannas & Shrublands 6: Mangroves 7: ( ) 8: Montane Grasslands & Shrublands 9: Tropical & Subtropical Grasslands, Savannas & Shrublands 10: Tropical & Subtropical Moist Broadleaf Forests 11: Flooded Grasslands & Savannas 12: Montane Grasslands & Shrublands 13: Magellanic subpolar forests 14: Rock and Ice 15: ( )
ecoreg = raster(r_files[3])
raster::plot(ecoreg, main="Ecoregions of Study Area")
raster::hist(ecoreg, main="Histogram of Ecoregions in Study Area", col="green")
Next, a vector-type spatial file corresponding to the countries that make up the study area is loaded.
area_estudio <- sf::st_read(dsn= paste0(dir,"/Datasets/samples/map.geojson"), layer="map", quiet=TRUE)
The following map shows the countries that make up the study area.
map.area <- plot_geo(area_estudio, name=~pais, color = I("#89C5DA"))%>% hide_legend()%>%
layout(title = "Countries of Study Area")
map.area
In advance, a selection is made from the database of the species with respect to the study area
coords <- data.frame(x=ndf$Longitud, y=ndf$Latitud)
prj <- CRS("+proj=longlat +datum=WGS84 +no_defs")
points <- SpatialPointsDataFrame(coords, ndf, proj4string = prj)
wildcats <- crop(points,ecoreg)
wildcats <- data.frame(wildcats)
wildcats <- wildcats[,1:6]
and then the observation points of the species to be studied are displayed on a dynamic map.
cf2 <- colorFactor(c("#ffa500","#13ED3F"), domain=wildcats$Nombre_Comun)
m2 <- leaflet(wildcats) %>% addTiles(group = "OSM (default)") %>%
addProviderTiles(providers$Stamen.Toner, group = "Toner") %>%
addProviderTiles(providers$Stamen.TonerLite, group = "Toner Lite") %>%
setView(-63.54105072, -8.88123188, zoom = 2) %>%
addCircles(~Longitud, ~Latitud, popup=wildcats$Nombre_Comun, group = wildcats$Nombre_Comun ,weight = 3, radius=4,
color=cf2(wildcats$Nombre_Comun), stroke = TRUE, fillOpacity = 0.5) %>%
addLegend("bottomright", colors= c("#ffa500","#13ED3F"), labels=c("Jaguar","Puma"), title="Species") %>%
addLayersControl(
baseGroups = c("OSM (default)", "Toner", "Toner Lite"),
overlayGroups = wildcats$Nombre_Comun,
options = layersControlOptions(collapsed = TRUE))%>%
suspendScroll()
m2
First, an inclusion of new variables related to the issue of temporality is carried out
wildcats$Hora_Registro <- strftime(wildcats$Fecha_Registro, format="%H") #col7
wildcats$Fecha_Registro <- as.POSIXct(wildcats$Fecha_Registro, tz="", format="%Y-%m-%d") #col5
wildcats$Y.M.R <- strftime(wildcats$Fecha_Registro, format="%Y-%m") #col8
wildcats$M.R <- format(wildcats$Fecha_Registro, "%m") #col9
wildcats$Y.R <- format(wildcats$Fecha_Registro, "%Y") #col10
wildcats <- wildcats[, colnames(wildcats)[c(1:4,6,5,8,10,9,7)]] #Reorganizacion de las columnas
and then proceed to view the final database in a dynamic table.
datatable(wildcats, extensions = c('Buttons','FixedColumns', 'RowReorder'),
filter = list(position = 'top', clear = FALSE),
options = list(dom = 'Bfrtip',
buttons = c('copy', 'csv', 'excel', 'pdf', 'print'), dom = 't',
scrollX = TRUE,
fixedColumns = TRUE,
rowReorder = TRUE, order = list(c(0 , 'asc'))))
In this first section, analyzes are carried out on spatio-temporal statistics regarding the pumas and jaguars of a sector of the American continent.
The following graphs show the number of species by country
In this first graph of this subsection, the amount per country is shown in relation to the month in which the observations occurred.
p_m <- plot_ly(data=wildcats, x= ~Pais, split= ~Nombre_Comun, color= I(cf2(wildcats$Nombre_Comun)),
frame=wildcats$M.R, type = "histogram", alpha = 0.6)%>%
layout(barmode = "overlay", title = "Species by Country and Month of Observation")
p_m%>%animation_opts(frame= 2000, redraw = TRUE)%>%
animation_slider(currentvalue = list(prefix = "Month ", font = list(color="red")))
In order to represent the number of species per country, a pre-selection of the data per species is carried out.
c.jaguars <- wildcats%>%
dplyr::filter(Nombre_Comun == "Jaguar")%>%
dplyr::group_by(Pais)%>%
dplyr::summarise(Cant=n(), .groups = 'drop')
c.pumas <- wildcats%>%
dplyr::filter(Nombre_Comun == "Puma")%>%
dplyr::group_by(Pais)%>%
dplyr::summarise(Cant=n(), .groups = 'drop')
Next, the jaguar information is graphed.
data(worldgeojson, package = "highcharter")
highchart() %>%
hc_title(text = "Number of Jaguars") %>%
hc_subtitle(text = "per Country") %>%
hc_add_series_map(map = worldgeojson, df = c.jaguars, name = "number of individuals",
value = "Cant", joinBy = c("name", "Pais"), allAreas=FALSE, animation=TRUE)%>%
hc_colorAxis(min= 1,
type='logarithmic',
stops = color_stops())%>%
hc_tooltip(useHTML = TRUE, headerFormat = "",
pointFormat = "Country: {point.name}/ Number of Species: {point.Cant}")%>%
hc_legend(layout= 'horizontal',
verticalAlign= 'top')%>%
hc_mapNavigation(enabled = TRUE)
## Warning: `mutate_()` was deprecated in dplyr 0.7.0.
## Please use `mutate()` instead.
## See vignette('programming') for more help
data(worldgeojson, package = "highcharter")
highchart() %>%
hc_title(text = "Number of Pumas") %>%
hc_subtitle(text = "per Country") %>%
hc_add_series_map(map = worldgeojson, df = c.pumas, name = "number of individuals",
value = "Cant", joinBy = c("name", "Pais"), allAreas= FALSE, animation=TRUE)%>%
hc_colorAxis(min= 1,
type='logarithmic',
stops = color_stops())%>%
hc_tooltip(useHTML = TRUE, headerFormat = "",
pointFormat = "Country: {point.name}/ Number of Species: {point.Cant}")%>%
hc_legend(layout= 'horizontal',
verticalAlign= 'top')%>%
hc_mapNavigation(enabled = TRUE)
In this subsection, quantities of the species observed are analyzed in relation to the temporality of the observations.
As was done in a previous point, in order to represent the number of species according to the observation period, a preselection of the data is carried out by species.
c.jaguars <- wildcats%>%
dplyr::filter(Nombre_Comun == "Jaguar")%>%
dplyr::group_by(Fecha_Registro)%>%
dplyr::summarise(Cant=n())
c.pumas <- wildcats%>%
dplyr::filter(Nombre_Comun == "Puma")%>%
dplyr::group_by(Fecha_Registro)%>%
dplyr::summarise(Cant=n())
ct.jaguars <- ts(data=c.jaguars, start = c(2011,10), end = c(2019,12), frequency = 12)
ct.pumas <- ts(data=c.pumas, start = c(2011,10), end = c(2019,12), frequency = 12)
Next, a time series is plotted on a line of quantities of observed species.
highchart(type = "stock") %>%
hc_title(text = "Number of Species") %>%
hc_subtitle(text = "on the Observation Timeline") %>%
hc_add_series(ct.jaguars[,2], type = "line", pointInterval= 30*24*3600*1000, name="Number of Jaguars Observed to Date", color = "#ffa500")%>%
hc_add_series(ct.pumas[,2], type = "line", pointInterval= 30*24*3600*1000, name="Number of Pumas Observed to Date", color = "#13ED3F")%>%
hc_xAxis(title= list(text='Date'), type='datetime')%>%
hc_yAxis(title= list(text='Number of Species Observed'), opposite = FALSE) %>%
hc_credits(enabled = TRUE, text = "By: Jose Francisco Nuñez O", position=list(align = "center"),
href = "http://geoimaginarte.github.io/mysite") %>%
hc_legend(enabled=TRUE)
Likewise, a decomposition of the time series is carried out to observe the behavior of the time line according to the observed, the trend, the seasonal and the random.
decomp_jaguars <- decompose(ct.jaguars[,2])
decomp_pumas <- decompose(ct.pumas[,2])
plot(decomp_jaguars)
par(adj = 1)
title(sub = "Jaguars", cex.sub = 1.5, font.sub = 3, col.sub = "#ffa500")
plot(decomp_pumas, sub="Pumas")
par(adj = 1)
title(sub = "Pumas", cex.sub = 1.5, font.sub = 3, col.sub = "#13ED3F")
In this section, the location of the observed species is analyzed in terms of distribution, density per occupied space (grouping), in addition to location in relation with the ecoregion.
In these first maps the corresponding location of the individuals of the different species is visualized, as well as the location dynamics in relation to the month in which specie was observed.
#colour = cf2(Nombre_Comun)
e.m <- ggplot(wildcats, aes(x=Longitud, y=Latitud, frame=M.R, text = paste0("Country:", Pais))) +
borders("world", xlim = c(-120, -30), ylim = c(-50, 25))+
geom_point(data=filter(wildcats), shape = 21, colour= cf2(wildcats$Nombre_Comun) ,size = 1)+
facet_grid(cols = vars(Nombre_Comun)) +
labs(title = "Spatial Distribution of Species by Month")
ggplotly(e.m)%>%
animation_opts(
2000, easing = "elastic", redraw = FALSE
)%>%
animation_slider(
currentvalue = list(prefix = "Month ", font = list(color="red"))
)
Similar the previous maps of this Subpart, the location of the individuals of the different species is visualized, referring to the pattern of elliptic spatial distribution that such individuals follow as location dynamics according to the month in which specie was observed.
pd <- qplot(data = wildcats, x=Longitud, y=Latitud) +
stat_ellipse(geom = "polygon", alpha = 1/2, aes(fill = Nombre_Comun, frame=M.R))+
borders("world", xlim = c(-120, -30), ylim = c(-50, 25))+
geom_point(data=filter(wildcats), aes(frame=M.R), shape = 21, fill = cf2(wildcats$Nombre_Comun), size = 2)+
facet_grid(cols = vars(Nombre_Comun)) +
labs(title = "Species Distribution Pattern")
ggplotly(pd)%>%
animation_opts(
2000, easing = "elastic", redraw = FALSE
)%>%
animation_slider(
currentvalue = list(prefix = "Month ", font = list(color="red"))
)
In this pair of maps on each species studied, the level of density is displayed that shows the location of the individuals in the different months in which specie was observed.
dn <- ggplot(wildcats, aes(x=Longitud, y=Latitud)) +
borders("world", regions=".", xlim = c(-120, -30), ylim = c(-50, 25))+
stat_density2d(aes(fill = stat(level), frame=wildcats$M.R), geom="polygon", alpha=0.3) +
geom_point(data=filter(wildcats), aes(frame=M.R), shape = 21, fill = cf2(wildcats$Nombre_Comun), size = 2, alpha=0.3)+
facet_grid(cols = vars(Nombre_Comun)) +
scale_fill_viridis_c(option = "viridis") +
theme(legend.position = "magma") +
labs(title = "Density of Each Species")
ggplotly(dn)%>%
animation_opts(
4000, easing = "elastic", redraw = FALSE
)%>%
animation_slider(
currentvalue = list(prefix = "Month ", font = list(color="red"))
)
The categories of Ecoregions/Biomes:
1: Mediterranean Forests, Woodlands & Scrub 2: Tropical & Subtropical Dry Broadleaf Forests 3: Temperate Broadleaf & Mixed Forests 4: Tropical & Subtropical Coniferous Forests 5: Temperate Grasslands, Savannas & Shrublands 6: Mangroves 7: ( ) 8: Montane Grasslands & Shrublands 9: Tropical & Subtropical Grasslands, Savannas & Shrublands 10: Tropical & Subtropical Moist Broadleaf Forests 11: Flooded Grasslands & Savannas 12: Montane Grasslands & Shrublands 13: Magellanic subpolar forests 14: Rock and Ice 15: ( )
At this point, three different graphs are shown in which species area represented: 1) the spatial location of the individuals according to the ecoregion (in orange: jaguars, green: pumas), 2) the spatial location of the individuals by species differentiated by color according to ecoregion, and 3) the number of individuals per species according to ecoregion.
ecoreg_spec <- raster::extract(ecoreg, wildcats[,4:3]) %>%
cbind(wildcats, .) %>%
as.data.frame()
ecoreg_spec <- na.omit(ecoreg_spec)
ecoreg_spec$Ecoreg <- as.character(ecoreg_spec$.)
r_eco <- plot(ecoreg)+
points(x=wildcats$Longitud, y=wildcats$Latitud, pch=20, col= cf2(wildcats$Nombre_Comun))
ggplot(ecoreg_spec, aes(x=Longitud, y=Latitud, color = ecoreg_spec$Ecoreg)) +
borders("world", xlim = c(-120, -30), ylim = c(-50, 25))+
geom_point(data=filter(ecoreg_spec), shape = 21, size = 1)+
scale_colour_manual(values = lacroix_palette("PassionFruit", n=14))+
facet_grid(cols = vars(Nombre_Comun)) +
labs(title = "Species According to Ecoregion")
ggplot(ecoreg_spec, aes(y=Ecoreg, x=Nombre_Comun, colour = Ecoreg)) +
geom_count(alpha=0.5) +
scale_colour_manual(values = lacroix_palette("PassionFruit", n=14))+
labs(title = "Number of Species According to Ecoregion",
x = "Specie",
y = "Ecoregions",
size = "")
This analysis is carried out in order to evaluate which are the most suitable variables for spatial modeling.
To achieve this analysis, it proceed to unite the observation points of the species together with the raster of the environmental variables. This process excludes ecoregion, which is a categorical variable. The wildcats database only works with a few variables that have to do with the species and its location. Once the result of the union is obtained, it proceed to eliminate the null values, which would be those pixels of the raster variables that do not achieve a point of spatial union with the individuals of the species.
#Relevant topics are extracted from environmental variables
select_r = dropLayer(st,3)
jp_swd <- raster::extract(select_r, wildcats[,4:3]) %>%
cbind(wildcats, .) %>%
as.data.frame()
#Rows with NULL values are deleted
jp_swd <- na.omit(jp_swd)
With the previous result, it proceed to apply a multiple correlation analysis and similarly graph the result of said analysis. In the graphics we see a representation with circles that have different sizes and colors. The color scale is based on values that range from 1 to -1, where -1 is a multiproportional inverse correlation, that is, while one variable increases the other decreases, in turn, values close to 1 show a direct multiproportional correlation, that is, there is a strong correlation in which case while one variable increases the other variable also does, and for its part, values close to 0 have a rather weak correlation, that is, there is no association between the two variables. Moreover, with the size of the circle is possible to visualize the magnitude of the correlation, osea, between the larger the circle closest to -1 or 1.
mt <- cor(jp_swd[,11:23])
corrplot(mt, method = 'circle', type = 'upper')
From the previous step, it was possible to know which variables have collinearity problems while it was known which variables are better related to each other, which are printed in the following table together with the individuals of the species and their respective spatial locations.
jp_fnl <- jp_swd %>%
as_tibble %>%
dplyr::select(Nombre_Cientifico:Longitud, vrs)
## Note: Using an external vector in selections is ambiguous.
## i Use `all_of(vrs)` instead of `vrs` to silence this message.
## i See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
## This message is displayed once per session.
head(jp_fnl)
Regarding the result of the correlation, the relationships between variables with multicollinearity are plotted:
*cld6190_ann : cloud cover, annual
*dtr6190_ann : daytime temperature range, annual
*frs6190_ann : frost frequency, annual
*h_dem : Digital Elevation Model
*pre6190_I1 : precipitation, january
*pre6190_I4 : precipitation, april
*pre6190_I7 : precipitation, july
*pre6190_I10 : precipitation, october
*vap6190_ann : vapor pressure, annual
r_corr= st[[c(1,2,4,5,7,8,9,10,14)]]
raster::pairs(r_corr)
The first step in this modeling process is to extract the predictor values at the point locations. The predictors refer to the rasters that make up the environmental variables that will be used in the modeling, which are those that present multicollinearity.
In addition, 500 random points are created based on the predictors, which are used to evaluate the presence (1) and absence (0) of individuals of the studied species in relation to the environmental variables (predictors). At this point, tables are printed to view the presence and absence results, as well as a general summary of this analysis.
pvals <- raster::extract(r_corr, wildcats[,4:3]) %>%
cbind(wildcats, .) %>%
as.data.frame()
presvals <- pvals[,11:19]
predictors <- r_corr
backgr <- randomPoints(predictors, 500)
absvals <- raster::extract(predictors, backgr)
pb <- c(rep(1, nrow(presvals)), rep(0, nrow(absvals)))
sdmdata <- data.frame(cbind(pb, rbind(presvals, absvals)))
head(sdmdata)
tail(sdmdata)
summary(sdmdata)
## pb cld6190_ann dtr6190_ann frs6190_ann
## Min. :0.0000 Min. :32.00 Min. : 54 Min. : 0.00
## 1st Qu.:0.0000 1st Qu.:58.00 1st Qu.: 99 1st Qu.: 0.00
## Median :1.0000 Median :64.00 Median :115 Median : 1.00
## Mean :0.6736 Mean :63.44 Mean :112 Mean : 12.93
## 3rd Qu.:1.0000 3rd Qu.:68.00 3rd Qu.:123 3rd Qu.: 3.00
## Max. :1.0000 Max. :84.00 Max. :171 Max. :206.00
## NA's :9 NA's :9 NA's :9
## h_dem pre6190_l1 pre6190_l10 pre6190_l4
## Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. : 0.00
## 1st Qu.: 106.8 1st Qu.: 22.00 1st Qu.: 27.00 1st Qu.: 20.50
## Median : 195.0 Median : 51.00 Median : 41.00 Median : 34.00
## Mean : 496.0 Mean : 52.25 Mean : 53.66 Mean : 44.69
## 3rd Qu.: 529.8 3rd Qu.: 75.00 3rd Qu.: 79.00 3rd Qu.: 62.00
## Max. :4920.0 Max. :173.00 Max. :210.00 Max. :189.00
## NA's :52 NA's :9 NA's :9 NA's :9
## pre6190_l7 vap6190_ann
## Min. : 0.00 Min. : 9.0
## 1st Qu.: 6.00 1st Qu.:202.5
## Median : 28.00 Median :256.0
## Mean : 42.95 Mean :226.2
## 3rd Qu.: 72.00 3rd Qu.:266.0
## Max. :204.00 Max. :307.0
## NA's :9 NA's :9
In this process, the methods take a formula that identifies the dependent and independent variables, accompanied by a data frame containing these variables.
For this, the function (glm) is used, which allows to fit generalized linear models, specified by a symbolic description of the linear predictor and a description of the error distribution.
In a first model, it is use the environmental variables with multicolinearity:
*cld6190_ann : cloud cover, annual
*dtr6190_ann : daytime temperature range, annual
*frs6190_ann : frost frequency, annual
*h_dem : Digital Elevation Model
*pre6190_I1 : precipitation, january
*pre6190_I4 : precipitation, april
*pre6190_I7 : precipitation, july
*pre6190_I10 : precipitation, october
*vap6190_ann : vapor pressure, annual
m1 <- glm ( pb ~ cld6190_ann + dtr6190_ann + frs6190_ann + h_dem + pre6190_l1 + pre6190_l10 + pre6190_l4 + pre6190_l7 + vap6190_ann, data = sdmdata )
class ( m1 )
## [1] "glm" "lm"
summary ( m1 )
##
## Call:
## glm(formula = pb ~ cld6190_ann + dtr6190_ann + frs6190_ann +
## h_dem + pre6190_l1 + pre6190_l10 + pre6190_l4 + pre6190_l7 +
## vap6190_ann, data = sdmdata)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.3055 -0.4267 0.1940 0.2883 1.0441
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.257e-01 1.595e-01 -5.176 2.58e-07 ***
## cld6190_ann 9.488e-03 1.864e-03 5.091 4.03e-07 ***
## dtr6190_ann 4.550e-03 7.590e-04 5.995 2.55e-09 ***
## frs6190_ann 2.016e-03 6.707e-04 3.006 0.00270 **
## h_dem -6.598e-05 2.103e-05 -3.138 0.00174 **
## pre6190_l1 1.413e-03 4.492e-04 3.146 0.00169 **
## pre6190_l10 5.114e-03 5.104e-04 10.020 < 2e-16 ***
## pre6190_l4 -6.039e-03 5.832e-04 -10.354 < 2e-16 ***
## pre6190_l7 7.965e-04 4.239e-04 1.879 0.06046 .
## vap6190_ann 1.257e-03 3.955e-04 3.177 0.00152 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1811072)
##
## Null deviance: 322.70 on 1477 degrees of freedom
## Residual deviance: 265.87 on 1468 degrees of freedom
## (54 observations deleted due to missingness)
## AIC: 1680.9
##
## Number of Fisher Scoring iterations: 2
m1
##
## Call: glm(formula = pb ~ cld6190_ann + dtr6190_ann + frs6190_ann +
## h_dem + pre6190_l1 + pre6190_l10 + pre6190_l4 + pre6190_l7 +
## vap6190_ann, data = sdmdata)
##
## Coefficients:
## (Intercept) cld6190_ann dtr6190_ann frs6190_ann h_dem pre6190_l1
## -8.257e-01 9.488e-03 4.550e-03 2.016e-03 -6.598e-05 1.413e-03
## pre6190_l10 pre6190_l4 pre6190_l7 vap6190_ann
## 5.114e-03 -6.039e-03 7.965e-04 1.257e-03
##
## Degrees of Freedom: 1477 Total (i.e. Null); 1468 Residual
## (54 observations deleted due to missingness)
## Null Deviance: 322.7
## Residual Deviance: 265.9 AIC: 1681
The BIOCLIM algorithm calculates the similarity of a location by comparing the values of environmental variables at any location with a percentage distribution of the values at known locations of occurrence. The closer to the 50th percentile (the median), the more suitable the location. The tails of the distribution are not distinguished, that is, the 10th percentile is treated as equivalent to the 90th percentile.
For this case only presence data is used, then it is use ‘presvals’ instead of ‘sdmdata’.
bc <- bioclim ( presvals [, c ( 'cld6190_ann', 'dtr6190_ann', 'frs6190_ann', 'h_dem', 'pre6190_l1', 'pre6190_l10', 'pre6190_l4', 'pre6190_l7', 'vap6190_ann' )])
class ( bc )
## [1] "Bioclim"
## attr(,"package")
## [1] "dismo"
bc
## class : Bioclim
##
## variables: cld6190_ann dtr6190_ann frs6190_ann h_dem pre6190_l1 pre6190_l10 pre6190_l4 pre6190_l7 vap6190_ann
##
##
## presence points: 1002
## cld6190_ann dtr6190_ann frs6190_ann h_dem pre6190_l1 pre6190_l10 pre6190_l4
## 1 66 118 0 107 72 30 34
## 2 57 99 0 31 113 110 88
## 3 60 121 1 250 14 45 14
## 5 63 123 10 757 86 43 32
## 7 56 84 1 51 48 98 25
## 9 60 119 1 275 15 46 14
## 10 58 97 0 2 121 111 87
## 12 60 95 0 2 23 58 12
## 14 66 118 0 110 72 30 34
## 15 53 116 1 200 47 130 43
## pre6190_l7 vap6190_ann
## 1 6 260
## 2 150 269
## 3 42 266
## 5 6 214
## 7 134 268
## 9 43 266
## 10 165 278
## 12 47 281
## 14 6 260
## 15 171 238
## (... ... ...)
pairs ( bc )
All these “model” objects, regardless of their exact class, can be used with the “predict” function in order to make predictions for any combination of independent variable values.
Then, predictions are made with the glm model object ‘m1’ and for the bioclim model ‘bc’, for three records with values for the variables: cld6190_ann, dtr6190_ann, frs6190_ann, h_dem, pre6190_l1, pre6190_l10, pre6190_l4, pre6190np_l4, pre6190np0_l4 . The latter are shown in a first dataframe.
In the results of the prediction there are categories where 1 indicates less frequency to find species, 2 average frequency and 3 where it would be more frequent to find species.
cld6190_ann = c(40, 60, 80)
dtr6190_ann = c(40, 100, 140)
frs6190_ann = c(50, 125, 200)
h_dem = c(1000, 3000, 6000)
pre6190_l1 = c(50, 100, 150)
pre6190_l10 = c(50, 100, 150)
pre6190_l4 = c(50, 100, 150)
pre6190_l7 = c(50, 100, 150)
vap6190_ann = c(50, 100, 250)
pdt = data.frame ( cbind (cld6190_ann, dtr6190_ann, frs6190_ann, h_dem, pre6190_l1, pre6190_l10, pre6190_l4, pre6190_l7, vap6190_ann))
pdt
predict (m1 , pdt)
## 1 2 3
## -0.1022662 0.5068132 1.0845869
predict (bc , pdt)
Making such predictions for some environments can be very useful to explore and understand the predictions of the model, therefore at this point, the “response” function is used to create response graphs for each variable, with the other variables at their mean value, based on the Bioclim Model.
response (bc)
Finally, the results of the prediction are represented on a map to be able to visualize the possible distribution of species based on environmental variables and the presence of observed individuals of the species studied.
In the color spectrum of the map, the highest values are observed, such as those where it is suitable to find these species, and on the contrary in the spectrum, where the presence of these is less suitable.
p <- predict (predictors,m1)
plot (p)