8 min read

Adding variables to a PresenceAbsence object

Once you have transformed species distribution data into a presence absence matrix (PAM) in PresenceAbsence format (see the first and second posts on how to do it), you may want to add variables to it. These variables are normally in raster format, for example the WorldClim bioclimatic data, or in shapefile format, for example ecorregions of the world.

Adding variables in raster format

To add variables in raster format to a PresenceAbsence object we can use the function lets.addvar from the letsR package. This function takes a raster object with any resolution and extent, and transform it to match the information in your PresenceAbsence object. Once this is done, the variables are included as additional columns containing the aggregate/summarize value of the variable(s) in each cell of the presence-absence matrix. Let’s see an example using the bioclimatic data from WorldClim that can be downloaded with the package raster.

library(letsR)

First we get the data.

r <- getData("worldclim", var = "bio", 
             res = 10)
plot(r)

Here I will use the PresenceAbsence object for Phyllomedusa species generated previously.

data(PAM)
plot(PAM, main = "Phyllomedusa\nRichness")

Let’s check the projection differences.

projection(r)
## [1] "+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"
projection(PAM$Richness_Raster)
## [1] "+proj=longlat +datum=WGS84"

Since the CRS descriptions differ, we have to fix them first. In this case the projections are actually the same, but the character describing them differ, so we can just change the name directly. But if the projections are actually different, you may want to use the function raster::projectRaster.

projection(PAM$Richness_Raster) <- projection(r)

We can now run the lets.addvar function.

PAM_env <- lets.addvar(PAM, r, fun = mean)

The climatic data have a higher resolution than our PAM. In this case, we could choose a function to aggregate the values with the argument fun. In most of the situations, people will be interested in averaging values to aggregate multiple cells, but in some specific cases you may want to sum them, or get the standard deviation, or use any another function.

The result is a presence absence matrix, where the last columns now include the raster values. Check the table:

head(PAM_env)
Longitude(x) Latitude(y) Phyllomedusa araguari Phyllomedusa atelopoides Phyllomedusa ayeaye Phyllomedusa azurea Phyllomedusa bahiana Phyllomedusa baltea Phyllomedusa bicolor Phyllomedusa boliviana Phyllomedusa burmeisteri Phyllomedusa camba Phyllomedusa centralis Phyllomedusa coelestis Phyllomedusa distincta Phyllomedusa duellmani Phyllomedusa ecuatoriana Phyllomedusa hypochondrialis Phyllomedusa iheringii Phyllomedusa itacolomi Phyllomedusa megacephala Phyllomedusa neildi Phyllomedusa nordestina Phyllomedusa oreades Phyllomedusa palliata Phyllomedusa perinesos Phyllomedusa rohdei Phyllomedusa sauvagii Phyllomedusa tarsius Phyllomedusa tetraploidea Phyllomedusa tomopterna Phyllomedusa trinitatis Phyllomedusa vaillantii Phyllomedusa venusta bio1_mean bio2_mean bio3_mean bio4_mean bio5_mean bio6_mean bio7_mean bio8_mean bio9_mean bio10_mean bio11_mean bio12_mean bio13_mean bio14_mean bio15_mean bio16_mean bio17_mean bio18_mean bio19_mean
-74.5 11.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 258.4000 91.65000 82.20000 526.7500 311.0500 200.1500 110.9000 258.7000 252.0000 263.5500 250.6000 1198.0500 250.7000 7.60000 80.50000 591.4000 34.15000 341.5500 108.0000
-69.5 11.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 262.8497 84.22171 71.63269 892.6027 322.6034 206.4553 116.1482 262.5537 254.9960 272.4501 250.0026 731.0237 122.2032 17.98552 56.41115 309.3956 75.65198 198.3499 145.9311
-68.5 11.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 266.2190 76.64375 73.65425 675.3595 319.1875 216.5294 102.6581 264.4078 261.5261 272.8083 256.0042 981.2149 166.8388 19.68305 57.18767 431.8224 98.00707 238.0230 229.3728
-75.5 10.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 277.1437 90.83067 80.08133 426.1697 330.8697 217.9157 112.9540 276.3097 273.5883 281.3817 270.9663 1055.2210 205.3570 4.30900 73.66500 486.6277 19.99833 258.0923 109.8290
-74.5 10.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 268.4548 108.15780 82.40520 445.8382 332.6014 202.0994 130.5020 266.5624 265.9968 273.0954 261.9868 1337.1702 237.9668 11.50960 66.55720 586.0000 52.64860 315.4482 273.1824
-69.5 10.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 238.6514 100.90767 79.92085 588.9459 299.3219 173.6980 125.6239 239.7355 231.9007 244.1322 229.8309 808.9125 127.8143 12.74922 55.21132 338.7148 52.07358 233.9582 106.5961

If you do not want the coordinates and species included you can set the argument onlyvar = TRUE.

climate <- lets.addvar(PAM, r, fun = mean, onlyvar = TRUE)
head(climate)
bio1_mean bio2_mean bio3_mean bio4_mean bio5_mean bio6_mean bio7_mean bio8_mean bio9_mean bio10_mean bio11_mean bio12_mean bio13_mean bio14_mean bio15_mean bio16_mean bio17_mean bio18_mean bio19_mean
258.4000 91.65000 82.20000 526.7500 311.0500 200.1500 110.9000 258.7000 252.0000 263.5500 250.6000 1198.0500 250.7000 7.60000 80.50000 591.4000 34.15000 341.5500 108.0000
262.8497 84.22171 71.63269 892.6027 322.6034 206.4553 116.1482 262.5537 254.9960 272.4501 250.0026 731.0237 122.2032 17.98552 56.41115 309.3956 75.65198 198.3499 145.9311
266.2190 76.64375 73.65425 675.3595 319.1875 216.5294 102.6581 264.4078 261.5261 272.8083 256.0042 981.2149 166.8388 19.68305 57.18767 431.8224 98.00707 238.0230 229.3728
277.1437 90.83067 80.08133 426.1697 330.8697 217.9157 112.9540 276.3097 273.5883 281.3817 270.9663 1055.2210 205.3570 4.30900 73.66500 486.6277 19.99833 258.0923 109.8290
268.4548 108.15780 82.40520 445.8382 332.6014 202.0994 130.5020 266.5624 265.9968 273.0954 261.9868 1337.1702 237.9668 11.50960 66.55720 586.0000 52.64860 315.4482 273.1824
238.6514 100.90767 79.92085 588.9459 299.3219 173.6980 125.6239 239.7355 231.9007 244.1322 229.8309 808.9125 127.8143 12.74922 55.21132 338.7148 52.07358 233.9582 106.5961

Now that we have the variables, we can use it to relate to our species data in many ways. For example, you could graph the relationship between precipitation/temperature and species richness.

library(ggplot2)
rich <- rowSums(PAM$P[, -(1:2)])

mpg1 <- data.frame("Temperature" = climate[, 1]/10,
                   "Precipitation" = climate[, 12],
                   "Richness" = rich)
f1 <- ggplot(mpg1, aes(Temperature, Richness))
f1 <- f1 + geom_smooth(model = lm) + 
  geom_point(col = rgb(0, 0, 0, .6)) + 
  theme_bw()

f2 <- ggplot(mpg1, aes(Precipitation, Richness))
f2 <- f2 + geom_smooth(model = lm) + 
  geom_point(col = rgb(0, 0, 0, .6)) + 
  theme_bw()

f1

f2

Adding variables in polygon format

Data in shapefile format like ecorregions, conservation units or countries, can be added to a PAM using the function lets.addpoly. This function adds polygons’ attributes as columns at the right-end of the matrix. The values represent the percentage of the cell covered by the polygon attribute used. As an example, we can use the South America countries map available in the package maptools.

library(maptools)
data("wrld_simpl")
SA <- c("Brazil", "Colombia",  "Argentina",
        "Peru", "Venezuela", "Chile",
        "Ecuador", "Bolivia", "Paraguay",
        "Uruguay", "Guyana", "Suriname",
        "French Guiana")
south_ame <- wrld_simpl[wrld_simpl@data$NAME %in% SA, ]
plot(south_ame)

Now we can add this variables to our PAM.

PAM_pol <- lets.addpoly(PAM, south_ame, "NAME")
head(PAM_pol)
Longitude(x) Latitude(y) Phyllomedusa araguari Phyllomedusa atelopoides Phyllomedusa ayeaye Phyllomedusa azurea Phyllomedusa bahiana Phyllomedusa baltea Phyllomedusa bicolor Phyllomedusa boliviana Phyllomedusa burmeisteri Phyllomedusa camba Phyllomedusa centralis Phyllomedusa coelestis Phyllomedusa distincta Phyllomedusa duellmani Phyllomedusa ecuatoriana Phyllomedusa hypochondrialis Phyllomedusa iheringii Phyllomedusa itacolomi Phyllomedusa megacephala Phyllomedusa neildi Phyllomedusa nordestina Phyllomedusa oreades Phyllomedusa palliata Phyllomedusa perinesos Phyllomedusa rohdei Phyllomedusa sauvagii Phyllomedusa tarsius Phyllomedusa tetraploidea Phyllomedusa tomopterna Phyllomedusa trinitatis Phyllomedusa vaillantii Phyllomedusa venusta Argentina Bolivia Brazil Chile Colombia Ecuador French Guiana Guyana Suriname Paraguay Peru Uruguay Venezuela
-74.5 11.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0.12 0 0 0 0 0 0 0 0.00
-69.5 11.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0.00 0 0 0 0 0 0 0 0.58
-68.5 11.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0.00 0 0 0 0 0 0 0 0.17
-75.5 10.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0.33 0 0 0 0 0 0 0 0.00
-74.5 10.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0.97 0 0 0 0 0 0 0 0.00
-69.5 10.5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0.00 0 0 0 0 0 0 0 1.00

This information can be used to calculate the number of species per country for example.

vars_col <- (ncol(PAM$P) + 1):ncol(PAM_pol)
n <- length(vars_col)
rich_count <- numeric(n)
for (i in 1:n) {
  rich_count[i] <- sum(colSums(PAM$P[PAM_pol[, vars_col[i]] > 0,
                                     -(1:2)]) > 0)
}
labs <- as.factor(colnames(PAM_pol)[vars_col])
names(rich_count) <- labs
mpg <- data.frame("Richness" = rich_count, "Country" = as.factor(labs))
g <- ggplot(mpg, aes(labs, Richness))
g + geom_bar(stat = "identity") + labs(x = "") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

To cite letsR in publications use: Bruno Vilela and Fabricio Villalobos (2015). letsR: a new R package for data handling and analysis in macroecology. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12401