Species distribution are largely available in online databases, such as the distributions ranges in IUCN, or the occurrence records in GBIF. However, to analyze these kind of data most of the time it is necessary to transform the spatial distribution of species into a presence absence matrix or into a grid format. In this tutorial I will show how to easily make this transformation using the R package
letsR, written by myself and Fabricio Villalobos.
First you will have to download the species distribution shapefiles from the IUCN website. This data can be loaded using the functions
raster::shapefile. Here I will use the data for the frogs of the family Phyllomedusa that is already loaded within the letsR package.
We can plot the data to see how it looks like.
# Plot ## Color settings and assignment colors <- rainbow(length(unique(Phyllomedusa@data$binomial)), alpha = 0.5) position <- match(Phyllomedusa@data$binomial, unique(Phyllomedusa@data$binomial)) colors <- colors[position] ## Plot call plot(Phyllomedusa, col = colors, lty = 0, main = "Spatial polygons of Phyllomedusa") map(add = TRUE)
Next step we can use the function
lets.presab to convert species’ ranges (in shapefile format) into a presence-absence matrix based on a user-defined grid system. A simple way to do this is to define the extent and resolution of the grid.
PAM <- lets.presab(Phyllomedusa, xmn = -93, xmx = -29, ymn = -57, ymx = 15, res = 1)
Note that if you have shapefiles with more species, or if you decide for a high resolution grid, the function may run very slowly. In this case, you may want to keep track of the analysis relative running time by setting the argument
count = TRUE.
lets.presab returns a
PresenceAbsence object (unless
show.matrix = TRUE, which in this case only a presence absence matrix is returned). The
PresenceAbsence object is basically a list containing a presence absence matrix, a raster with the geographical information, and the species names (for more information
?PresenceAbsence). We can use the function
summary to generate summary data about the PAM we just created.
## ## Class: PresenceAbsence ## _ _ ## Number of species: 32 ## Number of cells: 1168 ## Cells with presence: 1168 ## Cells without presence: 0 ## Species without presence: 0 ## Species with the largest range: Phyllomedusa hypochondrialis ## _ _ ## Grid parameters ## Resolution: 1, 1 (x, y) ## Extention: -93, -29, -57, 15 (xmin, xmax, ymin, ymax) ## Coord. Ref.: +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
You can also use the
plot function directly to the PAM object.
plot function also allow users to plot specific species distributions. For example, we can plot the map of Phyllomedusa hypochondrialis:
plot(PAM, name = "Phyllomedusa hypochondrialis")
As I said before, the PAM object contains the actual presence absence matrix, to access it we can use the following code:
presab <- PAM$P
The first two columns of the matrix contain the longitude (x) and latitude (y) of the cells’ centroid, the following columns include the species’ presence(1) and absence(0) information.
# Print only the first 5 rows and 3 columns presab[1:5, 1:3]
## Longitude(x) Latitude(y) Phyllomedusa araguari ## [1,] -74.5 11.5 0 ## [2,] -69.5 11.5 0 ## [3,] -68.5 11.5 0 ## [4,] -75.5 10.5 0 ## [5,] -74.5 10.5 0
Using different projections
Some users may want to use different projections to generate the presence absence matrix. The
lets.presab function allow users to do it by changing the
crs.grid argument. Check the example using the South America Equidistant Conic projection.
pro <- paste("+proj=eqdc +lat_0=-32 +lon_0=-60 +lat_1=-5", "+lat_2=-42 +x_0=0 +y_0=0 +ellps=aust_SA", "+units=m +no_defs") SA_EC <- CRS(pro) PAM_proj <- lets.presab(Phyllomedusa, xmn = -4135157, xmx = 4707602, ymn = -450000, ymx = 5774733, res = 100000, crs.grid = SA_EC)
## ## Class: PresenceAbsence ## _ _ ## Number of species: 32 ## Number of cells: 1376 ## Cells with presence: 1376 ## Cells without presence: 0 ## Species without presence: 0 ## Species with the largest range: Phyllomedusa hypochondrialis ## _ _ ## Grid parameters ## Resolution: 1e+05, 1e+05 (x, y) ## Extention: -4135157, 4664843, -425267, 5774733 (xmin, xmax, ymin, ymax) ## Coord. Ref.: +proj=eqdc +lat_0=-32 +lon_0=-60 +lat_1=-5 +lat_2=-42 +x_0=0 +y_0=0 +ellps=aust_SA +units=m +no_defs
plot(PAM_proj) # Add projected country boundaries library(maptools) data("wrld_simpl") world <- spTransform(wrld_simpl, SA_EC) plot(world, add = TRUE)
Note that I changed the extent and resolution parameters to match the new projection. A good way to determine both the extent and the resolution is to first transform the projection of a raster from a non-projected PresenceAbsence object, and see how the parameters changed. For example:
pr2 <- projectRaster(PAM$Richness_Raster, crs = SA_EC) mean(res(pr2)) # Resolution value
##  104900
extent(pr2) # Extent values (i.e. xmn, xmx, ymn, ymx)
## class : Extent ## xmin : -4716700 ## xmax : 4471700 ## ymin : -3568248 ## ymax : 5755752
Also note that the function assumes that the shapefiles are in WGS84 format, if this is not the case for your data you should change the
lets.presab has some other useful arguments. For example, some users may want to exclude parts of the range where the species are extinct, or only keep the breeding ranges. The arguments
seasonal allow users to filter the species distribution according to the IUCN classification of the different parts of a species range distribution. The specific values to use in these arguments may be obtained from the IUCN metadata files.
In some situations it is useful to only consider a species present in a cell if it covers more than a certain percentage value. Users can change this value using the argument
cover. Note that this option is only available when the coordinates are in degrees (longitude/latitude). UPDATE: the version on github now allow users to use the argument cover with other projections.
# 90% cover PAM_90 <- lets.presab(Phyllomedusa, xmn = -93, xmx = -29, ymn = -57, ymx = 15, res = 1, cover = 0.9) plot(PAM_90)
You can see now in the plot above that the cells close to the border of the continent do not indicate the presence of the species anymore.
When creating multiple
PresenceAbsence objects for different groups, users may want to keep the same grid. In this case it is important to keep the argument
remove.cells = FALSE, to avoid altering the grid. When
remove.cells = TRUE the final matrix will not contain cells in the grid with a value of zero (i.e. sites with no species present).
PAM_keep_cells <- lets.presab(Phyllomedusa, xmn = -93, xmx = -29, ymn = -57, ymx = 15, res = 1, remove.cells = FALSE)
Now you can use the summary function to verify if the empty cells were kept.
## ## Class: PresenceAbsence ## _ _ ## Number of species: 32 ## Number of cells: 4608 ## Cells with presence: 1168 ## Cells without presence: 3440 ## Species without presence: 0 ## Species with the largest range: Phyllomedusa hypochondrialis ## _ _ ## Grid parameters ## Resolution: 1, 1 (x, y) ## Extention: -93, -29, -57, 15 (xmin, xmax, ymin, ymax) ## Coord. Ref.: +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0
Also, if users want to keep the species that do not occur in any cell of the grid it is necessary to set
remove.sp = FALSE.
BirdLife species distribution data can be slightly different from the ones provided by IUCN. The main difference is that they are normally provided in separated shapefiles, rather than in one unique shapefile containing all the species.
letsR contains a specific function to deal with this kind of data. The function
lets.presab.birds is an analogous function to
lets.presab. The difference is that instead of a shapefile, users have to provide the path pointing to the location of all birds shapefiles. In the example below we will build a
PresenceAbsence object to Ramphastos birds.
# Attention: For your own files change the path. path_Ramphastos <- system.file("extdata", package = "letsR") PAM_birds <- lets.presab.birds(path_Ramphastos, xmn = -93, xmx = -29, ymn = -57, ymx = 25)
We can also use the functions
plot to the result of
## ## Class: PresenceAbsence ## _ _ ## Number of species: 11 ## Number of cells: 1173 ## Cells with presence: 1173 ## Cells without presence: 0 ## Species without presence: 0 ## Species with the largest range: Ramphastos culminatus ## _ _ ## Grid parameters ## Resolution: 1, 1 (x, y) ## Extention: -93, -29, -57, 25 (xmin, xmax, ymin, ymax) ## Coord. Ref.: NA
All the options in
lets.presab are also available in
Occurrence data (e.g. GBIF)
Another common source of spatial data are occurrence records. The function
lets.presab.points allows users to input occurrence records to generate a
PresenceAbsence object. To use this function you will need a two column matrix with longitude and latitude, and a vector indicating the species name of each occurrence record. The example below uses occurrence data from GBIF for Phyllomedusa, obtained using the R package
# Get occurrences for Phyllomedusa occurrrences <- occ(query = 'Phyllomedusa', from = 'gbif', limit = 5000) data <- occurrrences$gbif$data$Phyllomedusa # Remove NAs remove_na <- is.na(data$longitude) | is.na(data$latitude) data <- data[!remove_na, ] xy <- data[, c("longitude", "latitude")] # coordinates sp_name <- data$name
Now that we have the coordinates and species name, we can use the
PAM_points <- lets.presab.points(xy, sp_name, xmn = -93, xmx = -29, ymn = -57, ymx = 15, res = 1)
Using your own grid
For different reasons some users may want to create a presence absence matrix based on their own grid in shapefile format. The function
lets.presab.grid allow users to convert species’ ranges into a presence-absence matrix based on a grid in shapefile format. However, this function only returns the actual matrix of presence absence and the grid, not an
PresenceAbsence object. In some situations it is possible to convert this result to a
PresenceAbsence object, I will cover this in a new post soon. Let’s first create our grid:
# Grid sp.r <- rasterToPolygons(raster(xmn = -93, xmx = -29, ymn = -57, ymx = 15, resolution = 5)) # Give an ID to the cell slot(sp.r, "data") <- cbind("ID" = 1:length(sp.r), slot(sp.r, "data")) plot(sp.r, border = rgb(.5, .5, .5)) map(add = T)
The grid and the species ranges should be in the same projection.
data(Phyllomedusa) projection(Phyllomedusa) <- projection(sp.r)
Now we can build our presence absence matrix from the grid.
resu <- lets.presab.grid(Phyllomedusa, sp.r, "ID")
The result is a list with the presence absence matrix and the grid. To plot the richness map we can use the following code:
rich_plus1 <- rowSums(resu$PAM) + 1 colfunc <- colorRampPalette(c("#fff5f0", "#fb6a4a", "#67000d")) colors <- c("white", colfunc(max(rich_plus1))) plot(resu$grid, border = "gray40", col = colors[rich_plus1]) map(add = TRUE)
This covers all the functions to convert species distribution into presence absence matrix using the
letsR package. Let me know if you have any suggestion or comments, and please share if you like it.
To cite letsR in publications use: Bruno Vilela and Fabricio Villalobos (2015). letsR: a new R package for data handling and analysis in macroecology. Methods in Ecology and Evolution. DOI: 10.1111/2041-210X.12401