So you’ve read several posts about the Portal site and have even gone to the official GitHub repo for the data, but it still seems pretty intimidating to handle and do analyses on…
Never fear! The Weecology lab hears your concerns and we are actively working on a software package to smooth out the process. You can check out the project on GitHub.
FAQ
Q: What is it exactly?
A: The portalr
project is a software package for the R programming language (http://cran.r-project.org). R is one of the most popular languages for ecology, statistics, and data science; it also has a large open-source community that creates free add-on packages to extend the base functionality.
Q: How do I get the portalr
package?
A: There are some basic instructions on the repo page, but in short, the package is still in development and therefore not yet uploaded to CRAN (the comprehensive R archive network). If you are unfamiliar with installing a package from GitHub, the easiest approach is to first install the devtools
package, and then use one of its functions to install portalr
from GitHub:
install.packages("devtools") devtools::install_github("weecology/portalr")
Q: What can I do with the package?
A: Several different things! Mainly, it is designed to be a general-purpose interface to the Portal data for R users. It allows you to download the latest iteration of data from the data repo, summarize the data in different ways (e.g. by time, by space, by treatment), and integrate different data sources (e.g. rodents, plants, ants, weather).
Q: That sounds great! How do I get started with that?
A: Well, the package is still under development, but check out the demo below, and feel free to send us comments and suggestions (preferably as an issue here). 👇
Demo
Initial setup
Load in the packages we’re going to use for data manipulation and plotting:
library(tidyverse) library(cowplot) library(portalr)
Obtaining the data
To make sure we don’t unnecessarily download the data, we first check whether it might already exist, and if it does, whether the data matches the latest version on the GitHub repo:
# use current folder to store downloaded data my_path <- "." rodent_file <- file.path("PortalData", "Rodents", "Portal_rodent.csv") path_to_rodent_file <- FullPath(rodent_file, my_path) # check if we already have the latest data if(!file.exists(path_to_rodent_file) || observations_are_new(base_folder = my_path)) { download_observations(base_folder = my_path) }
Next, we read in the various data tables:
rodent_data_all <- loadData(path = my_path) print(summary(rodent_data_all))
## Length Class Mode ## rodent_data 29 data.frame list ## species_table 8 data.frame list ## trapping_table 6 data.frame list ## newmoons_table 4 data.frame list ## plots_table 4 data.frame list
Rodent Abundances
The first table that we loaded (rodent_data_all$rodent_data
) is a record of the observed macrofauna, including rodents, but also other taxa. We first filter the data for missing, unindentified, incomplete, or otherwise erroneous data:
rodent_data_all$rodent_data %>% remove_suspect_entries() %>% process_unknownsp(rodent_data_all$species_table, TRUE) %>% remove_incomplete_censuses(rodent_data_all$trapping_table, FALSE) %>% {.} -> rodent_data
Next, we write a function to summarize the abundances for each species within each sampling trip:
summarize_abundance <- function(rodent_data) { return(rodent_data %>% mutate(species = factor(species)) %>% group_by(period) %>% do(data.frame(x = table(.$species))) %>% ungroup() %>% select(period, species = x.Var1, abundance = x.Freq) ) } rodent_abundance <- summarize_abundance(rodent_data)
Finally, we want to add the dates of each sampling trip (currently recorded as an index in the period
column), as well as the scientific names for each species (currently recorded as a two-letter species code in the species
column):
join_census_date <- function(rodent_abundance, newmoons_table) { return(rodent_abundance %>% left_join(select(newmoons_table, "period", "censusdate"), by = "period") %>% mutate(census_date = as.Date(censusdate)) ) } join_scientific_name <- function(rodent_abundance, species_table) { return(rodent_abundance %>% left_join(select(species_table, "species", "scientificname"), by = "species") %>% rename(scientific_name = scientificname) ) } rodent_abundance %>% join_census_date(rodent_data_all$newmoons_table) %>% join_scientific_name(rodent_data_all$species_table) %>% select(census_date, scientific_name, abundance) %>% {.} -> rodent_abundance print(summary(rodent_abundance))
## census_date scientific_name abundance ## Min. :1977-07-16 Baiomys taylori : 438 Min. : 0.000 ## 1st Qu.:1987-05-28 Chaetodipus baileyi : 438 1st Qu.: 0.000 ## Median :1996-06-02 Chaetodipus hispidus : 438 Median : 0.000 ## Mean :1997-03-06 Chaetodipus intermedius : 438 Mean : 6.273 ## 3rd Qu.:2007-06-16 Chaetodipus penicillatus: 438 3rd Qu.: 5.000 ## Max. :2017-11-18 (Other) :7008 Max. :285.000 ## NA's : 438
Plot
Finally, let’s create our plot of species abundances over time:
my_plot <- ggplot(rodent_abundance, aes(x = census_date, y = abundance)) + geom_line() + facet_wrap(~scientific_name, scales = "free_y", ncol = 3) + xlab("Date") + ylab("Abundance") + scale_x_date(breaks = seq(as.Date("1977-01-01"), to = as.Date("2018-01-01"), "+5 years"), date_labels = "%Y", limits = as.Date(c("1977-01-01", "2018-01-01"))) + theme_cowplot() + theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), legend.position = "bottom", legend.justification = "center", strip.text.x = element_text(size = 10)) print(my_plot)
Conclusion
So that was just one way of viewing the rodent abundance data, but we plan to include many such examples with the `portalr` package. Our goal is to help everyone get up to speed quickly with doing various analyses with the package, and to be able to use all of the different data sources effectively. Again if you have questions or suggestions, please feel free to drop us a line at the GitHub issues page.