portalr – An R package for using the Portal data

August 13, 2019 by

Much effort has been made over the years to keep the Portal data a continuous, consistent time series.

Nonetheless, every field project has its quirks. And in 40 years, a lot of interesting stuff can happen. So some of that consistency has to happen post-hoc. Naturally over the course of decades of researchers using the data, some ‘best practices’ have been developed to deal with data cleaning on multiple levels.

Special Cases

You have to stop setting traps halfway through a plot because it’s in the middle of a lightning storm. You trap with no fences at all, because they’re being replaced. You catch a skink, or a cactus wren, or a snake(!), in a rodent trap.

Within-time series

We have made several improvements over the years to the ways data are collected. While not always affecting the consistency of the time series, those changes may affect the way the data get summarized to mesh with the previous methods.

Across time series

Of course, we just really collect a lot of data, of all types. These data are collected in different ways and at different time scales, but they can all be woven into one time series matrix, if you know what you’re doing.

We want to share these ‘best practices’ publically along with the dataset, because we want it to be easily accessible to anyone who might want to use it. Not just those of us who know all its ‘secrets.’ Or those of us who can yell down the hall to the senior grad student “Hey, there were no fences during a census? What should I do about THAT?”

The best way to do that seemed to be an R package, which we’ve published on CRAN.

portalr

Now you can install it easily from CRAN:

install.packages("portalr")

The development version is also available directly from GitHub by using devtools:

# install.packages("remotes")

remotes::install_github("weecology/portalr")

There are functions to download the data, or to load it into R (including straight from the GitHub repo):

download_observations(".")

data_tables <- load_rodent_data("repo")

You can summarize the rodent data in many different ways. There are arguments for the table shape, whether or not to include unknowns, which treatment types to use, and much more. The possible combinations are endless.

abundance(".", level = "site", shape = "crosstab", time = "period")

Screen Shot 2019-08-13 at 2.12.45 PM.png

You can also get the data as biomass, or even energy, rather than abundance:

biomass("repo", level = "plot", type = "granivores", shape = "flat", time = "date)

Screen Shot 2019-08-13 at 2.58.13 PM.png

There are similar options for weather, plant, and ant data:

weather("Monthly", ".", fill = T)

Screen Shot 2019-08-13 at 2.52.36 PM.png

plant_abundance(".", shape = "flat", level = "quadrat")

Screen Shot 2019-08-13 at 3.02.20 PM.png

There are more in-depth examples in the vignettes. Go check them out!

browseVignettes("portalr")

This is designed to be a quick way to get you off the ground, out of the data cleaning step, and into doing analyses. It also works as a good introduction to the data that are available. Of course the raw data and their metadata are always available once you feel prepared to create more specific/complicated data summaries of your own. The methods and the data paper contain a great amount of details, so you can always discover the provenance of our ‘best practices’ yourself, or decide to do something slightly differently if it fits your question better.

If you use the data in a way that we don’t provide, but you think may be generally useful, please feel free to submit a pull request, request an additional argument in a function, etc. We would love to know how you’re using it!

The Data Paper

June 15, 2018 by

The Portal Project is a living, breathing thing. Not only does the desert constantly keep us guessing, carefully curating the data keeps us on our toes as well.

The methods have changed slightly over the years as we’ve made some realizations about what works and what doesn’t, and, of course, as we’ve gained and lost, and gained, funding. How we collect weather data has changed.

We continue to discover details about the history of the quadrat and transect data. And as ecologists, we can hardly be expected to go 40 years without poking some things. While we’ve maintained half the plots in their original treatments, the other 12 plots have undergone a whirlwind adventure of experimental treatment changes; seed addition, plant removal, targeted ant and rodent species removal.

Not only do taxonomic names change over time, but we also keep getting better at confirming our species identification.

The names of the people involved in the project continue to grow as well. Maintaining a monthly trapping schedule for this long has required an army of grad students, post docs, undergrads, and volunteers.

We want to make these data as easy to use as possible, while making sure we also give as many folks as possible credit for their contributions. And those who use our data want to be able to cite it in a conventional way. Until now, we were doing that with a more traditional data paper, that we would rewrite and republish every time it ‘felt like time.’  But that just isn’t very satisfying. We’re getting new data on a monthly basis. Sometimes we discover that our description of a protocol wasn’t exactly right. We’re even getting new authors at a regular clip. It would bother our perfectionist minds that the latest data paper wasn’t it’s ‘best self.’

So we’ve decided to go live. We’ve published THE data paper to bioRxiv (the preprint server for biology); which we can modify with new versions, but will always maintain it’s doi and citation. Now we’ve got a living document that we can improve, and add data to, and make perfect to our hearts’ content. Data users will be able to access and cite all the knowledge that we currently have about the dataset, not a snapshot in time from 7 years ago.

Data paper

Of course, keeping a data paper up to date is just one part of what it takes to curate a living dataset. We’ve also got a paper in the works that describes our entire data workflow for maintaining the data, which helps us provide new data to the public ASAP.

Data workflow paper

What are the rodents eating?

January 15, 2018 by

Last week I talked about our adventures in vouchering the plants of Portal. This week I’ll get a little more in depth about the project that prompted this collection push.

This slideshow requires JavaScript.

One of the neatest things about Portal is that we have so many species of rodents coexisting together in the system (up to 21, in fact). Over the years, we’ve speculated on many potential reasons for this. When a fun new technique–DNA metabarcoding–for “easily” analyzing diet from fecal samples popped up on our radar, we started wondering if this was something we could use at our site to ask some neat questions.

For example:

  • What are the rodents actually eating?
  • Are different species of rodents eating different things? If so, is that possibly one reason so many species are able to coexist at the site?
  • Does the presence/absence of a behaviorally dominant species affect the diets of other species?
  • How does diet breadth change through time, especially with high seasonal and annual variability in food resources?
  • Etc., etc.

Once we started thinking of questions, it was hard to stop. To ask any of these questions, though, we need to know what DNA metabarcoding even meant and whether it could give us the information that we were hoping for. We started looking more into DNA metabarcoding to try to figure out what it actually was, considering none of us have much experience with genetics. We came to understand that, for our intents and purposes, DNA metabarcoding can be best thought of as the simultaneous identification of multiple species from a single sample (fecal samples, in our case). It offers an efficient and effective way to determine diet content without intensive observation or fatal sampling. As it turns out, DNA metabarcoding has actually be around for over a decade, mostly being used in the microbiology world. By the mid-2000s, microbial ecologists had started bringing the technique into ecological circles. Since then, it has been used to enhance biodiversity surveys through environmental DNA, or eDNA, as well as assess the diets of animals. What is just starting to be done with this technique is using it to compare diets between coexisting species.

In order to assess the diets, we needed to create a DNA reference library, with DNA from the plants at Portal. This way, we can compare the sequences that are extracted from the rodents’ fecal samples to the plants found at the site. This is where the plant vouchering came into play! We can also pull in data from huge international genetic databases, such as GenBank, to potentially find any plants we’ve missed.

So far, we’ve done four rounds of fecal collections–usually during plant censuses when we have lots of helping hands! Once the fecal samples have gone through the black box of next generation, or high-throughput, DNA sequencing, what we get back is rows upon rows of data about which plants have been eaten by which rodents. While we’re still working through the results, the preliminary data looks promising! We’re able to identify quite a few of the plants that they are eating, and we might even be picking up some shifts in diet between treatment types. Once we have a better picture of what is happening, I’ll chime back in with an update.

Now we can vouch for the plants!

January 10, 2018 by

Early on in my dissertation work, I became interested in using a fairly new technique (DNA metabarcoding) to look at what our rodents are eating and if partitioning their diets might be one way so many species exist in our system. I’ll get into more of the details on that project in a subsequent post; for now, though, I want to tell you about a really fun ancillary project. When you’re interested in what the rodents are eating, it’s pretty important to know what they could be eating! Therefore, over the past few plant censuses, we’ve been collecting vouchers and DNA samples for as many of our plant species as we can find!

Having recorded plants at the site for roughly four decades, we are in a pretty fortunate position—we already have a nearly complete list of plants that could be found at the site. Since most of us know more about rodents than plants, however, we wanted to make sure we were correctly identifying our plant species. This requires collecting voucher specimens for every species we come across, pressing them in a plant press, and then dropping them off at the University of Arizona Herbarium for an official verdict from a botanist who specializes in Arizona plants. Once he has looked over our samples and identified them, the herbarium digitizes the specimens. So far, we’ve vouchered about 85% of the nearly 200 recorded plant species at the site.

For the most part, we’ve been doing a pretty great job identifying our plant species, considering none of us really identify as botanists. We’ve also had some fun surprises along the way, though! For example, for forty years, we thought we had two species of Acacia at the site: 1) whitethorn acacia, Acacia constrica (now Vachellia constricta), and 2) catclaw acacia, Acacia greggii (now Senegalia greggii). As it turns out, however, we’ve probably never had the catclaw acacia at the site! What we’ve been calling A. greggii is actually a species of mimosa, Mimosa aculeaticarpa.

This process has also made us more attentive to the plants surrounding us at the site. It was only at the last census that we noticed a large bush/small tree and realized that it was our first (and maybe only) desert willow tree, Chilopsis linearis, at the site. It was hard to believe we’d never noticed one of the biggest plants at the site as being different, but since it was just outside of a plot, there had never really been a reason to notice.

Chilopsis-linearis_-_Desert_Willow_94c403dc-2de2-48c1-82a9-1f18a6a5d398_1024x1024

Example of a desert will tree, Chilopsis linearis

We’re looking forward to more surprises in the future, though we still dread having to figure out how to make these changes in the database!

 

Community change: fast or slow?

January 7, 2018 by

In a previous post, I talked about a major trend over time at the Portal site: the slow and steady increase in shrub cover, which has gradually replaced the grassy landscapes of the 1970s. I also mentioned that this was accompanied by changes in the rodent species we caught: in the 1970s and 80s when shrub cover was still low, we found more individuals of grassland-loving species, whereas in recent years we see more individuals of shrub-loving species now that shrub cover is high. This seems like a straightforward story—species that love shrubs will become more abundant as shrubs cover more of the landscape—but it turns out there’s more to it. In this post, I’m going to describe how three pieces of evidence led me to questions about the dynamics of community change over time, and the new method I used to quantify it.

Evidence #1: the gradual increase in shrub cover + the change in species composition of the rodent community. Shrub cover has increased at least 3-fold since the time the Portal Project was established in 1977. Published studies have also noted that rodent species typical of arid grasslands have declined during that time (such as the banner tail kangaroo rat Dipodomys spectabilis and silky pocket mouse Perognathus flavus), while rodent species typical of arid shrubland have increased their populations (such as Merriam’s kangaroo rat Dipodomys merriami and desert pocket mouse Chaetodipus penicillatus). The obvious prediction is that the change in rodent species was caused by the change in shrub cover: however, so far all we have is a correlation, we would need more information to infer causation.

This slideshow requires JavaScript.

Evidence #2: the high mortality of rodents that occurred after a dramatic sheet flood swept through the Portal Project, during the monsoon season of August 1999. This is illustrated by the graph below, showing the relatively stable number of rodents at the site over the year and a half leading up to the storm, followed by their sharp decline.

Flood decline

Total rodent capture numbers over time: showing sharp decline after a flood in August 1999

Evidence #3: Populations quickly bounced back after the sheet flood of 1999—but things were different. Within two months, researchers were catching numbers of rodents comparable to those before the flood. However, the community of rodents that re-assembled after the flood was not the same community that had existed before. While the identity of species present remained approximately the same, the relative abundances of these species changed drastically. Most notably, two species of pocket mice which had been rare up to this time (desert pocket mouse and Bailey’s pocket mouse, Chaetodipus penicillatus and C. baileyi respectively), rapidly increased their populations to become numerical dominants in the community. (Details of the flood and its effects can be found in Thibault and Brown (2008)).

The first chapter of my dissertation explores how these three pieces of evidence–an increase in shrub cover, an extreme weather event, and a shift in rodent community structure–are connected. Piecing these together, we have slow change in habitat that we believe is impacting the rodent community, a disturbance that proved catastrophic for the rodents at Portal, and a dramatic change in community structure. How can we put these together to describe the dynamics of this rodent community overall?

Did the transition from grass-loving to shrub-loving rodent communities occur slowly, following the slow change in habitat, or in a series of quick bursts (one of which happened in August 1999)? This is an important distinction to make, because it provides information about the mechanisms behind community change. If the rodent community is changing slowly with habitat change, this indicates that habitat is driving rodent community change.

However, if change occurs in quick bursts instead, it implies that habitat change is not the most important driver, and perhaps some sort of trigger is needed to facilitate change (like a flood). Furthermore, if we want to predict how this community (and others) will change in the future, we need to know when/if change tends to happen slowly vs. quickly. These concepts apply not only to the rodents at Portal, but to broad questions about community structure for many taxa.

The Research Question: is change in the Portal rodent community slow or fast? The graph below shows examples of patterns I might see in the rodent data if the community has been changing gradually, compared to changing through a series of quick bursts. Unfortunately, measuring how populations of 21 rodent species have changed over a span of 40 years, and in relation to one another, is not easy.

Dynamics

Gradual, or discrete?

The Method: On a suggestion from my colleague Dave Harris, I decided to borrow a technique from the field of document analysis, called Latent Dirichlet Allocation (LDA for short). Document analysts use LDA to look for patterns of words within documents. The algorithm identifies words that tend to be found together in a document (and the relative proportions of those words), and infers “topics” from those patterns. Each document can then be described in terms of the “topics” it represents. I co-opted this method to analyze patterns of species within sampling events. Given a table of counts of my 21 species captured in each sampling event (436 events in my case), LDA returns an estimate of which species are found together in specific proportions. I called these community-types instead of the “topics” used in document analysis. LDA then estimates how each sampling event is partitioned into these community-types. The result is that instead of describing each sampling event by listing how many of each of the 21 rodent species were caught, we can describe it in terms of a small number of community-types. For example, one of our samples might be that in January of 2015 we captured 35 Merriam’s kangaroo rats, 1 pack rat, 6 desert pocket mice, 6 Bailey’s pocket mice, 5 grasshopper mice, and 12 cactus mice. After running this data point (with all 435 other data points) through LDA, this information is transformed to say that the collection of animals caught in January 2015 was 80% similar to community-type 1 and 20% similar to community-type 2. (For a much more thorough description of the LDA method applied to ecological data, see Valle et al 2014).

The crucial part for answering my question: I am interested in measuring dynamics over time, so the part of the LDA results I am interested in is the description of each sampling event in terms of the community-types. The results of my LDA analysis indicated that the Portal rodent data set is best described with four community-types. In the graph below, I plotted the prevalence of these four community-types at each sampling event, illustrated with four different colors. When a color has a value near 1.00 (along the vertical, y-axis), for example as the “light blue” community does near the beginning (left) of the time series (x-axis), it means the rodent community captured at those sampling times was very similar to the community-type represented by that color.

LDA results

Results of LDA analysis. Four colors represent four community-types.

Description of the results: The first thing that jumps out is that the four rodent community-types display different dynamics at different times. There are periods where the dynamics appear to be stable: for example the light blue community-type remains close to 1.0 from approximately 1977-1984, and the dark blue community-type is likewise close to 1.0 from 1990-1999. The dynamics from 2000-2010 are complicated: dark blue, gold, and gray community-types take turns as the most prominent community-type at different times (remember that the value on the y-axis represents how similar the rodent sample was to the given community-type at that time). Starting in 2010 there is a strong seasonal signal in the dynamics: the dark blue community-type is close to 1.0 during winters, and the gray community-type is close to 1.0 during summers.

Interpretation: The prediction I mentioned earlier, that change in the rodent community from one type to another might be gradual and in line with the gradual change in habitat from 1977-present, is not supported by our LDA analysis. Some sections of the time series seem to show gradual, linear change, but not the entire time series overall. Second, there are moments when there seems to be a rapid change from one community-type to another. For example, in the 1990s the rodent community samples were most similar to the dark blue community-type, until late 1999 when it was replaced by the gold and gray community-types. This fits with the evidence from Thibault and Brown 2008: the catastrophic flood in August 1999 that triggered an overhaul of the rodent community.

What’s next: This analysis hasn’t answered all the questions about how the rodent community at the Portal Project has changed over 40 years. But it’s a starting point, a big-picture view that we can use to identify different types of dynamics occurring at different times, and start to dig deeper into the causes of these dynamics. We know that communities and ecosystems all over the world are currently changing; understanding the factors that might be driving these changes is a huge step toward being able to predict and even manage future change, and the Portal Project is an example of how long-term studies with a lot of data can be used to explore these questions.

For more information: My paper describing the LDA analysis in full is available online as a preprint: “Long-term community change through multiple rapid transitions in a desert rodent community” at https://doi.org/10.1101/163931

Special thanks to Joan Meiners for editing help on this post.

Literature cited:
Thibault, K.M., and J.H. Brown. (2008). Impact of an extreme climatic event on community assembly. Proceedings of the National Academy of Sciences 105, 3410–3415.

Valle, D., B. Baiser, C.W. Woodall, and R. Chazdon. (2014). Decomposing biodiversity data using the Latent Dirichlet Allocation model, a probabilistic multivariate statistical method. Ecology Letters 17, 1591–1601.

Data Analysis with the portalr Package

December 21, 2017 by

So you’ve read several posts about the Portal site and have even gone to the official GitHub repo for the data, but it still seems pretty intimidating to handle and do analyses on…

Never fear! The Weecology lab hears your concerns and we are actively working on a software package to smooth out the process. You can check out the project on GitHub.

FAQ

Q: What is it exactly?

A: The portalr project is a software package for the R programming language (http://cran.r-project.org). R is one of the most popular languages for ecology, statistics, and data science; it also has a large open-source community that creates free add-on packages to extend the base functionality.

Q: How do I get the portalr package?

A: There are some basic instructions on the repo page, but in short, the package is still in development and therefore not yet uploaded to CRAN (the comprehensive R archive network). If you are unfamiliar with installing a package from GitHub, the easiest approach is to first install the devtools package, and then use one of its functions to install portalr from GitHub:

install.packages("devtools")
devtools::install_github("weecology/portalr")

Q: What can I do with the package?

A: Several different things! Mainly, it is designed to be a general-purpose interface to the Portal data for R users. It allows you to download the latest iteration of data from the data repo, summarize the data in different ways (e.g. by time, by space, by treatment), and integrate different data sources (e.g. rodents, plants, ants, weather).

Q: That sounds great! How do I get started with that?

A: Well, the package is still under development, but check out the demo below, and feel free to send us comments and suggestions (preferably as an issue here). 👇

Demo

Initial setup

Load in the packages we’re going to use for data manipulation and plotting:

library(tidyverse)
library(cowplot)
library(portalr)

Obtaining the data

To make sure we don’t unnecessarily download the data, we first check whether it might already exist, and if it does, whether the data matches the latest version on the GitHub repo:

# use current folder to store downloaded data
my_path <- "."
rodent_file <- file.path("PortalData", "Rodents", "Portal_rodent.csv")
path_to_rodent_file <- FullPath(rodent_file, my_path)

# check if we already have the latest data
if(!file.exists(path_to_rodent_file) ||
   observations_are_new(base_folder = my_path))
{
  download_observations(base_folder = my_path)
}

Next, we read in the various data tables:

rodent_data_all <- loadData(path = my_path)
print(summary(rodent_data_all))
##                Length Class      Mode
## rodent_data    29     data.frame list
## species_table   8     data.frame list
## trapping_table  6     data.frame list
## newmoons_table  4     data.frame list
## plots_table     4     data.frame list

Rodent Abundances

The first table that we loaded (rodent_data_all$rodent_data) is a record of the observed macrofauna, including rodents, but also other taxa. We first filter the data for missing, unindentified, incomplete, or otherwise erroneous data:

rodent_data_all$rodent_data %>%
  remove_suspect_entries() %>%  
  process_unknownsp(rodent_data_all$species_table, TRUE) %>%
  remove_incomplete_censuses(rodent_data_all$trapping_table, FALSE) %>%
  {.} -> rodent_data

Next, we write a function to summarize the abundances for each species within each sampling trip:

summarize_abundance <- function(rodent_data)
{
return(rodent_data %>%
         mutate(species = factor(species)) %>%
         group_by(period) %>%
         do(data.frame(x = table(.$species))) %>% 
         ungroup() %>%
         select(period, species = x.Var1, abundance = x.Freq)
  )
}
rodent_abundance <- summarize_abundance(rodent_data)

Finally, we want to add the dates of each sampling trip (currently recorded as an index in the period column), as well as the scientific names for each species (currently recorded as a two-letter species code in the species column):

join_census_date <- function(rodent_abundance, newmoons_table)
{
  return(rodent_abundance %>%
           left_join(select(newmoons_table, "period", "censusdate"),
                     by = "period") %>%
           mutate(census_date = as.Date(censusdate))
  )
}
join_scientific_name <- function(rodent_abundance, species_table)
{
  return(rodent_abundance %>%
           left_join(select(species_table, "species", "scientificname"), 
                     by = "species") %>%
           rename(scientific_name = scientificname)
  )
}

rodent_abundance %>%
  join_census_date(rodent_data_all$newmoons_table) %>%
  join_scientific_name(rodent_data_all$species_table) %>%
  select(census_date, scientific_name, abundance) %>%
  {.} -> rodent_abundance

print(summary(rodent_abundance))
## census_date                        scientific_name   abundance      
## Min.   :1977-07-16   Baiomys taylori         : 438   Min.   :  0.000  
## 1st Qu.:1987-05-28   Chaetodipus baileyi     : 438   1st Qu.:  0.000  
## Median :1996-06-02   Chaetodipus hispidus    : 438   Median :  0.000  
## Mean   :1997-03-06   Chaetodipus intermedius : 438   Mean   :  6.273  
## 3rd Qu.:2007-06-16   Chaetodipus penicillatus: 438   3rd Qu.:  5.000  
## Max.   :2017-11-18   (Other)                 :7008   Max.   :285.000  
##                      NA's                    : 438

Plot

Finally, let’s create our plot of species abundances over time:

my_plot <- ggplot(rodent_abundance, 
                  aes(x = census_date, y = abundance)) + 
  geom_line() + 
  facet_wrap(~scientific_name, scales = "free_y", ncol = 3) + 
  xlab("Date") + 
  ylab("Abundance") + 
  scale_x_date(breaks = seq(as.Date("1977-01-01"), 
               to = as.Date("2018-01-01"), "+5 years"), 
  date_labels = "%Y", 
  limits = as.Date(c("1977-01-01", "2018-01-01"))) + 
  theme_cowplot() + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5), 
        legend.position = "bottom", legend.justification = "center", 
        strip.text.x = element_text(size = 10))

print(my_plot)

Conclusion

So that was just one way of viewing the rodent abundance data, but we plan to include many such examples with the `portalr` package. Our goal is to help everyone get up to speed quickly with doing various analyses with the package, and to be able to use all of the different data sources effectively. Again if you have questions or suggestions, please feel free to drop us a line at the GitHub issues page.

From field to repo – Portal data

December 4, 2017 by

Whenever we get new Portal data, we want to update our database as quickly as we can, without sacrificing data quality by adding data with errors or messing up the existing database when we try to update it. And, we want to make sure the process is reproducible and open, so that anyone can see what we do to maintain the data. This helps us keep the process consistent as new weecologists take over managing the data, and it lets anyone who wants to use the data understand how it’s been handled. We achieve these lofty goals through a combination of good old-fashioned record keeping and high-tech version control and open access through our GitHub repository.

All of our plant and rodent data begin on a paper datasheet that we fill in in the field. We keep those datasheets in binders, forever. We also scan them and archive digital copies. As a brand-new ecologist, I love these binders – going through them, I get to see forty years of fellow rodent enthusiasts, being enthusiastic about rodents! Beyond their sentimental value, these hard copies let us go back and quadruple-check for errors and notes whenever we find an anomaly in the data.

IMG_1806

The Portal rodent data cache

When we come back from a Portal trip, two different weecologists enter a copy of the data into an Excel workbook. Double-entry lets us filter out the inevitable typo or several. We enter the data into a preformatted Excel template, which uses data validation functions to catch especially strange typos. If we tried to record an invalid entry – a kangaroo rat ten times too big, for example – the datasheet would prompt us to double-check our work.

From there, we proof the data – check for errors – using a series of R scripts. The scripts are all continuously saved to our GitHub repository, which helps us maintain consistency and openness throughout the process. We use the scripts to check for typos and pick up on common errors in the new data, like forgetting to note that a rodent was a brand-new capture. Once the new data is cleaned up, we also check for discrepancies between the new data and our old records. For example, sometimes a rodent was identified as Dipodomys ordii in one census and Dipodomy merriami in another. Sometimes we can look at old data and resolve these discrepancies. If we can’t, we make a note to look very carefully at that rodent the next time we see it. Whenever we find a contradiction or make a change to the data, we keep a note of it in a notebook. If we change old records, the change is also recorded on GitHub.

Once the new records are clean and agree with old records, we go to add the new census to the master database. This is a potentially dangerous step: it’s pretty easy to mess up a dataset by introducing something tiny, like a comma in the wrong place. We use GitHub to protect the master database from catastrophic errors. Whoever has cleaned the new data submits a “pull request” on GitHub, which is essentially a request to make a set of changes to the master version of the database. GitHub lets us compare the old and new databases, and highlights the potential changes. We can make sure that the only changes being made are the ones we want to make before we approve the pull request. And even if something were to go wrong here, GitHub also allows us to revert to earlier versions of the repository.

At this stage, we enlist a nifty bot called Travis to run a final quality check and streamline updates to the whole database. When somebody opens a pull request to add rodent data, Travis automatically runs a set of scripts to make sure that there are no bugs in any changes we made to our data cleaning code. If those tests check out, it proceeds to automatically update the rest of the data tables in the database: records of the dates we trapped, which new moon we’re on, which plots were trapped (in case weather, or some other circumstance, kept us from trapping some plots) and which experimental treatments applied to which plots at the time. We also maintain weather records, which Travis automatically pipes in from the Portal weather station. Automatically updating all of these tables removes the possibility of introducing human error – and it’s much faster than having a human do it!

That’s it! New rodent reports come in about every four weeks. So if, like us, you have a burning desire to know what those rodents are up to – and if there are any more spectabs – it’s never a very long wait.

Morgan’s Favorite Portal Species

November 1, 2017 by

As part of our Portal 40th anniversary celebration, some of us will contribute our thoughts on our favorite species at Portal. We’ve already had several posts on the Banner-tailed Kangaroo Rat (a universally beloved species at the site). But, I have a confession, while I love Banner-tails, they are not my favorite species at Portal (cue collective gasp). No, my favorite species is the grasshopper mouse. We have two species of grasshopper mice at Portal, the Northern and the Southern Grasshopper Mouse (Onychomys leucogaster and Onychomys torridus).

They are similar in their biology and morphology. Both are small (usually less than 6 inches or 120-163 mm in length including the tail) and weigh less than 40 grams (or 0.088 lbs). Given their similarities, I like them equally well, so will simply refer to grasshopper mice generically for our purposes today.

Anyone who has interacted with a grasshopper mouse probably remembers the encounter. Grasshopper mice have sharp little teeth and love to use them. Keeping an eye on the front end wouldn’t be that hard if you didn’t also have to also keep a sharp eye on the back end. The teeth are just a distraction from the fact they are trying to coat you  with liquidy, yellowish diarrhea. Oh, and did I mention that grasshopper mice REEK. Yes, I do mean reek. Their oily, acrid scent curdles the nose hairs and lingers after the little rodent is gone (probably because they managed to smear some poo on you in retribution before they headed off).

So right about know, you’re probably wondering why this reeking, vicious little rodent is my favorite species at Portal (or you’re wondering what this says about my personality).  With so many amazing rodents to choose from at Portal, what makes the grasshopper mouse so special? The reason grasshopper mice are notably more aggressive than our other species is that grasshopper mice are predators – yes, predators. They will eat seeds when resources get scarce and cache seeds in their burrows (Ruffer 1965), but they actively hunt insects, small rodents, arthropods, and even reptiles. They even hunt scorpions – see for yourself. The video below shows in sequence an adult, a subadult, and a juvenile attacking a scorpion. The adult knows to chew off the stinger quickly. The younger ones….well, it’s definitely a more difficult experience for them, though they eventually get their meal.

Why can these mice withstand scorpion stings? Without getting into sodium ion channel-level detail , basically they have a special protein that binds to the scorpion’s neurotoxin that changes how it works (Rowe and Rowe 2008)– as a result not only doesn’t the sting hurt, it actually ends up numbing the area of the sting (Rowe et al 2013).

Grasshopper mice also have surprising social relationships. There are a variety of reports that male and female grasshopper mice form strong pair-bonds and that both sexes participate equally in offspring care (McCarty and Southwick, 1977) and make the nest burrows together (Ruffer 1965). Grasshopper mice have a calling behavior using sounds that are almost ultra-sonic (Hafner and Hafner 1979). Though members of a family group have similar calls, every individual has unique call characteristics, which means that Grasshopper mice may be able to use these calls to communicate with family members over long distances (Hafner and Hafner 1979). When they call, they stand on their hind legs and throw their heads back:

Some mammalogists cannot see past the reeking, bitty little animal with the magical stinking poo that seems to get on you no matter how hard you try to avoid it. But when I see a grasshopper mouse, I see a little mouse who thinks it’s a coyote. I see some cool evolution at play that takes a normal mouse and turns it into a scorpion-resistant killing machine. I also see a brave little mouse fearlessly taking on a scary world. There’s something about it that just makes me smile. And then I go get some hand sanitizer.

 

Scientific Studies Cited in this Post:

Hafner, M.S., and D.J. Hafner. 1979. Vocalizations of Grasshopper Mice (Genus Onychomys). Journal of Mammalogy 60:85-94

McCarty, R., and C. H. Southwick. 1977. Patterns of parental care in two cricetid rodents, Onychomys torridus and Peromyscus leucopus. Animal Behaviour 25:945–948.

Rowe, A. H., and M. P. Rowe. 2008. Physiological resistance of grasshopper mice (Onychomys spp.) to Arizona bark scorpion (Centruroides exilicauda) venom. Toxicon 52:597–605.

Rowe, A. H., Y. Xiao, M. P. Rowe, T. R. Cummins, and H. H. Zakon. 2013. Voltage-Gated Sodium Channel in Grasshopper Mice Defends Against Bark Scorpion Toxin. Science 342:441–446.

Ruffer, D. G. 1965. Burrows and Burrowing Behavior of Onychomys leucogaster. Journal of Mammalogy 46:241–247.

 

 

How do people use the Portal Data?

October 27, 2017 by

Every so often, someone asks me for a Portal reading list to come up to speed on what we know about the site. This seems like a simple questions, but it is actually pretty difficult. We define a “Portal Paper” as a paper using data collected at the study site (whether or not Portal Project people were involved), or data collected near the study site if that data was collected by the project or with substantive assistance from our project. Over the years, the site has contributed to over 120 papers and book chapters (our current estimate is 123, but we still find older papers that we didn’t know existed).

How the Portal Data is used has been changing in recent years. Historically, most papers were by people affiliated with the group. As we’ve blogged about before, starting in 2009 we have been working on making our data openly available through  number of venues. We post all of our data on the Portal GitHub Repo and we have also published two Data Papers through Ecology’s Ecological Archives. The nice thing about Data Papers is that they are indexed and cited just like regular scholarly papers, which is important because it allows us to 1)  document that the Portal Project is a valuable scientific resource to the community (which theoretically may be helpful on grant applications) and 2) let us here at the Portal Project keep informed of results coming out of the site that we’re not involved in.  So, how are scientists, external to our group, using our openly available data? Google Scholar lists 22 citations between the 2 data papers. (For the non-academics, citing other papers is in our own papers is an important part of scientific publishing. It allows us to give credit to those whose ideas, data, or methods we are working with. It also allows to us provide proof or support that statements we make in our papers are supported by things other people have been finding. Google Scholar is a database that keeps track of these citations). Here’s the breakdown of what Google Scholar says has been citing our Data Papers:

RplotAll the site focused research (papers that use our data as the primary focus of their analysis) was done by  or in collaboration with someone affiliated with the project. If other researchers are using our data, so far they tend to either use it as part of a meta-analysis (i.e. as one of many data points in the analysis) or to make a figure for their statistical or conceptual paper that has an empirical example of what they are talking about. (Three papers cite a Data Paper for reasons that defy classification. After reading their paper I have no idea why they cited us)! This number of citations listed by Google Scholar is probably a little lower than the database’s actual use in papers because data citations for meta-analyses often get shoved off into the supplementary materials and are not indexed by Google Scholar as a result. Our usage in meta-analyses is probably higher, but it is unlikely that we’ve missed a paper focused solely on data from our site.

We are hoping to increase Portal’s usability and we have some things in the works, which we will blog about later, that we hope will make it easier for people to get the data they need to use Portal as part of their analyses. We love seeing the data used but know that the history of the site and all the manipulation changes can make it difficult to figure out how to extract the data you need.

 

The Portal Weather Station

October 20, 2017 by

For the history of the project, weather monitoring has always accompanied the collection of rodent, plant, and ant data. At first, this was done manually. Portalites from 1980 to 1989 measured rain in a rain gauge, and used something called a hygrothermograph to measure temperature and humidity.

hygrothermograph

Hygrothermograph

Then things started to get fancy. In 1989, an automated weather station was installed. This is the desert though, and leaving expensive toys out in the rain, dust and lightning takes it’s toll.

Sunset in the desert jungle

At least the lightning storms leave us with some nice scenery after they try to blow up our weather station.

All things considered, our weather stations have stood up pretty well. The first lasted from 1989 until 2002. And the station from 2002 is still limping along, although it’s had its moments (it tends to have a bit of a tantrum after being struck by lightning). We connected to the dataloggers for those stations directly. That is, as part of their monthly duties, the rodent RA has to connect to the datalogger, download the data, and bring it back to the lab for checking and appending to the database.

Anticipating the 2002 station’s impending demise, in August 2016 we upgraded to a new station, and took the opportunity to make some improvements.

2016 station

The majestic new station

Of course we continue to collect data on precipitation, temperature and humidity. But we’ve also gotten to add a wind sensor (wind speed and direction), pyranometer (solar radiation), and barometer (atmospheric pressure). Having these additional data means that we can also calculate things like evapotranspiration, sunshine hours, and windchill. We have also added a new program to collect fine-scale precipitation data during storms. When a precipitation event begins, the datalogger begins recording total precipitation every 5 minutes until the storm ends.

The addition of a cellular modem is another major improvement. Rather than downloading it monthly in the field, we access the data remotely. The data trickle in to  our data repo whenever edits are made to trigger a new build, or at least once a week, and quality control happens automatically. Our station has a Wunderground account (from whence the fancy little widget in the sidebar comes). And we’ve mounted the phenocam (featured in an earlier post, and another widget in the sidebar) to it.

Aside from just being darn cool, the upgrades have improved our data collection. We can see what the weather has been at our exact location at any time. That means we can know what to expect from the plants before we go for a census (as much as that’s possible). And we can communicate with the datalogger at anytime. If something is wrong with the weather station, we’ll know immediately. It may be possible to fix the problem remotely. If not, the rodent RA can plan to fix it while she’s down there, instead of discovering the problem at the site, waiting until the next month to fix it, and losing at least a month worth of data. And we can always send new programs to the datalogger, if we want to add new data tables make improvements.

Find our weather data, updated sub-weekly, on the Portal Data github repository.