Integrating and cleaning biodiversity data: Workflows to model ranges and merge associated ecological, phylogenetic, and trait information.


Cory Merow, University of Connecticut (

Matt Aiello-Lammens, Pace University

Robert P. Anderson, City College of New York

Brad Boyle, University of Arizona

Brian J. Enquist, University of Arizona

Jamie M. Kass, City College of New York, CUNY

Brian McGill, University of Maine

Brian Maitner, University of Arizona

Naia Moreuta-Holme, University of California, Berkeley

Andrew Kerkhoff, Kenyon College,

Jens-Christian Svenning, Aarhus University


Large biodiversity databases are emerging that allow biogeographic research to include larger spatial extents, higher spatial resolutions, and more taxonomic groups using a variety of occurrence data types.  In this workshop, we will use a series of vignettes to provide hands-on training for emerging technologies linking ecoinformatics tools for biogeography.  We will focus on modeling workflows, data-cleaning practices, and new modeling algorithms, with the collective aim of improving species’ range predictions.  We will use one of the largest botanical databases in the world, the Botanical Information and Ecology Network (BIEN;, which holds over 83 million records covering occurrence, range, community and trait data for all New World land plants.  We will illustrate a wide range of tools to access all these biodiversity data with the new R package RBIEN. This introduction will emphasize data-cleaning challenges, including protocols and tools to help users further refine data quality.

Next, these data will be integrated into a novel range-modeling workflow, via the R-based software Wallace, which facilitates flexible design of cutting-edge reproducible model building, evaluation, and visualization.  Wallace is a valuable resource for both beginners preferring point-and-click software and experienced R users (who can develop new modules that add novel and alternative functionalities).  Participants will conduct niche/distributional modeling analyses, with focus on the influence of different modeling decisions on predictions. Illustrations will include traditional Maxent models and recently developed ‘Minxent’ models that incorporate other occurrence data types with presence-only data.  We will then illustrate synthetic biodiversity and community analyses facilitated by RBIEN and Wallace and drawing from ~90,000 range maps publicly available from BIEN.

We will follow with a group discussion on further development of these tools to link trait, community, phylogenetic, and range data and determine how best to serve the research, conservation/management, and educational communities.