Introduction to the analysis of messy data


Dr. Péter Sólymos

Alberta Biodiversity Monitoring Institute and the Boreal Avian Modelling Project (


Modeling biogeographical patterns and processes at large spatial and temporal scales often require integrating disparate data sources and processing large number of observations. Heterogeneities due to data integration can bias analyses when data sets are integrated inadequately, or can lead to information loss when filtered and standardized to common standards.

Analysts of big and messy data sets need to feel comfortable with manipulating the data, need a full understanding of the mechanics of the models being used (i.e. critically interpreting the results and acknowledging assumptions and limitations), and should be able to make informed choices when faced with methodological challenges.

In this workshop we will cover the following topics: data exploration and manipulation; overview of occupancy, abundance, presence-only models; dealing with nuisance variables and biases; quantifying uncertainty; and high-performance tools for efficient data processing and analysis.

The workshop will emphasize critical thinking and active learning through hands-on programming in R. We will use freely available and open-source R packages and publicly available data sets to demonstrate the manipulation and analysis of large scale and messy data sets. The expected outcome of the workshop is a solid foundation for further professional development via increased confidence in applying the methods.