2020 Changelog
Data Version: 2020 (available June 2022)
Citation:
Fink, D., T. Auer, A. Johnston, M. Strimas-Mackey, O. Robinson, S.
Ligocki, W. Hochachka, L. Jaromczyk, C. Wood, I. Davies, M. Iliff, L.
Seitz. 2021. eBird Status and Trends, Data Version: 2020; Released:
2021. Cornell Lab of Ornithology, Ithaca, New York. https://doi.org/10.2173/ebirdst.2020
Workflow and Code Changes
Spatiotemporal Partitioning
- CHANGED: The adaptive partitioning algorithm (AdaSTEM) now uses an
Icosahedron Gnomic projection that generates partitions with largely
conformal stixel boundaries across the globe.
- CHANGED: The temporal width of AdaSTEM partitions has been changed
from 30.5 days to 28 days.
Model Ensemble
- CHANGED: The percent above threshold (PAT) cutoff for 3km grid cells
to be reported as present has changed from 0.1 to 0.143, to accommodate
increased occurrence rates as a result of including hourly weather to
account for variation in detection rates.
Resident Methodology
- CHANGED: Residents now have a suite of independent settings designed
for species with strong spatiotemporal stationarity. These include the
following:
- Each stixel loads the full year of training and test data, not just
the 28 day window associated with the given stixel.
- The DAY predictor is encoding cyclically using sin and cosin
transformation to allow the model to wrap the year.
- The spatiotemporal grid sampling now seeks a maximum sample size of
65,000 checklists in a given stixel (for migrants this value is
5,000).
Data Products
- CHANGED: The count model prediction values for effort variables are
now set at 1 hour and 1 kilometer. Previously, the effort variables used
for the count model prediction were the same as those used for the
occurrence model, which sought to maximize detection by optimizing the
distance and duration effort variables to capture as much signal as
possible, up to 12 hours (6 hours in this version) and 10
kilometers.
- CHANGED: Zeroes in data products that are outside of the prediction
area for species (also known as assumed zeroes) now require, on average,
across the up-to 100 models in the ensemble, 0.5% of 3km grid cells
filled with at least 1 checklist for a given week to be reported as
zero. Previously, this was 0.1% of 3km grid cells. This has been
adjusted to offer a more appropriately conservative representation of
where absence can be assumed based on overall data volume.
- ADDED: Locations (3km grid cells) with less than a 0.5% mean site
selection probability are now masked out of the final data products and
reported as NA. Mean site selection probability is calculated weekly in
a species-agnostic AdaSTEM workflow that estimates the probability that
a location of a given habitat configuration will be visited in a given
region and season.
- ADDED: Spatial representations of predictive performance metrics and
other individual model-level summaries are being generated as 27km
GeoTIFFs for each week of the year. The spatialization is done by
assigning the stixel-level values to every 27km grid cell within the
stixel and then averaging across stixels to determine regional
metrics.
- FIXED: The Caspian Sea is now masked out of all data products.
- CHANGED: Raw test data that does not receive model predictions has
been removed from the calculation of predictive performance metrics.
Previously, this type of test data was used as a form of assumed absence
in the calculation of binary predictive performance metrics.
- ADDED: Predictions to 3km grid cells now include a standardization
of hourly weather within each individual model. The hourly weather
values set for prediction are based on a maximization of occurrence
estimates between the 80th and 90th percentiles.
- CHANGED: Calculation of individual model partial dependencies now
uses train out of bag data. Previously, train in bag data was used.
- ADDED: Predictor Importance and Partial Dependency products are now
included for both occurrence rate and count models. Previously, these
products were only available for the occurrence rate model.
- CHANGED: The time covariate used in the models, calculated as the
difference between the local checklist time and solar noon at the
checklist location, has been changed to use the temporal midpoint of the
checklist for the calculation. Previously, the time at the start of the
checklist had been used for this calculation.
- FIXED: The temporal centroid of individual models, used with
predictor importance and partial dependencies, has been changed to
represent the mean date of train in bag data. Previously, this was a
mean of all train, test, and all four weeks of 3km grid cell location
data.
- CHANGED: Regional habitat association charts are based on a weighted
summary of stixel-level predictor importance and partial dependence
estimates, with the weighting determined by the proportion of the region
covered by each stixel. Previously, stixel centroids were used to
determine the set of stixels contributing to a given region, with crude
approximations of the stixels as rectangles in lat-lon coordinates being
used to determine the overlap-based weighting. Now, the exact stixel
shape is used when calculating regional habitat associations, by
considering the exact set of 27km grid cells falling within each stixel,
to determine both the set of stixels used in habitat summarization and
the overlap-based weighting for a given region.
- CHANGED: Habitat and regional abundance and range statistical
summaries are now computed for all species, globally, using the Natural
Earth Data Admin 1 data for summarization.
Expert Review
- CHANGED: Animations are no longer being reviewed for resident
species.