This vignette explores the ws_monitor
data model used throughout the PWFSLSmoke package to store and work with monitoring data.
The PWFSLSmoke package is designed to provide a compact, full-featured suite of utilities for working with PM 2.5 data used to monitor wildfire smoke. A uniform data model provides consistent data access across monitoring data available from different agencies. The core data model in this package is defined by the ws_monitor
object used to store data associated with groups of individual monitors.
To work efficiently with the package it is important to understand the structure of this data object and which functions operate on it. Package functions that begin with monitor_
, expect objects of class ws_monitor
as their first argument. (‘ws_’ stands for ‘wildfire smoke’)
Monitoring data will typically be obtained from an agency charged with archiving data acquired at monitoring sites. For wildfire smoke, the primary pollutant is PM 2.5 and the sites archiving this data include EPA, AirNow, AIRSIS and WRCC.
The data model for monitoring data consists of an R list
with two dataframes: data
and meta
.
The data
dataframe contains all hourly measurements organized with rows (the ‘unlimited’ dimension) as unique timesteps and columns as unique monitor deployments. The very first column is always named datetime
and contains the POSIXct
datetime in Coordinated Universal Time (UTC).
The meta
dataframe contains all metadata associated with monitor deployment sites and is organized with rows as unique monitor deployments and columns as site attributes. The following columns are guaranteed to exist in the meta
dataframe:
monitorID
– unique ID for each site-instrument combinationlongitude
– decimal degrees Eastlatitude
– decimal degrees Northelevation
– meters above sea leveltimezone
– Olson timezonecountryCode
– ISO 3166-1 alpha-2 codestateCode
– ISO 3166-2 alpha-2 code(The MazamaSpatialUtils package is used to assign timezones and state and country codes.)
Starting with version 1.0 of the package, the following additional columns (mostly for internal use) will always exist:
siteName
– familiar name for a monitoring sitecountyName
– county/province namemsaName
– US Census Bureau ‘Metropolitan/Micropolitan Statistical Area’agencyName
– agency responsible for collecting the datamonitorType
– broad instrument categories for E-Sampler, EBAM or BAM-1020siteID
– unique identifier for each siteinstrumentID
– sequential identifier for each instrument at a single siteaqsID
– AQS site identifier (often used as the siteID
)pwfslID
– PWFSL site identifier (used as the siteID
for temporary monitors)pwfslDataIngestSource
– identifier for the source of monitoring data (e.g. AIRNOW, AIRSIS, WRCC_DUMPFILE, etc.)telemetryAggregator
– data provider for temporary monitors (e.g. ‘wrcc’ or ‘usfs.airsis’)telemetryUnitID
– unique ID for each monitoring site used within the telemetry_aggregator
These additional columns of information are much more variable and, depending on the source of data, may include many missing values.
It is important to note that the monitorID
acts as a unique key that connects data with metadata. The monitorID
is used for column names in the data
dataframe and for row names in the meta
dataframe. So the following will always be true:
rownames(ws_monitor$meta) == ws_monitor$meta$monitorID
colnames(ws_monitor$data) == c('datetime', ws_monitor$meta$monitorID)
Example 1: Exploring ws_monitor
Objects
We will use the built-in “Northwest_Megafires” dataset and the monitor_subset()
function to subset a ws_monitor
object which we can then explore.
suppressPackageStartupMessages(library(PWFSLSmoke))
# Get some airnow data for Washington state in the summer of 2015
# NOTE: 'tlim' is interpreted as UTC unless we specify 'timezone'
monitor_subset(Northwest_Megafires,
N_M <-tlim = c(20150801, 20150831),
timezone = "America/Los_Angeles")
monitor_subset(N_M, stateCodes = 'WA')
WA <-
# 'ws_monitor' objects can be identified by their class
class(WA)
## [1] "ws_monitor" "list"
# Examine the 'meta' dataframe
dim(WA$meta)
## [1] 55 19
rownames(WA$meta)
## [1] "530330017_01" "530330080_01" "530050002_01" "530330024_01" "530330057_01"
## [6] "530332004_01" "530530029_01" "530530031_01" "530610005_01" "530611007_01"
## [11] "530630047_01" "530670013_01" "530531018_01" "530272002_01" "530310003_01"
## [16] "530730015_01" "530251002_01" "530650004_01" "530010003_01" "530750006_01"
## [21] "530750003_01" "530331011_01" "530210002_01" "530330037_01" "530710005_01"
## [26] "530750005_01" "530150015_01" "530470009_01" "530370002_01" "530090013_01"
## [31] "530610020_01" "530070010_01" "530770015_01" "530650002_01" "530470010_01"
## [36] "530770009_01" "530570015_01" "530130002_01" "530030004_01" "530110022_01"
## [41] "530579999_01" "530639997_01" "530299999_01" "530639996_01" "530410004_01"
## [46] "530770016_01" "530090015_01" "530450007_01" "530470013_01" "530570011_01"
## [51] "530350007_01" "530070011_01" "530330030_01" "530110024_01" "530090017_01"
colnames(WA$meta)
## [1] "monitorID" "longitude" "latitude"
## [4] "elevation" "timezone" "countryCode"
## [7] "stateCode" "siteName" "agencyName"
## [10] "countyName" "msaName" "monitorType"
## [13] "siteID" "instrumentID" "aqsID"
## [16] "pwfslID" "pwfslDataIngestSource" "telemetryAggregator"
## [19] "telemetryUnitID"
# Examine the 'data' dataframe
dim(WA$data)
## [1] 721 56
colnames(WA$data)
## [1] "datetime" "530330017_01" "530330080_01" "530050002_01" "530330024_01"
## [6] "530330057_01" "530332004_01" "530530029_01" "530530031_01" "530610005_01"
## [11] "530611007_01" "530630047_01" "530670013_01" "530531018_01" "530272002_01"
## [16] "530310003_01" "530730015_01" "530251002_01" "530650004_01" "530010003_01"
## [21] "530750006_01" "530750003_01" "530331011_01" "530210002_01" "530330037_01"
## [26] "530710005_01" "530750005_01" "530150015_01" "530470009_01" "530370002_01"
## [31] "530090013_01" "530610020_01" "530070010_01" "530770015_01" "530650002_01"
## [36] "530470010_01" "530770009_01" "530570015_01" "530130002_01" "530030004_01"
## [41] "530110022_01" "530579999_01" "530639997_01" "530299999_01" "530639996_01"
## [46] "530410004_01" "530770016_01" "530090015_01" "530450007_01" "530470013_01"
## [51] "530570011_01" "530350007_01" "530070011_01" "530330030_01" "530110024_01"
## [56] "530090017_01"
# This should always be true
all(rownames(WA$meta) == colnames(WA$data[,-1]))
## [1] TRUE
Example 2: Manipulating ws_monitor
Objects
The PWFSLSmoke package has numerous functions that can work with ws_monitor
objects, all of which begin with monitor_
. If you need to do something that the package functions do not provide, you can manipulate ws_monitor
objects directly as long as you retain the structure of the data model.
Functions that accept and return ws_monitor
objects include:
monitor_aqi()
monitor_collapse()
monitor_dailyStatistic()
monitor_dailyThreshold()
monitor_join()
monitor_nowcast()
monitor_reorder()
monitor_replaceData()
monitor_rollingMean()
monitor_scaleData()
monitor_subset()
monitor_subsetBy()
monitor_subsetByDistance()
monitor_trim()
These functions can be used with the magrittr package %>%
pipe as in the following example:
# Calculate daily means for the Methow Valley from monitors in Twisp and Winthrop
'530470009_01'
TwispID <- '530470010_01'
WinthropID <-
Methow_Valley_JulyMeans <- Northwest_Megafires %>%
monitor_subset(monitorIDs = c(TwispID,WinthropID)) %>%
monitor_collapse(monitorID = 'MethowValley') %>%
monitor_subset(tlim=c(20150701, 20150731), timezone = 'America/Los_Angeles') %>%
monitor_dailyStatistic(minHours = 18)
# Look at the first week
$data[1:7,] Methow_Valley_JulyMeans
## datetime MethowValley
## 1 2015-07-01 5.06875
## 2 2015-07-02 5.45625
## 3 2015-07-03 6.64375
## 4 2015-07-04 10.60625
## 5 2015-07-05 10.33750
## 6 2015-07-06 13.93750
## 7 2015-07-07 30.68542
The following code mixes use of package functions with direct manipulation of the ws_monitor
object.
# Use special knowledge of AirNow IDs to subset airnow data for Spokane county monitors
N_M$meta$monitorID[stringr::str_detect(N_M$meta$monitorID, "^53063")]
SpokaneCountyIDs <- monitor_subset(N_M, monitorIDs = SpokaneCountyIDs)
Spokane <-
# Apply 3-hr rolling mean
monitor_rollingMean(Spokane, 3, align = "center")
Spokane_3hr <-
# 1) Replace data columns with their squares (exponentiation is not supplied by the package)
Spokane_3hr
Spokane_3hr_squared <-$data[,-1] <- (Spokane_3hr$data[,-1])^2 # exclude the 'datetime' column
Spokane_3hr_squared
# NOTE: Exponentiation is only used as an example. It does not generate a meaningful result.
# Create a daily averaged 'ws_monitor' object
monitor_dailyStatistic(Spokane_3hr)
Spokane_daily_3hr <-
# 2) Check out the correlation between monitors (correlation is not supplied by the package)
Spokane_daily_3hr$data[,-1] # exclude the 'datetime' column
data <-cor(data, use = 'complete.obs')
## 530630047_01 530639997_01 530639996_01
## 530630047_01 1.0000000 0.9148673 0.9159997
## 530639997_01 0.9148673 1.0000000 0.9284175
## 530639996_01 0.9159997 0.9284175 1.0000000
This introduction to the ws_monitor
data model should be enough to get you started. Lots more documentation and examples are available in the package documentation.
Best of luck exploring and understanding PM 2.5 values associated with wildfire smoke!