library(dplyr)
library(tibble) #For printing output
library(stringr)
library(swfscAirDAS)
This document introduces you to the swfscAirDAS package. This package is intended to standardize and streamline processing of aerial survey DAS (AirDAS) data collected using the PHOCOENA, TURTLE, or CARETTA programs from the Southwest Fisheries Science Center. In DAS data (and thus AirDAS data), an event is only recorded when something changes or happens, which can complicate processing. Thus, the main theme of this package is enabling analyses and downstream processing by 1) determining associated state and condition information for each event and 2) pulling out event-specific information from the Data columns. The following examples generally use the default function arguments; see the documentation for the individual functions for a complete description of all processing options.
This package includes a sample AirDAS file, which we will use in this document. This file is formatted as a DAS file created using the TURTLE program
<- system.file("airdas_sample.das", package = "swfscAirDAS")
y head(readLines(y))
#> [1] " 1C 123059 040915 N39:14.13 W123:09.38 Recorder: dd "
#> [2] " 2C 123059 040915 N39:14.13 W123:09.38 Not recording molas "
#> [3] " 3T.123059 040915 N39:14.13 W123:09.38 T1 "
#> [4] " 4V.123059 040915 N39:14.13 W123:09.38 g g g g g "
#> [5] " 5P.123059 040915 N39:14.13 W123:09.38 aa bb cc dd "
#> [6] " 6A.123059 040915 N39:14.13 W123:09.38 650 100 "
The first step in processing AirDAS data is to ensure that the DAS
file has expected formatting and values, which you can do using the
airdas_check
function. The checks performed by this
function are detailed in the function documentation, which can be
accessed by running ?airdas_check
. If you aren’t sure of
the file type (format) of your data, you can check the format PDFs.
These PDFs are available online at https://smwoodman.github.io/swfscAirDAS/, or see
?airdas_format_pdf
for how to access a local copy.
# Code not run
<- airdas_check(y, file.type = "turtle", skip = 0, print.transect = TRUE) y.check
Once QA/QC is complete and you have fixed any data entry errors, you can begin to process the AirDAS data. The backbone of this package is the reading and processing steps: 1) the data from the DAS file are read into the columns of a data frame and 2) state and condition information are extracted for each event. This means that after processing, you can simply look at any event (row) and determine the Beaufort, viewing conditions, etc., at the time of the event. All other functions in the package depend on the AirDAS data being in this processed state.
# Read
<- airdas_read(y, file.type = "turtle", skip = 0)
y.read head(y.read)
#> Event EffortDot DateTime Lat Lon Data1 Data2 Data3 Data4
#> 1 C FALSE 2015-04-09 12:30:59 39.2355 -123.1563 Reco rder: dd <NA>
#> 2 C FALSE 2015-04-09 12:30:59 39.2355 -123.1563 Not recor ding molas
#> 3 T TRUE 2015-04-09 12:30:59 39.2355 -123.1563 T1 <NA> <NA> <NA>
#> 4 V TRUE 2015-04-09 12:30:59 39.2355 -123.1563 g g g g
#> 5 P TRUE 2015-04-09 12:30:59 39.2355 -123.1563 aa bb cc dd
#> 6 A TRUE 2015-04-09 12:30:59 39.2355 -123.1563 650 100 <NA> <NA>
#> Data5 Data6 Data7 EventNum file_das line_num file_type
#> 1 <NA> <NA> <NA> 1 airdas_sample.das 1 turtle
#> 2 <NA> <NA> <NA> 2 airdas_sample.das 2 turtle
#> 3 <NA> <NA> <NA> 3 airdas_sample.das 3 turtle
#> 4 g <NA> <NA> 4 airdas_sample.das 4 turtle
#> 5 <NA> <NA> <NA> 5 airdas_sample.das 5 turtle
#> 6 <NA> <NA> <NA> 6 airdas_sample.das 6 turtle
# Process
<- airdas_process(y.read, trans.upper = TRUE)
y.proc head(y.proc)
#> Event DateTime Lat Lon OnEffort Trans Bft CCover Jelly
#> 1 C 2015-04-09 12:30:59 39.2355 -123.1563 FALSE <NA> NA NA NA
#> 2 C 2015-04-09 12:30:59 39.2355 -123.1563 FALSE <NA> NA NA NA
#> 3 T 2015-04-09 12:30:59 39.2355 -123.1563 TRUE T1 NA NA NA
#> 4 V 2015-04-09 12:30:59 39.2355 -123.1563 TRUE T1 NA NA NA
#> 5 P 2015-04-09 12:30:59 39.2355 -123.1563 TRUE T1 NA NA NA
#> 6 A 2015-04-09 12:30:59 39.2355 -123.1563 TRUE T1 NA NA NA
#> HorizSun VertSun HKR Haze Kelp RedTide AltFt SpKnot ObsL ObsB ObsR Rec
#> 1 NA NA <NA> FALSE FALSE FALSE NA NA <NA> <NA> <NA> <NA>
#> 2 NA NA <NA> FALSE FALSE FALSE NA NA <NA> <NA> <NA> <NA>
#> 3 NA NA <NA> FALSE FALSE FALSE NA NA <NA> <NA> <NA> <NA>
#> 4 NA NA <NA> FALSE FALSE FALSE NA NA <NA> <NA> <NA> <NA>
#> 5 NA NA <NA> FALSE FALSE FALSE NA NA aa bb cc dd
#> 6 NA NA <NA> FALSE FALSE FALSE 650 100 aa bb cc dd
#> VLI VLO VB VRI VRO Data1 Data2 Data3 Data4 Data5 Data6 Data7 EffortDot
#> 1 <NA> <NA> <NA> <NA> <NA> Reco rder: dd <NA> <NA> <NA> <NA> FALSE
#> 2 <NA> <NA> <NA> <NA> <NA> Not recor ding molas <NA> <NA> <NA> FALSE
#> 3 <NA> <NA> <NA> <NA> <NA> T1 <NA> <NA> <NA> <NA> <NA> <NA> TRUE
#> 4 g g g g g g g g g g <NA> <NA> TRUE
#> 5 g g g g g aa bb cc dd <NA> <NA> <NA> TRUE
#> 6 g g g g g 650 100 <NA> <NA> <NA> <NA> <NA> TRUE
#> EventNum file_das line_num file_type
#> 1 1 airdas_sample.das 1 turtle
#> 2 2 airdas_sample.das 2 turtle
#> 3 3 airdas_sample.das 3 turtle
#> 4 4 airdas_sample.das 4 turtle
#> 5 5 airdas_sample.das 5 turtle
#> 6 6 airdas_sample.das 6 turtle
Once you have processed the AirDAS data, you can easily access a variety of information. For instance, you can look at the different Beaufort values that occurred in the data, or filter for specific events to get the beginning and ending points of each effort section.
# The number of events per Beaufort value
table(y.proc$Bft)
#>
#> 1 2
#> 58 19
# Filter for T/R and O/E events to extract lat/lon points
%>%
y.proc filter(Event %in% c("T", "R", "E", "O")) %>%
select(Event, Lat, Lon, Trans)
#> Event Lat Lon Trans
#> 1 T 39.23550 -123.1563 T1
#> 2 E 39.23400 -123.4047 T1
#> 3 R 39.23400 -123.4742 T1
#> 4 O 39.23283 -123.6033 <NA>
#> 5 T 39.22000 -123.6078 T2
#> 6 O 39.21700 -123.2417 <NA>
The swfscAirDAS
package does contain specific functions
for extracting and/or summarizing particular information from the
processed data. First is airdas_sight
, a function that
returns a data frame with pertinent sighting data extracted to their own
columns
<- airdas_sight(y.proc)
y.sight
%>%
y.sight select(Event, SightNo:TurtleTail) %>%
head()
#> Event SightNo Obs Angle SightStd Mixed SpCode GsTotal GsSp TurtleSize
#> 1 S 1 cc 67 TRUE FALSE mn 6 6 NA
#> 2 S 2 bb -60 TRUE FALSE mn 6 6 NA
#> 3 S 3 cc 38 TRUE FALSE mn 2 2 NA
#> 4 S 4 aa -50 TRUE FALSE mn 5 5 NA
#> 5 S 5 cc 49 TRUE FALSE bm 4 4 NA
#> 6 S 6 cc 42 TRUE TRUE gg 10 8 NA
#> TurtleDirection TurtleTail
#> 1 NA <NA>
#> 2 NA <NA>
#> 3 NA <NA>
#> 4 NA <NA>
#> 5 NA <NA>
#> 6 NA <NA>
In addition, you can chop the effort data into segments, and
summarize the conditions and sightings on those segments using
airdas_effort
and airdas_effort_sight
. These
effort segments can be used for line transect estimates using the
Distance software, species distribution modeling, or summarizing the
number of harbor porpoises on each transect, among other uses.
airdas_effort
chops continuous effort sections (the event
sequence from T/R to E/O events) into effort segments using one of
several different chopping methods: condition (a new effort segment
every time a condition changes), equal length (effort segments of equal
length), and section (each segment is a full continuous effort section,
i.e. it runs from a T/R event to an E/O event).
airdas_effort_sight
takes the output of
airdas_effort
and returns the number of sightings and
animals per segment for specified species codes.
Both functions return a list of three data frames: segdata,
sightinfo, and randpicks. Briefly, as these data frames and the
different chopping methodologies are described in depth in the function
documentation (?airdas_effort
and
?airdas_effort_sight
), segdata contains information about
each effort segment, sightinfo contains information about the sightings
on the segments, and randpicks contains information specific to the
‘equal length’ chopping method. These are separate functions to allow
the user more control over the filters applied to determine which
sightings should be included in the summaries (see
?airdas_effort
).
# Chop the effort every time a condition changes
<- airdas_effort(
y.eff method = "condition", seg.min.km = 0,
y.proc, dist.method = "greatcircle", conditions = c("Bft", "CCover"),
num.cores = 1
)<- airdas_effort_sight(y.eff, sp.codes = c("bm", "dc"))
y.eff.sight
head(y.eff.sight$segdata)
#> segnum section_id section_sub_id event transect file stlin
#> 1 1 1 1 T T1 airdas_sample.das 3
#> 2 2 1 2 T T1 airdas_sample.das 36
#> 3 3 2 1 R T1 airdas_sample.das 40
#> 4 4 2 2 R T1 airdas_sample.das 41
#> 5 5 2 3 R T1 airdas_sample.das 44
#> 6 6 3 1 T T2 airdas_sample.das 53
#> endlin lat1 lon1 DateTime1 lat2 lon2
#> 1 36 39.23550 -123.1563 2015-04-09 12:30:59 39.23400 -123.4033
#> 2 37 39.23400 -123.4033 2015-04-09 12:37:57 39.23400 -123.4047
#> 3 41 39.23400 -123.4742 2015-04-09 12:40:00 39.23333 -123.4914
#> 4 44 39.23333 -123.4914 2015-04-09 12:40:28 39.23317 -123.5143
#> 5 49 39.23317 -123.5143 2015-04-09 12:41:08 39.23283 -123.6032
#> 6 78 39.22000 -123.6078 2015-04-09 12:46:51 39.21683 -123.3605
#> DateTime2 mlat mlon mDateTime dist year month
#> 1 2015-04-09 12:37:57 39.23309 -123.2797 2015-04-09 12:34:28 21.2882 2015 4
#> 2 2015-04-09 12:38:00 39.23400 -123.4040 2015-04-09 12:37:58 0.1148 2015 4
#> 3 2015-04-09 12:40:28 39.23367 -123.4828 2015-04-09 12:40:14 1.4937 2015 4
#> 4 2015-04-09 12:41:08 39.23317 -123.5029 2015-04-09 12:40:48 1.9657 2015 4
#> 5 2015-04-09 12:43:43 39.23245 -123.5588 2015-04-09 12:42:25 7.6620 2015 4
#> 6 2015-04-09 12:54:04 39.21667 -123.4842 2015-04-09 12:50:27 21.3146 2015 4
#> day mtime maxdistBft maxdistCCover nSI_bm ANI_bm nSI_dc ANI_dc
#> 1 9 12:34:28 1 10 3 11 1 1
#> 2 9 12:37:58 2 10 0 0 0 0
#> 3 9 12:40:14 2 10 0 0 0 0
#> 4 9 12:40:48 2 20 0 0 0 0
#> 5 9 12:42:25 1 20 0 0 0 0
#> 6 9 12:50:27 1 20 3 13 0 0
head(y.eff.sight$sightinfo)
#> segnum mlat mlon Event DateTime Lat Lon
#> 1 1 39.23309 -123.2797 S 2015-04-09 12:31:35 39.23367 -123.1783
#> 2 1 39.23309 -123.2797 S 2015-04-09 12:31:47 39.23350 -123.1857
#> 3 1 39.23309 -123.2797 S 2015-04-09 12:31:56 39.23333 -123.1917
#> 4 1 39.23309 -123.2797 S 2015-04-09 12:32:20 39.23233 -123.2053
#> 5 1 39.23309 -123.2797 S 2015-04-09 12:32:30 39.23233 -123.2108
#> 6 1 39.23309 -123.2797 S 2015-04-09 12:32:42 39.23233 -123.2195
#> OnEffort Trans Bft CCover Jelly HorizSun VertSun HKR Haze Kelp RedTide
#> 1 TRUE T1 1 10 0 6 NA n FALSE FALSE FALSE
#> 2 TRUE T1 1 10 0 6 NA n FALSE FALSE FALSE
#> 3 TRUE T1 1 10 0 6 NA n FALSE FALSE FALSE
#> 4 TRUE T1 1 10 0 6 NA n FALSE FALSE FALSE
#> 5 TRUE T1 1 10 0 6 NA n FALSE FALSE FALSE
#> 6 TRUE T1 1 10 0 6 NA n FALSE FALSE FALSE
#> AltFt SpKnot ObsL ObsB ObsR Rec VLI VLO VB VRI VRO EffortDot EventNum
#> 1 650 100 aa bb cc dd g g g g g TRUE 9
#> 2 650 100 aa bb cc dd g g g g g TRUE 10
#> 3 650 100 aa bb cc dd g g g g g TRUE 12
#> 4 650 100 aa bb cc dd g g g g g TRUE 14
#> 5 650 100 aa bb cc dd g g g g g TRUE 15
#> 6 650 100 aa bb cc dd g g g g g TRUE 16
#> file_das line_num file_type SightNo Obs Angle SightStd Mixed SpCode
#> 1 airdas_sample.das 9 turtle 1 cc 67 TRUE FALSE mn
#> 2 airdas_sample.das 10 turtle 2 bb -60 TRUE FALSE mn
#> 3 airdas_sample.das 12 turtle 3 cc 38 TRUE FALSE mn
#> 4 airdas_sample.das 14 turtle 4 aa -50 TRUE FALSE mn
#> 5 airdas_sample.das 15 turtle 5 cc 49 TRUE FALSE bm
#> 6 airdas_sample.das 16 turtle 6 cc 42 TRUE TRUE gg
#> GsTotal GsSp TurtleSize TurtleDirection TurtleTail included
#> 1 6 6 NA NA <NA> FALSE
#> 2 6 6 NA NA <NA> FALSE
#> 3 2 2 NA NA <NA> FALSE
#> 4 5 5 NA NA <NA> FALSE
#> 5 4 4 NA NA <NA> TRUE
#> 6 10 8 NA NA <NA> FALSE
If you wanted to determine the number of humpback whales on each transect with a sighting angle between -60 and 60 degrees, for instance, you could do the following
# 'Chop' the effort by continuous effort section
<- airdas_effort(
y.eff.section method = "section", dist.method = "greatcircle", conditions = NULL,
y.proc, num.cores = 1
)$sightinfo <- y.eff.section$sightinfo %>%
y.eff.sectionmutate(included = ifelse(.data$SpCode == "mn" & abs(.data$Angle > 60), FALSE, .data$included))
<- airdas_effort_sight(y.eff.section, sp.codes = c("mn"))
y.eff.section.sight
1]] %>%
y.eff.section.sight[[mutate(transect_id = cumsum(.data$event == "T")) %>%
group_by(transect_id) %>%
summarise(dist_sum = sum(dist),
mn_count = sum(ANI_mn)) %>%
as.data.frame() #for printing format
#> transect_id dist_sum mn_count
#> 1 1 32.5244 20
#> 2 2 31.5482 18
In this example you could also simply group by the
transect
column, but that structure is not robust to
analyzing multi-year data sets where the same transects are flown
multiple times.
Comments
In AirDAS data, comments are a catch-all field, meaning they are used to record information that does not fit neatly into the AirDAS framework. For instance, users will often enter a comment indicating if they are or are not recording extra information such as mola mola sightings. Again, this information is not recorded in a systematic way, but you can still use swfscAirDAS functions to determine this information.
A comment indicating whether or not something is being recorded will likely contain “record” somewhere in the comment. Thus, you can use
airdas_comments
to extract comments, and then subset for comments that contain the pattern “record”.You can extract data from ‘systematic’ comments, using
airdas_comments_process
. This function supports both the codes of TURTLE/PHOCOENA data, and the text string format of CARETTA data. Specifically, it includes acomment.format
argument for users with CARETTA data, which allows you to provide a list specifying how the data was recorded, although the data must be separated by a delimiter. See below for thecomment.format
value for default CARETTA data. This function also by default looks for codes, e.g. “fb” or “cp”, in TURTLE/PHOCOENA data; see?airdas_comments_process
for more details.