Introduction to swfscAirDAS

library(dplyr)
library(tibble) #For printing output
library(stringr)
library(swfscAirDAS)

This document introduces you to the swfscAirDAS package. This package is intended to standardize and streamline processing of aerial survey DAS (AirDAS) data collected using the PHOCOENA, TURTLE, or CARETTA programs from the Southwest Fisheries Science Center. In DAS data (and thus AirDAS data), an event is only recorded when something changes or happens, which can complicate processing. Thus, the main theme of this package is enabling analyses and downstream processing by 1) determining associated state and condition information for each event and 2) pulling out event-specific information from the Data columns. The following examples generally use the default function arguments; see the documentation for the individual functions for a complete description of all processing options.

Data

This package includes a sample AirDAS file, which we will use in this document. This file is formatted as a DAS file created using the TURTLE program

y <- system.file("airdas_sample.das", package = "swfscAirDAS")
head(readLines(y))
#> [1] "  1C 123059 040915 N39:14.13 W123:09.38 Recorder: dd                                "
#> [2] "  2C 123059 040915 N39:14.13 W123:09.38 Not recording molas                         "
#> [3] "  3T.123059 040915 N39:14.13 W123:09.38   T1                                        "
#> [4] "  4V.123059 040915 N39:14.13 W123:09.38    g    g    g    g    g                    "
#> [5] "  5P.123059 040915 N39:14.13 W123:09.38   aa   bb   cc   dd                         "
#> [6] "  6A.123059 040915 N39:14.13 W123:09.38  650  100                                   "

Check data format

The first step in processing AirDAS data is to ensure that the DAS file has expected formatting and values, which you can do using the airdas_check function. The checks performed by this function are detailed in the function documentation, which can be accessed by running ?airdas_check. If you aren’t sure of the file type (format) of your data, you can check the format PDFs. These PDFs are available online at https://smwoodman.github.io/swfscAirDAS/, or see ?airdas_format_pdf for how to access a local copy.

# Code not run
y.check <- airdas_check(y, file.type = "turtle", skip = 0, print.transect = TRUE)

Read and process data

Once QA/QC is complete and you have fixed any data entry errors, you can begin to process the AirDAS data. The backbone of this package is the reading and processing steps: 1) the data from the DAS file are read into the columns of a data frame and 2) state and condition information are extracted for each event. This means that after processing, you can simply look at any event (row) and determine the Beaufort, viewing conditions, etc., at the time of the event. All other functions in the package depend on the AirDAS data being in this processed state.

# Read 
y.read <- airdas_read(y, file.type = "turtle", skip = 0)
head(y.read)
#>   Event EffortDot            DateTime     Lat       Lon Data1 Data2 Data3 Data4
#> 1     C     FALSE 2015-04-09 12:30:59 39.2355 -123.1563  Reco rder:  dd    <NA>
#> 2     C     FALSE 2015-04-09 12:30:59 39.2355 -123.1563  Not  recor ding  molas
#> 3     T      TRUE 2015-04-09 12:30:59 39.2355 -123.1563    T1  <NA>  <NA>  <NA>
#> 4     V      TRUE 2015-04-09 12:30:59 39.2355 -123.1563     g     g     g     g
#> 5     P      TRUE 2015-04-09 12:30:59 39.2355 -123.1563    aa    bb    cc    dd
#> 6     A      TRUE 2015-04-09 12:30:59 39.2355 -123.1563   650   100  <NA>  <NA>
#>   Data5 Data6 Data7 EventNum          file_das line_num file_type
#> 1  <NA>  <NA>  <NA>        1 airdas_sample.das        1    turtle
#> 2  <NA>  <NA>  <NA>        2 airdas_sample.das        2    turtle
#> 3  <NA>  <NA>  <NA>        3 airdas_sample.das        3    turtle
#> 4     g  <NA>  <NA>        4 airdas_sample.das        4    turtle
#> 5  <NA>  <NA>  <NA>        5 airdas_sample.das        5    turtle
#> 6  <NA>  <NA>  <NA>        6 airdas_sample.das        6    turtle

# Process
y.proc <- airdas_process(y.read, trans.upper = TRUE)
head(y.proc)
#>   Event            DateTime     Lat       Lon OnEffort Trans Bft CCover Jelly
#> 1     C 2015-04-09 12:30:59 39.2355 -123.1563    FALSE  <NA>  NA     NA    NA
#> 2     C 2015-04-09 12:30:59 39.2355 -123.1563    FALSE  <NA>  NA     NA    NA
#> 3     T 2015-04-09 12:30:59 39.2355 -123.1563     TRUE    T1  NA     NA    NA
#> 4     V 2015-04-09 12:30:59 39.2355 -123.1563     TRUE    T1  NA     NA    NA
#> 5     P 2015-04-09 12:30:59 39.2355 -123.1563     TRUE    T1  NA     NA    NA
#> 6     A 2015-04-09 12:30:59 39.2355 -123.1563     TRUE    T1  NA     NA    NA
#>   HorizSun VertSun  HKR  Haze  Kelp RedTide AltFt SpKnot ObsL ObsB ObsR  Rec
#> 1       NA      NA <NA> FALSE FALSE   FALSE    NA     NA <NA> <NA> <NA> <NA>
#> 2       NA      NA <NA> FALSE FALSE   FALSE    NA     NA <NA> <NA> <NA> <NA>
#> 3       NA      NA <NA> FALSE FALSE   FALSE    NA     NA <NA> <NA> <NA> <NA>
#> 4       NA      NA <NA> FALSE FALSE   FALSE    NA     NA <NA> <NA> <NA> <NA>
#> 5       NA      NA <NA> FALSE FALSE   FALSE    NA     NA   aa   bb   cc   dd
#> 6       NA      NA <NA> FALSE FALSE   FALSE   650    100   aa   bb   cc   dd
#>    VLI  VLO   VB  VRI  VRO Data1 Data2 Data3 Data4 Data5 Data6 Data7 EffortDot
#> 1 <NA> <NA> <NA> <NA> <NA>  Reco rder:  dd    <NA>  <NA>  <NA>  <NA>     FALSE
#> 2 <NA> <NA> <NA> <NA> <NA>  Not  recor ding  molas  <NA>  <NA>  <NA>     FALSE
#> 3 <NA> <NA> <NA> <NA> <NA>    T1  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>      TRUE
#> 4    g    g    g    g    g     g     g     g     g     g  <NA>  <NA>      TRUE
#> 5    g    g    g    g    g    aa    bb    cc    dd  <NA>  <NA>  <NA>      TRUE
#> 6    g    g    g    g    g   650   100  <NA>  <NA>  <NA>  <NA>  <NA>      TRUE
#>   EventNum          file_das line_num file_type
#> 1        1 airdas_sample.das        1    turtle
#> 2        2 airdas_sample.das        2    turtle
#> 3        3 airdas_sample.das        3    turtle
#> 4        4 airdas_sample.das        4    turtle
#> 5        5 airdas_sample.das        5    turtle
#> 6        6 airdas_sample.das        6    turtle

Once you have processed the AirDAS data, you can easily access a variety of information. For instance, you can look at the different Beaufort values that occurred in the data, or filter for specific events to get the beginning and ending points of each effort section.

# The number of events per Beaufort value
table(y.proc$Bft)
#> 
#>  1  2 
#> 58 19

# Filter for T/R and O/E events to extract lat/lon points
y.proc %>% 
  filter(Event %in% c("T", "R", "E", "O")) %>% 
  select(Event, Lat, Lon, Trans)
#>   Event      Lat       Lon Trans
#> 1     T 39.23550 -123.1563    T1
#> 2     E 39.23400 -123.4047    T1
#> 3     R 39.23400 -123.4742    T1
#> 4     O 39.23283 -123.6033  <NA>
#> 5     T 39.22000 -123.6078    T2
#> 6     O 39.21700 -123.2417  <NA>

Sightings

The swfscAirDAS package does contain specific functions for extracting and/or summarizing particular information from the processed data. First is airdas_sight, a function that returns a data frame with pertinent sighting data extracted to their own columns

y.sight <- airdas_sight(y.proc)

y.sight %>% 
  select(Event, SightNo:TurtleTail) %>% 
  head()
#>   Event SightNo Obs Angle SightStd Mixed SpCode GsTotal GsSp TurtleSize
#> 1     S       1  cc    67     TRUE FALSE     mn       6    6         NA
#> 2     S       2  bb   -60     TRUE FALSE     mn       6    6         NA
#> 3     S       3  cc    38     TRUE FALSE     mn       2    2         NA
#> 4     S       4  aa   -50     TRUE FALSE     mn       5    5         NA
#> 5     S       5  cc    49     TRUE FALSE     bm       4    4         NA
#> 6     S       6  cc    42     TRUE  TRUE     gg      10    8         NA
#>   TurtleDirection TurtleTail
#> 1              NA       <NA>
#> 2              NA       <NA>
#> 3              NA       <NA>
#> 4              NA       <NA>
#> 5              NA       <NA>
#> 6              NA       <NA>

Effort

In addition, you can chop the effort data into segments, and summarize the conditions and sightings on those segments using airdas_effort and airdas_effort_sight. These effort segments can be used for line transect estimates using the Distance software, species distribution modeling, or summarizing the number of harbor porpoises on each transect, among other uses. airdas_effort chops continuous effort sections (the event sequence from T/R to E/O events) into effort segments using one of several different chopping methods: condition (a new effort segment every time a condition changes), equal length (effort segments of equal length), and section (each segment is a full continuous effort section, i.e. it runs from a T/R event to an E/O event). airdas_effort_sight takes the output of airdas_effort and returns the number of sightings and animals per segment for specified species codes.

Both functions return a list of three data frames: segdata, sightinfo, and randpicks. Briefly, as these data frames and the different chopping methodologies are described in depth in the function documentation (?airdas_effort and ?airdas_effort_sight), segdata contains information about each effort segment, sightinfo contains information about the sightings on the segments, and randpicks contains information specific to the ‘equal length’ chopping method. These are separate functions to allow the user more control over the filters applied to determine which sightings should be included in the summaries (see ?airdas_effort).

# Chop the effort every time a condition changes
y.eff <- airdas_effort(
  y.proc, method = "condition", seg.min.km = 0, 
  dist.method = "greatcircle", conditions = c("Bft", "CCover"), 
  num.cores = 1
)
y.eff.sight <- airdas_effort_sight(y.eff, sp.codes = c("bm", "dc"))

head(y.eff.sight$segdata)
#>   segnum section_id section_sub_id event transect              file stlin
#> 1      1          1              1     T       T1 airdas_sample.das     3
#> 2      2          1              2     T       T1 airdas_sample.das    36
#> 3      3          2              1     R       T1 airdas_sample.das    40
#> 4      4          2              2     R       T1 airdas_sample.das    41
#> 5      5          2              3     R       T1 airdas_sample.das    44
#> 6      6          3              1     T       T2 airdas_sample.das    53
#>   endlin     lat1      lon1           DateTime1     lat2      lon2
#> 1     36 39.23550 -123.1563 2015-04-09 12:30:59 39.23400 -123.4033
#> 2     37 39.23400 -123.4033 2015-04-09 12:37:57 39.23400 -123.4047
#> 3     41 39.23400 -123.4742 2015-04-09 12:40:00 39.23333 -123.4914
#> 4     44 39.23333 -123.4914 2015-04-09 12:40:28 39.23317 -123.5143
#> 5     49 39.23317 -123.5143 2015-04-09 12:41:08 39.23283 -123.6032
#> 6     78 39.22000 -123.6078 2015-04-09 12:46:51 39.21683 -123.3605
#>             DateTime2     mlat      mlon           mDateTime    dist year month
#> 1 2015-04-09 12:37:57 39.23309 -123.2797 2015-04-09 12:34:28 21.2882 2015     4
#> 2 2015-04-09 12:38:00 39.23400 -123.4040 2015-04-09 12:37:58  0.1148 2015     4
#> 3 2015-04-09 12:40:28 39.23367 -123.4828 2015-04-09 12:40:14  1.4937 2015     4
#> 4 2015-04-09 12:41:08 39.23317 -123.5029 2015-04-09 12:40:48  1.9657 2015     4
#> 5 2015-04-09 12:43:43 39.23245 -123.5588 2015-04-09 12:42:25  7.6620 2015     4
#> 6 2015-04-09 12:54:04 39.21667 -123.4842 2015-04-09 12:50:27 21.3146 2015     4
#>   day    mtime maxdistBft maxdistCCover nSI_bm ANI_bm nSI_dc ANI_dc
#> 1   9 12:34:28          1            10      3     11      1      1
#> 2   9 12:37:58          2            10      0      0      0      0
#> 3   9 12:40:14          2            10      0      0      0      0
#> 4   9 12:40:48          2            20      0      0      0      0
#> 5   9 12:42:25          1            20      0      0      0      0
#> 6   9 12:50:27          1            20      3     13      0      0

head(y.eff.sight$sightinfo)
#>   segnum     mlat      mlon Event            DateTime      Lat       Lon
#> 1      1 39.23309 -123.2797     S 2015-04-09 12:31:35 39.23367 -123.1783
#> 2      1 39.23309 -123.2797     S 2015-04-09 12:31:47 39.23350 -123.1857
#> 3      1 39.23309 -123.2797     S 2015-04-09 12:31:56 39.23333 -123.1917
#> 4      1 39.23309 -123.2797     S 2015-04-09 12:32:20 39.23233 -123.2053
#> 5      1 39.23309 -123.2797     S 2015-04-09 12:32:30 39.23233 -123.2108
#> 6      1 39.23309 -123.2797     S 2015-04-09 12:32:42 39.23233 -123.2195
#>   OnEffort Trans Bft CCover Jelly HorizSun VertSun HKR  Haze  Kelp RedTide
#> 1     TRUE    T1   1     10     0        6      NA   n FALSE FALSE   FALSE
#> 2     TRUE    T1   1     10     0        6      NA   n FALSE FALSE   FALSE
#> 3     TRUE    T1   1     10     0        6      NA   n FALSE FALSE   FALSE
#> 4     TRUE    T1   1     10     0        6      NA   n FALSE FALSE   FALSE
#> 5     TRUE    T1   1     10     0        6      NA   n FALSE FALSE   FALSE
#> 6     TRUE    T1   1     10     0        6      NA   n FALSE FALSE   FALSE
#>   AltFt SpKnot ObsL ObsB ObsR Rec VLI VLO VB VRI VRO EffortDot EventNum
#> 1   650    100   aa   bb   cc  dd   g   g  g   g   g      TRUE        9
#> 2   650    100   aa   bb   cc  dd   g   g  g   g   g      TRUE       10
#> 3   650    100   aa   bb   cc  dd   g   g  g   g   g      TRUE       12
#> 4   650    100   aa   bb   cc  dd   g   g  g   g   g      TRUE       14
#> 5   650    100   aa   bb   cc  dd   g   g  g   g   g      TRUE       15
#> 6   650    100   aa   bb   cc  dd   g   g  g   g   g      TRUE       16
#>            file_das line_num file_type SightNo Obs Angle SightStd Mixed SpCode
#> 1 airdas_sample.das        9    turtle       1  cc    67     TRUE FALSE     mn
#> 2 airdas_sample.das       10    turtle       2  bb   -60     TRUE FALSE     mn
#> 3 airdas_sample.das       12    turtle       3  cc    38     TRUE FALSE     mn
#> 4 airdas_sample.das       14    turtle       4  aa   -50     TRUE FALSE     mn
#> 5 airdas_sample.das       15    turtle       5  cc    49     TRUE FALSE     bm
#> 6 airdas_sample.das       16    turtle       6  cc    42     TRUE  TRUE     gg
#>   GsTotal GsSp TurtleSize TurtleDirection TurtleTail included
#> 1       6    6         NA              NA       <NA>    FALSE
#> 2       6    6         NA              NA       <NA>    FALSE
#> 3       2    2         NA              NA       <NA>    FALSE
#> 4       5    5         NA              NA       <NA>    FALSE
#> 5       4    4         NA              NA       <NA>     TRUE
#> 6      10    8         NA              NA       <NA>    FALSE

If you wanted to determine the number of humpback whales on each transect with a sighting angle between -60 and 60 degrees, for instance, you could do the following

# 'Chop' the effort by continuous effort section
y.eff.section <- airdas_effort(
  y.proc, method = "section", dist.method = "greatcircle", conditions = NULL, 
  num.cores = 1
)
y.eff.section$sightinfo <- y.eff.section$sightinfo %>% 
  mutate(included = ifelse(.data$SpCode == "mn" & abs(.data$Angle > 60), FALSE, .data$included))

y.eff.section.sight <- airdas_effort_sight(y.eff.section, sp.codes = c("mn"))

y.eff.section.sight[[1]] %>% 
  mutate(transect_id = cumsum(.data$event == "T")) %>% 
  group_by(transect_id) %>% 
  summarise(dist_sum = sum(dist), 
            mn_count = sum(ANI_mn)) %>% 
  as.data.frame() #for printing format
#>   transect_id dist_sum mn_count
#> 1           1  32.5244       20
#> 2           2  31.5482       18

In this example you could also simply group by the transect column, but that structure is not robust to analyzing multi-year data sets where the same transects are flown multiple times.

Comments

In AirDAS data, comments are a catch-all field, meaning they are used to record information that does not fit neatly into the AirDAS framework. For instance, users will often enter a comment indicating if they are or are not recording extra information such as mola mola sightings. Again, this information is not recorded in a systematic way, but you can still use swfscAirDAS functions to determine this information.

A comment indicating whether or not something is being recorded will likely contain “record” somewhere in the comment. Thus, you can use airdas_comments to extract comments, and then subset for comments that contain the pattern “record”.

y.comm <- airdas_comments(y.proc)
head(y.comm)
#>    Event            DateTime      Lat       Lon OnEffort Trans Bft CCover Jelly
#> 1      C 2015-04-09 12:30:59 39.23550 -123.1563    FALSE  <NA>  NA     NA    NA
#> 2      C 2015-04-09 12:30:59 39.23550 -123.1563    FALSE  <NA>  NA     NA    NA
#> 11     C 2015-04-09 12:31:48 39.23350 -123.1857     TRUE    T1   1     10     0
#> 22     C 2015-04-09 12:34:56 39.23317 -123.2968     TRUE    T1   1     10     2
#> 25     C 2015-04-09 12:35:32 39.23317 -123.3178     TRUE    T1   1     10     2
#> 37     C 2015-04-09 12:38:00 39.23400 -123.4047    FALSE    T1   2     10     0
#>    HorizSun VertSun  HKR  Haze  Kelp RedTide AltFt SpKnot ObsL ObsB ObsR  Rec
#> 1        NA      NA <NA> FALSE FALSE   FALSE    NA     NA <NA> <NA> <NA> <NA>
#> 2        NA      NA <NA> FALSE FALSE   FALSE    NA     NA <NA> <NA> <NA> <NA>
#> 11        6      NA    n FALSE FALSE   FALSE   650    100   aa   bb   cc   dd
#> 22        6      NA    r FALSE FALSE    TRUE   650    100   aa   bb   cc   dd
#> 25        6      NA    r FALSE FALSE    TRUE   650    100   aa   bb   cc   dd
#> 37        6      NA    n FALSE FALSE   FALSE   650    100   aa   bb   cc   dd
#>     VLI  VLO   VB  VRI  VRO Data1 Data2 Data3 Data4 Data5 Data6           Data7
#> 1  <NA> <NA> <NA> <NA> <NA>  Reco rder:  dd    <NA>  <NA>  <NA>            <NA>
#> 2  <NA> <NA> <NA> <NA> <NA>  Not  recor ding  molas  <NA>  <NA>            <NA>
#> 11    g    g    g    g    g  2 cp  <NA>  <NA>  <NA>  <NA>  <NA>            <NA>
#> 22    g    g    g    g    g  stre aky r ed ti de     <NA>  <NA>            <NA>
#> 25    g    g    g    g    g  shar k 11f t      <NA>  <NA>  <NA>            <NA>
#> 37    g    g    g    g    g  off  effor t to  circl e on  unide ntifed object  
#>    EffortDot EventNum          file_das line_num file_type
#> 1      FALSE        1 airdas_sample.das        1    turtle
#> 2      FALSE        2 airdas_sample.das        2    turtle
#> 11      TRUE       11 airdas_sample.das       11    turtle
#> 22      TRUE       21 airdas_sample.das       22    turtle
#> 25      TRUE       24 airdas_sample.das       25    turtle
#> 37     FALSE       37 airdas_sample.das       38    turtle
#>                                      comment_str
#> 1                                 Recorder: dd  
#> 2                            Not recording molas
#> 11                                          2 cp
#> 22                           streaky red tide   
#> 25                                shark 11ft    
#> 37  off effort to circle on unidentifed object

str_subset(y.comm$comment_str, "record") #Could also use grepl() here
#> [1] " Not recording molas"

You can extract data from ‘systematic’ comments, using airdas_comments_process. This function supports both the codes of TURTLE/PHOCOENA data, and the text string format of CARETTA data. Specifically, it includes a comment.format argument for users with CARETTA data, which allows you to provide a list specifying how the data was recorded, although the data must be separated by a delimiter. See below for the comment.format value for default CARETTA data. This function also by default looks for codes, e.g. “fb” or “cp”, in TURTLE/PHOCOENA data; see ?airdas_comments_process for more details.

# comment.format for default CARETTA data
comment.format <- list(
  n = 5, sep = ";", 
  type = c("character", "character", "numeric", "numeric", "character")
)

# TURTLE/PHOCOENA comment-data
head(airdas_comments_process(y.proc))
#>   Event            DateTime      Lat       Lon OnEffort Trans Bft CCover Jelly
#> 1     C 2015-04-09 12:31:48 39.23350 -123.1857     TRUE    T1   1     10     0
#> 2     C 2015-04-09 12:42:01 39.23267 -123.5433     TRUE    T1   1     20     0
#> 3     C 2015-04-09 12:42:01 39.23267 -123.5433     TRUE    T1   1     20     0
#>   HorizSun VertSun HKR  Haze  Kelp RedTide AltFt SpKnot ObsL ObsB ObsR Rec VLI
#> 1        6      NA   n FALSE FALSE   FALSE   650    100   aa   bb   cc  dd   g
#> 2        6      NA   n FALSE FALSE   FALSE   650    100   aa   bb   cc  dd   g
#> 3        6      NA   n FALSE FALSE   FALSE   650    100   aa   bb   cc  dd   g
#>   VLO VB VRI VRO Data1 Data2 Data3 Data4 Data5 Data6 Data7 EffortDot EventNum
#> 1   g  g   g   g  2 cp  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>      TRUE       11
#> 2   g  g   g   g  fb2s  fb1m  <NA>  <NA>  <NA>  <NA>  <NA>      TRUE       45
#> 3   g  g   g   g  fb2s  fb1m  <NA>  <NA>  <NA>  <NA>  <NA>      TRUE       45
#>            file_das line_num file_type comment_str     Misc1 Misc2 Value
#> 1 airdas_sample.das       11    turtle        2 cp  crab pot  <NA>     2
#> 2 airdas_sample.das       46    turtle   fb2s fb1m fish ball     s     2
#> 3 airdas_sample.das       46    turtle   fb2s fb1m fish ball     m     1
#>   flag_check
#> 1      FALSE
#> 2      FALSE
#> 3      FALSE