The sfo package provides summary statistics of the monthly passengers and landing in San Francisco International Airport (SFO) between 2005 and 2020.
Data source: San Francisco data portal - DataSF API
Install the stable version from CRAN:
or install the development version from Github:
The sfo package provides the following two datasets:
sfo_passengers
- air traffic passengers statisticssfo_stats
- air traffic landing statisticsMore information about the datasets available on the following vignette.
The sfo_passengers
dataset provides a monthly summary of the number of passengers in SFO airport by different categories (such as terminal, geo, type, etc.):
library(sfo)
data("sfo_passengers")
head(sfo_passengers)
#> activity_period operating_airline operating_airline_iata_code published_airline published_airline_iata_code geo_summary geo_region activity_type_code price_category_code terminal
#> 1 202012 United Airlines UA United Airlines UA Domestic US Enplaned Other Terminal 3
#> 2 202012 United Airlines UA United Airlines UA Domestic US Thru / Transit Other Terminal 3
#> 3 202012 United Airlines UA United Airlines UA International Asia Deplaned Other International
#> 4 202012 United Airlines UA United Airlines UA International Asia Enplaned Other International
#> 5 202012 United Airlines UA United Airlines UA International Australia / Oceania Deplaned Other International
#> 6 202012 United Airlines UA United Airlines UA International Australia / Oceania Enplaned Other International
#> boarding_area passenger_count
#> 1 F 117260
#> 2 F 61
#> 3 G 12763
#> 4 G 16050
#> 5 G 2733
#> 6 G 2878
The sfo_stats
dataset provides a monthly statistics on the air traffic landing at SFO airport:
data("sfo_stats")
head(sfo_stats)
#> activity_period operating_airline operating_airline_iata_code published_airline published_airline_iata_code geo_summary geo_region landing_aircraft_type aircraft_body_type aircraft_manufacturer
#> 1 202012 United Airlines UA United Airlines UA Domestic US Passenger Narrow Body Boeing
#> 2 202012 United Airlines UA United Airlines UA Domestic US Passenger Wide Body Boeing
#> 3 202012 United Airlines UA United Airlines UA Domestic US Passenger Wide Body Boeing
#> 4 202012 United Airlines UA United Airlines UA International Asia Passenger Wide Body Boeing
#> 5 202012 United Airlines UA United Airlines UA International US Passenger Wide Body Boeing
#> 6 202012 United Airlines UA United Airlines UA Domestic US Passenger Wide Body Boeing
#> aircraft_model aircraft_version landing_count total_landed_weight
#> 1 B753 - 78 17472000
#> 2 B763 - 19 6080000
#> 3 B772 - 96 43500000
#> 4 B772 - 1 460000
#> 5 B772 - 8 3680000
#> 6 B773 - 35 19390000
The total number of passengers in most recent month by activity_type_code
and geo_region
:
library(dplyr)
sfo_passengers %>%
filter(activity_period == max(activity_period)) %>%
group_by(activity_type_code, geo_region) %>%
summarise(total = sum(passenger_count), .groups = "drop")
#> # A tibble: 17 x 3
#> activity_type_code geo_region total
#> * <chr> <chr> <int>
#> 1 Deplaned Asia 24631
#> 2 Deplaned Australia / Oceania 2733
#> 3 Deplaned Canada 4688
#> 4 Deplaned Central America 4240
#> 5 Deplaned Europe 12385
#> 6 Deplaned Mexico 20789
#> 7 Deplaned Middle East 2179
#> 8 Deplaned US 364571
#> 9 Enplaned Asia 30418
#> 10 Enplaned Australia / Oceania 2878
#> 11 Enplaned Canada 4257
#> 12 Enplaned Central America 3684
#> 13 Enplaned Europe 13661
#> 14 Enplaned Mexico 29537
#> 15 Enplaned Middle East 2463
#> 16 Enplaned US 395548
#> 17 Thru / Transit US 61
The sankey_ly
function enables us to plot the distribution of a numeric variable by multiple categorical variables. The following example shows the distribution of the total United Airlines passengers during 2019 by terminal, travel type (domestic and international), geo, and travel direction (deplaned, enplaned, and transit):
sfo_passengers %>%
filter(operating_airline == "United Airlines",
activity_period >= 201901 & activity_period < 202001) %>%
mutate(terminal = ifelse(terminal == "International", "international", terminal)) %>%
group_by(operating_airline,activity_type_code, geo_summary, geo_region, terminal) %>%
summarise(total = sum(passenger_count), .groups = "drop") %>%
sankey_ly(cat_cols = c("operating_airline", "terminal","geo_summary", "geo_region", "activity_type_code"),
num_col = "total",
title = "Dist. of United Airlines Passengers at SFO During 2019")
The total number of landing in most recent month by activity_type_code
and aircraft_manufacturer
:
sfo_stats %>%
filter(activity_period == max(activity_period),
aircraft_manufacturer != "") %>%
group_by(aircraft_manufacturer) %>%
summarise(total_landing = sum(landing_count),
`.groups` = "drop") %>%
arrange(-total_landing) %>%
plot_ly(labels = ~ aircraft_manufacturer,
values = ~ total_landing) %>%
add_pie(hole = 0.6) %>%
layout(title = "Landing Distribution by Aircraft Manufacturer during Sep 2020")
The following Sankey plot demonstrate the distribution of number of landing in SFO by region and aircraft type, manufacturer, and body type during Sep 2020:
sfo_stats %>%
filter(activity_period == max(activity_period)) %>%
group_by(geo_summary, geo_region, landing_aircraft_type, aircraft_manufacturer, aircraft_body_type) %>%
summarise(total_landing = sum(landing_count),
groups = "drop") %>%
sankey_ly(cat_cols = c("geo_summary", "geo_region",
"landing_aircraft_type",
"aircraft_manufacturer",
"aircraft_body_type"),
num_col = "total_landing",
title = "Landing Summary by Geo Region and Aircraft Type During Sep 2020")