The covid19swiss R package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) pandemic outbreak in Switzerland cantons and Principality of Liechtenstein (FL).
The covid19swiss
dataset includes the following fields:
date
- the timestamp of the case, a Date
objectlocation
- the Cantons of Switzerland and the Principality of Liechtenstein (FL) abbreviation codelocation_type
- description of the location, either Canton of Switzerland or the Principality of echtensteinlocation_code
- a canton index code for merging geometry data from the rnaturalearth package, ailable only for Switzerland cantonslocation_code_type
- the name of code in the rnaturalearth package for Switzerland mapdata_type
- the type of casevalue
- the number of cases corresponding to the date
and data_type
fieldsWhere the available data_type
field includes the following cases:
tested_total
- cumulative number of tests performed as of the datecases_total
- cumulative confirmed Covid-19 cases as of the current datehosp_new
- new hospitalizations on the current datehosp_current
- current number of hospitalized patients as of the current dateicu_current
- number of hospitalized patients in ICUs as of the current datevent_current
- number of hospitalized patients requiring ventilation as of the current daterecovered_total
- cumulative number of patients recovered as of the current datedeaths_total
- cumulative deaths due to Covid-19 as of the current dateThe data organized in a long format:
library(covid19swiss)
head(covid19swiss)
#> date location location_type location_code location_code_type
#> 1 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> 2 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> 3 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> 4 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> 5 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> 6 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> data_type value
#> 1 tested_total 4
#> 2 cases_total NA
#> 3 hosp_new NA
#> 4 hosp_current NA
#> 5 icu_current NA
#> 6 vent_current NA
It is straightforward to transform the data into a wide format with the pivot_wider
function from the tidyr package:
library(tidyr)
covid19swiss_wide <- covid19swiss %>%
pivot_wider(names_from = data_type, values_from = value)
head(covid19swiss_wide)
#> # A tibble: 6 x 13
#> date location location_type location_code location_code_t… tested_total
#> <date> <chr> <chr> <chr> <chr> <int>
#> 1 2020-01-24 GE Canton of Sw… CH.GE gn_a1_code 4
#> 2 2020-01-25 GE Canton of Sw… CH.GE gn_a1_code 8
#> 3 2020-01-26 GE Canton of Sw… CH.GE gn_a1_code 11
#> 4 2020-01-27 GE Canton of Sw… CH.GE gn_a1_code 18
#> 5 2020-01-28 GE Canton of Sw… CH.GE gn_a1_code 27
#> 6 2020-01-29 GE Canton of Sw… CH.GE gn_a1_code 54
#> # … with 7 more variables: cases_total <int>, hosp_new <int>,
#> # hosp_current <int>, icu_current <int>, vent_current <int>,
#> # recovered_total <int>, deaths_total <int>
The following examples demonstrate simple methods for query and summarise the data with the dplyr and tidyr packages.
The first example demonstrates how to query the total confirmed, recovered, and death cases by canton as of April 8th:
library(dplyr)
covid19swiss %>%
filter(date == as.Date("2020-09-08"),
data_type %in% c("cases_total", "recovered_total", "death_total")) %>%
select(location, value, data_type) %>%
pivot_wider(names_from = data_type, values_from = value) %>%
arrange(-cases_total)
#> # A tibble: 26 x 3
#> location cases_total recovered_total
#> <chr> <int> <int>
#> 1 VD 8070 NA
#> 2 GE 7310 NA
#> 3 ZH 6652 NA
#> 4 TI 3565 929
#> 5 BE 2698 NA
#> 6 VS 2458 320
#> 7 AG 2260 NA
#> 8 FR 1920 164
#> 9 SG 1341 NA
#> 10 BS 1254 1154
#> # … with 16 more rows
Note: some fields, such as total_recovered
or total_tested
, are not available for some cantons and marked as missing values (i.e., NA
)
In the next example, we will filter the dataset for the Canton of Geneva and calculate the following metrics:
covid19swiss %>% dplyr::filter(location == "GE",
date == as.Date("2020-04-10")) %>%
dplyr::select(data_type, value) %>%
tidyr::pivot_wider(names_from = data_type, values_from = value) %>%
dplyr::mutate(positive_tested = round(100 * cases_total / tested_total, 2),
death_rate = round(100 * deaths_total / cases_total, 2),
recovery_rate = round(100 * recovered_total / cases_total, 2)) %>%
dplyr::select(positive_tested, recovery_rate, death_rate)
#> # A tibble: 1 x 3
#> positive_tested recovery_rate death_rate
#> <dbl> <dbl> <dbl>
#> 1 23.7 10.1 3.83
Values are in precentage
The raw data include both Switzerland and the Principality of Liechtenstein. Separating the data by country can be done by using the location
field:
switzerland <- covid19swiss %>% filter(location != "FL")
head(switzerland)
#> date location location_type location_code location_code_type
#> 1 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> 2 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> 3 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> 4 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> 5 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> 6 2020-01-24 GE Canton of Switzerland CH.GE gn_a1_code
#> data_type value
#> 1 tested_total 4
#> 2 cases_total NA
#> 3 hosp_new NA
#> 4 hosp_current NA
#> 5 icu_current NA
#> 6 vent_current NA
liechtenstein <- covid19swiss %>% filter(location == "FL")
head(liechtenstein)
#> date location location_type location_code
#> 1 2020-02-27 FL Principality of Liechtenstein <NA>
#> 2 2020-02-27 FL Principality of Liechtenstein <NA>
#> 3 2020-02-27 FL Principality of Liechtenstein <NA>
#> 4 2020-02-27 FL Principality of Liechtenstein <NA>
#> 5 2020-02-27 FL Principality of Liechtenstein <NA>
#> 6 2020-02-27 FL Principality of Liechtenstein <NA>
#> location_code_type data_type value
#> 1 gn_a1_code tested_total 3
#> 2 gn_a1_code cases_total NA
#> 3 gn_a1_code hosp_new NA
#> 4 gn_a1_code hosp_current NA
#> 5 gn_a1_code icu_current NA
#> 6 gn_a1_code vent_current NA