The wcde
package allows for R users to easily download
data from the Wittgenstein
Centre for Demography and Human Capital Data Explorer as well as
containing a number of helpful functions for working with education
specific demographic data.
You can install the released version of wcde
from CRAN with:
install.packages("wcde")
Install the developmental version with:
library(devtools)
install_github("guyabel/wcde", ref = "main")
The get_wcde()
function can be used to download data
from the Wittgenstein Centre Human Capital Data Explorer. It requires
three user inputs
indicator
: a short code for the indicator of
interestscenario
: a number referring to a SSP narrative, by
default 2 is used (for SSP2)country_code
(or country_name
):
corresponding to the country of interestlibrary(wcde)
# download education specific tfr data
get_wcde(indicator = "etfr",
country_name = c("Brazil", "Albania"))
#> # A tibble: 204 × 6
#> scenario name country_code education period etfr
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Brazil 76 No Education 2015-2020 2.47
#> 2 2 Albania 8 No Education 2015-2020 1.88
#> 3 2 Brazil 76 Incomplete Primary 2015-2020 2.47
#> 4 2 Albania 8 Incomplete Primary 2015-2020 1.88
#> 5 2 Brazil 76 Primary 2015-2020 2.47
#> 6 2 Albania 8 Primary 2015-2020 1.88
#> 7 2 Brazil 76 Lower Secondary 2015-2020 1.89
#> 8 2 Albania 8 Lower Secondary 2015-2020 1.9
#> 9 2 Brazil 76 Upper Secondary 2015-2020 1.37
#> 10 2 Albania 8 Upper Secondary 2015-2020 1.57
#> # … with 194 more rows
# download education specific survivorship rates
get_wcde(indicator = "eassr",
country_name = c("Niger", "Korea"))
#> # A tibble: 8,976 × 8
#> scenario name country_code age sex education period eassr
#> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr> <dbl>
#> 1 2 Niger 562 Newborn Male No Educat… 2015-… 91.6
#> 2 2 Republic of Korea 410 Newborn Male No Educat… 2015-… 99.4
#> 3 2 Niger 562 Newborn Male Incomplet… 2015-… 92
#> 4 2 Republic of Korea 410 Newborn Male Incomplet… 2015-… 99.5
#> 5 2 Niger 562 Newborn Male Primary 2015-… 92.5
#> 6 2 Republic of Korea 410 Newborn Male Primary 2015-… 99.5
#> 7 2 Niger 562 Newborn Male Lower Sec… 2015-… 93.4
#> 8 2 Republic of Korea 410 Newborn Male Lower Sec… 2015-… 99.6
#> 9 2 Niger 562 Newborn Male Upper Sec… 2015-… 95.2
#> 10 2 Republic of Korea 410 Newborn Male Upper Sec… 2015-… 99.7
#> # … with 8,966 more rows
The indicator input must match the short code from the indicator
table. The find_indicator()
function can be used to look up
short codes (given in the first column) from the
wic_indicators
data frame:
find_indicator(x = "tfr")
#> # A tibble: 2 × 3
#> indicator description definition
#> <chr> <chr> <chr>
#> 1 tfr Total Fertility Rate "The average number of children b…
#> 2 etfr Total Fertility Rate by Education "The average number of children b…
By default, get_wdce()
returns data for all years or
available periods or years. The filter()
function in dplyr can
be used to filter data for specific years or periods, for example:
library(tidyverse)
get_wcde(indicator = "e0",
country_name = c("Japan", "Australia")) %>%
filter(period == "2015-2020")
#> # A tibble: 4 × 6
#> scenario name country_code sex period e0
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Japan 392 Male 2015-2020 80.7
#> 2 2 Australia 36 Male 2015-2020 81.3
#> 3 2 Japan 392 Female 2015-2020 87.2
#> 4 2 Australia 36 Female 2015-2020 85
get_wcde(indicator = "sexratio",
country_name = c("China", "South Korea")) %>%
filter(year == 2020)
#> # A tibble: 44 × 6
#> scenario name country_code age year sexratio
#> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
#> 1 2 China 156 All 2020 1.06
#> 2 2 Republic of Korea 410 All 2020 1
#> 3 2 China 156 0--4 2020 1.15
#> 4 2 Republic of Korea 410 0--4 2020 1.07
#> 5 2 China 156 5--9 2020 1.16
#> 6 2 Republic of Korea 410 5--9 2020 1.07
#> 7 2 China 156 10--14 2020 1.17
#> 8 2 Republic of Korea 410 10--14 2020 1.07
#> 9 2 China 156 15--19 2020 1.16
#> 10 2 Republic of Korea 410 15--19 2020 1.1
#> # … with 34 more rows
Past data is only available for selected indicators. These can be
viewed using the past
indicator column:
wic_indicators %>%
filter(past) %>%
select(1:2)
#> # A tibble: 28 × 2
#> indicator description
#> <chr> <chr>
#> 1 pop Population Size (000's)
#> 2 bpop Population Size by Broad Age (000's)
#> 3 epop Population Size by Education (000's)
#> 4 prop Educational Attainment Distribution
#> 5 bprop Educational Attainment Distribution by Broad Age
#> 6 growth Average Annual Growth Rate
#> 7 nirate Average Annual Rate of Natural Increase
#> 8 sexratio Sex Ratio
#> 9 mage Population Median Age
#> 10 tdr Total Dependency Ratio
#> # … with 18 more rows
The filter()
function can also be used to filter
specific indicators to specific age, sex or education groups
get_wcde(indicator = "sexratio",
country_name = c("China", "South Korea")) %>%
filter(year == 2020,
age == "All")
#> # A tibble: 2 × 6
#> scenario name country_code age year sexratio
#> <dbl> <chr> <dbl> <chr> <dbl> <dbl>
#> 1 2 China 156 All 2020 1.06
#> 2 2 Republic of Korea 410 All 2020 1
Country names are guessed using the countrycode package.
get_wcde(indicator = "tfr",
country_name = c("U.A.E", "Espania", "Österreich"))
#> # A tibble: 90 × 5
#> scenario name country_code period tfr
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 2 United Arab Emirates 784 1950-1955 6.97
#> 2 2 Spain 724 1950-1955 2.53
#> 3 2 Austria 40 1950-1955 2.1
#> 4 2 United Arab Emirates 784 1955-1960 6.97
#> 5 2 Spain 724 1955-1960 2.7
#> 6 2 Austria 40 1955-1960 2.57
#> 7 2 United Arab Emirates 784 1960-1965 6.87
#> 8 2 Spain 724 1960-1965 2.81
#> 9 2 Austria 40 1960-1965 2.78
#> 10 2 United Arab Emirates 784 1965-1970 6.77
#> # … with 80 more rows
The get_wcde()
functions accepts ISO alpha numeric codes
for countries via the country_code
argument:
get_wcde(indicator = "etfr", country_code = c(44, 100))
#> # A tibble: 204 × 6
#> scenario name country_code education period etfr
#> <dbl> <chr> <dbl> <chr> <chr> <dbl>
#> 1 2 Bahamas 44 No Education 2015-2020 2.71
#> 2 2 Bulgaria 100 No Education 2015-2020 1.72
#> 3 2 Bahamas 44 Incomplete Primary 2015-2020 2.71
#> 4 2 Bulgaria 100 Incomplete Primary 2015-2020 1.72
#> 5 2 Bahamas 44 Primary 2015-2020 2.71
#> 6 2 Bulgaria 100 Primary 2015-2020 1.72
#> 7 2 Bahamas 44 Lower Secondary 2015-2020 2.09
#> 8 2 Bulgaria 100 Lower Secondary 2015-2020 1.73
#> 9 2 Bahamas 44 Upper Secondary 2015-2020 1.76
#> 10 2 Bulgaria 100 Upper Secondary 2015-2020 1.44
#> # … with 194 more rows
A full list of available countries and region aggregates, and their
codes, can be found in the wic_locations
data frame.
wic_locations
#> # A tibble: 230 × 5
#> name isono continent region dim
#> <chr> <dbl> <chr> <chr> <chr>
#> 1 World 900 <NA> <NA> area
#> 2 Africa 903 <NA> <NA> area
#> 3 Asia 935 <NA> <NA> area
#> 4 Europe 908 <NA> <NA> area
#> 5 Latin America and the Caribbean 904 <NA> <NA> area
#> 6 Northern America 905 <NA> <NA> area
#> 7 Oceania 909 <NA> <NA> area
#> 8 Afghanistan 4 Asia South-Central Asia country
#> 9 Albania 8 Europe Southern Europe country
#> 10 Algeria 12 Africa Northern Africa country
#> # … with 220 more rows
By default get_wcde()
returns data for Medium (SSP2)
scenario. Results for different SSP scenarios can be returned by passing
a different (or multiple) scenario values to the scenario
argument in get_data()
.
get_wcde(indicator = "growth",
country_name = c("India", "China"),
scenario = c(1:3, 21, 22)) %>%
filter(period == "2095-2100")
#> # A tibble: 10 × 5
#> scenario name country_code period growth
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 1 India 356 2095-2100 -0.7
#> 2 1 China 156 2095-2100 -1.1
#> 3 2 India 356 2095-2100 -0.5
#> 4 2 China 156 2095-2100 -1
#> 5 3 India 356 2095-2100 0.2
#> 6 3 China 156 2095-2100 -0.2
#> 7 21 India 356 2095-2100 -0.5
#> 8 21 China 156 2095-2100 -0.9
#> 9 22 India 356 2095-2100 -0.5
#> 10 22 China 156 2095-2100 -1
Set include_scenario_names = TRUE
to include a columns
with the full names of the scenarios
get_wcde(indicator = "tfr",
country_name = c("Kenya", "Nigeria", "Algeria"),
scenario = 1:3,
include_scenario_names = TRUE) %>%
filter(period == "2045-2050")
#> # A tibble: 9 × 7
#> scenario scenario_name scenario_abb name country_code period tfr
#> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl>
#> 1 1 Rapid Development (SSP1) SSP1 Kenya 404 2045-… 1.62
#> 2 1 Rapid Development (SSP1) SSP1 Nige… 566 2045-… 2.29
#> 3 1 Rapid Development (SSP1) SSP1 Alge… 12 2045-… 1.53
#> 4 2 Medium (SSP2) SSP2 Kenya 404 2045-… 2.36
#> 5 2 Medium (SSP2) SSP2 Nige… 566 2045-… 3.37
#> 6 2 Medium (SSP2) SSP2 Alge… 12 2045-… 1.77
#> 7 3 Stalled Development (SS… SSP3 Kenya 404 2045-… 3.33
#> 8 3 Stalled Development (SS… SSP3 Nige… 566 2045-… 4.65
#> 9 3 Stalled Development (SS… SSP3 Alge… 12 2045-… 2.41
Additional details of the pathways for each scenario numeric code can
be found in the wic_scenarios
object. Further background
and links to the corresponding literature are provided in the Data
Explorer
wic_scenarios
#> # A tibble: 5 × 3
#> scenario_name scenario scenario_abb
#> <chr> <dbl> <chr>
#> 1 Rapid Development (SSP1) 1 SSP1
#> 2 Medium (SSP2) 2 SSP2
#> 3 Stalled Development (SSP3) 3 SSP3
#> 4 Medium - Zero Migration (SSP2 - ZM) 21 SSP2ZM
#> 5 Medium - Double Migration (SSP2 - DM) 22 SSP2DM
Data for all countries can be obtained by not setting
country_name
or country_code
get_wcde(indicator = "mage")
#> # A tibble: 7,099 × 5
#> scenario name country_code year mage
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2 Bulgaria 100 1950 27.3
#> 2 2 Myanmar 104 1950 22.8
#> 3 2 Burundi 108 1950 19.5
#> 4 2 Belarus 112 1950 27.2
#> 5 2 Cambodia 116 1950 18.7
#> 6 2 Algeria 12 1950 19.4
#> 7 2 Cameroon 120 1950 20.8
#> 8 2 Canada 124 1950 27.7
#> 9 2 Cape Verde 132 1950 23
#> 10 2 Central African Republic 140 1950 22.5
#> # … with 7,089 more rows
The get_wdce()
function needs to be called multiple
times to download multiple indicators. This can be done using the
map()
function in purrr
mi <- tibble(ind = c("odr", "nirate", "ggapedu25")) %>%
mutate(d = map(.x = ind, .f = ~get_wcde(indicator = .x)))
mi
#> # A tibble: 3 × 2
#> ind d
#> <chr> <list>
#> 1 odr <tibble [7,099 × 5]>
#> 2 nirate <tibble [6,870 × 5]>
#> 3 ggapedu25 <tibble [41,346 × 6]>
mi %>%
filter(ind == "odr") %>%
select(-ind) %>%
unnest(cols = d)
#> # A tibble: 7,099 × 5
#> scenario name country_code year odr
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2 Bulgaria 100 1950 0.1
#> 2 2 Myanmar 104 1950 0.05
#> 3 2 Burundi 108 1950 0.06
#> 4 2 Belarus 112 1950 0.13
#> 5 2 Cambodia 116 1950 0.05
#> 6 2 Algeria 12 1950 0.06
#> 7 2 Cameroon 120 1950 0.06
#> 8 2 Canada 124 1950 0.12
#> 9 2 Cape Verde 132 1950 0.13
#> 10 2 Central African Republic 140 1950 0.09
#> # … with 7,089 more rows
mi %>%
filter(ind == "nirate") %>%
select(-ind) %>%
unnest(cols = d)
#> # A tibble: 6,870 × 5
#> scenario name country_code period nirate
#> <dbl> <chr> <dbl> <chr> <dbl>
#> 1 2 Bulgaria 100 1950-1955 11.1
#> 2 2 Myanmar 104 1950-1955 19.1
#> 3 2 Burundi 108 1950-1955 24.1
#> 4 2 Belarus 112 1950-1955 10.1
#> 5 2 Cambodia 116 1950-1955 25.9
#> 6 2 Algeria 12 1950-1955 27.1
#> 7 2 Cameroon 120 1950-1955 17.6
#> 8 2 Canada 124 1950-1955 18.9
#> 9 2 Cape Verde 132 1950-1955 26.9
#> 10 2 Central African Republic 140 1950-1955 10.7
#> # … with 6,860 more rows
mi %>%
filter(ind == "ggapedu25") %>%
select(-ind) %>%
unnest(cols = d)
#> # A tibble: 41,346 × 6
#> scenario name country_code year education ggapedu25
#> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 2 Bulgaria 100 1950 No Education -20
#> 2 2 Myanmar 104 1950 No Education -13
#> 3 2 Burundi 108 1950 No Education -6
#> 4 2 Belarus 112 1950 No Education -10
#> 5 2 Cambodia 116 1950 No Education -21
#> 6 2 Algeria 12 1950 No Education -2
#> 7 2 Cameroon 120 1950 No Education -13
#> 8 2 Canada 124 1950 No Education -2
#> 9 2 Cape Verde 132 1950 No Education -9
#> 10 2 Central African Republic 140 1950 No Education -1
#> # … with 41,336 more rows
Population data for a range of age-sex-educational attainment
combinations can be obtained by setting indicator = "pop"
in get_wcde()
and specifying a pop_age
,
pop_sex
and pop_edu
arguments. By default each
of the three population breakdown arguments are set to “total”
get_wcde(indicator = "pop", country_name = "India")
#> # A tibble: 31 × 5
#> scenario name country_code year pop
#> <dbl> <chr> <dbl> <dbl> <dbl>
#> 1 2 India 356 1950 376325.
#> 2 2 India 356 1955 409276.
#> 3 2 India 356 1960 449604.
#> 4 2 India 356 1965 497830.
#> 5 2 India 356 1970 553787.
#> 6 2 India 356 1975 621525.
#> 7 2 India 356 1980 697040.
#> 8 2 India 356 1985 781904.
#> 9 2 India 356 1990 870422.
#> 10 2 India 356 1995 960733.
#> # … with 21 more rows
The pop_age
argument can be set to all
to
get population data broken down in five-year age groups. The
pop_sex
argument can be set to both
to get
population data broken down into female and male groups. The
pop_edu
argument can be set to four
,
six
or eight
to get population data broken
down into education categorizations with different levels of detail.
get_wcde(indicator = "pop", country_code = 900, pop_edu = "four")
#> # A tibble: 155 × 6
#> scenario name country_code year education pop
#> <dbl> <fct> <dbl> <dbl> <fct> <dbl>
#> 1 2 World 900 1950 Under 15 868844.
#> 2 2 World 900 1950 No Education 763612.
#> 3 2 World 900 1950 Primary 549510.
#> 4 2 World 900 1950 Secondary 329182.
#> 5 2 World 900 1950 Post Secondary 30143.
#> 6 2 World 900 1955 Under 15 984764.
#> 7 2 World 900 1955 No Education 762022.
#> 8 2 World 900 1955 Primary 600299.
#> 9 2 World 900 1955 Secondary 392261.
#> 10 2 World 900 1955 Post Secondary 38199.
#> # … with 145 more rows
The population breakdown arguments can be used in combination to provide further breakdowns, for example sex and education specific population totals
get_wcde(indicator = "pop", country_code = 900, pop_edu = "six", pop_sex = "both")
#> # A tibble: 434 × 7
#> scenario name country_code year sex education pop
#> <dbl> <fct> <dbl> <dbl> <fct> <fct> <dbl>
#> 1 2 World 900 1950 Male Under 15 443968.
#> 2 2 World 900 1950 Male No Education 317636.
#> 3 2 World 900 1950 Male Incomplete Primary 116692.
#> 4 2 World 900 1950 Male Primary 194902
#> 5 2 World 900 1950 Male Lower Secondary 104160
#> 6 2 World 900 1950 Male Upper Secondary 69384.
#> 7 2 World 900 1950 Male Post Secondary 21102.
#> 8 2 World 900 1950 Female Under 15 424877.
#> 9 2 World 900 1950 Female No Education 445976.
#> 10 2 World 900 1950 Female Incomplete Primary 81231.
#> # … with 424 more rows
The full age-sex-education specific data can also be obtained by
setting indicator = "epop"
in get_wcde()
.
Create population pyramids by setting male population values to negative equivalent to allow for divergent columns from the y axis.
w <- get_wcde(indicator = "pop", country_code = 900,
pop_age = "all", pop_sex = "both", pop_edu = "four")
w
#> # A tibble: 6,510 × 8
#> scenario name country_code year age sex education pop
#> <dbl> <fct> <dbl> <dbl> <fct> <fct> <fct> <dbl>
#> 1 2 World 900 1950 0--4 Male Under 15 172362.
#> 2 2 World 900 1950 0--4 Male No Education 0
#> 3 2 World 900 1950 0--4 Male Primary 0
#> 4 2 World 900 1950 0--4 Male Secondary 0
#> 5 2 World 900 1950 0--4 Male Post Secondary 0
#> 6 2 World 900 1950 0--4 Female Under 15 166026.
#> 7 2 World 900 1950 0--4 Female No Education 0
#> 8 2 World 900 1950 0--4 Female Primary 0
#> 9 2 World 900 1950 0--4 Female Secondary 0
#> 10 2 World 900 1950 0--4 Female Post Secondary 0
#> # … with 6,500 more rows
w <- w %>%
mutate(pop_pm = ifelse(test = sex == "Male", yes = -pop, no = pop),
pop_pm = pop_pm/1e3)
w
#> # A tibble: 6,510 × 9
#> scenario name country_code year age sex education pop pop_pm
#> <dbl> <fct> <dbl> <dbl> <fct> <fct> <fct> <dbl> <dbl>
#> 1 2 World 900 1950 0--4 Male Under 15 172362. -172.
#> 2 2 World 900 1950 0--4 Male No Education 0 0
#> 3 2 World 900 1950 0--4 Male Primary 0 0
#> 4 2 World 900 1950 0--4 Male Secondary 0 0
#> 5 2 World 900 1950 0--4 Male Post Secondary 0 0
#> 6 2 World 900 1950 0--4 Female Under 15 166026. 166.
#> 7 2 World 900 1950 0--4 Female No Education 0 0
#> 8 2 World 900 1950 0--4 Female Primary 0 0
#> 9 2 World 900 1950 0--4 Female Secondary 0 0
#> 10 2 World 900 1950 0--4 Female Post Secondary 0 0
#> # … with 6,500 more rows
Use standard ggplot code to create population pyramid with
scale_x_symmetric()
from the lemon
package to allow for equal male and female x-axiswic_col4
object in the wcde
package which contains the names of the colours used in the Wittgenstein
Centre Human Capital Data Explorer Data Explorer.Note wic_col6
and wic_col8
objects also
exist for equivalent plots of population data objects with corresponding
numbers of categories of education.
library(lemon)
w %>%
filter(year == 2020) %>%
ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
geom_col() +
geom_vline(xintercept = 0, colour = "black") +
scale_x_symmetric(labels = abs) +
scale_fill_manual(values = wic_col4, name = "Education") +
labs(x = "Population (millions)", y = "Age") +
theme_bw()
Add male and female labels on the x-axis by
geom_blank()
to allow for equal x-axis and
additional space at the end of largest columns.w <- w %>%
mutate(pop_max = ifelse(sex == "Male", -max(pop/1e3), max(pop/1e3)))
w %>%
filter(year == 2020) %>%
ggplot(mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
geom_col() +
geom_vline(xintercept = 0, colour = "black") +
scale_x_continuous(labels = abs, expand = c(0, 0)) +
scale_fill_manual(values = wic_col4, name = "Education") +
labs(x = "Population (millions)", y = "Age") +
facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
geom_blank(mapping = aes(x = pop_max * 1.1)) +
theme(panel.spacing.x = unit(0, "pt"),
strip.placement = "outside",
strip.background = element_rect(fill = "transparent"),
strip.text.x = element_text(margin = margin( b = 0, t = 0)))
Animate the pyramid through the past data and projection periods
using the transition_time()
function in the gganimate
package
library(gganimate)
ggplot(data = w,
mapping = aes(x = pop_pm, y = age, fill = fct_rev(education))) +
geom_col() +
geom_vline(xintercept = 0, colour = "black") +
scale_x_continuous(labels = abs, expand = c(0, 0)) +
scale_fill_manual(values = wic_col4, name = "Education") +
facet_wrap(facets = "sex", scales = "free_x", strip.position = "bottom") +
geom_blank(mapping = aes(x = pop_max * 1.1)) +
theme(panel.spacing.x = unit(0, "pt"),
strip.placement = "outside",
strip.background = element_rect(fill = "transparent"),
strip.text.x = element_text(margin = margin(b = 0, t = 0))) +
transition_time(time = year) +
labs(x = "Population (millions)", y = "Age",
title = 'SSP2 World Population {round(frame_time)}')