Implementing Synthetic Controls

In order to understand the policy impact on Russian Alcohol Consumption, we will first load the SCtools, Synth, and some of the tidyverse packages.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
library(SCtools)
#> Loading required package: future
library(Synth)
#> ##
#> ## Synth Package: Implements Synthetic Control Methods.
#> ## See http://www.mit.edu/~jhainm/software.htm for additional information.
theme_set(theme_minimal())

After those are loaded, we can look at the alcohol data that is inside of the SCtools package. We can build the graphic in the Guardian using these data to see the trend in alcohol consumption.

alcohol %>% 
  filter(country_name == "Russian Federation") %>% 
    ggplot(aes(year, consumption))+
    geom_line()+
    geom_vline(xintercept = 2006, color  = "orange")+
    labs(
        title = "Per Capita Alcohol Consumption in the Russian Federation Since 1990",
        subtitle = "The Russian government began a series of policy changes in 2003\nto reduce alcohol consumption",
        y = "Per Capita Consumption (L/person)",
        x = NULL,
        caption = "Data: World Health Organization"
    )

Trend of Alcohol Consumption in Russia

We can also examine some of the trends in our predictors.

alcohol %>% 
  filter(country_name == "Russian Federation") %>% 
    select(year, consumption,labor_force_participation_rate:manufacturing) %>% 
    tidyr::gather(predictor, value, -year) %>% 
    ggplot(aes(year, value))+
    geom_line()+
        facet_wrap(~predictor, scales = "free")

Trends in Predictors in Russian Federation

One feature of SCM is selecting donor states. These donor states help us to generate our synthetic control (synthetic version of Russian in the absence of the policy change). There are several key assumptions regarding choosing donor states including

No spillover effects (e.g. the policy change in Russia did not cause a huge migration of drinkers to another country, did not change trade patterns, etc)
Donor states do not enact similar policies

In the absence of evidence of other changes in other states, we can look at the potential donor states graphically:

p1 <- alcohol %>% 
    mutate(my_color = ifelse(country_code == "RUS", "Russia", "Other")) %>% 
    ggplot(aes(year, consumption, group = country_code, color = my_color))+
    geom_line() +
    scale_color_manual(values = c("grey", "black", "white"))+
    ylim(5,15)+
    xlim(1990,2005)
p1
#> Warning: Removed 3579 row(s) containing missing values (geom_path).

Review of Possible Donors

When we look at the potential donors we see the usual suspects: France, United States, as well as some of the neighbouring Baltic states. This makes some intuitive sense. Baltic states share similar cultural habits. SCM is insensitive to having donors that are not neighbouring, which will allow us to include countries like Great Britain and the United States in the donor pool.³

Experimental Design

Now that we have an idea of the donors, we can build the pool.

comparison_states <- c("USA", "UK", "UKR", "KAZ",
                                             "GBR", "ESP", "DEU", "POL",
                                             "FIN", "FRA", "GRC", "IRL",
                                             "LTU", "ROU", "GEO", "MDA",
                                             "SWE", "BEL", "BLR", "KGZ",
                                             "CZE", "MEX", "SVN")

control_ids <- alcohol %>% 
    select(country_code,country_num) %>% 
    filter(country_code %in% comparison_states) %>% 
    distinct() %>% 
    pull(country_num)

Now we will use the dataprep function from Synth to format our data.

dataprep.out<-dataprep(
        foo = as.data.frame(alcohol),
        predictors = c("labor_force_participation_rate",
                       "inflation",
                       "mobile_cellular_subscriptions",
                       "manufacturing"),
        predictors.op = "mean",
        dependent = "consumption",
        unit.variable = "country_num",
        time.variable = "year",
        treatment.identifier = 142,
        controls.identifier = control_ids,
        time.predictors.prior = c(1991:2005),
        time.optimize.ssr = c(2000:2005),
        special.predictors = list(
      list("consumption", 2000:2005 ,"mean")),
        unit.names.variable = "country_code",
        time.plot = 1992:2015
    )
#> 
#>  Missing data- treated unit; predictor: inflation ; for period: 1991 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: inflation ; for period: 1992 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1991 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1992 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1993 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1994 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1995 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1996 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1997 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1998 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 1999 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 2000 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data- treated unit; predictor: manufacturing ; for period: 2001 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data - control unit: 16 ; predictor: inflation ; for period: 1991 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.
#> 
#>  Missing data - control unit: 16 ; predictor: inflation ; for period: 1992 
#>  We ignore (na.rm = TRUE) all missing values for predictors.op.

There are some missing values, but that should be ok. Now we will run the synth function to generate our synthetic controls.

out <- synth(dataprep.out, Sigf.ipop = 3)

After running the SCM, we can look at the weights from the donors and see if we see anything that is surprising.

solution <- out$solution.w %>% 
    as.data.frame()

solution$country_num <- rownames(solution)

solution %>% 
    mutate(country_num = as.numeric(country_num)) %>% 
    left_join(alcohol %>% 
    select(country_code,country_num) %>% 
    filter(country_code %in% comparison_states) %>% 
        distinct(), by = "country_num") %>% 
    ggplot(aes(reorder(country_code, w.weight), w.weight))+
    geom_col()+
    coord_flip()+
    labs(
        title = "Donor Weights by Country",
        y = NULL,
        x = "Weight"
    )

Unsurprisingly, we can see that Kazakstan and Lithuania contribute to counterfactual Russia, however, interestingly Great Britain and the Chzech Republic are amongst the top donors to the counterfactual. This highlights one of the powerful features of Synthetic Control - the donors do not need to be geographically connected, Let’s continue with our analysis.

Now we can plot our real Russia vs our Synthetic Russia using the plot.path function from Synth. Here we see that the synthetic control matches 2000-2005 fairly well (which is what we specified to optimize), but there is some opportunity to more closely match actual Russia.

path.plot(synth.res = out, dataprep.res = dataprep.out, 
                    Xlab = "per Capita Alcohol Consumption", )

Synthetic Russia vs Actual Russia for Alcohol Consumption per Capita

We can also look at the delta between our real and synthetic Russia.

delta_out <- (dataprep.out$Y1plot - (dataprep.out$Y0plot %*% out$solution.w)) %>% 
    as.data.frame()

delta_out$year <- rownames(delta_out)

delta_out%>% 
    knitr::kable(caption = "Difference in Alcohol Consumption Between Synthetic and Actual Russia", digits = 2)

Difference in Alcohol Consumption Between Synthetic and Actual Russia
	142	year
1992	-3.28	1992
1993	-1.71	1993
1994	-1.24	1994
1995	1.49	1995
1996	-0.41	1996
1997	-0.60	1997
1998	-0.10	1998
1999	0.77	1999
2000	-0.50	2000
2001	-0.44	2001
2002	-0.06	2002
2003	0.34	2003
2004	0.35	2004
2005	0.53	2005
2006	1.02	2006
2007	1.19	2007
2008	1.27	2008
2009	1.00	2009
2010	0.76	2010
2011	0.86	2011
2012	0.92	2012
2013	0.09	2013
2014	-1.00	2014
2015	-1.46	2015

It appears that the policy took a while to take effect, with a decrease in alcohol consumed per capita beginning in 2014. However, it must be mentioned that this isn’t the “40%” that the Guardian reports and that one may glean from the WHO graph at first glance.

The Next Steps

The next steps in a SCM controls analysis is to permute the data set to understand the sensitivity of our Synthetic controls to assess if the results observed are plausible (e.g. are the results we obtained due from chance, or reflect the true effect). The SCtools provides the generate.placebos function to automate this process. This includes the option to parallelise the operation (which greatly speeds up processing time).

placebo <- generate.placebos(dataprep.out = dataprep.out,
                             synth.out = out, strategy = "multicore")

Now that we have our placebo object, we can represent it graphically with plot_placebos. Here we can see our donors (control unit) and our actual treated group.

plot_placebos(placebo)

Placebo Plot for Control Units

Equally important we can test the Mean Squared Prediction Error (MSPE). Additionally, because we generated the controls for the placebos, we can see how extreme our values are and thus generate a pseudo p-value to see if our results are significant. Now, it should be mentioned that this test is relatively underpowered as it looks only at the donor states.

test_out <- mspe.test(placebo)
test_out$p.val
#> [1] 0.6521739

Why look at single number summary, when you can look at a plot? With mspe.plot we can visualise the ratios of MSPE for each donor. Here we see very little difference in Russia, which supports our higher p-value.

mspe.plot(tdf = placebo)

Mean Squared Prediction Error Ratios for Russia and Donors

Available at https://www.theguardian.com/world/2019/oct/01/russian-alcohol-consumption-down-40-since-2003-who ↩︎
Yes, we all know as good economists that as goods get more expensive, consumers will buy less of them. However, how much less is another question.↩︎
I would recommend using the plotly package which allows you to hover over the different lines and see the country codes.↩︎

SCM Case Study on Alcohol Consumption

Introduction

Implementing Synthetic Controls

Experimental Design

The Next Steps

Conclusion