library(votesmart)
The first step to using the votesmart
package is to register an API key and store it in an environment variable by following these instructions.
Let’s make sure our API key is set.
# If our key is not registered in this environment variable,
# the result of `Sys.getenv("VOTESMART_API_KEY")` will be `""` (i.e. a string of `nchar` 0)
Sys.getenv("VOTESMART_API_KEY")
key <-
(nchar(key) > 0)
key_exists <-
if (!key_exists) knitr::knit_exit()
We’ll also attach dplyr
for working with dataframes.
suppressPackageStartupMessages(library(dplyr))
#> Warning: package 'dplyr' was built under R version 4.0.2
::conflict_prefer("filter", "dplyr")
conflicted#> [conflicted] Will prefer dplyr::filter over any other package
Some of these functions are necessary precursors to obtain data you might want. For instance, in order to get candidates’ ratings by SIGs, you’ll need to get office_level_id
s in order to get office_id
s, which is a required argument to get candidate information using candidates_get_by_office_state
. We’ll go through what might be a typical example of how you might use the votesmart
package.
There are currently three functions for getting data on VoteSmart candidates: candidates_get_by_lastname
, candidates_get_by_levenshtein
, and candidates_get_by_office_state
.
Let’s search for former US House Rep Barney Frank using candidates_get_by_lastname
.
From ?candidates_get_by_lastname
, this function’s defaults are:
candidates_get_by_lastname(
last_names,
election_years = lubridate::year(lubridate::today()),
stage_ids = "",
all = TRUE,
verbose = TRUE
)
Since the default election year is the current year and Barney Frank left office in 2013, we’ll specify a few years in which he ran for office.
(franks <- candidates_get_by_lastname(
last_names = "frank",
election_years = c(2000, 2004)
)
)#> Requesting data for {last_name: frank, election_year: 2000, stage_id: }.
#> Requesting data for {last_name: frank, election_year: 2004, stage_id: }.
#> # A tibble: 13 x 32
#> candidate_id first_name nick_name middle_name last_name suffix title
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 12063 A. T. <NA> <NA> Frank <NA> <NA>
#> 2 26897 Barney <NA> <NA> Frank <NA> <NA>
#> 3 54614 Floyd <NA> <NA> Frank <NA> <NA>
#> 4 36663 Jo Anne <NA> <NA> Frank <NA> <NA>
#> 5 1507 Lonnie Da… <NA> <NA> Frank <NA> <NA>
#> 6 54827 Terrence Terry D. Frank <NA> <NA>
#> 7 26897 Barney <NA> <NA> Frank <NA> <NA>
#> 8 50597 Craig <NA> A. Frank <NA> <NA>
#> 9 37152 Deborah <NA> L. Frank <NA> <NA>
#> 10 50318 Douglas <NA> <NA> Frank <NA> <NA>
#> 11 33210 Keith <NA> R. Frank <NA> <NA>
#> 12 1507 Lonnie Da… <NA> <NA> Frank <NA> <NA>
#> 13 51171 William Bill R. Frank <NA> <NA>
#> # … with 25 more variables: ballot_name <chr>, stage_id <chr>,
#> # election_year <chr>, preferred_name <chr>, election_parties <chr>,
#> # election_status <chr>, election_stage <chr>, election_district_id <chr>,
#> # election_district_name <chr>, election_office <chr>,
#> # election_office_id <chr>, election_state_id <chr>,
#> # election_office_type_id <chr>, election_special <lgl>, election_date <chr>,
#> # office_parties <chr>, office_status <chr>, office_district_id <chr>,
#> # office_district_name <chr>, office_state_id <chr>, office_id <chr>,
#> # office_name <chr>, office_type_id <chr>, running_mate_id <chr>,
#> # running_mate_name <chr>
Looking at the first_name
column, are a number of non-Barneys returned. We can next filter our results to Barney.
(barneys <- franks %>%
filter(first_name == "Barney") %>%
select(
candidate_id, first_name, last_name,
election_year, election_state_id, election_office
)
)#> # A tibble: 2 x 6
#> candidate_id first_name last_name election_year election_state_…
#> <chr> <chr> <chr> <chr> <chr>
#> 1 26897 Barney Frank 2000 MA
#> 2 26897 Barney Frank 2004 MA
#> # … with 1 more variable: election_office <chr>
The two rows returned correspond to the two election_year
s we specified. Each candidate gets their own unique candidate_id
, which we can pull
out.
(barney_id <- barneys %>%
pull(candidate_id) %>%
unique()
)#> [1] "26897"
One of the most powerful things about VoteSmart is its wealth of information about candidates’ positions on issues as rated by a number of Special Interest Groups, or SIGs.
Given a candidate_id
, we can ask for those ratings using rating_get_candidate_ratings
.
(barney_ratings <- rating_get_candidate_ratings(
candidate_ids = barney_id,
sig_ids = "" # All SIGs
)
)#> Requesting data for {candidate_id: 26897, sig_id: }.
#> # A tibble: 1,642 x 19
#> rating_id candidate_id sig_id rating rating_name timespan rating_text
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 8661 26897 2419 63 Positions 2013-20… Barney Fra…
#> 2 6598 26897 1985 9 Lifetime P… 2013 Bsed on le…
#> 3 6093 26897 1578 100 Lifetime P… 2012 <NA>
#> 4 6305 26897 2086 0 Positions 2012 <NA>
#> 5 6408 26897 2023 75 Positions 2012 <NA>
#> 6 6481 26897 1084 91 Positions 2012 <NA>
#> 7 6616 26897 2159 75 Positions … 2012 Barney Fra…
#> 8 6642 26897 230 50 Positions 2012 Barney Fra…
#> 9 6725 26897 1734 21 Positions 2012 Barney Fra…
#> 10 6732 26897 329 92 Global Iss… 2012 Barney Fra…
#> # … with 1,632 more rows, and 12 more variables: category_id_1 <chr>,
#> # category_name_1 <chr>, category_id_2 <chr>, category_name_2 <chr>,
#> # category_id_3 <chr>, category_name_3 <chr>, category_id_4 <chr>,
#> # category_name_4 <chr>, category_id_5 <chr>, category_name_5 <chr>,
#> # category_id_6 <chr>, category_name_6 <chr>
There are a lot of columns here because some ratings are tagged with multiple categories.
c("rating", "category_name_1", "sig_id", "timespan") main_cols <-
We’ll filter to Barney’s ratings on the environment using just the first category name.
(barney_on_env <- barney_ratings %>%
filter(category_name_1 == "Environment") %>%
select(main_cols)
)#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(main_cols)` instead of `main_cols` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> # A tibble: 39 x 4
#> rating category_name_1 sig_id timespan
#> <chr> <chr> <chr> <chr>
#> 1 92 Environment 1012 2012
#> 2 89 Environment 1012 2012
#> 3 91 Environment 1012 2011-2012
#> 4 100 Environment 1938 2011-2012
#> 5 88 Environment 1826 2011-2012
#> 6 71 Environment 922 2011-2012
#> 7 94 Environment 1012 2011
#> 8 92 Environment 1012 2011
#> 9 100 Environment 1197 2011
#> 10 96 Environment 1826 2011
#> # … with 29 more rows
Something to be aware of is that some SIGs give ratings as letter grades:
%>%
barney_ratings filter(
::str_detect(rating, "[A-Z]")
stringr%>%
) select(rating, category_name_1)
#> # A tibble: 26 x 2
#> rating category_name_1
#> <chr> <chr>
#> 1 F Guns
#> 2 A Foreign Affairs
#> 3 F Social
#> 4 F Guns
#> 5 F- Guns
#> 6 A Foreign Affairs
#> 7 A+ Foreign Affairs
#> 8 F Fiscally Conservative
#> 9 C Foreign Affairs
#> 10 F Immigration
#> # … with 16 more rows
But using just Barney’s number grades, we can get his average rating on this category per timespan
:
%>%
barney_on_env group_by(timespan) %>%
summarise(
avg_rating = mean(as.numeric(rating), na.rm = TRUE)
%>%
) arrange(desc(timespan))
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 23 x 2
#> timespan avg_rating
#> <chr> <dbl>
#> 1 2012 90.5
#> 2 2011-2012 87.5
#> 3 2011 95.5
#> 4 2010 86
#> 5 2009-2010 83.5
#> 6 2009 100
#> 7 2008 92
#> 8 2007-2008 88
#> 9 2007 90
#> 10 2006 100
#> # … with 13 more rows
Keep in mind that these are ratings given by SIGs, which often have very different baseline stances on issues. For example, a pro-life group might give a candidate a rating of 0 whereas a pro-choice group might give that same candidate a 100.
%>%
barney_ratings filter(category_name_1 == "Abortion") %>%
select(
1
rating, sig_id, category_name_
)#> # A tibble: 36 x 3
#> rating sig_id category_name_1
#> <chr> <chr> <chr>
#> 1 100 1016 Abortion
#> 2 0 252 Abortion
#> 3 100 1016 Abortion
#> 4 0 252 Abortion
#> 5 0 1195 Abortion
#> 6 100 1016 Abortion
#> 7 0 1086 Abortion
#> 8 0 252 Abortion
#> 9 100 1016 Abortion
#> 10 0 1086 Abortion
#> # … with 26 more rows
When it comes to the Special Interest Groups themselves, the result of rating_get_candidate_ratings
only supplies us with a sig_id
.
We can get more information about these SIGs given these IDs with rating_get_sig
.
(some_sigs <- barney_ratings %>%
pull(sig_id) %>%
unique() %>%
sample(3)
)#> [1] "834" "143" "1764"
rating_get_sig(
sig_ids = some_sigs
)#> Requesting data for {sig_id: 834}.
#> Requesting data for {sig_id: 143}.
#> Requesting data for {sig_id: 1764}.
#> # A tibble: 3 x 14
#> sig_id name description state_id address city state zip phone_1 phone_2
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 834 Asso… The purpos… <NA> 440 1s… Wash… DC 20001 202-59… <NA>
#> 2 143 Busi… The work o… <NA> 1030 1… Wash… DC 20005 202-29… <NA>
#> 3 1764 PFLA… Parents, F… <NA> 1828 L… Wash… DC 20036 202-46… <NA>
#> # … with 4 more variables: fax <chr>, email <chr>, url <chr>,
#> # contact_name <chr>
Or, if we don’t yet know any sig_id
s, we can get a dataframe of them with the function rating_get_sig_list
.
That function requires a vector of issue category_ids
, however, so let’s first get a vector of some category_ids
.
(category_df <- rating_get_categories(
state_ids = NA # NA for national
%>%
) distinct() %>%
sample_n(nrow(.)) # Sampling so we can see multiple categories in the 10 rows shown here
)#> Beginning to get categories for state NA.
#> # A tibble: 40 x 3
#> category_id name state_id
#> <chr> <chr> <chr>
#> 1 40 Immigration <NA>
#> 2 30 Environment <NA>
#> 3 73 Gambling and Gaming <NA>
#> 4 66 Veterans <NA>
#> 5 11 Business and Consumers <NA>
#> 6 53 Senior Citizens <NA>
#> 7 2 Abortion <NA>
#> 8 41 Technology and Communication <NA>
#> 9 25 Drugs <NA>
#> 10 37 Guns <NA>
#> # … with 30 more rows
Now we can get our dataframe of SIGs given some categories.
category_df$category_id %>% sample(3))
(some_categories <-#> [1] "22" "2" "25"
(sigs <- rating_get_sig_list(
category_ids = some_categories,
state_ids = NA
%>%
) select(sig_id, name, category_id, state_id) %>%
sample_n(nrow(.))
)#> Requesting data for {category_id: 22, state_id: NA}.
#> Requesting data for {category_id: 2, state_id: NA}.
#> Requesting data for {category_id: 25, state_id: NA}.
#> # A tibble: 24 x 4
#> sig_id name category_id state_id
#> <chr> <chr> <chr> <chr>
#> 1 1946 Susan B. Anthony List 2 <NA>
#> 2 2368 Family Policy Alliance 2 <NA>
#> 3 1197 Women's Action for New Directions (WAND) 22 <NA>
#> 4 1578 Planned Parenthood Action Fund 2 <NA>
#> 5 1559 Democrats for Life of America 2 <NA>
#> 6 101 Council for a Livable World 22 <NA>
#> 7 1975 National Defense PAC 22 <NA>
#> 8 1957 One Nation PAC 22 <NA>
#> 9 2826 Tobacco Free Kids Action Fund 25 <NA>
#> 10 3020 RootsAction 22 <NA>
#> # … with 14 more rows
We already have the category names corresponding to those category_id
s in our category_df
, so we can join category_df
onto sigs
s to attach category_name_1
s to each of those SIGs.
%>%
sigs rename(
sig_name = name
%>%
) left_join(
category_df,by = c("state_id", "category_id")
%>%
) rename(
category_name_1 = name
%>%
) sample_n(nrow(.))
#> # A tibble: 24 x 5
#> sig_id sig_name category_id state_id category_name_1
#> <chr> <chr> <chr> <chr> <chr>
#> 1 1957 One Nation PAC 22 <NA> Defense
#> 2 725 Center for Security Policy 22 <NA> Defense
#> 3 1954 Republican National Coalition fo… 2 <NA> Abortion
#> 4 1946 Susan B. Anthony List 2 <NA> Abortion
#> 5 2368 Family Policy Alliance 2 <NA> Abortion
#> 6 3020 RootsAction 22 <NA> Defense
#> 7 1110 Peace Action 22 <NA> Defense
#> 8 1197 Women's Action for New Direction… 22 <NA> Defense
#> 9 1578 Planned Parenthood Action Fund 2 <NA> Abortion
#> 10 2231 Family Research Council (FRC) Ac… 2 <NA> Abortion
#> # … with 14 more rows
For more info or to report a bug to VoteSmart, please refer to the VoteSmart API docs!