The goal of headliner
is to help analysts to translate
facts to insights. In the comparison below (source),
both dashboards have the same underlying data but how they present the
information to the user is very different.
Right now, there isn’t anything out of the box to help users
dynamically create phrasing like used in the “insights” version without
a bit of coding gymnastics. The many ways you could approach it combined
with the steps required to say “if positive, show it like this, if
negative show it like that” increase the technical debt this kind of
code could add to a project. For this reason, headliner
is
designed to deliver the building blocks required to create these phrases
for plot titles, value boxes in shiny
or section headers in
a report.
You can install the dev version of headliner
from github with:
::install_github("rjake/headliner") devtools
For these examples, I will use a function called
demo_data()
to build a data set based on the current date
(this was last run on 06/26/22).
library(headliner)
library(dplyr)
demo_data()
#> # A tibble: 10 × 5
#> group sales count on_sale date
#> <chr> <dbl> <dbl> <dbl> <date>
#> 1 a 101 35 1 2022-06-26
#> 2 a 102 34 0 2022-04-26
#> 3 b 103 33 1 2022-02-26
#> 4 b 104 32 0 2021-12-26
#> 5 c 105 31 1 2021-10-26
#> 6 c 106 30 0 2021-08-26
#> 7 d 107 29 1 2021-06-26
#> 8 d 108 28 0 2021-04-26
#> 9 e 109 27 1 2021-02-26
#> 10 e 110 26 0 2020-12-26
What we want is to say something like this:
#> We have seen a 5.6% decrease compared to the same time last year (101 vs. 107).
We can look at the data an see that about 12 months ago,
sales
was 107 where as today it is 101. We can give these
values to headline()
and get a simple phrase
headline(
x = 101,
y = 107
)#> decrease of 6 (101 vs. 107)
To see how the sentence was constructed, we can look at the
components used under the hood. This return_data = TRUE
returns a named list. I will condense with view_list()
headline(101, 107, return_data = TRUE) |>
view_list()
#> value
#> headline decrease of 6 (101 vs. 107)
#> x 101
#> y 107
#> delta 6
#> delta_p 5.6
#> article_delta a 6
#> article_delta_p a 5.6
#> raw_delta -6
#> raw_delta_p -5.6
#> article_raw_delta a -6
#> article_raw_delta_p a -5.6
#> sign -1
#> orig_values 101 vs. 107
#> trend decrease
We can compose it like this using glue::glue()
syntax
headline(
x = 101,
y = 107,
headline = "We have seen {article_delta_p}% {trend} compared to the same time last year ({orig_values})."
)#> We have seen a 5.6% decrease compared to the same time last year (101 vs. 107).
You might have noticed that there are multiple article_*
components available. article_delta
is for the difference
between the two values (“a 6 person loss” vs
“an 8 person loss”), article_delta_p
is
for the percentage difference for “a 5.6%” vs
“an 8.6%”. You can also add articles to words using
add_article()
. For example
add_artice("increase")
gives us “an
increase” vs add_artice("decrease")
“a
decrease”.
But let’s see if we can make the calculations more dynamic…
First, we can use a function called add_date_columns()
to calculate distances from the current date (or the reference date
specified) to the values in the date
column . With these
new fields we can see that 04/26/22 was 61 days ago (or 8 weeks or 2
months, …) from the current date.
demo_data() |>
add_date_columns(date_col = date)
#> # A tibble: 10 × 11
#> group sales count on_sale date day week month quarter calendar_year
#> <chr> <dbl> <dbl> <dbl> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 a 101 35 1 2022-06-26 0 0 0 0 0
#> 2 a 102 34 0 2022-04-26 -61 -8 -2 0 0
#> 3 b 103 33 1 2022-02-26 -120 -17 -4 -1 0
#> 4 b 104 32 0 2021-12-26 -182 -26 -6 -2 -1
#> 5 c 105 31 1 2021-10-26 -243 -34 -8 -2 -1
#> 6 c 106 30 0 2021-08-26 -304 -43 -10 -3 -1
#> 7 d 107 29 1 2021-06-26 -365 -52 -12 -4 -1
#> 8 d 108 28 0 2021-04-26 -426 -60 -14 -4 -1
#> 9 e 109 27 1 2021-02-26 -485 -69 -16 -5 -1
#> 10 e 110 26 0 2020-12-26 -547 -78 -18 -6 -2
#> # … with 1 more variable: fiscal_year <dbl>
We can then identify some conditions for our group of interest
(x
) and our reference group (y
). This step
uses the kind of logic you would use in dplyr::filter()
or
base::subset()
+ the logic used by
dplyr::across()
<- # year over year
yoy demo_data() |>
add_date_columns(date) |>
compare_conditions(
x = (month == 0), # this month
y = (month == -12), # vs 12 months ago
.cols = sales, # the column(s) to aggregate
.fns = lst(mean) # the list of functions passed to summarise(across(...))
)
yoy#> # A tibble: 1 × 2
#> mean_sales_x mean_sales_y
#> <dbl> <dbl>
#> 1 101 107
The argument lst(mean)
is equivalent to writing
list(mean = mean)
. The name (left side) is how it will name
the column, the right side is the function to use. If I had used
.fns = list(avg = mean)
The names would have been
avg_sales_x
and avg_sales_y
and used
mean()
as the calculation. Because
compare_conditions()
uses the mean as the default, I’ll
omit it going forward.
You may want to do other steps with the data frame but if you want to
go right into a headline, you can use headline_list()
.
Another good option for adding headlines to data frames is
add_headline_column()
(see next section)
|>
yoy headline_list(
headline = "We have seen {article_delta_p}% {trend} compared to the same time last year ({orig_values})."
) #> We have seen a 5.6% decrease compared to the same time last year (101 vs. 107).
If your result has more than 2 values, you can specify the values you need by calling their names
<-
car_stats |>
mtcars compare_conditions(
x = cyl == 4,
y = cyl > 4,
.cols = starts_with("d"),
.fns = list(avg = mean, min = min)
)
view_list(car_stats)
#> value
#> avg_disp_x 105.136364
#> avg_disp_y 296.504762
#> avg_drat_x 4.070909
#> avg_drat_y 3.348095
#> min_disp_x 71.100000
#> min_disp_y 145.000000
#> min_drat_x 3.690000
#> min_drat_y 2.760000
|>
car_stats headline_list(
x = avg_disp_x,
y = avg_disp_y,
headline = "Difference in avg. displacement of {delta}cu.in. ({orig_values})"
)#> Difference in avg. displacement of 191.4cu.in. (105.1 vs. 296.5)
|>
car_stats headline_list(
x = avg_drat_x,
y = avg_drat_y,
headline = "Difference in avg. rear axle ratio of {delta} ({orig_values})"
)#> Difference in avg. rear axle ratio of 0.7 (4.1 vs. 3.3)
compare_conditions()
can also be used to compare
categorical criteria.
|>
pixar_films compare_conditions(
== "G",
rating == "PG",
rating .cols = rotten_tomatoes
|>
) headline_list(
headline =
"Metacritic has an avg. rating of {x} for G-rated films and {y} for PG-rated films \\
({delta} points {trend})",
trend_phrases = trend_terms(more = "higher", less = "lower"),
n_decimal = 0
)#> Metacritic has an avg. rating of 87 for G-rated films and 91 for PG-rated films (4 points lower)
demo_data() |>
compare_conditions(
x = group == "a",
y = group == "c",
.cols = c(sales),
.fns = sum
|>
) headline_list(
headline = "Group A is ${delta} {trend} Group C (${x} vs ${y})",
trend_phrases = trend_terms(more = "ahead", less = "behind")
)#> Group A is $8 behind Group C ($203 vs $211)
You can also use add_headline_column()
to append a
column to your data frame with headlines describing each row. You can
reference existing columns in the headline and you can bring back
specific talking points using return_cols =
. You can use
this to find the most interesting phrases.
|>
pixar_films select(film, rotten_tomatoes, metacritic) |>
add_headline_column(
x = rotten_tomatoes,
y = metacritic,
headline = "{film} had a difference of {delta} points",
return_cols = c(delta)
|>
) arrange(desc(delta))
#> # A tibble: 22 × 5
#> film rotten_tomatoes metacritic headline delta
#> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 Onward 88 61 Onward had a difference… 27
#> 2 Monsters, Inc. 96 79 Monsters, Inc. had a di… 17
#> 3 Cars 2 40 57 Cars 2 had a difference… 17
#> 4 Finding Dory 94 77 Finding Dory had a diff… 17
#> 5 Coco 97 81 Coco had a difference o… 16
#> 6 A Bug's Life 92 77 A Bug's Life had a diff… 15
#> 7 Monsters University 80 65 Monsters University had… 15
#> 8 Incredibles 2 93 80 Incredibles 2 had a dif… 13
#> 9 Toy Story 4 97 84 Toy Story 4 had a diffe… 13
#> 10 Toy Story 2 100 88 Toy Story 2 had a diffe… 12
#> # … with 12 more rows
You can add phrases to customize your sentences.
plural_phrases()
allows you to add new variables to the
list of components available. Here I am adding {people}
for
use in my headline.
headline(
x = 9,
y = 10,
headline = "{delta_p}% {trend} ({delta} {people})",
plural_phrases = list(
people = plural_phrasing(single = "person", multi = "people")
)
)#> 10% decrease (1 person)
You can actually pass multiple trend_terms()
and
plural_phrasing()
options.
# lists to use
<-
more_less list(
an_increase = trend_terms("an increase", "a decrease"),
more = trend_terms(more = "more", less = "less")
)
<-
are_people list(
are = plural_phrasing(single = "is", multi = "are"),
people = plural_phrasing(single = "person", multi = "people")
)
# notice the difference in these two outputs
headline(
x = 25,
y = 23,
headline = "There {are} {delta} {more} {people} ({an_increase} of {delta_p}%)",
trend_phrases = more_less,
plural_phrases = are_people
)#> There are 2 more people (an increase of 8.7%)
headline(
x = 25,
y = 26,
headline = "There {are} {delta} {more} {people} ({an_increase} of {delta_p}%)",
trend_phrases = more_less,
plural_phrases = are_people
)#> There is 1 less person (a decrease of 3.8%)
You can also adjust the text if the numbers are the same
headline(3, 3)
#> There was no difference
headline(3, 3, if_match = "There were no additional applicants ({x} total)")
#> There were no additional applicants (3 total)
valueBox()
Here’s an example that uses headliner
to create a
valueBox()
in shiny
with dynamic colors and
text.
<- compare_values(101, 107)
show <- ifelse(show$sign == -1, "red", "blue")
box_color
valueBox(
value =
headline(
$x,
show$y,
showheadline = "{delta_p}% {trend}"
),subtitle = "vs. the same time last year",
color = box_color
)