This vignette goes through the basics of getting started with
adobeanalyticsr
.
The basic setup is documented on the home page of adobeanalyticsr.com, which reviews the details for:
AW_CLIENT_ID
and AW_CLIENT_SECRET
environment variables (the recommended mechanism for this is through the
.Renviron
file). If these environment variables are not
available, then the client_id
and
client_secret
arguments in aw_token()
need to
be populated with the client ID and secret.AW_COMPANY_ID
and AW_REPORTSUITE_ID
environment variables (again, this is recommended to be done via a
.Renviron
file).Many adobeanalyticsr
functions require a company ID (the
Adobe account being accessed) and a report suite ID (the specific report
suite to pull data or information from) to run. As such, these are
arguments (company_id
and rsid
) within those
functions that default to the values for the AW_COMPANY_ID
and AW_REPORTSUITE_ID
environment variables.
This approach has some important ramifications:
AW_COMPANY_ID
and
AW_REPORTSUITE_ID
environment variables means you will
not have to set them on every function call, which can make for
shorter and cleaner code.company_id
or rsid
from what is specified as
an environment variable, all you need to do is call them explicitly
within the function calls. The values you specify for either of these
(or both of them) directly in a function call will take precedence over
any environment variables that are set up (this is core R behavior:
function arguments often have default values built into them, but any
value specified in the code will be used in place of those defaults;
this is not anything unique to adobeanalyticsr
).AW_COMPANY_ID
and
AW_REPORTSUITE_ID
environment variables created
and you do not specify values for them in the
adobeanalyticsr
function calls, then the functions will
fail.In this vignette, both AW_COMPANY_ID
and
AW_REPORTSUITE_ID
environment variables have been set up
(but not shown).
Once the package is loaded, authenticate with
aw_token()
. If you have not authenticated ever, or if you
have not authenticated within the last 24 hours, then a browser should
open requiring you to log in to your Adobe account, and it will then
redirect you to a web page that displays a lengthy token that you can
copy and paste back into an Enter your authorization
code: prompt in the R console.
library(dplyr) # Light cleanup of output
library(knitr) # Cleaner data tables
library(adobeanalyticsr)
aw_token()
The token will look something like this (with different letters and numbers and characters):
eyJ4NXUiOiJpbXNfbmExLWtleS0xLmNlciIsImFsZyI6IlJTMjU2In0.eyJpZCI6IjE2MTI3MTEyMjM5MzNfNGE2ZmVlM2-tYzdkOS00M2QyLWI5ODUtYzAzMDk2NDA2MGFjX3VlMSIsImNsaWVudF9pZCI6IjA2MTU0ZDgzMjl-OTQxZjViMGUzM2EwZGM2M2U3NmQxIiwidXNlcl9pZCI6IkY2MUJBOTU0NUE3QjMyM0QwQTQ5NUQ2Q0BBZG9iZUlEI-wic3RhdGUiOiIiLCJKeXBlIjoiYXV0aG9yaXphdGlvbl9jb2RlIiwiYXMiOiJpbXMtbmExIiwiZmciO-JWRlFSUkhCTVhMTzU2SDRDRTZSTFJZZUE0TT09PT09PSIsInNpZCI6IjE2MTI3MTEyMjM5MzNfMWE2ZDc4ZTUtNT-zZi00OWFkLWFmZTktYTk0MTJkZmYwMzA0X3VlMSIsIm90byI3InRydWUiLCJleHBpcmVzX2luIjoiMT-wMDAwMCIsImNyZWF0ZWRfYXQiOiIxNjEyNzExMjIzOTMzIiwic2NvcGUiOiJvcGVuaWQsQWRvYmVJRCxyZWFkX-9yZ2FuaXphdGlvbnMsYWRkaXRpb25hbF9pbmZvLnByb2plY3RlZFByb2R1Y3RDb250ZXh0LGFkZGl0aW9uY-xfaW5mby5qb2JfZnVuY3Rpb24ifQ.EvoihD7IPXkew_yFQuzjZGfiU_Q8-vlUkmytwfB_Y46DKePINPn7URq8bit0dXoO-tUdMUKVNOHEBbqww6ydDGBPHSCbOH1GgNsILoN96tjHTtA-jDxN8jrAnQ0PuCssA-GePnqryxCxOH9WyZIl2fog00ib8iZ3ZFAJLDrvthWwWHUw-ivu-K-F3UAtU7A4ma_7pe07D1rW5MhTcZOSr0pri68bjFA86cJqH5DHyMdp4F2d7QgZcYPMdrvVfXTwWXv9s5r6huvDcqv6nny-WOKZbKmoP6zwMzn5xa343wrQ5oXFbRxYem-tC154_dc7ekrC8YUX0pY5up9a-OUy0w
Confirm that the authorization worked by running
get_me()
, which will return two things if the authorization
was successful:
Your data is now available!
globalCompanyId
)
and company name (companyName
) for all of the companies
(accounts) to which you have access based on the login you used when you
called aw_token()
.Because the Adobe Analytics API works with the variable and segment IDs rather than the plain English names of dimensions, metrics, and segments, it is often useful, at least in the initial development of a project, to create data frames that contain all of the available values for these three types of variables.
<- aw_get_dimensions()
dims_df #> Auto-refreshing stale OAuth token.
#> Warning: Unable to refresh token: unauthorized_client
#> Request failed [401]. Retrying in 1 seconds...
#> Auto-refreshing stale OAuth token.
#> Warning: Unable to refresh token: unauthorized_client
#> Request failed [401]. Retrying in 1 seconds...
#> Auto-refreshing stale OAuth token.
#> Warning: Unable to refresh token: unauthorized_client
#> Error in aw_call_api(req_path = urlstructure, debug = debug, company_id = company_id): Unauthorized (HTTP 401).
head(dims_df, 10) %>% select(id, name, type, category) %>% kable()
id | name | type | category |
---|---|---|---|
averagepagetime | Time Spent on Page - Bucketed | ordered-enum | Metrics |
browser | Browser | string | Audience |
browserheight | Browser Height - Granular | int | Audience |
browserheightbucketed | Browser Height - Bucketed | ordered-enum | Audience |
browsertype | Browser Type | enum | Audience |
browserwidth | Browser Width - Granular | int | Audience |
browserwidthbucketed | Browser Width - Bucketed | ordered-enum | Audience |
campaign | Tracking Code | string | Traffic Sources |
campaign.1 | Campaign Source | string | Traffic Sources |
campaign.2 | Campaign Medium | string | Traffic Sources |
This data frame includes:
id
values are
[classified variable].[num]
and they have a
non-NA
value for parent
(the
parent
column–not shown above–has the name of the
classified variable)The data frame can then be searched and filtered for specific dimensions for use in subsequent data calls.
<- aw_get_metrics()
metrics_df head(metrics_df, 10) %>% select(id, name, type, category) %>% kable()
id | name | type | category |
---|---|---|---|
averagepagedepth | Average Page Depth | int | Traffic |
averagetimespentonpage | Average Time Spent on Page (seconds) | int | Traffic |
averagetimespentonsite | Average Time Spent on Site (seconds) | int | Traffic |
averagevisitdepth | Average Visit Depth | int | Traffic |
bouncerate | Bounce Rate | percent | Traffic |
bounces | Bounces | int | Traffic |
campaigninstances | Campaign Click-throughs | int | Traffic Sources |
cartadditions | Cart Additions | int | Conversion |
cartremovals | Cart Removals | int | Conversion |
carts | Carts | int | Conversion |
This data frame includes:
aw_get_metrics()
does not return calculated
metrics, so those require a separate function call.
<- aw_get_calculatedmetrics()
calc_metrics_df head(calc_metrics_df, 10) %>% select(id, name, type) %>% kable()
id | name | type |
---|---|---|
cm300003965_557fc577e4b07e827d177ad0 | Crash Rate (Mobile) | percent |
cm300003965_557fc577e4b07e827d177ad3 | Average Session Length (Mobile) | time |
cm300003965_557fc578e4b0094eea4f5201 | Single Access (Calculated) | decimal |
cm300003965_557fc578e4b0094eea4f5204 | Crash Rate (Mobile) | percent |
cm300003965_557fc578e4b0416196926008 | Average Session Length (Mobile) | time |
cm300003965_557fc656e4b0adadba08f337 | Avg. Order Value | currency |
cm300003965_557fc656e4b0094eea4f58b6 | Average Order Value | decimal |
cm300003965_557fc656e4b0094eea4f58b8 | Bounce Rate | decimal |
cm300003965_557fc790e4b0094eea4f6222 | Crash Rate (Mobile) | percent |
cm300003965_557fc790e4b0416196926fac | Average Session Length (Mobile) | time |
Calculated metric IDs start with cm
and are assigned by
Adobe when the calculated metric is created.
These two metrics data frames can be searched and filtered for specific metrics for use in subsequent data calls.
The default limit for the number of segments returned by the
aw_get_segments()
function is 10, so, depending on the
number of segments you have, the limit value can be increased to return
a complete list of segments.
<- aw_get_segments(limit = 1000)
segments_df names(segments_df)
#> [1] "name" "description" "id" "owner" "migratedIds" "rsid"
Similar to the dimensions and metrics, aw_get_segments()
returns a data frame of available segments that can be searched and
filtered to identify the specific id
values to use in
subsequent data calls.
Just as in Analysis Workspace, the freeform table is the workhorse of
adobeanalyticsr
, and, as such, is the focus of the rest of
this vignette.
The following is a breakdown of visits by device category for the
last 30 full days for the report suite that is specified in the
AW_REPORTSUITE_ID
environment variable.
# Specify a start date and end date. These can be specified as Date
# objects or as string objects in YYYY-MM-DD format.
<- Sys.Date() - 30
start_date <- Sys.Date() - 1
end_date
<- aw_freeform_table(date_range = c(start_date, end_date),
df dimensions = "mobiledevicetype",
metrics = "visits")
#> A total of 3 rows have been pulled.
# Output the results as a formatted table
%>% kable() df
mobiledevicetype | visits |
---|---|
Other | 5546 |
Mobile Phone | 1509 |
Tablet | 28 |
Note that the top
argument for
aw_freeform_table()
defaults to 5
, so only the
top 5 values will be shown unless that argument is passed a higher
value.
Getting multiple metrics is simply a matter of passing a vector to
the metrics
argument rather than a single string:
<- aw_freeform_table(date_range = c(start_date, end_date),
df dimensions = "mobiledevicetype",
metrics = c("visits", "pageviews", "visitors"))
#> A total of 3 rows have been pulled.
# Output the results as a formatted table
%>% kable() df
mobiledevicetype | visits | pageviews | visitors |
---|---|---|---|
Other | 5546 | 10055 | 3829 |
Mobile Phone | 1509 | 2414 | 1216 |
Tablet | 28 | 95 | 23 |
The example above used standard metrics, but custom metrics (e.g., “event10”) and calculated metrics (e.g., “cm300003965_557fc578e4b0094efa4f5204”) can also be included in the vector of metrics as needed.
The default for the function is to return the API field names, but
the “pretty names” can be returned instead by setting the
prettynames
argument to TRUE
:
<- aw_freeform_table(date_range = c(start_date, end_date),
df dimensions = "mobiledevicetype",
metrics = c("visits", "pageviews", "visitors"),
prettynames = TRUE)
#> A total of 3 rows have been pulled.
# Output the results as a formatted table
%>% kable() df
Mobile Device Type | Visits | Page Views | Unique Visitors |
---|---|---|---|
Other | 5546 | 10055 | 3829 |
Mobile Phone | 1509 | 2414 | 1216 |
Tablet | 28 | 95 | 23 |
While the pretty names are, indeed, “prettier,” they can add
downstream complexity to the code. And, since custom metrics and
calculated metrics can have their names changed at any point, if
subsequent code references columns using these pretty names, the code is
subject to break in the future if the metric(s) it references gets
renamed. So, it is generally considered a best practice to work with the
API field names (prettynames = FALSE
) and then only convert
to more readable names at the point of the data being presented to the
end user.
What is time? This could be a deeply philosophical question, but, instead, we’ll treat it as a pragmatic one: time is a dimension. It’s just a dimension that gets some special treatment in this package.
The first thing to note is that the id
values for time
values are all prepended with daterange
.
%>% filter(grepl("^daterange.*", id)) %>%
dims_df select(id, name) %>%
kable()
id | name |
---|---|
daterangeday | Day |
daterangehour | Hour |
daterangeminute | Minute |
daterangemonth | Month |
daterangequarter | Quarter |
daterangeweek | Week |
daterangeyear | Year |
To get trended data for one or more metrics, simply use the
appropriate date dimension as a dimension. The first “special” thing
that happens here is that you don’t need to worry about setting the
top
argument to include all of the values. The package will
assume that you want to include all of the date values in the range and
will do this automatically.
What will not (yet) be done automatically is for the package
to return the results ordered by the date value (it will sort them by
the first metric or whatever is specified in the metricSort
argument), so we’ll need to do that sorting with the results:
<- aw_freeform_table(date_range = c(start_date, end_date),
df dimensions = "daterangeday",
metrics = "visits") %>%
arrange(daterangeday)
#> A total of 30 rows have been pulled.
# Output the results as a formatted table
%>% kable() df
daterangeday | visits |
---|---|
2021-01-23 | 66 |
2021-01-24 | 67 |
2021-01-25 | 268 |
2021-01-26 | 359 |
2021-01-27 | 331 |
2021-01-28 | 291 |
2021-01-29 | 257 |
2021-01-30 | 81 |
2021-01-31 | 99 |
2021-02-01 | 286 |
2021-02-02 | 336 |
2021-02-03 | 396 |
2021-02-04 | 334 |
2021-02-05 | 329 |
2021-02-06 | 85 |
2021-02-07 | 119 |
2021-02-08 | 372 |
2021-02-09 | 557 |
2021-02-10 | 352 |
2021-02-11 | 337 |
2021-02-12 | 244 |
2021-02-13 | 85 |
2021-02-14 | 106 |
2021-02-15 | 193 |
2021-02-16 | 260 |
2021-02-17 | 214 |
2021-02-18 | 307 |
2021-02-19 | 207 |
2021-02-20 | 65 |
2021-02-21 | 84 |
To include a non-date dimension and specify a top
value
for that dimension while also returning all of the date values, use
0
in the position of the date dimension when specifying the
top
argument.
The following code breaks down each day by Mobile Device Type, but only includes the top 2 values for each breakdown:
<- aw_freeform_table(date_range = c(start_date, end_date),
df dimensions = c("daterangeday", "mobiledevicetype"),
metrics = "visits",
top = c(0, 2)) %>%
arrange(daterangeday)
#> Estimated runtime: 24.8sec./0.41min.
#> 1 of 31 possible data requests complete. Starting the next 30 requests.
#> Request failed [429]. Retrying in 6 seconds...
#> A total of 60 rows have been pulled.
# Output the first few rows as a formatted table
head(df, 10) %>% kable()
daterangeday | mobiledevicetype | visits |
---|---|---|
2021-01-23 | Other | 38 |
2021-01-23 | Mobile Phone | 28 |
2021-01-24 | Other | 41 |
2021-01-24 | Mobile Phone | 25 |
2021-01-25 | Other | 236 |
2021-01-25 | Mobile Phone | 31 |
2021-01-26 | Other | 280 |
2021-01-26 | Mobile Phone | 79 |
2021-01-27 | Other | 266 |
2021-01-27 | Mobile Phone | 62 |
If the daterange...
dimension is not the first dimension
(and, for speed reasons, it may make sense to not have it there, as is
discussed in the next section), the 0
can simply be
re-located:
<- aw_freeform_table(date_range = c(start_date, end_date),
df dimensions = c("mobiledevicetype", "daterangeday"),
metrics = "visits",
top = c(2, 0)) %>%
arrange(mobiledevicetype, daterangeday)
#> Estimated runtime: 2.4sec./0.04min.
#> 1 of 3 possible data requests complete. Starting the next 2 requests.
#> A total of 60 rows have been pulled.
# Output the first few rows as a formatted table
head(df) %>% kable()
mobiledevicetype | daterangeday | visits |
---|---|---|
Mobile Phone | 2021-01-23 | 28 |
Mobile Phone | 2021-01-24 | 25 |
Mobile Phone | 2021-01-25 | 31 |
Mobile Phone | 2021-01-26 | 79 |
Mobile Phone | 2021-01-27 | 62 |
Mobile Phone | 2021-01-28 | 60 |
Working with multiple dimensions is much easier once you understand two fundamental aspects of the 2.0 API, which may seem contradictory at first:
The trick to reconciling these two statements is that any “single dimension” call can be filtered by an unlimited number of other dimensions.
A thought experiment to explain how this works is to imagine that you are in Analysis Workspace (or, actually go into Analysis Workspace and try this directly to make it a real experiment rather than an experiment of the mind):
Each drag-and-drop with your mouse triggers a new API call. To build a freeform table that has Marketing Channel broken down by Mobile Device Type might look something like the following:
At this point, you’re cursing the constraint of not being able to use
the <Shift>
key! But, each of those steps is actually
an API call that Analysis Worskpace is making to the 2.0 API:
These API calls all happen relatively quickly (wildly faster than API calls using the older v1.4 API), but they still all have to happen, and they happen one after another (serially rather than in parallel).
To push this experiment one step farther, think about what you would have to do if you wanted to drill down to a third dimension: Entry Page. You would have to repeatedly drag the Entry Page dimension onto each of:
For the first call above, the API call for the single dimension Entry Page filtered to only include results where Marketing Channel = Direct AND Mobile Device Type = Mobile Phone.
To build a freeform table in Analysis Workspace that has three
dimensions fully broken down would require dozens of drag-and-drop
actions! It’s possible that that tedium is one of the reasons that
you’re looking to adobeanalyticsr
in the first place. As
well you should! aw_freeform_table()
handles these multiple
API calls for you!
On the one hand, all of the exposition above may seem like overkill,
because all you have to do with aw_freeform_table()
to pull
multiple dimensions is to…pass a vector of dimensions to the
dimensions
argument!
Where things get a little trickier is when it comes to specifying how many rows to include for each dimension level, and, more importantly, for constructing the function calls so that they run as quickly as possible. To illustrate, we’ll explore a scenario where we want to get Visits broken down by Mobile Device Type and Browser.
First, let’s query each of the dimensions independently to see how many unique dimension values we’re dealing with. This isn’t something that is required, but we’re going to do a little math to illustrate the differences that dimension order can make.
<- aw_freeform_table(date_range = c(start_date, end_date),
device_types dimensions = "mobiledevicetype",
metrics = "visits",
# The default of 5 is probably going to get all of them,
# but set a higher cutoff just in case.
top = 10)
#> A total of 3 rows have been pulled.
<- aw_freeform_table(date_range = c(start_date, end_date),
browsers dimensions = "browser",
metrics = "visits",
# We want to get all of the browsers, so set top as a high
# value rather than the default of "5"
top = 1000)
#> A total of 113 rows have been pulled.
When making our actual call to get both dimensions at once, we have
two options for the dimensions
argument value:
dimensions = c("mobiledevicetype", "browser")
dimensions = c("browser", "mobiledevicetype")
Assuming we set top
appropriately to include all
possible values, we should get the same data, ultimately. But, the
number of API calls required behind the scenes and, therefore, the time
it will take for the function to run, will vary quite a bit between
these two!
For
dimensions = c("mobiledevicetype", "browser")
,
the API calls will be:
mobiledevicetype
values.mobiledevicetype
values to get each value broken down by browser
.This will result in a total of 4 API calls.
Let’s try it out, including logging how long the function takes to run.
<- Sys.time()
start_time
<- aw_freeform_table(date_range = c(start_date, end_date),
df dimensions = c("mobiledevicetype", "browser"),
metrics = "visits",
top = c(10, 1000))
#> Estimated runtime: 8.8sec./0.15min.
#> 1 of 11 possible data requests complete. Starting the next 3 requests.
#> A total of 125 rows have been pulled.
<- Sys.time()
end_time
# Show the summary for the resulting df for comparison to the next approach.
summary(df)
#> mobiledevicetype browser visits
#> Length:125 Length:125 Min. : 1.00
#> Class :character Class :character 1st Qu.: 1.00
#> Mode :character Mode :character Median : 3.00
#> Mean : 56.66
#> 3rd Qu.: 11.00
#> Max. :3488.00
# Output how long it took to run the query
- start_time
end_time #> Time difference of 4.994079 secs
Now, instead, let’s consider what happens if we swap the order of our
dimensions and, instead, use
dimensions = c("browser", "mobiledevicetype")
.
Now, the API calls will be:
browser
values.browser
values
to get each value broken down by mobiledevicetype
.This will result in a total of 114 API calls!!!
Let’s try it out:
<- Sys.time()
start_time
<- aw_freeform_table(date_range = c(start_date, end_date),
df dimensions = c("browser", "mobiledevicetype"),
metrics = "visits",
top = c(1000, 10))
#> Estimated runtime: 800.8sec./13.35min.
#> 1 of 1001 possible data requests complete. Starting the next 113 requests.
#> A total of 125 rows have been pulled.
<- Sys.time()
end_time
# Show the summary for the resulting df for comparison to the next approach.
summary(df)
#> browser mobiledevicetype visits
#> Length:125 Length:125 Min. : 1.00
#> Class :character Class :character 1st Qu.: 1.00
#> Mode :character Mode :character Median : 3.00
#> Mean : 56.66
#> 3rd Qu.: 11.00
#> Max. :3488.00
# Output how long it took to run the query
- start_time
end_time #> Time difference of 1.615242 mins
This is a BIG difference in run-time even though, aside from the column order being slightly different, the resulting data is identical.
If you followed the Analysis Workspace experiment in the previous
section, then another way to think about this is that (without the
<Shift>
key), breaking down Mobile Device
Type by Browser would require a lot
less clicking and dropping than breaking down Browser
by Mobile Device Type. In the latter, you would have to
drag Mobile Device Type onto each of the
numerous Browser values one at a time!
The good (great?) news is that you can string as many dimensions together as you would like and then let R just take it from there and get a resulting data frame with multiple dimensions! The order of your dimensions just can have a dramatic effect on how long you have to wait for the results to be returned.