Getting Started

This vignette goes through the basics of getting started with adobeanalyticsr.

Basic Setup

The basic setup is documented on the home page of adobeanalyticsr.com, which reviews the details for:

Creating an Adobe Console API project through https://developer.adobe.com/console (this is available with all Adobe Analytics accounts, but requires the appropriate level of admin access to set up the project)
Adding AW_CLIENT_ID and AW_CLIENT_SECRET environment variables (the recommended mechanism for this is through the .Renviron file). If these environment variables are not available, then the client_id and client_secret arguments in aw_token() need to be populated with the client ID and secret.
Adding AW_COMPANY_ID and AW_REPORTSUITE_ID environment variables (again, this is recommended to be done via a .Renviron file).

Understanding AW_COMPANY_ID and AW_REPORTSUITE_ID

Many adobeanalyticsr functions require a company ID (the Adobe account being accessed) and a report suite ID (the specific report suite to pull data or information from) to run. As such, these are arguments (company_id and rsid) within those functions that default to the values for the AW_COMPANY_ID and AW_REPORTSUITE_ID environment variables.

This approach has some important ramifications:

If you primarily access a single company and/or a single report suite, then creating AW_COMPANY_ID and AW_REPORTSUITE_ID environment variables means you will not have to set them on every function call, which can make for shorter and cleaner code.
If you periodically (or often) need to use a different company_id or rsid from what is specified as an environment variable, all you need to do is call them explicitly within the function calls. The values you specify for either of these (or both of them) directly in a function call will take precedence over any environment variables that are set up (this is core R behavior: function arguments often have default values built into them, but any value specified in the code will be used in place of those defaults; this is not anything unique to adobeanalyticsr).
If you do not have AW_COMPANY_ID and AW_REPORTSUITE_ID environment variables created and you do not specify values for them in the adobeanalyticsr function calls, then the functions will fail.

In this vignette, both AW_COMPANY_ID and AW_REPORTSUITE_ID environment variables have been set up (but not shown).

Load the Package and Authenticate

Once the package is loaded, authenticate with aw_token(). If you have not authenticated ever, or if you have not authenticated within the last 24 hours, then a browser should open requiring you to log in to your Adobe account, and it will then redirect you to a web page that displays a lengthy token that you can copy and paste back into an Enter your authorization code: prompt in the R console.

library(dplyr)            # Light cleanup of output
library(knitr)            # Cleaner data tables
library(adobeanalyticsr)
aw_token()

The token will look something like this (with different letters and numbers and characters):

eyJ4NXUiOiJpbXNfbmExLWtleS0xLmNlciIsImFsZyI6IlJTMjU2In0.eyJpZCI6IjE2MTI3MTEyMjM5MzNfNGE2ZmVlM2-tYzdkOS00M2QyLWI5ODUtYzAzMDk2NDA2MGFjX3VlMSIsImNsaWVudF9pZCI6IjA2MTU0ZDgzMjl-OTQxZjViMGUzM2EwZGM2M2U3NmQxIiwidXNlcl9pZCI6IkY2MUJBOTU0NUE3QjMyM0QwQTQ5NUQ2Q0BBZG9iZUlEI-wic3RhdGUiOiIiLCJKeXBlIjoiYXV0aG9yaXphdGlvbl9jb2RlIiwiYXMiOiJpbXMtbmExIiwiZmciO-JWRlFSUkhCTVhMTzU2SDRDRTZSTFJZZUE0TT09PT09PSIsInNpZCI6IjE2MTI3MTEyMjM5MzNfMWE2ZDc4ZTUtNT-zZi00OWFkLWFmZTktYTk0MTJkZmYwMzA0X3VlMSIsIm90byI3InRydWUiLCJleHBpcmVzX2luIjoiMT-wMDAwMCIsImNyZWF0ZWRfYXQiOiIxNjEyNzExMjIzOTMzIiwic2NvcGUiOiJvcGVuaWQsQWRvYmVJRCxyZWFkX-9yZ2FuaXphdGlvbnMsYWRkaXRpb25hbF9pbmZvLnByb2plY3RlZFByb2R1Y3RDb250ZXh0LGFkZGl0aW9uY-xfaW5mby5qb2JfZnVuY3Rpb24ifQ.EvoihD7IPXkew_yFQuzjZGfiU_Q8-vlUkmytwfB_Y46DKePINPn7URq8bit0dXoO-tUdMUKVNOHEBbqww6ydDGBPHSCbOH1GgNsILoN96tjHTtA-jDxN8jrAnQ0PuCssA-GePnqryxCxOH9WyZIl2fog00ib8iZ3ZFAJLDrvthWwWHUw-ivu-K-F3UAtU7A4ma_7pe07D1rW5MhTcZOSr0pri68bjFA86cJqH5DHyMdp4F2d7QgZcYPMdrvVfXTwWXv9s5r6huvDcqv6nny-WOKZbKmoP6zwMzn5xa343wrQ5oXFbRxYem-tC154_dc7ekrC8YUX0pY5up9a-OUy0w

Confirm that the authorization worked by running get_me(), which will return two things if the authorization was successful:

A message: Your data is now available!
A data frame showing the company ID (globalCompanyId) and company name (companyName) for all of the companies (accounts) to which you have access based on the login you used when you called aw_token().

Get Available Dimensions, Metrics, and Segments

Because the Adobe Analytics API works with the variable and segment IDs rather than the plain English names of dimensions, metrics, and segments, it is often useful, at least in the initial development of a project, to create data frames that contain all of the available values for these three types of variables.

Get Available Dimensions

dims_df <- aw_get_dimensions()
#> Auto-refreshing stale OAuth token.
#> Warning: Unable to refresh token: unauthorized_client
#> Request failed [401]. Retrying in 1 seconds...
#> Auto-refreshing stale OAuth token.
#> Warning: Unable to refresh token: unauthorized_client
#> Request failed [401]. Retrying in 1 seconds...
#> Auto-refreshing stale OAuth token.
#> Warning: Unable to refresh token: unauthorized_client
#> Error in aw_call_api(req_path = urlstructure, debug = debug, company_id = company_id): Unauthorized (HTTP 401).
head(dims_df, 10) %>% select(id, name, type, category) %>% kable()

id	name	type	category
averagepagetime	Time Spent on Page - Bucketed	ordered-enum	Metrics
browser	Browser	string	Audience
browserheight	Browser Height - Granular	int	Audience
browserheightbucketed	Browser Height - Bucketed	ordered-enum	Audience
browsertype	Browser Type	enum	Audience
browserwidth	Browser Width - Granular	int	Audience
browserwidthbucketed	Browser Width - Bucketed	ordered-enum	Audience
campaign	Tracking Code	string	Traffic Sources
campaign.1	Campaign Source	string	Traffic Sources
campaign.2	Campaign Medium	string	Traffic Sources

This data frame includes:

Standard, out-of-the-box dimensions
Custom dimensions (eVars and s.props)
Classifications of dimensions: these are identifiable because their id values are [classified variable].[num] and they have a non-NA value for parent (the parent column–not shown above–has the name of the classified variable)

The data frame can then be searched and filtered for specific dimensions for use in subsequent data calls.

Get Available Metrics

metrics_df <- aw_get_metrics()
head(metrics_df, 10) %>% select(id, name, type, category) %>% kable()

id	name	type	category
averagepagedepth	Average Page Depth	int	Traffic
averagetimespentonpage	Average Time Spent on Page (seconds)	int	Traffic
averagetimespentonsite	Average Time Spent on Site (seconds)	int	Traffic
averagevisitdepth	Average Visit Depth	int	Traffic
bouncerate	Bounce Rate	percent	Traffic
bounces	Bounces	int	Traffic
campaigninstances	Campaign Click-throughs	int	Traffic Sources
cartadditions	Cart Additions	int	Conversion
cartremovals	Cart Removals	int	Conversion
carts	Carts	int	Conversion

This data frame includes:

Standard, out-of-the-box metrics
Custom metrics (events)

aw_get_metrics() does not return calculated metrics, so those require a separate function call.

calc_metrics_df <- aw_get_calculatedmetrics()
head(calc_metrics_df, 10) %>% select(id, name, type) %>% kable()

id	name	type
cm300003965_557fc577e4b07e827d177ad0	Crash Rate (Mobile)	percent
cm300003965_557fc577e4b07e827d177ad3	Average Session Length (Mobile)	time
cm300003965_557fc578e4b0094eea4f5201	Single Access (Calculated)	decimal
cm300003965_557fc578e4b0094eea4f5204	Crash Rate (Mobile)	percent
cm300003965_557fc578e4b0416196926008	Average Session Length (Mobile)	time
cm300003965_557fc656e4b0adadba08f337	Avg. Order Value	currency
cm300003965_557fc656e4b0094eea4f58b6	Average Order Value	decimal
cm300003965_557fc656e4b0094eea4f58b8	Bounce Rate	decimal
cm300003965_557fc790e4b0094eea4f6222	Crash Rate (Mobile)	percent
cm300003965_557fc790e4b0416196926fac	Average Session Length (Mobile)	time

Calculated metric IDs start with cm and are assigned by Adobe when the calculated metric is created.

These two metrics data frames can be searched and filtered for specific metrics for use in subsequent data calls.

Get Available Segments

The default limit for the number of segments returned by the aw_get_segments() function is 10, so, depending on the number of segments you have, the limit value can be increased to return a complete list of segments.

segments_df <- aw_get_segments(limit = 1000)
names(segments_df)
#> [1] "name"        "description" "id"          "owner"       "migratedIds" "rsid"

Similar to the dimensions and metrics, aw_get_segments() returns a data frame of available segments that can be searched and filtered to identify the specific id values to use in subsequent data calls.

Freeform Table Data Pull

Just as in Analysis Workspace, the freeform table is the workhorse of adobeanalyticsr, and, as such, is the focus of the rest of this vignette.

Single Dimension, Single Metric

The following is a breakdown of visits by device category for the last 30 full days for the report suite that is specified in the AW_REPORTSUITE_ID environment variable.

# Specify a start date and end date. These can be specified as Date
# objects or as string objects in YYYY-MM-DD format.
start_date <- Sys.Date() - 30
end_date <- Sys.Date() - 1

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "mobiledevicetype",
                        metrics = "visits")
#> A total of 3 rows have been pulled.

# Output the results as a formatted table
df %>% kable()

mobiledevicetype	visits
Other	5546
Mobile Phone	1509
Tablet	28

Note that the top argument for aw_freeform_table() defaults to 5, so only the top 5 values will be shown unless that argument is passed a higher value.

Single Dimension, Multiple Metrics

Getting multiple metrics is simply a matter of passing a vector to the metrics argument rather than a single string:

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "mobiledevicetype",
                        metrics = c("visits", "pageviews", "visitors"))
#> A total of 3 rows have been pulled.

# Output the results as a formatted table
df %>% kable()

mobiledevicetype	visits	pageviews	visitors
Other	5546	10055	3829
Mobile Phone	1509	2414	1216
Tablet	28	95	23

The example above used standard metrics, but custom metrics (e.g., “event10”) and calculated metrics (e.g., “cm300003965_557fc578e4b0094efa4f5204”) can also be included in the vector of metrics as needed.

The default for the function is to return the API field names, but the “pretty names” can be returned instead by setting the prettynames argument to TRUE:

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "mobiledevicetype",
                        metrics = c("visits", "pageviews", "visitors"),
                        prettynames = TRUE)
#> A total of 3 rows have been pulled.

# Output the results as a formatted table
df %>% kable()

Mobile Device Type	Visits	Page Views	Unique Visitors
Other	5546	10055	3829
Mobile Phone	1509	2414	1216
Tablet	28	95	23

While the pretty names are, indeed, “prettier,” they can add downstream complexity to the code. And, since custom metrics and calculated metrics can have their names changed at any point, if subsequent code references columns using these pretty names, the code is subject to break in the future if the metric(s) it references gets renamed. So, it is generally considered a best practice to work with the API field names (prettynames = FALSE) and then only convert to more readable names at the point of the data being presented to the end user.

Trending Metrics Over Time

What is time? This could be a deeply philosophical question, but, instead, we’ll treat it as a pragmatic one: time is a dimension. It’s just a dimension that gets some special treatment in this package.

The first thing to note is that the id values for time values are all prepended with daterange.

dims_df %>% filter(grepl("^daterange.*", id)) %>% 
  select(id, name) %>% 
  kable()

id	name
daterangeday	Day
daterangehour	Hour
daterangeminute	Minute
daterangemonth	Month
daterangequarter	Quarter
daterangeweek	Week
daterangeyear	Year

To get trended data for one or more metrics, simply use the appropriate date dimension as a dimension. The first “special” thing that happens here is that you don’t need to worry about setting the top argument to include all of the values. The package will assume that you want to include all of the date values in the range and will do this automatically.

What will not (yet) be done automatically is for the package to return the results ordered by the date value (it will sort them by the first metric or whatever is specified in the metricSort argument), so we’ll need to do that sorting with the results:

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "daterangeday",
                        metrics = "visits") %>% 
  arrange(daterangeday)
#> A total of 30 rows have been pulled.

# Output the results as a formatted table
df %>% kable()

daterangeday	visits
2021-01-23	66
2021-01-24	67
2021-01-25	268
2021-01-26	359
2021-01-27	331
2021-01-28	291
2021-01-29	257
2021-01-30	81
2021-01-31	99
2021-02-01	286
2021-02-02	336
2021-02-03	396
2021-02-04	334
2021-02-05	329
2021-02-06	85
2021-02-07	119
2021-02-08	372
2021-02-09	557
2021-02-10	352
2021-02-11	337
2021-02-12	244
2021-02-13	85
2021-02-14	106
2021-02-15	193
2021-02-16	260
2021-02-17	214
2021-02-18	307
2021-02-19	207
2021-02-20	65
2021-02-21	84

To include a non-date dimension and specify a top value for that dimension while also returning all of the date values, use 0 in the position of the date dimension when specifying the top argument.

The following code breaks down each day by Mobile Device Type, but only includes the top 2 values for each breakdown:

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = c("daterangeday", "mobiledevicetype"),
                        metrics = "visits",
                        top = c(0, 2)) %>% 
  arrange(daterangeday)
#> Estimated runtime: 24.8sec./0.41min.
#> 1 of 31 possible data requests complete. Starting the next 30 requests.
#> Request failed [429]. Retrying in 6 seconds...
#> A total of 60 rows have been pulled.

# Output the first few rows as a formatted table
head(df, 10) %>% kable()

daterangeday	mobiledevicetype	visits
2021-01-23	Other	38
2021-01-23	Mobile Phone	28
2021-01-24	Other	41
2021-01-24	Mobile Phone	25
2021-01-25	Other	236
2021-01-25	Mobile Phone	31
2021-01-26	Other	280
2021-01-26	Mobile Phone	79
2021-01-27	Other	266
2021-01-27	Mobile Phone	62

If the daterange... dimension is not the first dimension (and, for speed reasons, it may make sense to not have it there, as is discussed in the next section), the 0 can simply be re-located:

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = c("mobiledevicetype", "daterangeday"),
                        metrics = "visits",
                        top = c(2, 0)) %>% 
  arrange(mobiledevicetype, daterangeday)
#> Estimated runtime: 2.4sec./0.04min.
#> 1 of 3 possible data requests complete. Starting the next 2 requests.
#> A total of 60 rows have been pulled.

# Output the first few rows as a formatted table
head(df) %>% kable()

mobiledevicetype	daterangeday	visits
Mobile Phone	2021-01-23	28
Mobile Phone	2021-01-24	25
Mobile Phone	2021-01-25	31
Mobile Phone	2021-01-26	79
Mobile Phone	2021-01-27	62
Mobile Phone	2021-01-28	60

Multiple Dimensions, Multiple Metrics

Working with multiple dimensions is much easier once you understand two fundamental aspects of the 2.0 API, which may seem contradictory at first:

Each API call is limited to a single dimension.
The API can be used to pull data with an unlimited number of dimensions.

The trick to reconciling these two statements is that any “single dimension” call can be filtered by an unlimited number of other dimensions.

Consider Multiple Dimensions in Analysis Workspace

A thought experiment to explain how this works is to imagine that you are in Analysis Workspace (or, actually go into Analysis Workspace and try this directly to make it a real experiment rather than an experiment of the mind):

Create a blank freeform table.
Using only your mouse (no keyboard keys allowed!) create a freeform table report that has two dimensions.

Each drag-and-drop with your mouse triggers a new API call. To build a freeform table that has Marketing Channel broken down by Mobile Device Type might look something like the following:

[Click and drag] Marketing Channel onto the freeform table. [API Call #1]
[Click and drag] Mobile Device Type onto the first marketing channel in the list: Direct (for instance). [API Call #2]
[Click and drag] Mobile Device Type onto the second marketing channel in the list: Paid Search. [API Call #3]
[Click and drag] Mobile Device Type onto the third marketing channel the list: Natural Search. [API Call #4]
[Click and drag] Mobile Device Type onto the third marketing channel the list: Display. [API Call #5]

At this point, you’re cursing the constraint of not being able to use the <Shift> key! But, each of those steps is actually an API call that Analysis Worskpace is making to the 2.0 API:

API Call #1 is a simple, single-dimension call that returns a table listing all of the Marketing Channel values.
API Call #2 is also a single-dimension API call, but the single dimension being queried is now Mobile Device type. But, that query has a constraint on it, in that it is filtered to only include results where Marketing Channel = Direct (the first marketing channel in the list; that’s the value we dragged and dropped Mobile Device Type on).
API Call #3 is similar to #2, except the results are filtered for Marketing Channel = Page Search).
And so on…

These API calls all happen relatively quickly (wildly faster than API calls using the older v1.4 API), but they still all have to happen, and they happen one after another (serially rather than in parallel).

To push this experiment one step farther, think about what you would have to do if you wanted to drill down to a third dimension: Entry Page. You would have to repeatedly drag the Entry Page dimension onto each of:

Direct > Mobile Phone
Direct > Other
Direct > Tablet
Paid Search > Mobile Phone
Paid Search > Other
Paid Search > Tablet
Natural Search > Mobile Phone
Natural Search > Other
Natural Search > Tablet
Etc.

For the first call above, the API call for the single dimension Entry Page filtered to only include results where Marketing Channel = Direct AND Mobile Device Type = Mobile Phone.

To build a freeform table in Analysis Workspace that has three dimensions fully broken down would require dozens of drag-and-drop actions! It’s possible that that tedium is one of the reasons that you’re looking to adobeanalyticsr in the first place. As well you should! aw_freeform_table() handles these multiple API calls for you!

Multiple Dimensions with aw_freeform_table()

On the one hand, all of the exposition above may seem like overkill, because all you have to do with aw_freeform_table() to pull multiple dimensions is to…pass a vector of dimensions to the dimensions argument!

Where things get a little trickier is when it comes to specifying how many rows to include for each dimension level, and, more importantly, for constructing the function calls so that they run as quickly as possible. To illustrate, we’ll explore a scenario where we want to get Visits broken down by Mobile Device Type and Browser.

First, let’s query each of the dimensions independently to see how many unique dimension values we’re dealing with. This isn’t something that is required, but we’re going to do a little math to illustrate the differences that dimension order can make.

device_types <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "mobiledevicetype",
                        metrics = "visits",
                        # The default of 5 is probably going to get all of them,
                        # but set a higher cutoff just in case.
                        top = 10)
#> A total of 3 rows have been pulled.

browsers <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = "browser",
                        metrics = "visits",
                        # We want to get all of the browsers, so set top as a high 
                        # value rather than the default of "5"
                        top = 1000)
#> A total of 113 rows have been pulled.

When making our actual call to get both dimensions at once, we have two options for the dimensions argument value:

dimensions = c("mobiledevicetype", "browser")
dimensions = c("browser", "mobiledevicetype")

Assuming we set top appropriately to include all possible values, we should get the same data, ultimately. But, the number of API calls required behind the scenes and, therefore, the time it will take for the function to run, will vary quite a bit between these two!

For dimensions = c("mobiledevicetype", "browser"), the API calls will be:

One call to get the full list of mobiledevicetype values.
One query for each of the 3 mobiledevicetype values to get each value broken down by browser.

This will result in a total of 4 API calls.

Let’s try it out, including logging how long the function takes to run.

start_time <- Sys.time()

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = c("mobiledevicetype", "browser"),
                        metrics = "visits",
                        top = c(10, 1000))
#> Estimated runtime: 8.8sec./0.15min.
#> 1 of 11 possible data requests complete. Starting the next 3 requests.
#> A total of 125 rows have been pulled.

end_time <- Sys.time()

# Show the summary for the resulting df for comparison to the next approach.
summary(df)
#>  mobiledevicetype     browser              visits       
#>  Length:125         Length:125         Min.   :   1.00  
#>  Class :character   Class :character   1st Qu.:   1.00  
#>  Mode  :character   Mode  :character   Median :   3.00  
#>                                        Mean   :  56.66  
#>                                        3rd Qu.:  11.00  
#>                                        Max.   :3488.00

# Output how long it took to run the query
end_time - start_time
#> Time difference of 4.994079 secs

Now, instead, let’s consider what happens if we swap the order of our dimensions and, instead, use dimensions = c("browser", "mobiledevicetype"). Now, the API calls will be:

One call to get the full list of browser values.
One query for each of the 113 browser values to get each value broken down by mobiledevicetype.

This will result in a total of 114 API calls!!!

Let’s try it out:

start_time <- Sys.time()

df <- aw_freeform_table(date_range = c(start_date, end_date),
                        dimensions = c("browser", "mobiledevicetype"),
                        metrics = "visits",
                        top = c(1000, 10))
#> Estimated runtime: 800.8sec./13.35min.
#> 1 of 1001 possible data requests complete. Starting the next 113 requests.
#> A total of 125 rows have been pulled.

end_time <- Sys.time()

# Show the summary for the resulting df for comparison to the next approach.
summary(df)
#>    browser          mobiledevicetype       visits       
#>  Length:125         Length:125         Min.   :   1.00  
#>  Class :character   Class :character   1st Qu.:   1.00  
#>  Mode  :character   Mode  :character   Median :   3.00  
#>                                        Mean   :  56.66  
#>                                        3rd Qu.:  11.00  
#>                                        Max.   :3488.00

# Output how long it took to run the query
end_time - start_time
#> Time difference of 1.615242 mins

This is a BIG difference in run-time even though, aside from the column order being slightly different, the resulting data is identical.

If you followed the Analysis Workspace experiment in the previous section, then another way to think about this is that (without the <Shift> key), breaking down Mobile Device Type by Browser would require a lot less clicking and dropping than breaking down Browser by Mobile Device Type. In the latter, you would have to drag Mobile Device Type onto each of the numerous Browser values one at a time!

The good (great?) news is that you can string as many dimensions together as you would like and then let R just take it from there and get a resulting data frame with multiple dimensions! The order of your dimensions just can have a dramatic effect on how long you have to wait for the results to be returned.