# timelineR: timeline visualizations in R
This package helps in visualizing multi-variate time series data having numeric and factor variables.
The main function to plot time series is plot_timeline
A data helper function match_grep
is provided to extract names from large data frames
The plot_timeline
function in R helps visualize multi-variate time-series having numeric and state variables. The data used for demonstration details the pollution levels in Delhi, which is downloaded from http://www.cpcb.gov.in/ .
In this package, the futile:logger
package is used for logging since it provides a more granular control over the logging. This is useful to use the package in production systems but you can treat the logs like normal R logs.
For the package to work correctly, it expects the data to be structured in a specific way with a single timestamp column and one or more state and numeric variables.
State variables are variables represented by factor
or character
columns, are categorical in nature.
Numeric variables are represented by numeric
columns, are numeric and ordinal in nature.
Also, input data frame should have one column of the type POSIXct
which represents the time of occurence of each observation.
library(dplyr, quietly = T)
data_path <- system.file("extdata/delhi_air_pollution.csv", package = "timelineR")
air_pollution <- read.csv(data_path) %>% mutate(date = as.POSIXct(date)) %>% filter(date < as.POSIXct("2016-05-01"))
air_pollution$date = as.POSIXct(air_pollution$date)
str(air_pollution)
plot_timeline
The default configuration of the plot_timeline
package plots univariate time series for all the variables in the order they appear in the data frame. It returns a grob
object (grid
package).
{r, fig.width=8, fig.height=10} require(timelineR) plot_grob <- plot_timeline(air_pollution)
The columns for which the data should be plotted can be subsetted by passing the names of the column in the argument data_cols
. By default, the data for all the variables is plotted.
{r, fig.width=8, fig.height=4} data_cols = c("pm10", "no") plot_grob <- plot_timeline(air_pollution, data_cols = data_cols)
The time to plot can be subsetted by giving the start and the end times as values in the arguments start_time
and end_time
respectively.
{r, fig.width=8, fig.height=4} data_cols = c("pm10", "no") start_time = as.POSIXct("2016-03-05") end_time = as.POSIXct("2016-03-10") plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, start_time = start_time, end_time = end_time)
The limit to plot on each axis can be passed as a named list with limits for the numeric plot
{r, fig.width=8, fig.height=4} data_cols = c("pm10", "no") start_time = as.POSIXct("2016-03-05") end_time = as.POSIXct("2016-03-10") ylimits = list("pm10" = c(120,180)) plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, start_time = start_time, end_time = end_time, ylimits = ylimits)
To bring multiple numeric variables to a comparable level, they can be scaled. The scaling information is passed as a named vector with name as the name of the column and value as its corresponding multiplication factor.
{r, fig.width=8, fig.height=5} data_cols = c("pm10", "no", "odd_even") scale_vals = c("pm10" = 0.5, "no" = 2) plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, scale_vals = scale_vals)
For state variables, it can be specified if the legend is required or not. By default, the legends are shown
{r, fig.width=8, fig.height=4} data_cols = c("pm10", "odd_even") plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, add_legend = F)
For changing the fill color values for the plot of state variables, a color mapping can be passed. Color mapping can be defined for one or more state variables. However for a given state variable, mapping should be defined for all the possible states in that variable.
The package internally uses the ggplot package and accepts color input in the format supported by ggplot
{r, fig.width=8, fig.height=2} data_cols = c("odd_even") color_mapping = list("odd_even" = c("NORMAL" = "#E67E22", "ODD-EVEN" = "green4")) plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, color_mapping = color_mapping)
The time series for data of numeric variables can be plotted in three ways:
The type of plot can be specified in the argument numeric_plot_type
. Only three types are supported, hence the value for this argument should be one of line
, step
or point
. By default, the plot type is line
.
{r, fig.width=8, fig.height=2} data_cols = c("pm10") plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, numeric_plot_type = "step")
{r, fig.width=8, fig.height=2} data_cols = c("co") plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, numeric_plot_type = "point")
By default, the label on Y-axis for the state and numeric plot is State
and Numeric
respectively. This can be changed by passing a name vector with names being the names of the plots and values as the name of the label.
{r, fig.width=8, fig.height=6} data_cols = c("co", "pm25", "odd_even") ylabels = c("pm25" = "concentration", "odd_even" = "day type") plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, ylabels = ylabels)
Many times it is required to not show the output of the current function. In that case, the plot can be drawn by passing the plot_output
argument as FALSE
. By default, the plot is drawn.
{r, fig.width=8, fig.height=6} data_cols = c("co", "pm25", "odd_even") ylabels = c("pm25" = "concentration", "odd_even" = "day type") plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, ylabels = ylabels, plot_output = F)
In time series data visualization, many times it is required to study the relation between a state variable and a numeric variable. For this, with the help of plot_timeline
, it is possible to overlap the plot of numeric variable on that of a state variable.
To draw overlapping plots, a named list of vector is passed. Each name in the list is the name of the overlapping plot. Each element in the list is a vector of two elements with the first element as the name of the state variable and the second variable as the name of the numeric variable. Both the variables should be present in the data_cols
.
{r, fig.width=8, fig.height=6} data_cols = c("pm25", "odd_even") overlapping_plot_names = list("pm25_with_odd_even" = c("odd_even", "pm25")) plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, overlap_plots_names = overlapping_plot_names)
Each plot can be assigned a title. By default the title is the name of the variable for univariate plots and the name of the plot for the overlapping plots. The information is passed as a named vector with names as the name of the plot and value as the name of the title.
{r, fig.width=8, fig.height=6} data_cols = c("pm25", "odd_even") titles = c("pm25" = "Concentration of particulate 2.5 matter", "pm25_with_odd_even" = "Study of concentration of PM 2.5 matter with odd-even policy") overlapping_plot_names = list("pm25_with_odd_even" = c("odd_even", "pm25")) plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, overlap_plots_names = overlapping_plot_names, titles = titles)
While visualizing data, it is preferred to have plots arranged in a required order for better understanding. The order is specified as a vector with the names of variables and overlapping plots arranged in the required order. Only the plots given in the argument order_plot
are drawn.
By default the plots for univariate variables are arranged in the order they appear in the data frame followed by the overlapping plots.
{r, fig.width=8, fig.height=6} data_cols = c("pm25", "odd_even") titles = c("pm25" = "Concentration of particulate 2.5 matter", "pm25_with_odd_even" = "Study of concentration of PM 2.5 matter with odd-even policy") overlapping_plot_names = list("pm25_with_odd_even" = c("odd_even", "pm25")) order_plots = c("pm25_with_odd_even", "pm25") plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, overlap_plots_names = overlapping_plot_names, titles = titles, order_plots = order_plots)
To emphasie on some of the plots, the relative size of the plots can be adjusted. The relative sizes are passed as a named vector with tha name as the name of the plot and value as the relative ratio. By default each plot has relative size of 1.
{r, fig.width=8, fig.height=7} data_cols = c("pm25", "odd_even") titles = c("pm25" = "Concentration of particulate 2.5 matter", "pm25_with_odd_even" = "Study of concentration of PM 2.5 matter with odd-even policy") overlapping_plot_names = list("pm25_with_odd_even" = c("odd_even", "pm25")) plot_size_ratios = c("pm25_with_odd_even" = 2, "odd_even" = 0.5) plot_grob <- plot_timeline(air_pollution, data_cols = data_cols, overlap_plots_names = overlapping_plot_names, titles = titles, plot_size_ratios = plot_size_ratios)
It is possible to save the plot from the function. The name to be saved is passed as save_path
argument. Only PNG format is supported.
To ease the procedure of extracting required names which are passed as arguments in the plot_timeline
function, a helper function match_grep
based on regular expression is provided for the same.
data_path <- system.file("extdata/test_data.csv", package = "timelineR")
test_data <- read.csv(data_path)
test_data %>% str
match_grep
The first argument grep_vec
is the named vector which is to be searched. The second vector actual_names
is the vector in which the search is to be named. A named vector is returned with the names as the matched names and values given in grep_vec
.
grep_vec = c("state" = 1, "num" = 2)
match_grep(grep_vec, names(test_data))
grep_vec
If it is required to search the values given in a vector then the argument use_values
should be set as TRUE
.
grep_vec = c("state" , "num")
match_grep(grep_vec, names(test_data), use_values = T)
To return just the matched values from actual_names
instead of the named vector, pass the argument return_names
as TRUE
.
grep_vec = c("state" , "num")
match_grep(grep_vec, names(test_data), use_values = T, return_names = T)
To show for each value searched, what values in actual_names
matched or not, the argument echo
can be set to TRUE
.
grep_vec = c("state" , "num")
match_grep(grep_vec, names(test_data), use_values = T, return_names = T, echo = T)