library(manymodelr)
agg_by_group
As can be guessed from the name, this function provides an easy way to manipulate grouped data. We can for instance find the number of observations in the yields data set. The formula takes the form x~y
where y
is the grouping variable(in this case normal
). One can supply a formula as shown next.
# Load the yields dataset
data("yields")
head(agg_by_group(yields,.~normal,length))
#> Grouped By[1]: normal
#>
#> normal height weight yield
#> 1 No 500 500 500
#> 2 Yes 500 500 500
head(agg_by_group(mtcars,cyl~hp+vs,sum))
#> Grouped By[2]: hp vs
#>
#> hp vs cyl
#> 1 91 0 4
#> 2 110 0 12
#> 3 150 0 16
#> 4 175 0 22
#> 5 180 0 24
#> 6 205 0 8
rowdiff
This is useful when trying to find differences between rows. The direction
argument specifies how the subtractions are made while the exclude
argument is used to specify classes that should be removed before calculations are made. Using direction="reverse"
performs a subtraction akin to x-(x-1)
where x
is the row number.
head(rowdiff(yields,exclude = "factor",direction = "reverse"))
#> height weight yield
#> 1 NA NA NA
#> 2 -0.04212634 0.24042659 -15.808303
#> 3 0.01516059 0.09649856 11.170825
#> 4 0.25961718 0.03008764 6.578424
#> 5 -0.11495811 -0.02971837 -19.584090
#> 6 0.57638627 -0.42979818 6.825719
na_replace
This allows the user to conveniently replace missing values. Current options are ffill
which replaces with the next non-missing value, samples
that samples the data and does replacement, value
that allows one to fill NA
s with a specific value. Other common mathematical methods like min
, max
,get_mode
, sd
, etc are no longer supported. They are now available with more flexibility in standalone mde
head(na_replace(airquality, how="value", value="Missing"),8)
#> Ozone Solar.R Wind Temp Month Day
#> 1 41 190 7.4 67 5 1
#> 2 36 118 8.0 72 5 2
#> 3 12 149 12.6 74 5 3
#> 4 18 313 11.5 62 5 4
#> 5 Missing Missing 14.3 56 5 5
#> 6 28 Missing 14.9 66 5 6
#> 7 23 299 8.6 65 5 7
#> 8 19 99 13.8 59 5 8
na_replace_grouped
This provides a convenient way to replace values by group.
test_df <- data.frame(A=c(NA,1,2,3), B=c(1,5,6,NA),groups=c("A","A","B","B"))
# Replace NAs by group
# replace with the next non NA by group.
na_replace_grouped(df=test_df,group_by_cols = "groups",how="ffill")
#> groups A B
#> 1 A 1 1
#> 2 A 1 5
#> 3 B 2 6
#> 4 B 3 6
The use of mean
,sd
,etc is no longer supported. Use mde instead which is focused on missingness.