The package largeList is designed to handle large list objects in R. In many business and engineering scenarios, huge among of unstructured data needs to be stored into list objects, which causes both RAM consumption and running time problems. This package enables serializing, compressing and saving elements in list separately, therefore provides the possibility to randomly access elements stored in files.
R objects will be serialized with an uncompressed/ compressed (zlib, default level) non-ascii little-endian format, which is similar to saveRDS. Two ordered tables are created at the end of data for quick lookups, one for indices and one for element names. Notice that, all the names will be truncated to 16 characters.
Given indices or names of elements, positions will be directly extracted or extracted via binary search within the name-position table. Then required elements are located and unserialized. Therefore it will not restore the whole list into memory.
In the current version, only basic data types are supported, including NULL, integer, numeric, character, complex, raw, logic, factor, list, matrix, array and data.frame. Types like function, data.table are not supported.
The supported maximum size of each R object stored in list is \(2^{31} -1\) bytes.
There’re basically two ways to use the package: via original functions or use operator overloadings.
Basic functions include:
If parameter append
is TRUE
, file will be created if not exists, or truncated if already exists. If append = FALSE
, list object will be appended to the file using the same compression setting.
# save list_1 to a new file called example.llo using compression.
list_1 <- list("A" = c(1,2),
"B" = "abc",
list(1, 2))
saveList(object = list_1, file = "example.llo", append = FALSE, compress = TRUE)
# append list_2 to the existing file example.llo, compress option will be extracted from the file.
list_2 <- list("C" = data.frame(col_1 = 1:2, col_2 = 3:4),
"D" = matrix(0, nrow = 2, ncol = 2))
saveList(object = list_2, file = "example.llo", append = TRUE)
Different kinds of indices can be used in readList to access data.
# all elements
list_read <- readList(file = "example.llo")
# by numeric indices
list_read <- readList(file = "example.llo", index = c(1, 3))
# by names
list_read <- readList(file = "example.llo", index = c("A", "B"))
# by logical indices
list_read <- readList(file = "example.llo", index = c(T, F, T, F, T))
Removing can also be done using different indices. This function may relocate all the data in the stored file, thus can be very slow! Please consider to call this function batchwise instead of index one by one.
# copy the file
file.copy(from = "example.llo", to = "example_remove.llo")
# [1] TRUE
# by numeric indices
removeFromList(file = "example_remove.llo", index = c(2))
# by names
removeFromList(file = "example_remove.llo", index = c("A", "D"))
# by logical indices
removeFromList(file = "example_remove.llo", index = c(T, F))
# remove file
file.remove("example_remove.llo")
# [1] TRUE
modifyInList modifies elements with given indices by replacement values provided in parameter object. If length of replacement values is shorter than length of indices, values will be used circularly. This function may relocate all the data in the stored file, thus can be very slow! Please consider to call this function batchwise instead of one by one.
# copy the file
file.copy(from = "example.llo", to = "example_modify.llo")
# [1] TRUE
# by numeric indices
modifyInList(file = "example_modify.llo", index = c(1, 2), object = list("AA", "BB"))
# by names
modifyInList(file = "example_modify.llo", index = c("C","D"), object = list("C","D"))
# by logical indices
modifyInList(file = "example_modify.llo", index = c(T, F), object = list(1, 2))
# remove file
file.remove("example_modify.llo")
# [1] TRUE
modifyNameInList modifies names of elements with given indices by replacement values provided in parameter name
. If the length of replacement values is shorter than the length of indices, values will be used circularly.
# copy the file
file.copy(from = "example.llo", to = "example_modify_name.llo")
# [1] TRUE
# by numeric indices
modifyNameInList(file = "example_modify_name.llo", index = c(1, 2), name = c("new_name_A", "new_name_B"))
# by logical indices
modifyNameInList(file = "example_modify_name.llo", index = c(T, F), name = c("new_name_C", "new_name_D"))
# remove file
file.remove("example_modify_name.llo")
# [1] TRUE
getListName("example.llo")
# [1] "A" "B" "" "C" "D"
getListLength("example.llo")
# [1] 5
# remove file
file.remove("example.llo")
# [1] TRUE
Through operator overloadings, list objects stored in file can be manipulated pretty similar to basic R list objects.
getList creates a R object of class “largeList” and bind it with a file.
# by setting truncate == TRUE, file will be truncated if exists.
largelist_object <- getList("example.llo", verbose = TRUE, truncate = TRUE)
# file does not exist, create an empty list and store into the file.
# by setting truncate == FALSE, it will bind to existing file.
largelist_object <- getList("example.llo", verbose = TRUE, truncate = FALSE)
# file exists, file head and version will be examed.
Save and append syntaxes are a little bit different from basic list type.
# save list
largelist_object[[]] <- list("A" = 1, "B" = 2)
# append list
largelist_object[] <- list("C" = 3, "D" = 4)
The same as list type, []
for getting sublist, [[]]
for getting one element.
# For print just use largelist_object, for assignment, use largelist_object[]
largelist_object
# $A
# [1] 1
#
# $B
# [1] 2
#
# $C
# [1] 3
#
# $D
# [1] 4
object_copy <- largelist_object[]
# by numeric indices
largelist_object[c(1,2)]
# $A
# [1] 1
#
# $B
# [1] 2
largelist_object[[1]]
# [1] 1
# by names
largelist_object[c("A", "E")]
# $A
# [1] 1
#
# $<NA>
# NULL
largelist_object[["A"]]
# [1] 1
# by logical indices
largelist_object[c(T, F)]
# $A
# [1] 1
#
# $C
# [1] 3
The same as list type, assign NULL
to values.
# by numeric indices
largelist_object[1] <- NULL
# by names
largelist_object["B"] <- NULL
# by logical indices
largelist_object[c(T,F)] <- NULL
The same as list type. Depends on indices, elements will be changed or appended.
largelist_object[[]] <- list("A" = 1, "B" = 2, "C" = 3, "D" = 4)
# by numeric indices
largelist_object[c(1, 5)] <- list(1, "E" = 5)
# by names
largelist_object[c("C","F")] <- c(5, 7)
# by logical indices
largelist_object[c(T, F)] <- c(8)
print(largelist_object)
# $A
# [1] 8
#
# $B
# [1] 2
#
# $C
# [1] 8
#
# $D
# [1] 4
#
# $<NA>
# [1] 8
#
# $F
# [1] 7
largelist_object[[]] <- list("A" = 1, "B" = 2)
# get names
names(largelist_object)
# [1] "A" "B"
# modify names
names(largelist_object)[c(1, 2)] <- c("AA", "BB")
names(largelist_object)[c(F, T)] <- c("DD")
print(largelist_object)
# $AA
# [1] 1
#
# $DD
# [1] 2
Other operators like print
, length
, head
, tail
are also avaliable.
largelist_object[[]] <- list("A" = 1, "B" = 2)
# maximal number to print can be changed by setting option largeList.max.print.
print(largelist_object)
# $A
# [1] 1
#
# $B
# [1] 2
length(largelist_object)
# [1] 2
head(largelist_object)
# $A
# [1] 1
#
# $B
# [1] 2
tail(largelist_object)
# $A
# [1] 1
#
# $B
# [1] 2
# remove object and file
rm(largelist_object)
file.remove("example.llo")
# [1] TRUE
Processing progress will be output to console if operations take too long, it can be switched off by setting option largeList.report.progress
to FALSE
. (options(list(largeList.report.progress = FALSE))
)