chunk.apply {iotools}R Documentation

Process input by applying a function to each chunk

Description

chunk.apply processes input in chunks and applies FUN to each chunk, collecting the results.

Usage

chunk.apply(input, FUN, ..., CH.MERGE = rbind, CH.MAX.SIZE = 33554432, parallel=1)

chunk.tapply(input, FUN, ..., sep = "\t", CH.MERGE = rbind, CH.MAX.SIZE = 33554432)

Arguments

input

Either a chunk reader or a file name or connection that will be used to create a chunk reader

FUN

Function to apply to each chunk

...

Additional parameters passed to FUN

sep

for tapply, gives separator for the key over which to apply. Each line is split at the first separator, and the value is treated as the key over which to apply the function over.

CH.MERGE

Function to call to merge results from all chunks. Common values are list to get lapply-like behavior, rbind for table-like output or c for a long vector.

CH.MAX.SIZE

maximal size of each chunk in bytes

parallel

the number of parallel processes to use in the calculation (*nix only).

Value

The result of calling CH.MERGE on all chunk results.

Note

The input to FUN is the raw chunk, so typically it is advisabe to use mstrsplit or similar function as the first setep in FUN.

Author(s)

Simon Urbanek

Examples

## Not run: 
## compute quantiles of the first variable for each chunk
## of at most 10kB size
chunk.apply("input.file.txt",
            function(o) {
              m = mstrsplit(o)
              quantile(as.numeric(m[,1]), c(0.25, 0.5, 0.75))
            }, CH.MAX.SIZE=1e5)

## End(Not run)

[Package iotools version 0.2-4 Index]