ctapply.Rd
ctapply
is a fast replacement of tapply
that assumes
contiguous input, i.e. unique values in the index are never speparated
by any other values. This avoids an expensive split
step since
both value and the index chungs can be created on the fly. It also
cuts a few corners to allow very efficient copying of values. This
makes it many orders of magnitude faster than the classical
lapply(split(), ...)
implementation.
ctapply(X, INDEX, FUN, ..., MERGE=c)
an atomic object, typically a vector
numeric or character vector of the same length as X
the function to be applied
additional arguments to FUN
. They are passed as-is,
i.e., without replication or recycling
function to merge the resulting vector or NULL
if
the arguments to such a functiona re to be returned instead
Note that ctapply
supports either integer, real or character
vectors as indices (note that factors are integer vectors and thus
supported, but you do not need to convert character vectors). Unlike
tapply
it does not take a list of factors - if you want to use
a cross-product of factors, create the product first, e.g. using
paste(i1, i2, i3, sep='\01')
or multiplication - whetever
method is convenient for the input types.
ctapply
requires the INDEX
to contiguous. One (slow) way
to achieve that is to use sort
or order
.
ctapply
also supports X
to be a matrix in which case it
is split row-wise based on INDEX
. The number of rows must match
the length of INDEX
. Note that the indexed matrices behave as
if drop=FALSE
was used and curretnly dimnames
are only
honored if rownames are present.
This function has been moved to the fastmatch
package!
i = rnorm(4e6)
names(i) = as.integer(rnorm(1e6))
i = i[order(names(i))]
system.time(tapply(i, names(i), sum))
#> user system elapsed
#> 0.202 0.028 0.230
system.time(ctapply(i, names(i), sum))
#> user system elapsed
#> 0.052 0.008 0.061
## ctapply() also works on matrices (unlike tapply)
m=matrix(c("A","A","B","B","B","C","A","B","C","D","E","F","","X","X","Y","Y","Z"),,3)
ctapply(m, m[,1], identity, MERGE=list)
#> $A
#> [,1] [,2] [,3]
#> [1,] "A" "A" ""
#> [2,] "A" "B" "X"
#>
#> $B
#> [,1] [,2] [,3]
#> [1,] "B" "C" "X"
#> [2,] "B" "D" "Y"
#> [3,] "B" "E" "Y"
#>
#> $C
#> [,1] [,2] [,3]
#> [1,] "C" "F" "Z"
#>
ctapply(m, m[,1], identity, MERGE=rbind)
#> [,1] [,2] [,3]
#> [1,] "A" "A" ""
#> [2,] "A" "B" "X"
#> [3,] "B" "C" "X"
#> [4,] "B" "D" "Y"
#> [5,] "B" "E" "Y"
#> [6,] "C" "F" "Z"
m2=m[,-1]
rownames(m2)=m[,1]
colnames(m2) = c("V1","V2")
ctapply(m2, rownames(m2), identity, MERGE=list)
#> $A
#> V1 V2
#> A "A" ""
#> A "B" "X"
#>
#> $B
#> V1 V2
#> B "C" "X"
#> B "D" "Y"
#> B "E" "Y"
#>
#> $C
#> V1 V2
#> C "F" "Z"
#>
ctapply(m2, rownames(m2), identity, MERGE=rbind)
#> V1 V2
#> A "A" ""
#> A "B" "X"
#> B "C" "X"
#> B "D" "Y"
#> B "E" "Y"
#> C "F" "Z"