coalesce.Rd
coalesce
makes sure that a given index vector is coalesced,
i.e., identical values are grouped into contiguous blocks. This can be
used as a much faster alternative to sort.list
where the
goal is to group identical values, but not necessarily in a
pre-defined order. The algorithm is linear in the length of the vector.
coalesce(x)
The current implementation takes two passes through the vector. In the
first pass it creates a hash table for the values of x
counting
the occurrences in the process. In the second pass it assigns indices
for every element based on the index stored in the hash table.
The order of the groups of unique values is defined by the first
occurence of each unique value, hence it is identical to the order of
unique
.
One common use of coalesce
is to allow the use of arbitrary
vectors in ctapply
via
ctapply(x[coalesce(x)], ...)
.
Integer vector with the resulting permutation. x[coalesce(x)]
gives x
with contiguous unique values.
i = rnorm(2e6)
names(i) = as.integer(rnorm(2e6))
## compare sorting and coalesce
system.time(o <- i[order(names(i))])
#> user system elapsed
#> 1.528 0.012 1.540
system.time(o <- i[coalesce(names(i))])
#> user system elapsed
#> 0.055 0.003 0.059
## more fair comparison taking the coalesce time (and copy) into account
system.time(tapply(i, names(i), sum))
#> user system elapsed
#> 0.231 0.008 0.239
system.time({ o <- i[coalesce(names(i))]; ctapply(o, names(o), sum) })
#> user system elapsed
#> 0.075 0.000 0.074
## in fact, using ctapply() on a dummy vector is faster than table() ...
## believe it or not ... (that that is actually wasteful, since coalesce
## already computed the table internally anyway ...)
ftable <- function(x) {
t <- ctapply(rep(0L, length(x)), x[coalesce(x)], length)
t[sort.list(names(t))]
}
system.time(table(names(i)))
#> user system elapsed
#> 0.118 0.008 0.127
system.time(ftable(names(i)))
#> user system elapsed
#> 0.069 0.000 0.069