dstrsplit {iotools} | R Documentation |
dstrsplit
takes raw or character vector and splits it
into a dataframe according to the separators.
dstrsplit(x, col_types, sep="|", nsep=NA, strict=TRUE, skip=0L, nrows=-1L,
quote="")
x |
character vector (each element is treated as a row) or a raw vector (newlines separate rows) |
col_types |
required character vector or a list. A vector of
classes to be assumed for the output dataframe. If it is a list,
Possible values are |
sep |
single character: field (column) separator. Set to |
nsep |
index name separator (single character) or |
strict |
logical, if |
skip |
integer: the number of lines of the data file to skip before beginning to read data. |
nrows |
integer: the maximum number of rows to read in. Negative and other invalid values are ignored. |
quote |
the set of quoting characters as a length 1 vector. To disable
quoting altogether, use |
If nsep
is specified then all characters up to (but excluding)
the occurrence of nsep
are treated as the index name. The
remaining characters are split using the sep
character into
fields (columns). dstrsplit
will fail with an error if any
line contains more columns then expected unless strict
is
FALSE
. Excessive columns are ignored in that case. Lines may
contain fewer columns in which case they are set to NA
.
Note that it is legal to use the same separator for sep
and
nsep
in which case the first field is treated as a row name and
subsequent fields as data columns.
If nsep
is specified, the output of dstrsplit
contains
an extra column called 'rowindex' containing the row index. This is
used instead of the rownames to allow for duplicated indicies (which
are checked for and not allowed in a dataframe, unlike the case with
a matrix).
dstrsplit
returns a data.frame with as many rows as
they are lines in the input and as many columns as there are
non-NULL values in col_types
, plus an additional column if
nsep
is specified. The colnames (other than the row index)
are set to 'V' concatenated with the column number unless
col_types
is a named vector in which case the names are
inherited.
Taylor Arnold and Simon Urbanek
input = c("apple\t2|2.7|horse|0d|1|2015-02-05 20:22:57",
"pear\t7|3e3|bear|e4|1+3i|2015-02-05",
"pear\te|1.8|bat|77|4.2i|2001-02-05")
z = dstrsplit(x = input,
col_types = c("integer", "numeric", "character","raw","complex","POSIXct"),
sep="|", nsep="\t")
lapply(z,class)
z
# Ignoring the third column:
z = dstrsplit(x = input,
col_types = c("integer", "numeric", "character","raw","complex","POSIXct"),
sep="|", nsep="\t")
z