Summarize {FSA} | R Documentation |

Summary statistics for a single numeric variable, possibly separated by the levels of a factor variable or variables. This function is very similar to `summary`

for a numeric variable.

Summarize(object, ...) ## Default S3 method: Summarize(object, digits = getOption("digits"), na.rm = TRUE, exclude = NULL, nvalid = c("different", "always", "never"), percZero = c("different", "always", "never"), ...) ## S3 method for class 'formula' Summarize(object, data = NULL, digits = getOption("digits"), na.rm = TRUE, exclude = NULL, nvalid = c("different", "always", "never"), percZero = c("different", "always", "never"), ...)

`object` |
A vector of numeric data. |

`...` |
Not implemented. |

`digits` |
A single numeric that indicates the number of decimals to round the numeric summaries. |

`na.rm` |
A logical that indicates whether numeric missing values ( |

`exclude` |
A string that contains the level that should be excluded from a factor variable. |

`nvalid` |
A string that indicates how the “validn” result will be handled. If |

`percZero` |
A string that indicates how the “percZero” result will be handled. If |

`data` |
A data.frame that contains the variables in |

This function is primarily used with formulas of the following types (where `quant`

and `factor`

generically represent quantitative/numeric and factor variables, respectively):

Formula | Description of Summary |

`~quant` | Numerical summaries (see below) of `quant` . |

`quant~factor` | Summaries of `quant` separated by levels in `factor` . |

`quant~factor1*factor2` | Summaries of `quant` separated by the combined levels in `factor1` and `factor2` . |

Numerical summaries include all results from `summary`

(min, Q1, mean, median, Q3, and max) and the sample size, valid sample size (sample size minus number of `NA`

s), and standard deviation (i.e., `sd`

). `NA`

values are removed from the calculations with `na.rm=TRUE`

(the DEFAULT). The number of digits in the returned results are controlled with `digits=`

.

A named vector or data frame (when a quantitative variable is separated by one or two factor variables) of summary statistics for numeric data.

Students often need to examine basic statistics of a quantitative variable separated for different levels of a categorical variable. These results may be obtained with `tapply`

, `by`

, or `aggregate`

(or with functions in other packages), but the use of these functions is not obvious to newbie students or return results in a format that is not obvious to newbie students. Thus, the formula method to `Summarize`

allows newbie students to use a common notation (i.e., formula) to easily compute summary statistics for a quantitative variable separated by the levels of a factor.

Derek H. Ogle, derek@derekogle.com

See `summary`

for related one dimensional functionality. See `tapply`

, `summaryBy`

in doBy, `describe`

in psych, `describe`

in prettyR, and `basicStats`

in fBasics for similar “by” functionality.

## Create a data.frame of "data" n <- 102 d <- data.frame(y=c(0,0,NA,NA,NA,runif(n-5)), w=sample(7:9,n,replace=TRUE), v=sample(0:2,n,replace=TRUE), g1=factor(sample(c("A","B","C",NA),n,replace=TRUE)), g2=factor(sample(c("male","female","UNKNOWN"),n,replace=TRUE)), g3=sample(c("a","b","c","d"),n,replace=TRUE), stringsAsFactors=FALSE) # typical output of summary() for a numeric variable summary(d$y) # this function Summarize(d$y,digits=3) Summarize(~y,data=d,digits=3) Summarize(y~1,data=d,digits=3) # note that nvalid is not shown if there are no NAs and # percZero is not shown if there are no zeros Summarize(~w,data=d,digits=3) Summarize(~v,data=d,digits=3) # note that the nvalid and percZero results can be forced to be shown Summarize(~w,data=d,digits=3,nvalid="always",percZero="always") ## Numeric vector by levels of a factor variable Summarize(y~g1,data=d,digits=3) Summarize(y~g2,data=d,digits=3) Summarize(y~g2,data=d,digits=3,exclude="UNKNOWN") ## Numeric vector by levels of two factor variables Summarize(y~g1+g2,data=d,digits=3) Summarize(y~g1+g2,data=d,digits=3,exclude="UNKNOWN") ## What happens if RHS of formula is not a factor Summarize(y~w,data=d,digits=3) ## Summarizing multiple variables in a data.frame (must reduce to numerics) lapply(as.list(d[,1:3]),Summarize,digits=4)

[Package *FSA* version 0.8.18 Index]