pcurve {pcurve}R Documentation

Principal Curve Analysis

Description

Fits a principal curve to a numeric multivariate dataset in arbitrary dimensions. Produces diagnostic plots.

Usage

pcurve(x, xcan = NULL, start = "ca", rank = FALSE, cv.fit = FALSE,
penalty= 1, cv.all = FALSE, df = "vary", fit.meth = "spline",
canfit = "lm",candf = FALSE, vary.adj = FALSE, subset,
robust = FALSE, lowf = 0.5, min.df, max.df, max.df.cv.fit,
ext.dist = TRUE, ext.dc = 0.9, metric = "bray", latent = FALSE,
plot.pca = TRUE, thresh = 0.001, plot.true = TRUE,
plot.init = FALSE, plot.segs = TRUE, plot.resp = TRUE,
plot.cov = TRUE, maxit = 10, stretch = 2, fits = FALSE,
prnt.fits = TRUE, trace = TRUE, trace.all = FALSE, pch = 1,
row.chk0 = FALSE, col.chk0 = TRUE, use.loc = FALSE)
 

Arguments

x

numeric data matrix or data.frame.

xcan

data.frame or matrix of explanatory variables to be used in constrained PCs.

start

specifies how to determine the starting configuration (location of points on initial curve): "ca" = correspondence analysis; "pca" = principal components analysis with Euclidan metric; "pca.bc" = principal components analysis with Bray-Curtis metric; "mds" = non-metric multidimensional scaling with Euclidean metric; "mds.bc" = non-metric multidimensional scaling with Bray-Curtis metric; "cs.bc" = classical scaling (metric multidimensional scaling) with Bray-Curtis metric; "ran" = random start. Or if start is numeric and of length dim(x)[1] a user supplied configuration will be used.

rank

if TRUE starting configuration is transformed to rank

cv.fit

if TRUE a final iteration using cross-validation is done.

penalty

penalty for smoothing spline. A value of 1 corresponds to no penalty with values > 1 giving a less-smoothed fit. Increasing the penalty for small data sets can reduce over-fitting. If penalty = "np", penalty = 1 for N > 1000, penalty = 2 for N <=100, and penalty = 4-log(N, 10) for N > 100 and N <= 1000.

cv.all

if TRUE a cross-validated smoothing spline fit at each iteration.

df

if numeric specifies the df for the smoothing spline.

fit.meth

specifies smoother. "spline" = smooth.spline, "poisson" = poisson general additive model, "binomial" = binomial general additive model, "lowess" = lowess smoother (this argument overridden by robust = TRUE).

canfit

"lm" or "gam", model used to relate pc to xcan.

candf

if canfit = "gam", df for model. May be a single value or a vector of FALSE or positive integers indicating dfs for each explanatory variable in xcan. If FALSE, this is equivalent to fx=FALSE in gam, and d.f. is selected by GCV.UBRE

vary.adj

if FALSE the same df are used for the smooth of each variable, otherwise each variable has its own df.

subset

used to take a subset of x and start (if numeric).

robust

if TRUE uses lowess smooths, if FALSE uses smoothing spline.

lowf

specifies the span of the lowess smooth.

min.df

specifies the min df for the smoothing.

max.df

specifies the max df for smoothing during cross-validation.

max.df.cv.fit

specifies the max df for the smoothing.

ext.dist

if TRUE extended dissimilarities in calculation of initial configuration using the flexible shortest path. If FALSE standard dissimilarites are used (see De'ath, 1999b and stepacross in package vegan).

ext.dc

critical distance, the toolong argument in stepacross.

metric

similarity metric, the method argument in vegdist in package vegan.

latent

if FALSE locations are rescaled after each iteration to give distance along the curve; if TRUE no rescaling is done.

plot.pca

if TRUE the fitting is plotted (assuming plot.true = TRUE) in the first 2 dimensions of PCA space.

thresh

threshold value of difference in cross-validation for ceasing iteration

plot.true

if TRUE the fitting process is plotted.

plot.init

if TRUE the initial fits to each variable are plotted.

plot.segs

if TRUE segments linking the fitted points on the curves to their corresponding data points are plotted.

plot.resp

if TRUE the final response curves are plotted.

plot.cov

if TRUE covariate partial effects are plotted (only if xcan is not null).

maxit

specifies the maximin number of iterations.

stretch

end segments of the curve are stretched by this factor at each iteration.

fits

if TRUE value of pcurve includes diagnostics for each variable.

prnt.fits

statistics on model fits printed.

trace

prints out useful fitting diagnostics at each iteration.

trace.all

if TRUE prints out all curve details at each iteration.

pch

symbol for plots

row.chk0

if TRUE checks for and removes rows of x identically 0.

col.chk0

if TRUE checks for and removes columns of x identically 0.

use.loc

if TRUE pauses during the fitting displays (left mouse-click to progress to next plot).

Details

See De'ath (1999a) for a full discussion of the functions and their application.

Value

An object of class principal curve containing a list comprising

s

fitted values

tag

order of points along the curve

lambda

locations along the curve

dist

sum of squared distances of points from the curve

c

call to pcurve

x

data to which the curve was fitted

df

degrees of freedom for the smoothers used in the fit

fit.list

diagnostics for each variable, only included if fits = TRUE.

Author(s)

R port by Chris Walsh cwalsh@unimelb.edu.au from S+ library by Glenn De'ath g.death@aims.gov.au. Original S code for principal curve analysis by Trevor Hastie hastie@stat.stanford.edu.

References

De'ath, G. 1999a Principal Curves: a new technique for indirect and direct gradient analysis. Ecology 80, 2237–2253.

De'ath, G. 1999b Extended dissimilarity: method of robust estimation of ecological distances with high beta diversity. Plant Ecology 144, 191–199.

Gittins, R. 1985 Canonical Analysis. A review with applications in ecology. Berlin: Springer-Verlag.

Hastie, T.J and Tibshirani, R.J. 1990 Generalized additive models. London: Chapman and Hall.

Hastie, T.J. and Stuetzle, W. 1989 Principal Curves. Journal of the American Statistical Association 84, 502–516.

See Also

pcdiags.plt, vegdist, stepacross

Examples

#a simulated dataset with 4 response variables (taxa 1-4),
#n=100.  The response curve is Gaussian and noise is Poisson.
    data(sim4var)
    sim4fit <-  pcurve(sim4var, plot.init = FALSE, use.loc = TRUE)

#Limestone grassland community example worked by De'ath (1999a),
#from data in Gittins (1985)
    data(soilspec)
    species <- sqrt(soilspec[,2:9])
    envvar <- soilspec[,10:12]
#indirect gradient analysis
    spec.fit <- pcurve(species, start = "mds.bc", plot.init = FALSE,
                       use.loc = TRUE)
#direct gradient analysis
    soilspec.fit <- pcurve(species, xcan = envvar, 
                           start = "mds.bc", plot.init = FALSE,  
                           fits = TRUE, prnt.fits = TRUE,
                           use.loc = TRUE)

[Package pcurve version 0.6-5 Index]