glm.pred.JGR {CADStat} | R Documentation |

The regression prediction tool provides a way to model natural variations in an environmental parameter (e.g., stream temperature) in reference (i.e., least-disturbed) sites. Then, one can use the reference site model to determine whether observations of that same parameter in test sites are within the range of variation defined by the reference sites.

The linear regression tool has more options available for diagnosing and assessing the fit of a regression model on the quality data. The regression prediction tool assumes that such diagnostics have been performed to define a regression model, and the goal is now to apply the model to new data.

**User Interface**

Select *Analysis Tools -> Regression Prediction* from the menus.
A dialog box will open. Select the data set of interest from
the pull-down menu, or browse for a tab-delimited text file.
The Data Subsetting tab can be used to select a subset of the
data file by choosing a variable from the pull down menu and
then selecting the levels of that variable to include. You can
hold down the <CTRL> key to add several levels.

Select **Dependent** variable and **Independent** variables as in the linear
regression tool.

By default, an intercept is included in the model. The intercept can be
excluded by selecting **Remove Intercept** under the Analysis Options, in
which case at least one independent variable must have been selected.

**Select a Reference Variable**: a variable that contains (character) values
which can be used to identify which rows correspond to reference data.
Once the Reference Variable is selected, select the levels of that
variable that correspond to reference data. You can hold down the
<CTRL> key to select multiple levels that indicate reference.

Choose a **Significance Level** for identification of anomalous data points
in the new data. Note that the significance level corresponds to a
level for a single prediction, not a family-wise significance level; no
correction is performed for multiple comparisons when multiple
predictions are being made. Keep in mind that some new data is expected
to be flagged as significant (on average, about the number of
predictions times the significance level should appear significant,
depending on correlation).

The **ID Variable** allows for a label variable to be specified, so that
points that differ significantly from reference can be identified on the
plot. If no ID Variable is selected, then the row number will be used
instead.

The output of the regression prediction tool is a table for the test (non-reference) data, including the observed and predicted values, the standard error associated with the prediction, a p-value for the prediction, an indication of whether the sample differs significantly from reference, an indication of whether the independent variables for the new data are in range of the model fit, and finally an indication of whether or not the normal approximation used to calculate the p-value is appropriate (for poisson and binomial regression only). Note: for binomial data, the observed value is converted from a count to a proportion (dependent variable divided by sample size). A plot of observed versus predicted values is also produced, to help assess the model fit and identify anomalous data points.

[Package *CADStat* version 2.2-5 Index]