Data on 475 prostate cancer patients

data(cancer.df)

Format

A data.frame with 475 rows and 12 columns:

age

Age in years

wt

Weight in pounds

pf

Patient activity

hx

Family history of cancer

sbp

Systolic blood pressure

dbp

Diastolic blood pressure

ekg

Electrocardiogram code

hg

Serum haemoglobin

sz

Size of primary tumour

sg

Index of tumour stage and histolic grade

ap

Serum prostatic acid phosphatase

bm

Bone metastatses

Source

D.P. Byar and S.B. Green 'The choice of treatment for cancer patients based on covariate information - application to prostate cancer', Bulletin du Cancer 1980: 67:477--490, reproduced in D.A. Andrews and A.M. Herzberg 'Data: a collection of problems from many fields for the student and research worker' p.261--274 Springer series in statistics, Springer-Verlag. New York.

Details

There are twelve pre-trial covariates measured on each patient, seven may be taken to be continuous, four to be discrete, and one variable (SG) is an index nearly all of whose values lie between 7 and 15, and which could be considered either discrete or continuous. We will treat SG as a continuous variable.

A preliminary inspection of the data showed that the sizeof the primary tumour (SZ) and serum prostatic acid phosphatase (AP) were both skewed variables. These variables have therefore been transformed. A square root transformation was used for SZ, and a logarithmic transformation was used for AP to achieve approximate normality. (As for correlation, skewness over the whole data set does not necessarily mean skewness within clusters. But when clusters were formed, within-cluster skewness was observed for these variables.)

Observations that had missing values in any of the twelve pretreatment covariates were omitted from furtheranalysis, leaving 475 out of the original 506 observations available.

The categorical variable Patient activity had 4 levels: 'Normally Active', 'Bed rest below 50 or more', and 'Confined to bed'. The numbers of the 475 in these groups were 428, 32, 12, and 3. The least active two groups are grouped in our data, giving 3 groups of size 428, 32, and 15.