Data on 475 prostate cancer patients
A data.frame with 475 rows and 12 columns:
Age in years
Weight in pounds
Patient activity
Family history of cancer
Systolic blood pressure
Diastolic blood pressure
Electrocardiogram code
Serum haemoglobin
Size of primary tumour
Index of tumour stage and histolic grade
Serum prostatic acid phosphatase
Bone metastatses
D.P. Byar and S.B. Green 'The choice of treatment for cancer patients based on covariate information - application to prostate cancer', Bulletin du Cancer 1980: 67:477--490, reproduced in D.A. Andrews and A.M. Herzberg 'Data: a collection of problems from many fields for the student and research worker' p.261--274 Springer series in statistics, Springer-Verlag. New York.
There are twelve pre-trial covariates measured on each patient, seven may be taken to be continuous, four to be discrete, and one variable (SG) is an index nearly all of whose values lie between 7 and 15, and which could be considered either discrete or continuous. We will treat SG as a continuous variable.
A preliminary inspection of the data showed that the sizeof the primary tumour (SZ) and serum prostatic acid phosphatase (AP) were both skewed variables. These variables have therefore been transformed. A square root transformation was used for SZ, and a logarithmic transformation was used for AP to achieve approximate normality. (As for correlation, skewness over the whole data set does not necessarily mean skewness within clusters. But when clusters were formed, within-cluster skewness was observed for these variables.)
Observations that had missing values in any of the twelve pretreatment covariates were omitted from furtheranalysis, leaving 475 out of the original 506 observations available.
The categorical variable Patient activity
had 4 levels: 'Normally
Active', 'Bed rest below 50
or more', and 'Confined to bed'. The numbers of the 475 in these groups were
428, 32, 12, and 3. The least active two groups are grouped in our data,
giving 3 groups of size 428, 32, and 15.