Prostate cancer patient data — cancer.df • multimix

Data on 475 prostate cancer patients

data(cancer.df)

Format

A data.frame with 475 rows and 12 columns:

age: Age in years
wt: Weight in pounds
pf: Patient activity
hx: Family history of cancer
sbp: Systolic blood pressure
dbp: Diastolic blood pressure
ekg: Electrocardiogram code
hg: Serum haemoglobin
sz: Size of primary tumour
sg: Index of tumour stage and histolic grade
ap: Serum prostatic acid phosphatase
bm: Bone metastatses

Source

D.P. Byar and S.B. Green 'The choice of treatment for cancer patients based on covariate information - application to prostate cancer', Bulletin du Cancer 1980: 67:477--490, reproduced in D.A. Andrews and A.M. Herzberg 'Data: a collection of problems from many fields for the student and research worker' p.261--274 Springer series in statistics, Springer-Verlag. New York.

Details

There are twelve pre-trial covariates measured on each patient, seven may be taken to be continuous, four to be discrete, and one variable (SG) is an index nearly all of whose values lie between 7 and 15, and which could be considered either discrete or continuous. We will treat SG as a continuous variable.

A preliminary inspection of the data showed that the sizeof the primary tumour (SZ) and serum prostatic acid phosphatase (AP) were both skewed variables. These variables have therefore been transformed. A square root transformation was used for SZ, and a logarithmic transformation was used for AP to achieve approximate normality. (As for correlation, skewness over the whole data set does not necessarily mean skewness within clusters. But when clusters were formed, within-cluster skewness was observed for these variables.)

Observations that had missing values in any of the twelve pretreatment covariates were omitted from furtheranalysis, leaving 475 out of the original 506 observations available.

The categorical variable Patient activity had 4 levels: 'Normally Active', 'Bed rest below 50 or more', and 'Confined to bed'. The numbers of the 475 in these groups were 428, 32, 12, and 3. The least active two groups are grouped in our data, giving 3 groups of size 428, 32, and 15.