Prepare data for use with multimix

data_organise(
  dframe,
  numClusters,
  numIter = 1000,
  cdep = NULL,
  lcdep = NULL,
  minpstar = 1e-09
)

Arguments

dframe

a data frame containing the data set you wish to model.

numClusters

the clusters you wish to fit.

numIter

the maximum number of steps to that the EM agorithm will run before terminating.

cdep

a list of multivariate normal cells.

lcdep

a list of location cells.

minpstar

Minimum denominator for application of Bayes Rule.

Value

An object of class multimixSettings which is a list

with the following elements:

  • cdep --- a list of multivariate normal cells.

  • clink --- column numbers of univariate normal variables.

  • cprods --- a list over MVN cells containing a matrix of pair-wise products of columns in the cell, columns ordered by pair.index.

  • cvals --- a list over MVN cells containing a matrix of columns of variables in the cell

  • cvals2 --- a list over MVN cells containing a matrix of squared columns of variables in the cell

  • dframe --- the data.frame of variables

  • discvar --- logical: the variable is takes values of either TRUE or FALSE

  • dlevs --- for discrete cells: number of levels

  • dlink --- column numbers of univariate discrete variables

  • dvals --- a list over discrete cells of level indicator matrices

  • lc --- logical: is continuous variable belonging to OT cell TRUE/FALSE

  • lcdep --- a list of OT cells

  • lcdisc --- column numbers of discrete variables in OT cells

  • lclink --- column numbers of continuous variables in OT cells

  • lcprods --- a list over OT cells containing a matrix of pair-wise products of continuous columns in the cell, columns ordered by pair.index

  • lcvals --- a list over OT cells containing a matrix of continuous columns of variables in the cell

  • lcvals2 --- a list over OT cells containing a matrix of squared continuous columns of variables in the cell

  • ld --- logical: is discrete variable belonging to OT cell TRUE/FALSE

  • ldlevs --- for discrete variables in OT cells: number of levels

  • ldlink --- a column numbers of OT discrete variables

  • ldvals --- a list over OT cells of level indicator matrices

  • ldxc --- a list over OT cells whose members are lists over levels of matrices of the cell continuous variables whose columns are multiplied by the level indicator column

  • mc --- logical: is continuous variable not in OT cell TRUE/FALSE

  • md --- logical: is discrete variable not in OT cell TRUE/FALSE

  • minpstar --- minimum denominator for appliction of Bayes' Rule

  • n --- number of observations

  • numIter --- the maximum number of steps to that the EM agorithm will run before terminating

  • oc --- logical: is continuous variable in univariate cell TRUE/FALSE

  • olink --- column numbers of continuous univariate cells

  • op --- length(olink)

  • ovals --- n by op matrix of continuous univariate variables

  • ovals2 --- n by op matrix of squared continuous univariate variables

  • numClusters --- the number of clusters in the model.

Author

Murray Jorgensen

Examples

data(cancer.df)
D = data_organise(cancer.df, numClusters = 2)