IMA.methy450PP {IMA}R Documentation

Data preprocessing and quality control

Description

It allows user to choose several filtering steps or modify filtering criteria for specific quality control purpose. These include whether or not to filter probes based on detection P-value; whether or not to remove the loci from the X or Y chromosome, or both; whether or not to perform peak transformation, whether or not to transfer the raw β value using either arcsine square root or logit; whether or not to perform quantile normalization; whether or not to remove the loci containing missing β values; whether or not to filter out loci whose methylation level are measured by probes containing SNP(s) at/near the targeted CpG site. The user can choose the preprocessing routes and corresponding cutoffs in the argument of this function.

Usage

IMA.methy450PP(data, na.omit = TRUE,peakcorrection = FALSE,normalization = FALSE, 
transfm =c(FALSE,"arcsinsqr", "logit"),samplefilterdetectP = c(FALSE, 1e-05),
samplefilterperc = 0.75, sitefilterdetectP = c(FALSE, 0.05),sitefilterperc 
= 0.75,locidiff = c(FALSE, 0.01),locidiffgroup = list("g1","g2"), XYchrom = c(FALSE,"X","Y",
c("X","Y")), snpfilter = c(FALSE,"snpsites.txt"))

Arguments

data

an exprmethy450 class returned by the IMA.methy450R function

na.omit

if TRUE remove the sites containing missing value

peakcorrection

if TRUE, peak correction is performed based on the paper by sarah Dedeurwaerder et al.

normalization

if TRUE, quantile normalization performed

transfm

if FALSE, no transfm is performed, "arcsinsqr":arcsine square root transformation on β value is performed, "logit":logit transformation on β is performed

samplefilterdetectP

Default is false, i.e, no sample filtering by detection P-value. Otherwise, choose the cut off of detection P-value.

samplefilterperc

Keep the samples having at least specified percentage of sites with detection P-value less than the samplefilterdetectP.

sitefilterdetectP

Default is false, i.e. no site filtering by detection p-value. Otherwise, choose the cut off of detection P-value.

sitefilterperc

Remove the sites having specified percentage of samples with detection P-value greater than sitefilterdetectP.

locidiff

if FALSE, don't filter sites by the difference of group β value. Otherwise, remove the sites with β value difference greater than the specified value.

locidiffgroup

specify which two groups are considered to check the loci difference if locidiff is not true

XYchrom

if "X", remove the sites on chromosome X , if "Y", remove the sites on chromosome Y, if c("X","Y"), remove both on chromosome X and Y.

snpfilter

if FALSE, keep the loci whose methylation level are measured by probes containing SNP(s) at/near the targeted CpG site; otherwise filter out the list of snp-containing loci by specifying the snp file name and location

Details

It allows user to choose several filtering steps or modify filtering criteria for specific quality control purpose. By default, IMA will filter out loci with missing β value, from the X chromosome or with median detection P-value greater than 0.05. Users can choose to filter out loci whose methylation level are measured by probes containing SNP(s) at/near the targeted CpG site. The option for sample level quality control is also provided. Although the raw β values will be analyzed as recommended by Illumina, users can choose Arcsine square root transformation when modeling the methylation level as the response in a linear model. Logit transformation is also available as an option. The default setting in IMA package for preprocessing is that no normalization will be performed. Although quantile normalization is available as an alternative preprocessing option, it should be pointed out that several literatures show that quantile normalization does not remove unwanted technical variation between samples in methylation analysis.

Value

This function will return a methy450batch class including:

bmatrix

a matrix of β value for individual sites

detectP

a matrix of detection p-value for individual sites

annot

a matrix of annotation information for individual sites

groupinfo

a list of sample ID and phenotype of each sample

TSS1500Ind

two lists of IDs - SID (site IDs) and PID (Position IDs) belonging to the TSS1500 region of each gene

TSS200Ind

two lists of IDs - SID (site IDs) and PID (Position IDs) belonging to the TSS200 region of each gene

UTR5Ind

two lists of IDs - SID (site IDs) and PID (Position IDs) belonging to the 5' UTR region of each gene

EXON1Ind

two lists of IDs - SID (site IDs) and PID (Position IDs) belonging to the 1st EXON of each gene

UTR3Ind

two lists of IDs - SID (site IDs) and PID (Position IDs) belonging to the 3' UTR region of each gene

GENEBODYInd

two lists of IDs - SID (site IDs) and PID (Position IDs) belonging to the gene body region of each gene

ISLANDInd

two lists of IDS - SID (site IDs) and PID (Position IDs) belonging to the ISLAND region of each UCSC_CPG_ISLAND

NSHOREInd

two lists of IDs - SID (site IDs) and PID (Position IDs) belonging to the N Shore region of each UCSC_CPG_ISLAND

SSHOREInd

two lists of IDs - SID (site IDs) and PID (Position IDs) belonging to the S Shore region of each UCSC_CPG_ISLAND

NSHELFInd

two lists of IDs - SID (site IDs) and PID (Position IDs) belonging to the N Shelf region of each UCSC_CPG_ISLAND

SSHELFInd

two lists of IDs - SID (site IDs) and PID (Position IDs) belonging to the S Shelf region of each UCSC_CPG_ISLAND

Author(s)

Dan Wang, Li Yan, Qiang Hu, Dominic J Smiraglia, Song Liu

Maintainer: Dan Wang <wangdan412@gmail.com>

See Also

IMA.methy450R,regionswrapper,testfunc,sitetest,indexregionfunc,annotfunc

Examples

setwd(system.file("extdata", package="IMA"))
MethyFileName = "SampleMethFinalReport.txt"
PhenoFileName = "SamplePhenotype.txt"
data = IMA.methy450R(file = MethyFileName,columnGrepPattern=list(beta=".AVG_Beta",detectp=".Detection.Pval"),groupfile = PhenoFileName);
dataf = IMA.methy450PP(data,na.omit = TRUE,normalization=FALSE,transfm =FALSE,peakcorrection = TRUE, samplefilterdetectP = 1e-5,samplefilterperc =0.75,sitefilterdetectP = 0.05, sitefilterperc = 0.5,locidiff = FALSE, XYchrom = FALSE,snpfilter = FALSE)

[Package IMA version 3.1.2 Index]