IMA: An R package for high-throughput analysis of Illumina’s 450K Infinium methylation data.
***The package is maintained on rforge.net: https://www.rforge.net/IMA/
Table of Contents
1 Abstract
The Illumina Infinium HumanMethylation450 BeadChip is a newly designed high-density microarray for quantifying the methylation level of over 450,000 CpG sites within human genome. IMA (Illumina Methylation Analyzer) is a computational package designed to automate the pipeline for analyzing site-level and region-level methylation changes in epigenetic studies utilizing the 450K DNA methylation microarray. The pipeline loads the data from Illumina platform and provides user-customized functions commonly required to perform exploratory methylation analysis and summarization for individual sites as well as annotated regions.
Note that instead of providing recommendations about which specific analysis method should be used, the main purpose of developing IMA package is to provide a range of commonly used Infinium methylation microarray analysis options for users to choose for their exploratory analysis and summarization in an automatic way. Therefore, it is the best interest for the users to consult experienced bioinformatician/statistician about which specific analysis option/route should be chosen for their 450k microarray data.
2 Installation
***Prerequisites:
1.The IMA package requires R version >= 2.13 for Windows system, and R version >=2.11.0 for Linux like system.
2.The IMA package requires the following packages to be installed: WriteXLS,limma,MASS,bioDist,preprocessCore,dplR. If your system does not have them installed, the easiest way to install them is to issue the following command at the R prompt:
source("http://bioconductor.org/biocLite.R"); biocLite(c("limma","bioDist","preprocessCore")); install.packages(c("WriteXLS","MASS","dplR"),repos="http://cran.r-project.org");
3.The WriteXLS package requires the perl to be installed.
**Install options:There are two ways to install IMA package:
Option 1): Issue the following command at the R prompt:
install.packages("IMA",repos=c("http://rforge.net"))
Option 2): download the package here and issue the following command at the R prompt:
install.packages("IMA_3.1.2.tar.gz",repos=NULL,type = "source")
3 Tutorial
4 Annotation file
The region-level annotation library for the 450k microarray could be produced by issue below code after read in your raw methylation data into R.
>dataf2 = IMA.methy450PP(data,peakcorrection = FALSE,na.omit = FALSE,normalization=FALSE,transfm = FALSE,samplefilterdetectP =FALSE,locidiff = FALSE, XYchrom = FALSE,snpfilter=FALSE )
>fullannot = dataf2@annot
>temp = c("TSS1500Ind","TSS200Ind","UTR5Ind", "EXON1Ind","GENEBODYInd","UTR3Ind","ISLANDInd","NSHOREInd","SSHOREInd","NSHELFInd", "SSHELFInd")
>for( i in 1:11){eval(parse(text=paste(temp[i],"=dataf2@",temp[i],sep="")))}
>eval(parse(text = paste("save(fullannot", paste(temp,collapse = ","), "file = 'fullannotInd.rda')", sep = "," )))
Instead, the region-level annotation library for the 450k microarray could be downloaded from here.Then users can load the regional-level annotation library by issuing the following command at the R prompt:
>load("./fullannotInd.rda")
It is recommended to produce the annotation library by using user's own data, as it is very likely that different users may have slightly different annotation produced by GenomeStudio .
5 Pipeline
The pipeline loads the data from Illumina platform and provides user-customized functions commonly required to perform differential methylation analysis nd summarization for individual sites as well as annotated regions. The user can either run the pipeline with default setting or specify optional routes in the parameter file. Note that it is the best interest for the users to consult experienced bioinformatician/statistician about which specific analysis option should be chosen for their 450k microarray data.
To run the pipeline file, users can simply type "R no save < pipeline.R" at the Linux/Unix prompt. Alternatively, users can copy the commands in the pipeline.R and paste them into the R prompt.
6 Citations
Wang D, Yan L, Hu Q, Sucheston LE, Higgins MJ, Ambrosone CB, Johnson CS, Smiraglia DJ, Liu S. IMA: an R package for high-throughput analysis of Illumina's 450K Infinium methylation data. Bioinformatics. 2012 Mar 1;28(5):729-30
7 Frequently Asked Questions
1.There are a total of ~65k probes on the 450k platform which contain SNPs at/near the target CpG site and are unlikely to measure DNA methylation at all. Should this issue be considered?
Answer:
Users can choose to filter out loci whose methylation level are measured by probes containing SNP(s) at/near the targeted CpG site. We have included an optional route for users to filter out these SNP-containing probes in Version 2.1.0 or above. The list of SNP-containing probes (based on dbSNP v132) was provided by Ali Torkamani at Scripps Institute and could be downloaded from here or by issuing the following command in R :
>snpfilter = system.file("extdata/snpsites.txt",package ="IMA").
2.I need to make a paired analysis for the samples and usually I would adjust for this using block or some other factor in LIMMA. However, I do not really see where I can add that type of info now. So if possible, I would really appreciate some info on this, otherwise, can the object be run outside the package as an input to LIMMA?
Answer:
We have included optional routes for paired analysis in Version 2.1.0 or above.
3.I found a list of Island regions differentially expressed using IMA testfunc. However, the results only contain the chromosome regions instead of ProbeID within the differentially expressed regions. Would it be possible to have some options to get the Probe Ids and their corresponding annotations within these regions?
Answer:
An example for how to extract probeID and corresponding annotation information within the differentially expressed region(s) has been added to the Vignette.
4. Have you consider the peark correction in the data preprocessing step?
Answer:
The peak correction option had been added to the preprocessing step. For the detail of the peak correction method, please reference "Evaluation of the Infinium Methylation 450k technology" by Sarah Dedeurwaerder et al. We fixed the bug after version 3.1.1 in the peak correction option, please use the latest version of IMA(>3.1.1) if you choose the peak correction option = TRUE.
5. I have some data generated by the Methyl27k arrays. Can I use IMA for as well?
Answer:
To make IMA configurable for 27k array, we first mapped the loci's annotation for 27k array to that of 450k array. There are a total of 27578 loci for 27k array, and 1600 of them couldn't be mapped to 450k array. For those unmapped loci, we keep their original annotation from the 27k array. For those mapped loci, we use the annotation from 450k array. The annotation for 27k array could be downloaded from here. The usage of IMA to 27k array is similar to that for 450k array, except that the following two commands needs to be issued between the reading step(IMA.methy450R) and the preprocessing step(IMA.methy450PP):
load("./annot27k_mapped.Rdata")
data@annot = as.matrix(annotout)
6.I have project where we have got methylation data in .idat file format. I am interested in using your IMA package for region wide analysis. Is there a way to convert .idat to .txt file on R as we dont have Genome studio software.
Answer:
R package "minfi" can be used to load .idat file and normalize the data by "swan". Then user can use the "convert.minfi.to.IMA" function here to convert a minfi object to IMA object. Following that, IMA.methy450kPP() function can be used to preprocess such as filtering etc.
7. I get the error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '6,70E+01' Can you give me any more suggestions in resolving this? Do you know any way, where in R can tell you "which lines gives an error"
Answer:
It looks like some of the input methylation data(beta value/pvalue) use ',' as a decimal point while read.delim expect "."
If all decimal point are ",", just add an option dec="," in the IMA.methy450R( ) function.
If most of your data use "." as decimal point but several had "," , then you might need to change it back to "." manually. To find out which rows caused the error, use something similar as below to check.
>temp = readLines(" enexample_sample_file.txt", n= 20)
>nskip = grep(".AVG_Beta", temp, ignore.case = TRUE) - 1
>titleLine = temp[nskip + 1]
>allcolname = unlist(strsplit(titleLine, "\t"))
>betacol = grep(".AVG_Beta",, allcolname)
>pvalcol = grep(,".Detection.Pval" allcolname)
>annotcol = grep("ILMNID", allcolname):(length(allcolname))
>colClasses = rep("NULL", length(allcolname))
>colClasses[1] = "character"
>colClasses[betacol] = "numeric"
>for(i in 1:4485586){
>cat(i,"\n")
>betamatrix <- read.delim(file = fileName, header = TRUE, skip = nskip, colClasses = colClasses, na.string = c("NA", ""), check.names = FALSE, row.names = 1,nrows = i,dec = ".")[, 1:length(betacol)]
>}
Date: 2012-12-13 15:03:26 EST
HTML generated by org-mode 6.34c in emacs 23