1. RRBS Data Analysis
Sample Information | 15 RRBS samples (paired-ends sequencing, 75nt) ?? Patients datasets X Normal / X Tumors |
Perform by | Loh Wan Yi (Benoukraf’s Lab) |
Date Data Received | 29th June, 2015 |
2. QC
A quality control checks was performed by using FastQC. As shown in Figure 1, the bases of sequenced reads (R1 and R2 ) result a very good quality score as all the bases falling above the green zone with its quality score ranging from 32 to 40.
Figure 2 shows the proportion of each base position in FastQ files.
3. Read Alignment
RRBS samples were mapped against human reference genome ( hg19 ) by using Bismark .Table 1 shows the total input reads to Bismark, the number of paired-end alignments with a unique best hit and the mapping efficiency for all samples. The average mapping efficiency of RRBS samples is ~ 63%.
Samples | 27 | 87 | 89 | 90 | FG014 | FG058 | FG060 | FG064 | FG070 | FG093 | LO1iT | NP111 |
Input reads | 31459427 | 31227992 | 53315689 | 31985150 | 56610577 | 57764919 | 55927417 | 57553068 | 56625979 | 34302067 | 52061596 | 52191564 |
Uniquely mapped | 21246583 | 19379149 | 33510024 | 20075250 | 37527305 | 36032670 | 33940745 | 36614700 | 34147955 | 21298918 | 32691864 | 33204931 |
Mapping efficiency | 67.5% | 62.1% | 62.9% | 62.8% | 66.3% | 62.4% | 60.7% | 63.6% | 60.3% | 62.1% | 62.8% | 63.6% |
Technical Information:
– Genome : hg19 ,
– Software : Bismark
– Command Line : for i in *; do bismark /home/jason/bismark14-bowtie2_hg19/ –bam –bowtie2 -p 4 -1 $i/*1.fq.gz -2 $i/*2.fq.gz -o $i;done
4. Methylation Scoring
B-score (DNA methylation Scoring) of the BAM files (obtain from Step 3) was calculated and performed by using GBSA. Later, comparative analysis on output files was performed by using MethylKit.
- Clustering Samples:
Figure 3 shows the similarity of their methylation profiles by clustering method.
- PCA
Figure 4 plots a scree plot for importance of components.
Figure 5 shows the scatter plot of the samples.
Technical Information:
- file.list=list(‘methyl_form_27_bscore_CpG_File.txt.gz’,’methyl_form_FG014_bscore_CpG_File.txt.gz’,’methyl_form_FG070_bscore_CpG_File.txt.gz’,’methyl_form_87_bscore_CpG_File.txt.gz’,’methyl_form_FG058_bscore_CpG_File.txt.gz’,’methyl_form_FG093_bscore_CpG_File.txt.gz’,’methyl_form_89_bscore_CpG_File.txt.gz’,’methyl_form_FG060_bscore_CpG_File.txt.gz’,’methyl_form_LO1iT_bscore_CpG_File.txt.gz’,’methyl_form_90_bscore_CpG_File.txt.gz’,’methyl_form_FG064_bscore_CpG_File.txt.gz’,’methyl_form_NP111_bscore_CpG_File.txt.gz’)
- myobj=read( file.list, sample.id=list(’27’,’FG014′,’FG070′,’87’,’FG058′,’FG093′,’89’,’FG060′,’LO1iT’,’90’,’FG064′,’NP111′), assembly=’hg19′,treatment=c(1,1,1,1,1,1,1,1,1,1,1,1),context=’CpG’ )
- meth=unite(myobj, destrand=FALSE)
- clusterSamples(meth, dist=’correlation’, method=’ward’, plot=TRUE)
- PCASamples(meth,sreeplot=TRUE)
- PCASamples(meth)