Session 3: Data visualization


[printfriendly]

 

 

Goal: View the peaks in their genomic context, to help the biological interpretation of the results

Although many genome data analysis tasks can be accomplished with automated processes, some steps continue to require human judgment and are frequently rate limiting. Beyond facilitating the interpretation of genome-anchored data and their contextualization, genome browsers also provide a common platform for investigators to share, store and publish scientific discoveries. In this session, we will
Define the criteria to bear in mind when choosing one (or several) genome browsers
Illustrate the use of 3 different genome browsers to better apprehend the outcome of some of the analysis performed during the two previous sessions

  1. IGV a local browser
  2. UCSC genome browser
  3. ZENBU genome browser

Choosing a genome browser

Local vs online solutions

There are several options for genome browsers. They can be divided into two main categories :

  • local browsers, requiring the installation of a program (eg. IGV)
  • online browsers, which can be access via common web browsers (eg. UCSC genome browser, Ensembl, ZENBU).

This distinction is important as, each one of two categories comes with its own advantages and inconveniences in terms of performance and ease of use.

You may find yourself using both types, depending on the aim and the localisation of the bulk of data to be visualized at different step of your project. If the data are on your computer, to circumvent lengthy data transfer, it’s easier to visualize the data locally (IGV).If the aim is to share the results with your collaborators, view many tracks in the context of many existing annotations, then the online genome browsers may be more suitable.

  • IGV. Runs locally on you computer and complementary data can be retrieved from distant servers.
  • UCSC genome browser. Is probably the most popular web-based genome browser and benefit from large collections of publicly available data sets and annotations. Its require the loading of data either one-file-at-a-time or by setting up a web-accessible folder.
  • ZENBU genome browser. Is novel we-based browser. ZENBU supports large amounts of data uploads to its registered users via a simple interface.

Note that if you’re working on a non-model organism, the local viewer will be the only choice

Genome browsers and data formats : tight vs loose coupling of the data format and its rendering and use “exotic” vs common formats

Genome browsers differ in their ability to handle different data format (.bam,  .bed,  .narrowPeak, .bigWig, .bigBed, bedGraph) and in the way they are able to render them. We will see several examples their abilities / shortcoming at handling different types of data and format

  • IGV. The panel of data rendering are limited and depend of the format in which they have been uploaded.
  • UCSC genome browser. The rendering of the data is tightly coupled to the format in which they have been uploaded. It makes for a simple user interface but is also a serious limitation to the ease with which data can be explored with ease
  • ZENBU genome browser. Rendering of the data and upload format are not related allowing for greater flexibility (which also comes with a more complex user interface). This ability is one of the key strength of ZENBU

 

Ease with which potentially large amount of data can be viewed

This question touches upon both the capabilities inherent to the browser being online-based or local-based and the data rendering capabilities

Online genome browser requires the user to transfer the data to the site hosting the web service you will connect to. Therefore, a local genome browser such as IGV, for which data on your hard-disk can directly be loaded without any data transfer might be a better choice. For small amount of data (few files), online genome browsers provide simple uploading interfaces. UCSC genome browser and IGV offers the possibility to connect to local data repositories by setting web service accessible “Data hubs” which can also facilitate collaborative data and data visualization sharing.

With large number of data sources to be viewed simultaneously, the typical “1-track / 1-data source” models is quickly becoming an obsolete approach.

  • IGV is able to compress the data representation to track sets where 100s of datasource can be displayed concurrently
  • for UCSC genome browser, because the rendering of the data is tightly coupled to the format in which they have been uploaded, the only solution to concurrently display large amount of data is the creation of ad-hock data aggregation which can take a lot of time to prepare, upload … and update.
  • ZENBU genome browser. Rendering of the data and upload format are not related allowing for greater flexibility (which also comes with a more complex user interface). Any data can also be merged seamlessly, including with publicly available datasets of interest. This ability is one of the key strength of ZENBU.

 

From exporting visualizations for reports and publications to sharing visualizations of the data with collaborators 

All 3 genome browsers offers the possibility to export visualization as bitmap images that can be integrated into presentation, for more demanding (better quality, possibility of re-editing) purpose they also enable exporting of the data in scalar vector format (PDF for UCSC, and in the more sensible SVG format for ZENBU and IGV)

Beyond mere screenshots of loci, you may want to share your results with collaborators, in which case providing them access to your visualization (your selection of tracks, on a given loci but with the possibility for them to browser other loci, add more data for you to inspect, etc…)

 

 

 

Viewing the mapped reads and MACS called peaks usng a local genome browser (IGV)

In the part of the session we will use IGV to visualize the results of read mapping

 

1. Download and start IGV

IGV can be downloaded from the Broad Institute : http://www.broadinstitute.org/software/igv/download , but you can also retrieve faster from our server

Download IGV software to your desktop and uncompress it. The file IGV_2.3.32/igv.bat is the windows executable file to launched.

You can see a console being opened and details of the execution of the java program.

IGV_terminal_java_exec

2. Load the desired genome

The files in IGV_genome.mm9_chr19_only/* that you just downloaded contain information pertinaing the reference geneom we will be using. It should load by default, but would you need to load it (for example if you IGV defaulted to loading the hg18 reference)

Load the file IGV_genome.mm9_chr19_only/mm9_chr19.genome into IGV via the "Genomes > Load genome from File" menu.

igv.load.genome

Note, that we could also have loaded pre-defined genomes retrieved online from one of the Broad Institute servers (this may takes substantially more time as it contains all the chromosomes and need to be fetched online)

This .genome file allows for the reference to be indexed and browsed. It contains the location of refSeq transcripts, chromosome cytobands.

Browsing along the genome can be accomplished either typing the name of a refseq transcript (indexed in the .genome reference file), by using the zoom/pan on the upper right corner of the browser or selecting an area to zoom in in the upper track.

 

3. Visualizing genome mapping
Download the mapping files archive GMP_WT_Cebpa_mm9_chr19.tar.gz to you desktop and uncompress it

This archive contains .bam files, .bedgraph and .bigwig files generated during the 1st session.
Alternatively, feel free to use the files you have created yourself.

3.1. Loading files
Load the bam file GMP_WT_Cebpa_mm9_chr19.psort.bam into IGV via the "Files > Load from File..."

igv.load.datafile

Bam files are automatically rendered as an histogram plot (upper part of the track) and as “raw” pile-up bottom part. Right-click on the track name to open a control panel that let  you alter the rendering, rename the track, …

Note that IGV requires bam files

  • to be sorted by position
  • to be indexed

igv.no-hires-bamigv.bam-res

Note, also that at large magnitude the data cannot be displayed. This is because too large of an amount of data would be loaded in memory. To circumvent this short coming we can load reduced format containing only the intensity at each nucleotide (.bigWig) or on continuous regions of similar intensities (.bedGraph)

Load the bigWig file GMP_WT_Cebpa_mm9_chr19.bigWig and bedGraph file GMP_WT_Cebpa_mm9_chr19.bedGraph 
into IGV via the "Files > Load from File..."
3.2. Inspecting the outcome of the data filtering, compare signal and control

Let us repeat the steps in 3.1 with the mapping files archives containing

igv.compare.filter

Note in particular the attention to be paid to the scale of the data (which can be misleading). In particular have a look at the regions identified as potential artifact in the previous session :

chr19 53933711 53933911 macs2_bw300dup1p0.001_Cebpa_vs_IgG/macs2_bw300dup1p0.001_Cebpa_vs_IgG_peak_1399 30 . 2.57541 3.03847 0.90463 0
chr19 42912219 42912364 macs2_bw300dup1p0.001_Cebpa_vs_IgG/macs2_bw300dup1p0.001_Cebpa_vs_IgG_peak_1160 31 . 2.41011 3.15615 0.97725 56
chr19 41337608 41337748 macs2_bw300dup1p0.001_Cebpa_vs_IgG/macs2_bw300dup1p0.001_Cebpa_vs_IgG_peak_1092 35 . 2.73377 3.56270 1.34256 91

 

4. Load the bed-formatted files obtained from calling peaks summits using MACS

Let us look at the peak called by MACS. IGV can load .bed files (un which summits have been stored) but also .narrowPeak

Download from the server or use the files you created in the previous session and load them into IGV
macs2_bw300dup1p0.001_Cebpa_peaks.narrowPeak
macs2_bw300dup1p0.001_Cebpa_summits.bed
macs2_bw300dup1q0.001summit_Cebpa_vs_IgG_peaks.narrowPeak
macs2_bw300dup1q0.001summit_Cebpa_vs_IgG_summits.bed

Let’s have again a look at the dubious “peaks” that were found when omitting to use the igg control in the peak calling steps

igv.peakcalling

5. Exporting results and  saving your session

IGV lets you export view of interest in hi-quality easy to edit svg format for publication, as well as low resolution png. Importanlty your work can be saved as a session to which you can easily come back

igv.save.2igv.save

Viewing the mapped reads and MACS called peaks via a web-based browser (UCSC Genome Browser and  ZENBU)

 

Among the web-based genome browsers, UCSC’s is probably the most popular and benefit from large collections of publicly available data sets and annotations. Its require the loading of data either one-file-at-a-time or by setting up a web-accessible folder. ZENBU genome browser is a novel web-based browser which ZENBU supports large amounts of data uploads to its registered users via a simple interface.

1. Data presentation in web-based browser

The main advantage of we-based system resides in their easy access to publicly available data. The differ in the presentation of the data

  • UCSC is organized in tracks, each containing a “single” data source. Tracks are assembled in views that can be exported and shared
  • ZENBU has a more complex organization, separating data-sources (annotation, NGS datasets) and tracks. Tracks are also grouped in views. All 3 can be independently saved, shared

zenbu.frontzenbu.dex

http://tblab-csi.nus.edu.sg/zenbu/

 

ucsc.front.ucsc.overview

http://genome.ucsc.edu/cgi-bin/hgTracks

 

 

2.1 Loading your data on UCSC genome browser
From the 1st page or from an existing view, click on "Manage custom track"

This provides an upload panel.

Load the bigWig file GMP_WT_Cebpa_mm9_chr19.bigWig 
and bedGraph file GMP_WT_Cebpa_mm9_chr19.bedGraph 
as well as the files you created in the previous session and load them into IGV
macs2_bw300dup1p0.001_Cebpa_peaks.narrowPeak
macs2_bw300dup1p0.001_Cebpa_summits.bed
macs2_bw300dup1q0.001summit_Cebpa_vs_IgG_peaks.narrowPeak
macs2_bw300dup1q0.001summit_Cebpa_vs_IgG_summits.bed
 ucsc.upload

 

2.2 Loading data into ZENBU

Loading data into ZENBU requires to create a user account (Data has also already been loaded, so you may skip this step).

Click on "sign in" and provide an email address. You will receive shortly an email with a link to follow which will invite you to change your password

zenbu,signin

ZENBU can directly load bam files. No need to convert it bedGraph or bigWig. Filtering for mapQ value ore removing duplicates can by done “on-the fly”

Once signed in click on "User upload and share" and "Data upload" to load the bam files

To load .narrowPeaks are simply tab delimited files, you can instruct zenbu to load any tab delmited files and keep the meaning of each field by providing a header that will be recognized.

Open the .narrowPeak file in a text editor (notepad for example) and add the following line (field must be tab separated, not spaces) replacing __NAME__ by a "pretty name"
##ParameterValue[filetype] = osc
##ParameterValue[display_name] = __NAME__
##ExperimentMetadata[x][eedb:display_name] = __NAME__
##ColumnVariable[eedb:chrom] = chromosome name
##ColumnVariable[eedb:start.0base] = chromosome start in 0base coordinate system
##ColumnVariable[eedb:end] = chromosome end
##ColumnVariable[eedb:strand] = chromosome strand
##ColumnVariable[eedb:score] = score or significance of the feature
##ColumnVariable[exp.signal.x] = measurement of overall (usually, average) enrichment for the region
##ColumnVariable[exp.qvalue.x] = measurement of statistical significance using false discovery rate (-log10). Use -1 if no qValue is assigned
##ColumnVariable[exp.pvalue.x] = measurement of statistical significance (-log10). Use -1 if no pValue is assigned
##ColumnVariable[point_source] = point-source called for this peak; 0-based offset from chromStart. Use -1 if no point-source called
eedb:chrom	eedb:start.0base	eedb:end	eedb:name	eedb:score	eedb:strand	exp.signal.x	exp.pvalue.x	exp.qvalue.x	point_source
3.1 Visualizing data with ZENBU

Let us start with a simple pre constructed view.

Go to the Data explorer and search for the view named "mm9 basic"

zenbu.mm9.basic

We will then add all the bam files related to this workshop.

Click on "Configure New Track". Select "data source type: experiment only" and "collaborative project: Public sharing". Type "Hassemann" in the search box.

Doing so, we will generate a single track that we will then filter on a per-need basis

Select "Signal-Histogram" which automatically switch the stream processing script to "expression binnig mode"
zenbu.newtrack2.
3.2 Looking at specific loci (saved views)

Lets see some nice examples (thanks Samuel)
http://tblab-csi.nus.edu.sg/zenbu/dex/#section=Views;search=workshop;collab=curated

views
3.3 Complex data rendering using ZENBU

http://tblab-csi.nus.edu.sg/zenbu/gLyphs/#config=BlQbWqPuCdauFWI5jGtZXC;loc=mm9::chr19:4683175..4905285

mix.viewselect.things