BIOINFORMATICS WORKSHOP (APR 2019)
Visualize and Share Large Raw Sequencing Datasets
Raw sequencing datasets generated by Genome-seq, Exome-seq, RNA-seq, ChiP-seq, etc… experiments consist of a compilation of sequences assigned to a genomic location (BAM files). These files are usually too large to be manipulated by non-bioinformaticians. Nonetheless, assessing the quality of the experiment and getting a prior overview of the data can be achieved by a larger audience. BAM files manipulation can help the biologist in understanding her/his data and perform troubleshooting operations.
In this mini-workshop, we will explain how these files can be handled by biologists without bioinformatics knowledge using a conventional computer.
In detail, attendees will be trained to:
- Perform and interpret quality control on BAM files
- Create their own UCSC genome browser track to navigate through BAM files
- Transfer/share large BAM files via public servers.
Speakers | Touati Benoukraf, Ph.D Canada Research Chair (Tier II) in Bioinformatics for Personalised Medicine Assistant Professor, Faculty of Medicine, Discipline of Genetics Memorial University of Newfoundland Visiting Assistant Professor Cancer Science Institute of Singapore, NUS |
![]() |
Denis Thieffry, Ph.D Group Leader and Professor Institute of Biology, Ecole Normale Superieure de Paris |
![]() |
|
Venue | NUS, Centre for Translational Medicine (MD6), #04-01 SMART Classroom 14 Medical Drive, S117599 |
|
Date | 16 April 2019, Tuesday | |
Time | 1 pm – 5 pm |
Workshop Exercises
The aim of this workshop is to learn how to preprocess a raw sequencing file (fastq) and visualize it. As an example, we will use an RNAseq dataset from the Encode consortium, perform in the K562 cell line.
To accelerate all processes during this workshop, we will provide you only a fragment of the file (chromosome 19 only).
As explained during the lecture, you will use the Galaxy platform (usegalaxy.org) to perform quality control and to generate the different files needed for visualization. Then, files will be uploaded to Cyverse, a cloud system that allows connecting data with genome browser like UCSC Genome Browser.
Step 1:
Download both fastq files for the following link and perform a fastqc using Galaxy (https://usegalaxy.org/).
https://tinyurl.com/y2yq7fka
Step 2:
Go the usegalaxy.org to upload your fastq files
Step 3: Perform a QC using FastQC
Step 4: Load BAM files
Due to time constraint, we will not align fastqc to the genome reference.
Please download BAM files from the previous like.
Here, we will use bamCoverage, a tool that pileup reads into signal. In this specific example, pileup reads will represent transcription intensities.
Important note: After generating a BAM file, the BAM has to be “optimized” via 2 steps: i) sorting and ii) indexing. Indexing will create a new file that records an index of the main file.
Step 5: Convert BAM files into coverage files
Note: For RNAseq, strands can be segregated.
Then, convert your coverage file into BigWig:
Step 6: Create a Cyverse Hub for UCSC Genome Browser.
Log in to Cyverse.
Create the following folder and files:
folder hg38
file genome.txt
file hub.txt
File contents:
hub.txt
hub Project-Name shortLabel RNAseq-test longLabelRNAseq-test genomesFile https://de.cyverse.org/dl/d/0D9281C4-D365-433C-9A1E-765AEDD717B2/genomes.txt email tbenoukraf@mun.ca descriptionUrl ucscHub.html