The goal of this first exercice is to learn how to predict potential regulatory motifs with no a priori knowledge on the regulating factor(s).
A typical situation is a group of co-expressed genes (e.g. from microarray data or RNA-seq), for which we want to identify a possible common regulatory motif. These approaches are also commonly used with genome-wide binding regions dectections, such as ChIp-chip or ChIP-seq data.
You will use the RSA-tools (commonly called “RSAT”). From the RSA-tools suite, you will use the programs:
oligo-analysis to discover over-represented words
dyad-analysis to discover over-represented dyads (=spaced motifs).
1 – Discovering over-represented oligonucleotides
You will discover a potential regulatory motif of the genes regulated by the TF Spo0A, the main regulator of sporulation in the bacterium Bacillus subtilis. The list of target genes was obtained from Chip-chip experiment.
You will follow in part this protocol: “Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences” by Defrance et al, Nature Protocols (2008) PDF
- Read the Introduction. The study case 2 can be skipped, as this course do not cover the topic of phylogenetic footprints. Be sure not to skip the section Other applications of this protocol !
- You will use this sequence file : Bacillus_subtilis_Spo0A_ChIP-chip_target_upseq.fasta
- Follow the procedure from step 1-13 with the program oligo-analysis (option A)Analyzing the resultsHow many sequences were used ? Tip: Look at the information above the table containing the discovered words
If we are looking for over-represented oligo-mers of size k, what is the maximum order of the background markov model ?
Look at the top result. How many times was this oligonucleotide found in the input set ? How many times was it expected ? How was calculated this expected number ?
In the feature-map, how do you explain the fact that some discovered oligonucleotides are overlapping ?
- To finish interpreting the above results, read section Anticipated results, application 1, option A
- Read Box 1, Box 2 and Box 3
2 – Discovering over-represented dyads (spaced motifs)
- You will use this sequence file: Escherichia_coli_K12_FNR_RegulonDB_target_ upseq.fasta
- Follow (again !) the procedure from step 1-13 with the program dyad-analysis (option B)Notice that you now work with Escherichia coli K12.
Analyzing the results
Did you find significant spaced motif(s) ?
How were the dyads assembled to obtain the final motif ?
- To finish interpreting the above results, read section Anticipated results, application 1, option B
1. Defrance, M., Janky, R., Sand, O. & van Helden, J. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nature Protocols 3, 1589–1603 (2008).