Oligo-analysis : discovering over-represented k-mers


[printfriendly]

 

 

The goal of this first exercice is to learn how to predict potential regulatory motifs with no a priori knowledge on the regulating factor(s).

A typical situation is a group of co-expressed genes (e.g. from microarray data or RNA-seq), for which we want to identify a possible common regulatory motif. These approaches are also commonly used with genome-wide binding regions dectections, such as ChIp-chip or ChIP-seq data.

You will use the RSA-tools (commonly called “RSAT”). From the RSA-tools suite, you will use the programs:
oligo-analysis to discover over-represented words
dyad-analysis to discover over-represented dyads (=spaced motifs).

1 – Discovering over-represented oligonucleotides

You will discover a potential regulatory motif of the genes regulated by the TF Spo0A, the main regulator of sporulation in the bacterium Bacillus subtilis. The list of target genes was obtained from Chip-chip experiment.
You will follow in part this protocol: “Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences” by Defrance et al, Nature Protocols (2008) PDF

  1. Read the Introduction. The study case 2 can be skipped, as this course do not cover the topic of phylogenetic footprints. Be sure not to skip the section Other applications of this protocol !
  2. You will use this sequence file : Bacillus_subtilis_Spo0A_ChIP-chip_target_upseq.fasta
  3. Follow the procedure from step 1-13 with the program oligo-analysis (option A)Analyzing the resultsHow many sequences were used ? Tip: Look at the information above the table containing the discovered words
    If we are looking for over-represented oligo-mers of size k, what is the maximum order of the background markov model ?
    Look at the top result. How many times was this oligonucleotide found in the input set ? How many times was it expected ? How was calculated this expected number ?
    In the feature-map, how do you explain the fact that some discovered oligonucleotides are overlapping ?
  4. To finish interpreting the above results, read section Anticipated results, application 1, option A
  5. Read Box 1, Box 2 and Box 3 

2 – Discovering over-represented dyads (spaced motifs)

FNR represses genes involved in aerobic respiration and activates genes required for anaerobic respiration. You will discover a potential spaced motif in the promoters of 98 target genes of the factor FNR, in Escherichia coli K12.
  1. You will use this sequence file: Escherichia_coli_K12_FNR_RegulonDB_target_ upseq.fasta
  2. Follow (again !) the procedure from step 1-13 with the program dyad-analysis (option B)
    Notice that you now work with Escherichia coli K12.

    Analyzing the results
    Did you find significant spaced motif(s) ?
    How were the dyads assembled to obtain the final motif ?

  3. To finish interpreting the above results, read section Anticipated results, application 1, option B 

 

Bibliography

1. Defrance, M., Janky, R., Sand, O. & van Helden, J. Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences. Nature Protocols 3, 1589–1603 (2008).