forkedTF integration with TFregulomeR

forkedTF utilizes many of the resources and funtions in TFregulomeR. Therefore, TFregulomeR is automatically called when loading forkedTF with library(forkedTF)

Co-Factor exploration

In order to explore the diversity of binding partners of that a give TF might have in a particular cell line, we implemented the function miniCofactorReport(). This function serves as a convenient lite version of cofactorReport included in TFregulomeR. Just by specifying the TF and cell line of interest, miniCofactorReport() will access the “TFregulomeR compendium”, identify the top 9 co-binding partners and generate a PDF report.

library(forkedTF)

miniCofactorReport( TF = "CEBPB", cell = "K562" )
MiniCoFactor report for CEBPB in K562.

MiniCoFactor report for CEBPB in K562.

You can inspect all the different cellLine-TF combinations with TFregulomeR’s dataBrowser() function. For example, to retireve all the TF available for “K562” run:

all_record <- dataBrowser() # Extract all the entries in the TFregulomeR compendium
cell_tf <- cbind(all_record$cell_tissue_name, all_record$TF) # keep the relevant information in a data frame

cell_of_interest <- "K562" # cell line of interest
cell_tf[grep(cell_of_interest,cell_tf),2] 
  [1] "AFF1"     "ARID2"    "ARID3A"   "ATF1"     "ATF2"     "ATF3"    
  [7] "ATF4"     "ATF7"     "BACH1"    "BHLHE40"  "C11orf30" "CBFA2T3" 
 [13] "CEBPB"    "CEBPD"    "CREB1"    "CREB3L1"  "CREM"     "CTCF"    
 [19] "CTCFL"    "CUX1"     "DACH1"    "DPF2"     "E2F1"     "E2F4"    
 [25] "E2F6"     "E2F7"     "E2F8"     "E4F1"     "EGR1"     "ELF1"    
 [31] "ELF4"     "ELK1"     "ESRRA"    "ETV6"     "FOS"      "FOSL1"   
 [37] "FOXA1"    "FOXK2"    "FOXM1"    "GABPA"    "GABPB1"   "GATA1"   
 [43] "IRF1"     "IRF2"     "JUN"      "JUNB"     "JUND"     "KLF1"    
 [49] "KLF13"    "MAFF"     "MAFG"     "MAFK"     "MAX"      "MAZ"     
 [55] "MEF2A"    "MEF2D"    "MEIS2"    "MITF"     "MNT"      "MXI1"    
 [61] "MYBL2"    "MYC"      "MYNN"     "NCOA1"    "NCOA2"    "NFATC3"  
 [67] "NFE2"     "NFE2L1"   "NFE2L2"   "NFIA"     "NFIC"     "NFYB"    
 [73] "NR2C1"    "NR2F1"    "NR2F2"    "NR3C1"    "NRF1"     "NUFIP1"  
 [79] "PBX2"     "PHF21A"   "PKNOX1"   "RBM25"    "REST"     "RFX1"    
 [85] "RFX5"     "RUNX1"    "SIX5"     "SMAD1"    "SMARCA5"  "SMARCC2" 
 [91] "SP1"      "SPI1"     "SRF"      "TAL1"     "TCF12"    "TCF7"    
 [97] "TEAD2"    "TFDP1"    "THAP1"    "THRA"     "USF1"     "USF2"    
[103] "VEZF1"    "YY1"      "ZBTB2"    "ZBTB40"   "ZBTB5"    "ZBTB7A"  
[109] "ZFP36"    "ZFP91"    "ZFX"      "ZHX1"     "ZKSCAN1"  "ZKSCAN8" 
[115] "ZMYM3"    "ZNF143"   "ZNF148"   "ZNF184"   "ZNF24"    "ZNF263"  
[121] "ZNF316"   "ZNF354B"  "ZNF407"   "ZNF410"   "ZNF507"   "ZNF592"  
[127] "ZNF639"   "ZNF740"   "ZNF75A"   "ZNF83"    "ZSCAN29" 

Creation of a FPWM

In order to create a Forked Positioning Weight Matrix (FPWM), we construct multiple PWM, one for each group of peaks from the main TF that overlap with the binding partners of interest. We have created a function called createFPWM that takes as an input the main TF of interest mainTF, a list or vector of binding partners partners, a cell line cell and the postion of the binding site to be forked forkPosition. We foresee that a common application would be to find the most common co-binding partners of a TF in a cell line using miniCofactorReport() and then create FPWM object based on the co-factor report. Here we create a FPWM object based on the top binding partners we observed in the co-factor report above.

createFPWM can take as inputs the IDs in the TFregulomeR compendium.

FPWM_CEBPB_K562 <- createFPWM(mainTF ="CEBPB",
                            partners = c("CEBPD","ATF4","ATF7","ATF3","JUND","FOS"),
                                cell = "K562", 
                        forkPosition = 5)

Aditionally, we can create the same FPWM providing as input the IDs in the TFregulomeR compendium. In this mode use mainTF_MMID to provide the ID of the main TF and partners_MMID for the list or vector of IDs for the binding partners. It is not necessary to specify the cell line as that information is already included in the IDs.

FPWM_CEBPB_K562 <- createFPWM(mainTF_MMID ="MM1_HSA_K562_CEBPB",
                            partners_MMID = c("MM1_HSA_K562_CEBPD",
                                              "MM1_HSA_K562_ATF4",
                                              "MM1_HSA_K562_ATF7",
                                              "MM1_HSA_K562_ATF3",
                                              "MM1_HSA_K562_JUND",
                                              "MM1_HSA_K562_FOS"),
                             forkPosition = 5)

It is very common that a particular TF is not found in a cell line but is present in another. To overcome this limitation, it is possible to compare peaks from different cell lines. To create a FPWM with ChIP-Seq peaks from different cell lines give as input the IDs in TFregulomeR as shown below:

FPWM_different_celllines <- createFPWM(mainTF_MMID ="MM1_HSA_K562_CEBPB",
                                     partners_MMID = c("MM1_HSA_HCT116_CEBPB",
                                                       "MM1_HSA_K562_ATF4",
                                                       "MM1_HSA_GM12878_ATF3"),
                                      forkPosition = 5)

Reverse complement FPWM when the core motif is on the second half of the motif positions.

It might happen that the core motif appears in the second half of the motifs positions while the partners motif is at the beginning. Such is the case for the cofactors of JUND in HepG2 as can be seen in this miniCofactorReport:

miniCofactorReport( TF = "JUND", cell = "HepG2" )
MiniCoFactor report for CEBPB in K562.

MiniCoFactor report for CEBPB in K562.

In such cases, FPWM must be created with the parameter flipMatrix = TRUE to reverse-complement the individual PWM. forkPosition should be set to the position just before the core main TF motif, in this case position 6.

flipped_fpwm <- createFPWM(mainTF ="JUND",
                        partners = c("ATF7","ATF2","JUN","FOSL2","FOS"),
                        cell = "HepG2", 
                        forkPosition = 6,
                        flipMatrix = TRUE)

The resulting FPWM can be drawn with plotFPWM.

plotFPWM(flipped_fpwm, pdfName="flipped_fpwm.pdf")
MiniCoFactor report for CEBPB in K562.

MiniCoFactor report for CEBPB in K562.

Writing FPWMs into a FPWM-Transfac format and standard Transfac format.

One of the most common ways to represent a TF binding motif is through the Transfac format. Therefore, we propose a modified version of this format called FPWMtransfac. In this format the mainTF ID is stored as the “parentLogo” whereas the partners are called “leafLogos”. Information regarding overlap percentage, the number of bases pairs found in the overlapped peaks and the total number of overlapped peaks are found in the “overlappingScore”, “numberOfBasePairs” and “numberOfOverlappingPeaks” fields respectively. Each comma separated value in those fields corresponds to the TF in that position in leafLogos. For example, the first value in overlappingScore corresponds to the first TF in leafLogos, the second value to the second TF and so forth. This is also the case for the nucleotide frequency matrix. From position 1 until the forkPosition corresponds to the mainTF shown in the parentLogo field whereas the looping values after the forkPosition correspond to the binding partners in the order shown in the leafLogos field.

FPWM_CEBPB_2partners <- createFPWM(mainTF = "CEBPB",
                                 partners = c("ATF4","JUND"),
                                     cell = "K562", 
                             forkPosition = 5)

write.FPWM( FPWM = FPWM_CEBPB_2partners, format = "FPWMtransfac", fileName = "FPWM_CEBPB_2partners.FPWMtransfac")
This is how the output in FPWMtransfac format looks like:
FPWMtransfac format.

FPWMtransfac format.

There is a myriad of sequence analysis tools that employ the Transfac format as input. Therefore, we have provided the option to write FPWM in the standard transfact format where each combination of mainTF + partner is given as an independent matrix together with all the other information found in a FPWM.

write.FPWM( FPWM = FPWM_CEBPB_2partners, format = "transfac", fileName = "FPWM_CEBPB_2partners.transfac")
This is how the output looks in transfac format:
FPWM in Transfac format.

FPWM in Transfac format.

Creating FPWMs with three different matrix value options.

In a standard transfact matrix, the sum of all the elements in a row gives the same result for every row. This is because all the positions in the matrix are built from the same number of peaks/binding regions. However, in FPWMs, parts of the matrix are constructed with variable numbers of peaks/binding regions depending on the overlap between the main TF and their partners. We have provided 3 different options to output the matrix values:

# Count Matrix option [Default]
FPWM_CEBPB_2partners <- createFPWM(mainTF = "CEBPB",
                                             partners = c("ATF4","JUND"),
                                                 cell = "K562", 
                                         forkPosition = 5)
                                    
write.FPWM( FPWM = FPWM_CEBPB_2partners, 
            format = "FPWMtransfac", 
            fileName = "FPWM_CEBPB_2partners.FPWMtransfac")
# Probability Matrix option
FPWM_CEBPB_2partners_probability <- createFPWM(mainTF = "CEBPB",
                                             partners = c("ATF4","JUND"),
                                                 cell = "K562", 
                                         forkPosition = 5,
                                    probabilityMatrix = TRUE) # Probability option

write.FPWM( FPWM = FPWM_CEBPB_2partners_probability, 
            format = "FPWMtransfac", fileName = 
              "FPWM_CEBPB_2partners_probability.FPWMtransfac")
# Scaled Count Matrix option
FPWM_CEBPB_2partners_scaledCounts <- createFPWM(mainTF = "CEBPB",
                                              partners = c("ATF4","JUND"),
                                                  cell = "K562", 
                                          forkPosition = 5,
                                  scaleFrequencyCounts = TRUE) # Scaled Count option

write.FPWM( FPWM = FPWM_CEBPB_2partners_scaledCounts, 
            format = "FPWMtransfac", 
            fileName = "FPWM_CEBPB_2partners_scaledCounts.FPWMtransfac")
Different options for the values in the matrix.

Different options for the values in the matrix.

Saving FPWMs as RDS.

Lastly, FPWM objects can be easily saved/read into an RDS using R built-in functions:

# To save in RDS:
saveRDS(FPWM_CEBPB_K562,"FPWM_CEBPB_K562.rds")

# To read RDS:
FPWM_CEBPB_K562 <- readRDS("FPWM_CEBPB_K562.rds")