Menu

Home

General information:

--- Part I (Detection) ---

This part of the pipeline performs quality trimming of reads, mapping, fusion calling, read counting, estimation of insert size.

Input:
Output:

Output is stored in the following folder structure:

  • arriba (Fusion calls by Arriba)
  • featurecounts (read counts by FeatureCounts)
  • fusioncatcher (Fusion calls by FusionCatcher)
  • insertsizes (Estimated inerts size by Picard)
  • mapping (BAM files of mapped reads by STAR)
  • pipeline_logs (Log files of each tool)
  • trimmedfastq (Reads trimmed by FastP)

--- Part II (Filtering) ---

This part of the pipeline performs filtering of fusion events by the built-in filters of the callers, a custom generated blacklist and the metrics: Promiscuity Score (PS), Fusion Transcript Score (FTS) and Robustness Score (RS). A description of these metrics can be found here.

Input:
  • Annotation file that was used in Part I
  • Path to output folder
  • Excel file containing sample information such as karyotype (ISCN) and results from molecular diagnostics (FISH/PCR). This is needed to include the information in the output whether a fusion event showed evidence by karyotype and/or molecular diagnostics. (see Format specifications)
  • Optional: Custom blacklist of fusion genes (Excel file, first column contains fusions in the format "Gene1-Gene2" with HGNC symbols as gene names)
Output:

see Output files


Usage:

--- Part I / Detection pipeline ---

Requirements
  • Singularity installed (version >=3.6)
  • 40GB of RAM

Set the parameters in FP_run.sh before executing:

# Required:
threads=<integer>      # Number of threads for running the detection pipeline
outputfolder=<string>  # Path to output folder
genomebuild=<string>   # ["hg19", "hg38"]
sample_name=<string>   # Name of sample
fastq_folder=<string>  # Path to folder containing the fastq files
strandness=<integer>   # [0 -> unstranded, 1 -> stranded, 2 -> reversely stranded]
ref=<string>           # Path to genome reference file in Fasta format (GENCODE)
anno=<string>          # Path to gene annotation file in GTF format (GENCODE)
starindex=<string>     # Path to folder containing the STAR index
fcdata=<string>        # Path to folder containing the genomic database as required by FusionCatcher

# Steps to perform (0 skips the according step)
FusionCatcher=1     # Fusion calling by FusionCatcher
FastP=1             # Read trimming before STAR mapping
STAR=1              # Mapping by STAR (If FastP=0 mapping is performed on untrimmed reads)
Arriba=1            # Fusion calling by Arriba (Preceding mapping required)
FeatureCounts=1     # Read counting (Preceding mapping required)
Picard=1            # Insert size estimation (Preceding mapping required)

--- Part II / Filtering pipeline ---

This will only work if you have performed Part I on at least two samples.
Set the parameters in FP_filter.sh before executing:

# Required:
anno=<string>            # Path to gene annotation file in GTF format (GENCODE) as used in Part I
outputfolder=<string>    # Path to output folder generated by the detection pipeline in Part I
clintable=<string>       # Path to clinical information table in Excel format
# Optional:
debug_flag=0             # Set to 1 for saving R workspace
internal_BL=1            # Whether to use the internal blacklist of fusion genes
user_BL=<string>         # Path to own fusion blacklist (xlsx file, first column with fusion labels)

Related

Wiki: Format specifications
Wiki: Output files

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.