Fusion Detection Pipeline Wiki

Fusion gene detection pipeline bundled into a Singularity container.

Status: Beta

Brought to you by: pkerbs

Home

General information:

--- Part I (Detection) ---

This part of the pipeline performs quality trimming of reads, mapping, fusion calling, read counting, estimation of insert size.

Input:

Sample name
Path to fastq folder
- Contains gzipped fastq files
- Filenames contain 'R1' for first read or 'R2' for second read
- Filenames contain string matching the provided sample name
Genome reference in Fasta format (https://www.gencodegenes.org/)
Gene annotation file in GTF format (https://www.gencodegenes.org/)
STAR index (build index without '--sjdbGTFfile' and '--sjdbOverhang' parameter, see STAR manual)
Genomic databases folder as required by FusionCatcher (download newest build: https://sourceforge.net/projects/fusioncatcher/files/data/)
Path to output folder

Output:

Output is stored in the following folder structure:

arriba (Fusion calls by Arriba)
featurecounts (read counts by FeatureCounts)
fusioncatcher (Fusion calls by FusionCatcher)
insertsizes (Estimated inerts size by Picard)
mapping (BAM files of mapped reads by STAR)
pipeline_logs (Log files of each tool)
trimmedfastq (Reads trimmed by FastP)

--- Part II (Filtering) ---

This part of the pipeline performs filtering of fusion events by the built-in filters of the callers, a custom generated blacklist and the metrics: Promiscuity Score (PS), Fusion Transcript Score (FTS) and Robustness Score (RS). A description of these metrics can be found here.

Input:

Annotation file that was used in Part I
Path to output folder
Excel file containing sample information such as karyotype (ISCN) and results from molecular diagnostics (FISH/PCR). This is needed to include the information in the output whether a fusion event showed evidence by karyotype and/or molecular diagnostics. (see Format specifications)
Optional: Custom blacklist of fusion genes (Excel file, first column contains fusions in the format "Gene1-Gene2" with HGNC symbols as gene names)

Output:

see Output files

Usage:

--- Part I / Detection pipeline ---

Requirements

Singularity installed (version >=3.6)
40GB of RAM

Set the parameters in FP_run.sh before executing:

# Required:
threads=<integer>      # Number of threads for running the detection pipeline
outputfolder=<string>  # Path to output folder
genomebuild=<string>   # ["hg19", "hg38"]
sample_name=<string>   # Name of sample
fastq_folder=<string>  # Path to folder containing the fastq files
strandness=<integer>   # [0 -> unstranded, 1 -> stranded, 2 -> reversely stranded]
ref=<string>           # Path to genome reference file in Fasta format (GENCODE)
anno=<string>          # Path to gene annotation file in GTF format (GENCODE)
starindex=<string>     # Path to folder containing the STAR index
fcdata=<string>        # Path to folder containing the genomic database as required by FusionCatcher

# Steps to perform (0 skips the according step)
FusionCatcher=1     # Fusion calling by FusionCatcher
FastP=1             # Read trimming before STAR mapping
STAR=1              # Mapping by STAR (If FastP=0 mapping is performed on untrimmed reads)
Arriba=1            # Fusion calling by Arriba (Preceding mapping required)
FeatureCounts=1     # Read counting (Preceding mapping required)
Picard=1            # Insert size estimation (Preceding mapping required)

--- Part II / Filtering pipeline ---

This will only work if you have performed Part I on at least two samples.
Set the parameters in FP_filter.sh before executing:

# Required:
anno=<string>            # Path to gene annotation file in GTF format (GENCODE) as used in Part I
outputfolder=<string>    # Path to output folder generated by the detection pipeline in Part I
clintable=<string>       # Path to clinical information table in Excel format

# Optional:
debug_flag=0             # Set to 1 for saving R workspace
internal_BL=1            # Whether to use the internal blacklist of fusion genes
user_BL=<string>         # Path to own fusion blacklist (xlsx file, first column with fusion labels)

Wiki: Format specifications
Wiki: Output files

Fusion Detection Pipeline Wiki

Fusion gene detection pipeline bundled into a Singularity container.

Home

General information:

--- Part I (Detection) ---

Input:

Output:

--- Part II (Filtering) ---

Input:

Output:

Usage:

--- Part I / Detection pipeline ---

Requirements

--- Part II / Filtering pipeline ---

Related