SOAPfuse Wiki

a tool for identifying fusion transcripts from paired-end RNA-Seq data

Brought to you by: jwl890427

Run_SOAPfuse

Authors:

Run SOAPfuse

To run SOAPfuse, we need to prepare the config file, and SOAPfuse will run based on the configuration.

Check the config file
```
$ cd /PATH_WHERE_YOU_PUT_THE_PACKAGE/SOAPfuse-vX.X/config/
$ less -S config.txt
```
Note:
- All lines prefixed by '#' should be considered as comments.
- Value and parameter name are separated by '=', and just modify the value behind '='.
- Some values can be set as 'yes' or 'no', and some can be left as defaults.
- Check prefix of each parameter.
  There are five kinds of prefixes, they are 'DB','PG','PS','PD' and 'PA'.
    a. 'DB' means the info of DataBase.
    b. 'PG' means the info of ProGrams.
    c. 'PS' means the info of Pipeline Steps.
    d. 'PD' means the info of Pipeline Directories.
    e. 'PA' means the info of PArameters.
  ^# 'DB','PG','PS' and 'PD' types are related to the database, so SOAPfuse could run successfully once
    these parameters are set accurately. 'PA' type is related to the parameters of each step, and they
    have been set as default value, so you can ignore them in your first time trying.
    But, 'PA_all_fq_postfix', which defines the PostFix of RNA-Seq data files, should be set
    accordiing to your RNA-Seq files before running.
Modify the config file

Now we presume that you have unpacked the SOAPfuse package, and obtained the SOAPfuse-vX.X directory. We call the absolutepath of this directory as 'TOOL_DIR'.
Download database package ('hgXX-XX.for.SOAPfuse.tar.gz') from links aboved, and unpack it, then get the hgXX-database directory. We call the absolutepath of this directory as 'DATABASE_DIR'.
^# You can also follow the guide to construct your own database files in DATABASE_DIR.

Then, modify the config file as below:
- Define 'DB' prefix info
```
DB_db_dir = /DATABASE_DIR/
```
- Define 'PG' prefix info
```
PG_pg_dir = /TOOL_DIR/source/bin
```
- Define 'PS' prefix info
```
PS_ps_dir = /TOOL_DIR/source
```
- Define 'PD' prefix info^*
```
PD_all_out = /out_directory/
```
- Define 'PA_all_fq_postfix' prefix info
```
PA_all_fq_postfix = PostFix
```
^* PD_all_out is the directory which you prepared to store all results of SOAPfuse. You can set it via
the option ('-o') of main program which is introduced below, and it has the higher priority. SOAPfuse
will creat the sub-directories of each step in out_directory automatically when it runs.

Run SOAPfuse

You can find the main script 'SOAPfuse-RUN.pl' in TOOL_DIR. Use perl to run it.

From v1.27, SOAPfuse packages part of its functions into SOAPfuse perl module.
you must set the PERL Lib PATH as this post tells.

Command:

$ perl SOAPfuse-RUN.pl -c  <config_file>
                       -fd <WHOLE_SEQ-DATA_DIR>
                       -l  <sample_list>
                       -o  <out_directory>
                           [Other Options]

Options:

-c  [s] Config File for run this pipeline. <required>
-fd [s] Directory which stores Paired-end Sequenced Read Files. <required>
         Sequenced Reads Format can be fastq or fasta.
         Files could be compressed by gzip or just readable text-format.
-l  [s] The information list of sample(s) you want to deal. <required>
         This list can include infomation of one or more samples.
         It is suggested to include one sample/patient in each sample list file.
-o  [s] Directory which will store all results.
         It has the first priority, or you should set 'PD_all_out' in config file.
-fs [i] The step you want to start from. [1]
-es [i] The step you want to end at. [9]
         Step 9 is the last step of the SOAPfuse pipeline.
-tp [s] The name-postfix of temp directory^*. [data +%s.'_'.int(rand(1000)+1)]
         Donot set same string for different Sample-info-list files.
         It is suggested to set this parameter as same as SampleID for distinguishing
         the scripts of different samples easily in the general case that one
         sample-info-list file just includes one sample.
-fm     Sign to enable perl fork management. [disabled]
-h      Display this help info.

^* We suggest to set -tp as the sample-ID or patient-ID to easily distinguish the temp directory, as we have suggested to prepare one list for each sample or patient (in somatic mode).

Other Command:

To check the version of SOAPfuse
```
$ perl SOAPfuse-RUN.pl -c version
```

SOAPfuse Wiki

a tool for identifying fusion transcripts from paired-end RNA-Seq data

Run_SOAPfuse

Run SOAPfuse

Check the config file

Modify the config file

Run SOAPfuse