Menu

SOAPfuse / Blog: Recent posts

SOAPfuse perl module applied from SOAPfuse_v1.27

From v1.27, SOAPfuse packages part of its functions and operations into a perl module package, named as SOAPfuse.
It is already included in SOAPfuse tar.gz package (v1.27 or later) you download from this SourceForge Project.

So, you need to include the module folder to your PERL Library PATH: add the following text line into your ~/.bashrc file.

PERL5LIB=$PERL5LIB:/PATH_WHERE_YOU_PUT_THE_PACKAGE/source/bin/perl_module; export PERL5LIB... [read more](/p/soapfuse/blog/2016/01/soapfuse-perl-module-applied-from-soapfusev127/)
Posted by NOBEL89 2016-01-19

strategy for repeated transcript_name and gene_name in Ensembl gtf file

Note: The previous 'SOAPfusexxSOAPfuse' postfix is abandoned from SOAPfuse_v1.27, and a new and meaningful postfix is applied now. It is the ENS_id. As we know, the ENS_id is unique in GTF database, so it is the best marker to distinguish the duplicated gene_names. like, MATR3_ENSG00000015479 and MATR3_ENSG00000280987. Click here to know about it. Here, this post is only for you to know about the existence of repeated (duplicated) gene_names and transcript_names. (Added by Jia on 2016-01-19)... read more

Posted by NOBEL89 2016-01-19

New PSL file format applied by SOAPfuse

From v1.27, SOAPfuse adopts new format for PSL file. The file format is backward-compatible with previous verison of SOAPfuse. It just introduce more information of gene and transcript at the columns that are reserved in the former format.

The new PSL files will be automatically created via SOAPfuse database construction procedure in v1.27.

Check the transcript.psl file format:

#------- Format defination of trans PSL file -----------#
#------- customized by SOAPfuse ------------------------#
# Note: all the smallest edge position of region has been subtracted by 1.
#       such as Column NO.16, NO.20, NO.21.
#       Other info, i.e, largest edge (NO.17) or point position (e.g., start_codon and
#       stop_codon) remains no changes.
#
#  column_NO.    Description
#           1    transcript source
#           2    transcript version
#           3           Reserved
#           4           Reserved
#           5    start_codon info, if available
#           6    stop_codon info, if available
#           7    protein ENS_id, if available
#           8           Reserved
#           9    sense strand
#          10    transcript name for bioinformatics analysis
#          11    transcript ENS_id
#          12    transcript name (original) from the GTF file
#          13    biotype
#          14    refseg
#          15    cytoband on refseg
#          16    the smallest position on the plus strand of the refseg, and subtract 1
#          17    the largest  position on the plus strand of the refseg, no modification.
#          18    the amount of exon
#          19    length of each exon, corresponding to the NO.21 column
#          20    CDS region info, available when transcript's biotype is protein_coding.
#                Format: smallest_pos(length), sorted from small to large
#          21    the smallest position on the plus strand of the refseg of each exon,
#                sorted from small to large
#          22    gene name for bioinformatics analysis
#-------------------------------------------------------#
    ... [read more](/p/soapfuse/blog/2016/01/new-psl-file-format-applied-by-soapfuse/)
Posted by NOBEL89 2016-01-19

a manual solution when encountering the startcondon_sequence_error

Note: The 'startcodon_sequence_error' has been fixed from v1.27, as what I have written at the end of this post. So, this post is only available to SOAPfuse version before 1.27. (Added by Jia on 2016-01-19)

I have written one post about the startcodon_sequence_error reported by SOAPfuse. Click here to know about it.... read more

Posted by NOBEL89 2013-09-14

SOAPfuse v1.26 has been released

Now, SOAPfuse v1.26 has been released on SourceForge.
Please check the Download Wiki page and Update_logs.

Note:

  • The database files are changed, please reconstruct your database.
  • The new version of config is required.
  • The official website of SOAPfuse is still not updated.... read more
Posted by NOBEL89 2013-08-01

start_codon sequence error in Ensembl human being GTF file

Note: From v1.27, SOAPfuse has introduced method during SOAPfuse_database_construction to handle this problem. So, this post is now only for you to know about the reason of abnormal codons in GTF file.

Thanks to Aditya again for reporting this case.

The 'wrong start_codon' error still occurs in his rerun of SOAPfuse after the update based on the last post that concerns the 'start_codon base number error'.... read more

Posted by NOBEL89 2013-07-31

Start_codon base-number error in Ensembl Released human GTF file

Note: From v1.27, SOAPfuse has introduced method during SOAPfuse_database_construction to handle this problem. So, this post is now only for you to know about the reason of abnormal codons in GTF file.

Thanks to Qianqian Ou and Aditya for reporting this error.

Recently, some users emailed to me, reporting the failure of SOAPfuse that stops at step s08.
Firstly, I was surprised, because SOAPfuse rarely fails at this step.
Then, I checked the step-log file:
TEMP/config_shell.xxx/fusion.s08.singlefork/fusion.s08.sh.log... read more

Posted by NOBEL89 2013-07-28

the perl module usage in some perl scripts in v1.25

Note: These problems have already been fixed from v1.26.

Many thanks to Lei Deng for reporting this bug.

SOAPfuse uses some perl modules, e.g. the SVG.pm, which is not included in the standard perl module lib. So, you may find that, in the source/bin/perl_module directory, there are several modules for SOAPfuse workflow.

How does the scripts find them?
I wrote two statements in the beginning of some script:

    use FindBin qw/$RealBin $RealScript/;
    use lib "$RealBin/bin/perl_module";

so it works.... read more

Posted by NOBEL89 2013-07-19

Update of pretreatment of original reads in SOAPfuse

First, many thanks to Joshua and Eric for reporting this bug.

We always use fastq/fasta format original reads to start our bioinformatic analysis. Although the basic structure of format is similar more or less, but there are several types which are different from each other at the readid writing.

For example, I always use read like this:

@FCD055NACXX:6:1101:1165:2217#NNNNGNNN/1
TGAAAACATGAATCCAGATGGCATGGTTGCTCTATTGGACTACCGTGAGGATG
+
EGGGIIIIIHIHIIIIIIIIIIHIIIIIIHIIIHHIIHFHIHIHIHIIIIFHH... [read more](/p/soapfuse/blog/2013/05/update-of-pretreatment-of-original-reads-in-soapfuse/)
Posted by NOBEL89 2013-05-30 Labels: readid pretreat bug fixed

Is the parameter '-tp' important?

Yes, I think it is important.

I strongly suggest users to set it as the sample-id (single-sample analysis mode) or the patient-id (tumor-vs-control analysis mode).

All the temp files, including shell command of SOAPfuse workflow and step-logs, will be stored in the sub-dir of TEMP dir in the out_dir you set for SOAPfuse.

As default, this sub-dir is named as random number concatenated with the data +%s outputs. But It is difficult for you to distinguish different sub-dir for different samples/patients or even different times of running of one same samples. Of course, also difficult for me to fix your problem once you emailed me for your question.... read more

Posted by NOBEL89 2013-05-29 Labels: debug -tp parameter

step s09 is just for further analysis of fusion

SOAPfuse reports all final fusion genes at step s08, while step s09 is just some work on further analysis of fusion genes. So, please use '-es 8' when you compare SOAPfuse with other software, because the step s09 does not change any final fusion genes, and it is not included in the standard workflow of so-called detecting fusion genes. But it will cost comparable amount of cpu-time, so it may cause the lower computer-resource-usage-performance of SOAPfuse. Never mind the maximum usage of memory, because it always appears at step s01, s02 or s07.... read more

Posted by NOBEL89 2013-05-21 Labels: compare cpu-time

a small bug in SOAPfuse v1.25

As I have said in the update log at the official website, from v1.24, SOAPfuse is able to detect fusion junction point that locates in the flanking intron nearby exons.

As long as you set para 'PA_all_intron_len_extend_from_exon_edge' as non-zero in the config file.

As the step s08 does not (so far, v1.25) need this para, so I removed its Getoption receptor variable form the s08 main script (perl). However, unfortunately, I forgot to modify the same setting in the SOAPfuse-RUN.pl, so if you set 'PA_all_intron_len_extend_from_exon_edge' as non-zero, SOAPfuse-RUN.pl will still give it to the s08 main script, which leads to an error report.... read more

Posted by NOBEL89 2013-04-15 Labels: bug alert fixed

SOAPfuse v1.25 has been released, welcome to use

Now, the v1.25 SOAPfuse has been released at the official website:
http://soap.genomics.org.cn/soapfuse.html

Welcome to use it.
Some bugs are fixed, several imperfect steps are improved, and some new features are added.
pls check update log:
http://soap.genomics.org.cn/down/update_log.for.SOAPfuse.txt

Note:
Requirements for database files are changed slightly in v1.25. Sorry for that.
Please use the script (SOAPfuse-S00-Generate_SOAPfuse_database.pl) in v1.25 SOAPfuse to reconstruct your database in One-Step.... read more

Posted by NOBEL89 2013-04-14 Labels: update v1.25
MongoDB Logo MongoDB