Menu

socs 2.2 execution time and progress

Help
2014-07-09
2014-07-09
  • Hugo Lopez Fernandez

    Hello,

    I am using socs 2.2 to analyze one sample (composed by files TEST_L50_Q40_CG20_CH80.csfasta and TEST_L50_Q40_CG20_CH80.qual) against the whole hg19.fa (about 3GB).

    The command I ran was:

    socs -p -r ./reference/hg19.fa -c ./data/single_end/TEST_L50_Q40_CG20_CH80/TEST_L50_Q40_CG20_CH80.csfasta -q ./data/single_end/TEST_L50_Q40_CG20_CH80/TEST_L50_Q40_CG20_CH80.qual -d "analysis-socs/TEST_L50_Q40_CG20_CH80" -T 4

    It has been running for 24 hours and the only files generated are the following (note that the last modification was 10 hours ago in the alignments.txt file):

    analysis-socs/TEST_L50_Q40_CG20_CH80:
    total 1,8M
    -rw-rw-r-- 1 bssequtils bssequtils 1,8M jul 9 00:52 alignments.txt
    drwxr-xr-x 3 bssequtils bssequtils 4,0K jul 8 10:55 index

    analysis-socs/TEST_L50_Q40_CG20_CH80/index:
    total 8,0K
    drwxr-xr-x 2 bssequtils bssequtils 4,0K jul 8 10:55 contigs
    -rw-rw-r-- 1 bssequtils bssequtils 12 jul 8 10:55 files.csv

    analysis-socs/TEST_L50_Q40_CG20_CH80/index/contigs:
    total 4,0K
    -rw-rw-r-- 1 bssequtils bssequtils 1,7K jul 8 10:55 hg19.fa.csv

    The two things that strike me are: the long execution (without [almost] modifying files) and that in the standart output it is not reporting the execution progress.

    I have no idea if socs is performing any operation in this momment. I hope you can help me.

    Thank you very much in advance.

    Regards,

    Hugo.

     
  • Brian Ondov

    Brian Ondov - 2014-07-09

    Hi Hugo,

    For the human genome, I would recommend adding -m 10 to limit the number of matches per read, since it will store all of them by default. I would also adjust the RAM (-R, in MB) to more closely match your system, since the default is only 1000 (1GB). Turning the sensitivity (-s) down to 1 or 2 will also help, though probably at the expense of matches found. You can counter that with more aggressive filtering and trimming (-y and -x). So, for example, I would add:

    -m 10 -R 32000 -T 16 -s 2 -y 15 -x 5

    ...for a machine with 16 cores and 32GB RAM. That said, SOCS is quite slow in bisulfite mode for the human genome, and will do better on a compute cluster if you have access to one. You could try the above in interactive mode (without -p) first to get an idea of how it's progressing, but the complete run could still take days.

     
  • Hugo Lopez Fernandez

    Thank you very much for your valuable and detailed response.

    The analysis have just finished right (without a warning that I put at the end of this post) so I will use the parameters that you have suggested (with RAM and number of threads adjusted to my workstation capabilities) for the analysis of the next sample.

    Thank you very much.

    cat analysis-socs/TEST_L50_Q40_CG20_CH80/warnings.log
    [-- Warning 1 --] 1855239059 substrings of hg19.fa ignored
    due to 1706774051 character(s) other than [ACGT].

     

Anonymous
Anonymous

Add attachments
Cancel