Menu

how to leave out insert size in qsv .ini file

xuan wang
2017-02-06
2017-02-06
  • xuan wang

    xuan wang - 2017-02-06

    Hi developers,

    I'm trying to use qsv on my tumor data.

    Here is the content of test .ini file.

    [general]
    log = DT.1.log
    loglevel = DEBUG
    sample = DT.1
    sv_analysis = both
    output = /home/beta/tmp
    reference = /home/beta/pub/genome/dna.toplevel.withMaskedY.SimpleID.fa
    platform = illumina

    [pair]
    pairing_type = pe
    mapper = bwa-mem

    [clip]
    blatpath = /home/beta/bin/x86_64/
    blatserver = 127.0.0.1
    blatport = 3456

    [test]
    name = DT.1
    input_file = /data/projects/bwamap/DT.1.out/DT.1.srt.dedup.withdbsnprealign.withdbsnpNchiprecal.bam

    [control]
    name = DH.1
    input_file = /data/projects/bwamap/DH.1.out/DH.1.srt.dedup.withdbsnprealign.withdbsnpNchiprecal.bam

    when I run:
    java -Xmx40g -jar /full/path/qsv-0.3.jar -ini /full/path/test.ini -tmp /tmp/directory
    I get:
    some {16:20:13.027 [main] INFO org.qcmg.qsv.QSV} line# omit this part info

    usage: qsv [OPTIONS] --ini [ini_file] --tmp [temporary_directory]
    org.qcmg.qsv.QSVException: No insert sizes were provided in the ini file
    at org.qcmg.qsv.QSVParameters.getISizesFromIniFile(QSVParameters.java:177)
    at org.qcmg.qsv.QSVParameters.<init>(QSVParameters.java:125)
    at org.qcmg.qsv.QSVPipeline.setQSVParameters(QSVPipeline.java:142)
    at org.qcmg.qsv.QSVPipeline.<init>(QSVPipeline.java:110)
    at org.qcmg.qsv.QSV.runQSV(QSV.java:89)
    at org.qcmg.qsv.QSV.main(QSV.java:36)
    16:20:13.094 [main] SEVERE org.qcmg.qsv.QSV - org.qcmg.qsv.QSVException: No insert sizes were provided in the ini file
    at org.qcmg.qsv.QSVParameters.getISizesFromIniFile(QSVParameters.java:177)
    at org.qcmg.qsv.QSVParameters.<init>(QSVParameters.java:125)
    at org.qcmg.qsv.QSVPipeline.setQSVParameters(QSVPipeline.java:142)
    at org.qcmg.qsv.QSVPipeline.<init>(QSVPipeline.java:110)
    at org.qcmg.qsv.QSV.runQSV(QSV.java:89)
    at org.qcmg.qsv.QSV.main(QSV.java:36)

    16:20:13.095 [main] EXEC org.qcmg.qsv.QSV - StopTime 2017-02-06 16:20:13
    16:20:13.095 [main] EXEC org.qcmg.qsv.QSV - TimeTaken 00:00:00
    16:20:13.095 [main] EXEC org.qcmg.qsv.QSV - ExitStatus 1

    Please tell me how to write the ini file to let qsv calculate insert size auto?

    Thanks!
    Xuan

     

    Related

    Discussion: general

  • Ollie Holmes

    Ollie Holmes - 2017-02-07

    Hi Xuan,
    At present, qSV will not automatically detect the insert size.
    It used to do this, but we found that the results were problematic.
    I have update the wiki page to more accurately reflect this.

    There are a number of different ways of getting this information externally (eg. using qProfiler or Picard's CollectISizeMetrics (http://broadinstitute.github.io/picard/command-line-overview.html#CollectInsertSizeMetrics)
    You will then need to update your ini file with the gathered isize information and try running qsv again.

    Please let me know if you experiece any further issues.
    Thanks for using qSV!
    Cheers,
    Oliver Holmes

     
    • xuan wang

      xuan wang - 2017-02-09

      I tested the previous bug and found the output of qprofiler must be .xml suffix.
      However the html file maybe also have a bug. You use systemsbiology-visualizations utility in googlecode to show graph and information, but google code has already stopped, the project moved to github now.https://github.com/IlyaLab/systemsbiology-visualizations

      Please update related q-software :)

       
  • xuan wang

    xuan wang - 2017-02-09

    Hello Oliver,
    Thank you for helping me and sharing your tool!

    About insert size, CollectISizeMetrics in Picard can't give result by readgroup.
    ReadGroupProperties in GATK can only give the median isize by RG.
    samtools stat -S RG the.bam can give mean isize and sd by RG.
    qProfiler seems OK, but the xml format output is tough. The qvisualise doesn't work on my server. Bug info:

    11:05:04.749 [main] EXEC org.qcmg.qvisualise.QVisualise - Uuid fdb9da05_0d9c_4578_8b59_1b52de2e518f
    11:05:04.749 [main] EXEC org.qcmg.qvisualise.QVisualise - StartTime 2017-02-09 11:05:04
    11:05:04.749 [main] EXEC org.qcmg.qvisualise.QVisualise - OsName Linux
    11:05:04.749 [main] EXEC org.qcmg.qvisualise.QVisualise - OsArch amd64
    11:05:04.750 [main] EXEC org.qcmg.qvisualise.QVisualise - OsVersion 3.10.0-229.20.1.el7.x86_64
    11:05:04.750 [main] EXEC org.qcmg.qvisualise.QVisualise - RunBy wangxuan
    11:05:04.750 [main] EXEC org.qcmg.qvisualise.QVisualise - ToolName qvisualise
    11:05:04.750 [main] EXEC org.qcmg.qvisualise.QVisualise - ToolVersion 0.1pre (118)
    11:05:04.750 [main] EXEC org.qcmg.qvisualise.QVisualise - CommandLine qvisualise -i /home/wangxuan/tmp/qp.out -log /home/wangxuan/tmp/qp.log
    11:05:04.750 [main] EXEC org.qcmg.qvisualise.QVisualise - JavaHome /home/wangxuan/software/jre1.8.0_111
    11:05:04.750 [main] EXEC org.qcmg.qvisualise.QVisualise - JavaVendor Oracle Corporation
    11:05:04.750 [main] EXEC org.qcmg.qvisualise.QVisualise - JavaVersion 1.8.0_111
    11:05:04.750 [main] EXEC org.qcmg.qvisualise.QVisualise - host centaurus
    11:05:04.793 [main] WARNING org.qcmg.qprofiler.QProfiler - qVisualise failed for qprofiler output: /home/wangxuan/tmp/qp.out
    11:05:04.793 [main] EXEC org.qcmg.qprofiler.QProfiler - StopTime 2017-02-09 11:05:04
    11:05:04.793 [main] EXEC org.qcmg.qprofiler.QProfiler - TimeTaken 00:21:09
    11:05:04.793 [main] EXEC org.qcmg.qprofiler.QProfiler - ExitStatus 0

    Could you please help me again?

    BTW, what's the detail meaning of "upper/lower" value? which Statistics, SD or MAD or Any percentage?

    If I use qsv in my paper, which paper should I cite? This one?
    http://www.biotechniques.com/BiotechniquesJournal/2014/July/A-workflow-to-increase-verification-rate-of-chromosomal-structural-rearrangements-using-high-throughput-next-generation-sequencing/biotechniques-352784.html

    Cheers,
    Xuan

     
  • Ollie Holmes

    Ollie Holmes - 2017-02-13

    Hi Xuan,

    If you run Picard with the "METRIC_ACCUMULATION_LEVEL" option set to "READ_GROUP" then you should hopefully get readgroup specific isize metrics.

    We are hoping to release a newer version of qprofiler and qvisualise soon which should address the isues that you are experiencing.

    The Kelly Quek paper that you linked to is fine for citing qsv, as it has yet to be published...

    Hope this helps.
    Cheers,
    Oliver Holmes

     
    • xuan wang

      xuan wang - 2017-02-15

      Oliver, could you please tell me the definition of "upper/lower" insert size value? How should I set this part when I get many statistics from picard.
      Thanks a lot!

       
  • Ollie Holmes

    Ollie Holmes - 2017-02-15

    Hi Xuan,
    You want to set the insert size bounds so that the majority of reads would be within the range.
    For example, in the following plot, a lower value of 100, and an upper value of 450 would be reasonable.
    https://duckduckgo.com/?q=picard+CollectInsertSizeMetrics&ia=images&iax=1&iai=http%3A%2F%2Fblog.amelieff.jp%2Fimages%2Finsertsize.png

    Hope this helps,
    Oliver Holmes

     
    • xuan wang

      xuan wang - 2017-02-19

      Hi Oliver,
      Visual estimation may mot be very well and batch usable, but I did that.
      Now clip mode is okay, but pair mode is still in trouble.

      I use bwa mem -a -M -R "my readgroup" ref.fa read1.fq read2.fq >out.sam to align all hiseq-Xten pair end data. Is it okay for qsv?
      Then I remove duplication with picard, realign indel and do BQSR with GATK. Finally, one of my reads contain such information:
      ST-E00144:112:HF5FJCCXX:2:2123:14702:21895 163 1 135 0 4S146M = 193 208 AAAACATCTTACTTTTGAGAGTTGAGCTGACCCCCAGTCCCTCACAGTTCCACACTGCCTGCAGAGTGAGTTTCCCATGTCTTCACCAGAGACTTTTGCCAGAGGCTTCTGAGACGCAAGTTAACAATGCAGACATGGAGGGTATCTCCA =<,<;===>==:=<==<<=7=:;=:=;;==<:::==<:=6;>:=<==<=/;=<=<>=;:>;8====<===-7+<618<=<:<===<:==6=;<>====<=====<;>=9<=;>;=9698=<=>),:),4-)+9<)7=69=<=7;<0<89= MC:Z:150M BD:Z:DD>>IGKJGFDIGE==FHECDGEEGEIIHHGGAAAGHHHGAGEHDDHHFEGGDDDGHIGGHIIHDEHDHEHF=EGAGHIEHEEEHDGGHDEDGGE==FIGGHDEGGIEEEHHEEHGFJHEIGDEFEHFJJJIFIFJKJHHJDJLJMEEGG MD:Z:9G0C13C4T100C15 PG:Z:MarkDuplicates RG:Z:DT.1.novo.250_L2 BI:Z:GGDDGGHHFEFFHEBBFHEEEHFFHEHHGHEFCCCHHHGFCGEHDFHHFFFHDFDHGFFGGFHHEEHFHEIFCGGDIHHGHGFGIEGIIFFFFIFCCGGGIIFFHHIFGGHIFFFHGIHGJHHHGHIIIHJJGGIJJJJIKHKKKKGFGI NM:i:5 MQ:i:0 AS:i:121 XS:i:125

      bwa mem dosen't output SM NH X0 XA field anymore. So qsv stopped with informaton:

      13:22:45.443 [pool-1-thread-2] SEVERE org.qcmg.qsv.annotate.AnnotateFilterMT - org.qcmg.qsv.QSVException: No discordant pair records passed the filter.
      at org.qcmg.qsv.annotate.AnnotateFilterMT.run(AnnotateFilterMT.java:182)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at java.lang.Thread.run(Thread.java:745)

      usage: qsv [OPTIONS] --ini [ini_file] --tmp [temporary_directory]
      org.qcmg.qsv.QSVException: Exception observed when annotating and filtering BAMs
      at org.qcmg.qsv.QSVPipeline.annotateAndFilterBams(QSVPipeline.java:414)
      at org.qcmg.qsv.QSVPipeline.runPipeline(QSVPipeline.java:237)
      at org.qcmg.qsv.QSV.runQSV(QSV.java:90)
      at org.qcmg.qsv.QSV.main(QSV.java:36)
      13:22:45.444 [main] SEVERE org.qcmg.qsv.QSV - org.qcmg.qsv.QSVException: Exception observed when annotating and filtering BAMs
      at org.qcmg.qsv.QSVPipeline.annotateAndFilterBams(QSVPipeline.java:414)
      at org.qcmg.qsv.QSVPipeline.runPipeline(QSVPipeline.java:237)
      at org.qcmg.qsv.QSV.runQSV(QSV.java:90)
      at org.qcmg.qsv.QSV.main(QSV.java:36)

      13:22:45.444 [main] EXEC org.qcmg.qsv.QSV - StopTime 2017-02-19 13:22:45
      13:22:45.444 [main] EXEC org.qcmg.qsv.QSV - TimeTaken 00:27:25
      13:22:45.445 [main] EXEC org.qcmg.qsv.QSV - ExitStatus 1

      I think we need another filter query compatible with widely used bwa-mem. What's your opinion?

      Best wish!
      Xuan

       

      Last edit: xuan wang 2017-02-19
  • Ollie Holmes

    Ollie Holmes - 2017-02-19

    Hi Xuan,
    I agree, you will need to dial the fitlers to suit your bam files.

    Details on the filtering options are available on the wiki:
    https://sourceforge.net/p/adamajava/wiki/qsv/#filter-options

    Please let me know if you have any further questions.
    Cheers,
    Oliver Holmes

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.