Menu

Potential Bug

Emily
2013-07-04
2013-08-29
  • Emily

    Emily - 2013-07-04

    Hi,

    I have just started using srst this week and I think it is great!

    However I think that there might be a minor bug in srst.py. Basically, it crashes when it can't find a particular allele. Having looked at the code I believe that this is because the location of samtools.pl is incorrect, all of the installation of samtools that I have worked with hold this perl script in the misc folder rather than with the samtools executable.

    I guess the user can move samtools.pl up a directory or modify the script to point to the correct place.

    I hope that this is useful.

    Thanks,

    Emily

     
  • Lee Katz

    Lee Katz - 2013-08-29

    I am also finding a similar error. I really want it to work, so I am debugging it now with my limited knowledge of Python. I believe that it is due to using samtools.pl to generate a fastq file (and also, the samtools.pl path is incorrect; you are right). However, the authors should use the command that Samtools recommends:

    samtools mpileup -uf ref.fa aln.bam | bcftools view -cg - | vcfutils.pl vcf2fq > cns.fq

    Additionally, SRST should make sure that the user has samtools, bcftools, and vcftools.pl in the PATH.

     
  • Lee Katz

    Lee Katz - 2013-08-29

    I fixed the script by altering the following subroutine. In essence, samtools is not being used correctly. I believe that the current version of SRST does not support the newest version of samtools, and this should help.

    Additionally, I added the following lines near line 430 denoted with #LK

    while pos < here:
      holes += 1
      pos += 1
    if len(t) < 5: # LK
      t.append("") # LK
    v = pile(t[2], t[4])
    
    
    # the subroutine I changed
    def getNovelAllele(variant,locus,ranges,files,workDir,logFile,samtools,bwa,resBase,nameSep,refSeqs,paired,insertSize):
      allele_name = locus + nameSep + variant ###CHECK DELIMITER VARIABLE NAME
      # get reference sequence = closest match
      typeFn = allele_name + ".fa"
      typeFile = open(typeFn, "w")
      s = refSeqs[allele_name]
      Bio.SeqIO.write([Bio.SeqRecord.SeqRecord(s, id=allele_name)], typeFile, "fasta")
      typeFile.close()
      # index it
      null = open(os.devnull, 'w')
      p = subprocess.call([bwa, 'index', typeFn], stderr=null)
      # align reads to it
      sais = alignReads(workDir, resBase+"."+allele_name, paired, typeFn, files, logFile, bwa)
      bam = bamify(workDir, resBase+"."+allele_name, files, sais, typeFn, False, logFile, insertSize, bwa, samtools)
      bamFn = resBase + "." + allele_name + ".bam"
      p = subprocess.call(['mv',bam,bamFn],stderr=null) # copy to current directory from working dir
      print >> logFile, "Alignments are in " + bamFn
      # generate pileup and fastq
      piFn = resBase + "." + allele_name + '.pileup'
      vcfFn = resBase + "." + allele_name + '.vcf'
      fqFn = resBase + "." + allele_name + '.fastq'
      print >> logFile, 'Writing pileup to ' + piFn
      # index
      p = subprocess.Popen([samtools, 'index', bamFn], stdout=subprocess.PIPE, stderr=null)
      # create vcf
      p = subprocess.Popen([samtools, 'mpileup', '-uf', typeFn, bamFn], stdout=subprocess.PIPE)
      p = subprocess.Popen(['bcftools', 'view', '-cg', '-'], stdin=p.stdout, stdout=subprocess.PIPE)
      vcf = open(vcfFn, 'w')
      for l in p.stdout:
      vcf.write(l)
      vcf.close()
      # create the fastq file
      p = subprocess.Popen(['vcfutils.pl', 'vcf2fq', vcfFn], stdout=subprocess.PIPE)
      f = open(fqFn, 'w')
      for l in p.stdout:
      f.write(l)
      f.close()
      fq = Bio.SeqIO.read(fqFn,"fastq")
      slice = fq[ ranges[allele_name][0] : ranges[allele_name][1] ]
      slice.id = resBase+"."+allele_name
      Bio.SeqIO.write(slice, fqFn, "fastq")
    
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.