phred scale

daweonline
2014-01-15
2014-01-16
  • daweonline

    daweonline - 2014-01-15

    Dear James,
    thanks for fqzcomp. I'm compressing a lot of old fastq files and it is working fine. I noticed I have some phred-64 encoded sequences, will fqzcomp properly work with them? Should I convert them (on the fly) to phred33?

    Thanks

    d

     
    • James Bonfield

      James Bonfield - 2014-01-16

      Hello,

      On Wed, Jan 15, 2014 at 07:48:30PM +0000, daweonline wrote:

      thanks for fqzcomp. I'm compressing a lot of old fastq files and it
      is working fine. I noticed I have some phred-64 encoded sequences,
      will fqzcomp properly work with them? Should I convert them (on the
      fly) to phred33?

      A good question and one I had to look up.

      The code has a #define QMAX 64 near the top:

      / Keep as a power of 2 /
      //#define QMAX 128

      define QMAX 64

      You can see I had it as 128 for a while, but commented that out. I
      believe you'd need to revert back to 128 as QMAX to support both
      phred-64 and phred-33 encoded sequences. Doing so however would mean
      it is incompatible with other fqzcomp output so it limits you do using
      it for local storage only. (Otherwise people won't be able to decode
      your files without also changing their code.)

      A quality of 64 max means Illumina phred-64 encodings could support
      values up to 32, but higher than that would wrap around (I think). The
      best strategy is simply to try encoding, decoding and then comparing.
      I believe it should be quite obvious that it's truncated the data.

      James

      PS. The reason for limiting it to 64 instead of 128 is simply that it
      uses less memory.

      --
      James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
      | Plurima gyrabant gymbolitare vabo;
      A Staden Package developer: | Et Borogovorum mimzebant undique formae,
      https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi.

      --
      The Wellcome Trust Sanger Institute is operated by Genome Research
      Limited, a charity registered in England with number 1021457 and a
      company registered in England with number 2742969, whose registered
      office is 215 Euston Road, London, NW1 2BE.

       

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks