Staden Package / Discussion / Open Discussion: Reading FASTQ data

Will Stokes - 2016-05-23

Does Staden io_lib support reading FASTQ content into a Read object? If not, can you point me to where in the code I would begin to add such support myself? It appears there are multiple "standards" for encoding the quality data and it would be nice if I could use a library like io_lib to handle handle that nonsense for me. :-) I see there is a program to convert scf to fastq, but didn't see anything for the other way around.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- James Bonfield - 2016-05-23
  
  On Mon, May 23, 2016 at 01:12:39PM +0000, Will Stokes wrote:
  
  Does Staden io_lib support reading FASTQ content into a Read object?
  
  No, sorry. It's probably be a bit of a heavy-weight API for fastq
  too.
  
  If not, can you point me to where in the code I would begin to add
  such support myself? It appears there are multiple "standards" for
  encoding the quality data and it would be nice if I could use a
  library like io_lib to handle handle that nonsense for me. :-) I see
  there is a program to convert scf to fastq, but didn't see anything
  for the other way around.
  
  The only code in the Staden Package that deals with fastq would be in
  Gap5 itself (in the confusingly named staden/src/gap5/fasta.c).
  Possibly that should have been added to io_lib, but for whatever
  reason at the time I didn't (I can't think why - probably just
  laziness).
  
  However it doesn't deal with the multiple ways of encoding quality.
  Frankly I'd be inclined to ignore all other fastq anyway except the
  standard qval + 33 ('!'). The others were invented by Illumina (and
  subsequently dropped again I believe).
  
  Processing fastq yourself though isn't hard provided you make sure to
  use the length of the sequence as the indicator for how many quality
  values you should expect. Ie don't fall into the pit fall of a line
  of qualities starting with "@" represents the next sequence identifier
  if we haven't yet read enough quality values. Other than that it's
  such a simple format you can just roll your own with relatively few
  lines.
  
  James
  
  --
  James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
  | Plurima gyrabant gymbolitare vabo;
  A Staden Package developer: | Et Borogovorum mimzebant undique formae,
  https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi.
  
  --
  The Wellcome Trust Sanger Institute is operated by Genome Research
  Limited, a charity registered in England with number 1021457 and a
  company registered in England with number 2742969, whose registered
  office is 215 Euston Road, London, NW1 2BE.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Will Stokes - 2016-05-24
    
    Thanks, I appreciate the quick reply and suggestion.
    
    Will Stokes
    Chief Software Architect
    
    Follow us: Facebook https://www.facebook.com/SnapGene, Twitter
    https://twitter.com/SnapGene, Newsletter
    http://www.snapgene.com/company/newsletter/subscribe_to_our_newsletter/
    
    On Mon, May 23, 2016 at 12:27 PM, James Bonfield jkbonfield@users.sf.net
    wrote:
    
    On Mon, May 23, 2016 at 01:12:39PM +0000, Will Stokes wrote:
    
    Does Staden io_lib support reading FASTQ content into a Read object?
    
    No, sorry. It's probably be a bit of a heavy-weight API for fastq
    too.
    
    If not, can you point me to where in the code I would begin to add
    such support myself? It appears there are multiple "standards" for
    encoding the quality data and it would be nice if I could use a
    library like io_lib to handle handle that nonsense for me. :-) I see
    there is a program to convert scf to fastq, but didn't see anything
    for the other way around.
    
    The only code in the Staden Package that deals with fastq would be in
    Gap5 itself (in the confusingly named staden/src/gap5/fasta.c).
    Possibly that should have been added to io_lib, but for whatever
    reason at the time I didn't (I can't think why - probably just
    laziness).
    
    However it doesn't deal with the multiple ways of encoding quality.
    Frankly I'd be inclined to ignore all other fastq anyway except the
    standard qval + 33 ('!'). The others were invented by Illumina (and
    subsequently dropped again I believe).
    
    Processing fastq yourself though isn't hard provided you make sure to
    use the length of the sequence as the indicator for how many quality
    values you should expect. Ie don't fall into the pit fall of a line
    of qualities starting with "@" represents the next sequence identifier
    if we haven't yet read enough quality values. Other than that it's
    such a simple format you can just roll your own with relatively few
    lines.
    
    James
    
    --
    James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia
    Tova
    | Plurima gyrabant gymbolitare vabo;
    A Staden Package developer: | Et Borogovorum mimzebant undique formae,
    https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi.
    
    --
    The Wellcome Trust Sanger Institute is operated by Genome Research
    Limited, a charity registered in England with number 1021457 and a
    company registered in England with number 2742969, whose registered
    office is 215 Euston Road, London, NW1 2BE.
    
    Reading FASTQ data
    https://sourceforge.net/p/staden/discussion/347718/thread/6abe8727/?limit=25#c9e3/e5d5
    
    Sent from sourceforge.net because you indicated interest in
    https://sourceforge.net/p/staden/discussion/347718/
    
    To unsubscribe from further messages, please visit
    https://sourceforge.net/auth/subscriptions/
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Reading FASTQ data

Forums

Help

Reading FASTQ data

Reading FASTQ data

Forums

Help

Reading FASTQ data document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Reading FASTQ data