Thanks for the new release. I really like the support for RTA data and indexing reads.
I have been playing around with creating SRF files including the raw information (-b) and wanted to check if I was successful, especially with paired and indexed runs.
I've tried to use the srf2fastq tool provided with the io_lib package, however I get the following error message:
"Zero or greater than one CNF chunks found."
I also get this message with the SRF files generated with the GA-Pipeline v.1.4 SRF_ARCHIVE_REQUIRED option. Don't know if I'm doing something wrong.
Do you know the best way to test if the generated SRF files are valid?
This is srf2fastq not being very friendly. You need to use the '-c' option to tell it to read the 'calibrated' confidence values from the SRF file (i.e. the ones originally stored in the qseq.txt files).
Run 'srf2fastq -h' to get a list of the other options that it understands.
Thanks for the reply. I missed the -c option, now it's working for me.
Do you get fastq with proper qualities using srf2fastq?
Also, how do you produce srf files?
Apparently srf2fastq is trying to convert scores in a weird way… I believe it's trying to convert in sanger format assuming input in solexa format or worse (i.e. scores like 'b' become 'D')
The quality values produced by srf2fastq are correct.
SRF is a binary format, so it stores the actual quality values as integers along with a flag to say if they are in the (now very obsolete) log-odds scale, or if they use the phred scale. srf2fastq takes this data and then outputs it as a Sanger-formatted fastq file (I.e. phred+33).
For information on fastq encodings, see the wikipedia fastq page, or the Nucleic Acids Research article on the fasta format (doi:10.1093/nar/gkp1137).