It appears that v4.5 does not correctly handle windows line-endings which contain carriage returns. Following compression and decompression an 'N' is added to the sequence and an '!' to the quality string, a carriage return is kept at the end of the sequence ID line but is absent from the optional read ID line which preceeds the quality string.
Cheers,
Nathan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What I'm thinking is to auto-detect whether there are '\r' characters in the file and to set a flag in the header so it'll reproduce them on output. This means you get the same out as you put in, but you can't encode on windows and extract in unix format or vice versa.
Is that sufficient?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I reverted that decision in the end after experimenting with it in code - not very clean.
Now it simply reads \r\n or \n line ends and works regardless (provided the file doesn't change format mid way). Extracting is always in \n unix style format though with no choice for windows formats.
I also bug fixed a check-sum error when read names exist after the + line.
See the uploaded 4.6 tarball. This is binary compatible with the 4.5 output.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It appears that v4.5 does not correctly handle windows line-endings which contain carriage returns. Following compression and decompression an 'N' is added to the sequence and an '!' to the quality string, a carriage return is kept at the end of the sequence ID line but is absent from the optional read ID line which preceeds the quality string.
Cheers,
Nathan
Oops. Thanks for the bug report Nathan.
I'll investigate the issue.
What I'm thinking is to auto-detect whether there are '\r' characters in the file and to set a flag in the header so it'll reproduce them on output. This means you get the same out as you put in, but you can't encode on windows and extract in unix format or vice versa.
Is that sufficient?
I reverted that decision in the end after experimenting with it in code - not very clean.
Now it simply reads \r\n or \n line ends and works regardless (provided the file doesn't change format mid way). Extracting is always in \n unix style format though with no choice for windows formats.
I also bug fixed a check-sum error when read names exist after the + line.
See the uploaded 4.6 tarball. This is binary compatible with the 4.5 output.