Menu

#21 Spot gzipped input when --seqcomp not given

open
nobody
None
1
2012-11-11
2012-10-01
Peter
No

If I (accidentally) run mrfast with a gizpped FASTQ file as the --seq argument, but omit the --seqcomp argument, it runs and tries to parse the file. This gives nonsense reads and terminates with a malloc error. e.g.

Sequence length: 619 bp. Error threshold is set to 25 bp.
You can override this value using the -e parameter.
mrfast(36815) malloc: *** error for object 0x7f81743fc058: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

or:

Sequence length: 2 bp. Error threshold is set to 1 bp.
You can override this value using the -e parameter.
mrfast(36751) malloc: *** error for object 0x7ff9d3cfc078: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug
Abort trap: 6

It is trivial to inspect the first few bytes of the sequence file and spot GZIP compression, and then issue a clear and specific error in this situation.

Ideally I would like the --seqcomp argument to be automatically set based on the input data, but I appreciate that is a bit more work (especially if you want to support streaming input from stdin).

Discussion


Log in to post a comment.