Hi. As you may have inferred from ticket #63. I am trying to build a FASTQ QC pipeline such that I do not spend additional compute on FASTQs predestined to fail for structural reasons. One of the edge cases I aim to cover is if a gzipped FASTQ is incompletely downloaded. Here is the snippet:
$ ./bbmap/reformat.sh in=unit_test_fq/reads-corrupted_R1.fastq.gz in2=unit_test_fq/reads-corrupted_R2.fastq.gz
java -ea -Xmx300m -Xms300m -cp /home/max/fastq_qc_redux/bbmap/current/ jgi.ReformatReads in=unit_test_fq/reads-corrupted_R1.fastq.gz in2=unit_test_fq/reads-corrupted_R2.fastq.gz
Executing jgi.ReformatReads [in=unit_test_fq/reads-corrupted_R1.fastq.gz, in2=unit_test_fq/reads-corrupted_R2.fastq.gz]
No output stream specified. To write to stdout, please specify 'out=stdout.fq' or similar.
Set INTERLEAVED to false
[E::bgzf_read] Read block operation failed with error 4 after 5 of 65280 bytes
Error 5 in block starting at offset 33(21)
Input is being processed as paired
[E::bgzf_read] Read block operation failed with error 4 after 5 of 65280 bytes
Error 5 in block starting at offset 33(21)
Input: 0 reads 0 bases
Output: 0 reads (NaN%) 0 bases (NaN%)
Time: 0.721 seconds.
Reads Processed: 0 0.00k reads/sec
Bases Processed: 0 0.00m bases/sec
As you can see the [E::bgzf_read]
printout is returned. What is problematic is that the exit code returns successful.
$ echo $?
0
I can redirect this text to stdout and use grep
, awk
, etc.. to handle such errors, but my question is how nontrivial of a lift would it be to propagate error codes for what I feels similar to HTSlib-esque calls underneath.
Forgive me again. I tried to use chatGPT to discern what was going on in the source, but I'm not fluent in java.
As always, thank you for your time and help.
Another alternative solution for me might just be to run
gzip -dct ${FASTQ_PATH}
on each gzipped fastq I'd like to analyze. This will catch gzip corruption. However it might be useful to propagate errors like these directly through bbmap suite.Good suggestion; I'm opening a new process for bgzip and piping the input. Shouldn't be too hard to catch the error code.