I'm using reformat.sh
for a FastQ to SAM conversion. For proper results, I need to pre-process the FastQ files, and I would like to avoid writing them to disk. To do this, I need to provide two streams of data (two ends) to reformat.sh
so I cannot use stdin.
The easy way would be using bash's process substitution:
reformat.sh in1=<(preprocess "$fq1") in2=<(preprocess "$fq2") ...
but this outputs zero data, presumably because it reads the first bytes to decide on the format and quality enconding of the input and then reset the stream expecting to start over again. That works on files, but not streams.
Adding parameter qin=33
does not help, as the format is still to be decided and results are the same.
Adding the format to the input file (like stdin.fq
) also fails:
reformat.sh in1=<(preprocess "$fq1").fq in2=<(preprocess "$fq2").fq qin=33 ...
In this case, everything is known in advance, so no real need to read the file prior to real processing, but now the actual file looks like /dev/fd/63.fq
, and when trying to open it, it fails, as the file to be opened is /dev/fd/63
.
Finally, the workaround I used to all this is to use named pipes:
tmp1=$(mktemp --dryrun tmp.XXXXXX_1.fastq) tmp2=$(mktemp --dryrun tmp.XXXXXX_2.fastq) mkfifo "$tmp1" "$tmp2" preprocess "$fq1" > "$tmp1" & preprocess "$fq2" > "$tmp2" & reformat.sh in1="$tmp1" in2="$tmp2" ... wait rm "$tmp1" "$tmp2"
but this is cumbersome. Create temporary files, create the pipes, remove them later (better done by a trap on EXIT), sending jobs to background and waiting for them... this is a lot of structural code and more error prone compared to the easy approach of the process substitution.
It would be good to have way to specify all the unknowns of the input and output files through parameters instead of relying on the naming or auto-discovery. That would allow the code to work with the easier approaches.
Just discovered the
extin=fq
parameter. But havingdoes not solve the issue either:
Last edit: Jordi Camps 2020-09-24