issue with in silico PCR on ONT reads
BBMap short read aligner, and other bioinformatic tools.
Brought to you by:
brian-jgi
Dear Brian,
I have genomic data from ONT from which I want to extract the 16S subset for training purpose
I tried applying your method from (https://www.biostars.org/p/216039/#216054) on these PROM reads (ERR3152365 from Loman . et al).
The first msa.sh command works after specifying phred33 with 'qin=33 'as reported by an error message
msa.sh -Xmx20g qin=33 in=ERR3152365.fastq.gz out=forward.sam literal="AGAGTTTGATCMTGGCTCAG" rcomp=t
java -ea -Xmx20g -cp /opt/biotools/miniconda3/envs/atwork3/opt/bbmap-38.67-0/current/ jgi.FindPrimers -Xmx20g qin=33 in=ERR3152365.fastq.gz out=forward.sam literal=AGAGTTTGATCMTGGCTCAG rcomp=t
Executing jgi.FindPrimers [-Xmx20g, qin=33, in=ERR3152365.fastq.gz, out=forward.sam, literal=AGAGTTTGATCMTGGCTCAG, rcomp=t]
and is still running with a growing sam output
but the reverse primer command fails at start even with 'qin=33'
msa.sh -Xmx20g qin=33 in=ERR3152365.fastq.gz out=reverse.sam literal="CGGTWACCTTGTTACGACTT" rcomp=t
java -ea -Xmx20g -cp /opt/biotools/BBMap/current/ jgi.FindPrimers -Xmx20g qin=33 in=ERR3152365.fastq.gz out=reverse.sam literal=CGGTWACCTTGTTACGACTT rcomp=t
Executing jgi.FindPrimers [-Xmx20g, qin=33, in=ERR3152365.fastq.gz, out=reverse.sam, literal=CGGTWACCTTGTTACGACTT, rcomp=t]
Warning! Changed from ASCII-33 to ASCII-64 on input ?: 63 -> 32
Up to 12 prior reads may have been generated with incorrect qualities.
If this is a problem you may wish to re-run with the flag 'qin=33' or 'qin=64'.
***WARNING***: The ASCII quality encoding offset (64) may not be set correctly.
Problematic read number 12: @ERR3152365.13 0194f8dc-5342-405e-9286-621d87a7b34e/1
Exception in thread "Thread-1" java.lang.ArrayIndexOutOfBoundsException: -22
at stream.Read.validateCommonCase_branchless(Read.java:351)
at stream.Read.validate(Read.java:115)
at stream.Read.<init>(Read.java:78)
at stream.Read.<init>(Read.java:52)
at stream.FASTQ.quadToRead_slow(FASTQ.java:896)
at stream.FASTQ.toReadList(FASTQ.java:741)
at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:107)
at stream.FastqReadInputStream.hasMore(FastqReadInputStream.java:73)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:664)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:653)
Do you have an idea what is wrong here?
Do the two threads conflict and I should wait for 1 to finish?
Thanks in advance
My apologies, I was running an old msa.sh version 38.32 because I was outside of my conda env for the second command (separate shell). It is now running right with version 38.63
Last edit: Stephane Plaisance 2019-09-11