filterbysequence.sh does not accept padding dashes
demuxbyname.sh Exception in thread "main" java.lang.AssertionError: 22, 23, '^@'=0
I still spend all of my time developing BBTools. However, due to internal politics, the JGI website has changed to not feature any JGI-developed bioinformatics software (BBTools, MetaHipmer, etc). Feel free to send an email to JGI's director (nmouncey@lbl.gov) if you think it is helpful for JGI's users to have bioinformatics software such as BBTools on its website, because helping users is supposed to be our goal.
Okay, the above URL linked from the project website still does not work. However, I have found what appears to be the original content at https://archive.jgi.doe.gov/data-and-tools/software-tools/bbtools/ . I'm not sure the implication of this now being on the "archive" website - does that mean the project has been wound up? Although that doesn't appear to be consistent with the amount of recent updates (since the above post) to BBMap/ BBTools, William
Okay, the above URL linked from the project website still does not work. However, I have found what appears to be the original content at https://archive.jgi.doe.gov/data-and-tools/software-tools/bbtools/ . I'm not sure the implication of this now being on the "archive" website - does that mean the project has been wound up? William
I'm asking because I can see very different results in the read number of duplicated reads when checking a specific file. When running it with fatqc I get over 90% duplications, when using fastp to calculate duplications I have only 47.3% duplicated reads, but when testing it with clumpify I get 0.213% duplication rate. I can't explain these differences, unless I'm using the wrong parameters for searching duplications. The commands for the two tools are as followed: fastp --in1 ${prefix}${file}_R1.fastq.gz...
dupedist value for AVITI machine
Great. Thank you Brian!
Found the problem in the parser; it will be fixed in 39.20.
OK! I'll look into it and get back to you.
hi Brian, thanks for the reply. It's the newest version - 39.19. I just deployed it by conda yesterday.
That's odd; looks to me like it should be working. What version are you using?
Parameter 'stats2' in rqcfilter2.sh is not recognized
Does filterbyname remove all reads in the fastq files that match a read name in the txt file? Are duplicates in the input files an issue?
Project website for BBMap appears to have disappeared
repair.sh swaps sequences in the output fastqs
java.lang.AssertionError with forcemerge=t
39.15 is out now.
reformat.sh producing output with 4 reads having the same id
Hmmm... looks like the issue here is that the intermediate fastq file is not really interleaved. Sam/bam files are not interleaved in the first place and should never be treated as such; the order of records is arbitrary and reformatting them as fastq (with samtools) doesn't change that. If you force Reformat to process a noninterleaved file as interleaved, strange things will happen; in this case, the records for that pair are nonadjacent and that causes the header replication, as per the sam specification...
:-O Blazing fast! Thanks a lot! We will run this typically with pair end data. Let's see how it goes. The flag to classify reads as duplicates only if UMIs match will be useful too. There are a lot of ways to decide if two different UMIs are the same, but this basic method (exact identity) will be more than enough for a quick classification of non-aligned data.
Issues when reading IDs with UMIs
All fixed; will be released in BBTools 39.15. Along with the new flags "umi" and "umisubs" so that you can require reads to only be classified as duplicates if their UMIs match.
Thanks for this report... JGI doesn't use UMI's so I haven't seen them before in Illumina reads. I've duplicated the error and am modifying my header parsers to support UMIs, so that will work correctly in the next release.
Issues when reading IDs with UMIs
reformat.sh producing output with 4 reads having the same id
I was looking for a java-based alternative to either cutadapt or ngmerge, and bbmerge seems to be able to do much of what both other tools can do (primarily I was looking for fixed read trimming, quality trimming, and paired read dovetail trimming). A few suggested improvements: * Allow simultaneous fixed trimming and q-trimming. Right now it seems like q-trimming supercedes fixed trimming if both are specified, regardless of which end you want to trim in each case; but there are situations where...
I was looking for a java-based global short-read aligner and this software performs excellently, with lots of neat features that many other aligners lack. In descending order of priority, here are some ideas for improvement: * Update SAM spec to v1.6 (currently 1.3/1.4), primarily to produce an updated @HD line to indicate query grouped output (GO:query). Many downstream tools (e.g. some fgbio tools) rely on this and won't work unless this is explicitly specified, and resorting just to add this annotation...
When I wrote that, PacBio did not have paired reads. They have a new sequencing machine now for short reads that I think does produce pairs but I have not seen any data for it so I'm not sure of the header structure.
I will check in with the lab, but my understanding is that these came from a NextSeq or NovaSeq and didn't have any modifications. Thanks for the quick response. I took a peak at FASTQ.java and saw the following code block: // Here we try to weed out PacBio, which will differ after the last slash: for (int i = idxSlash1 + 2; i < len1; i++) { if (id1.charAt(i) != id2.charAt(i)) { return false; } } I am using reformat.sh to do the following: - make sure reads are paired - count the number of reads/bases...