filterbysequence.sh does not accept padding dashes
demuxbyname.sh Exception in thread "main" java.lang.AssertionError: 22, 23, '^@'=0
I still spend all of my time developing BBTools. However, due to internal politics, the JGI website has changed to not feature any JGI-developed bioinformatics software (BBTools, MetaHipmer, etc). Feel free to send an email to JGI's director (nmouncey@lbl.gov) if you think it is helpful for JGI's users to have bioinformatics software such as BBTools on its website, because helping users is supposed to be our goal.
Okay, the above URL linked from the project website still does not work. However, I have found what appears to be the original content at https://archive.jgi.doe.gov/data-and-tools/software-tools/bbtools/ . I'm not sure the implication of this now being on the "archive" website - does that mean the project has been wound up? Although that doesn't appear to be consistent with the amount of recent updates (since the above post) to BBMap/ BBTools, William
Okay, the above URL linked from the project website still does not work. However, I have found what appears to be the original content at https://archive.jgi.doe.gov/data-and-tools/software-tools/bbtools/ . I'm not sure the implication of this now being on the "archive" website - does that mean the project has been wound up? William
I'm asking because I can see very different results in the read number of duplicated reads when checking a specific file. When running it with fatqc I get over 90% duplications, when using fastp to calculate duplications I have only 47.3% duplicated reads, but when testing it with clumpify I get 0.213% duplication rate. I can't explain these differences, unless I'm using the wrong parameters for searching duplications. The commands for the two tools are as followed: fastp --in1 ${prefix}${file}_R1.fastq.gz...
dupedist value for AVITI machine
Great. Thank you Brian!
Found the problem in the parser; it will be fixed in 39.20.
OK! I'll look into it and get back to you.
hi Brian, thanks for the reply. It's the newest version - 39.19. I just deployed it by conda yesterday.
That's odd; looks to me like it should be working. What version are you using?
Parameter 'stats2' in rqcfilter2.sh is not recognized
Does filterbyname remove all reads in the fastq files that match a read name in the txt file? Are duplicates in the input files an issue?
Project website for BBMap appears to have disappeared
repair.sh swaps sequences in the output fastqs
java.lang.AssertionError with forcemerge=t
39.15 is out now.
reformat.sh producing output with 4 reads having the same id
Hmmm... looks like the issue here is that the intermediate fastq file is not really interleaved. Sam/bam files are not interleaved in the first place and should never be treated as such; the order of records is arbitrary and reformatting them as fastq (with samtools) doesn't change that. If you force Reformat to process a noninterleaved file as interleaved, strange things will happen; in this case, the records for that pair are nonadjacent and that causes the header replication, as per the sam specification...
:-O Blazing fast! Thanks a lot! We will run this typically with pair end data. Let's see how it goes. The flag to classify reads as duplicates only if UMIs match will be useful too. There are a lot of ways to decide if two different UMIs are the same, but this basic method (exact identity) will be more than enough for a quick classification of non-aligned data.
Issues when reading IDs with UMIs
All fixed; will be released in BBTools 39.15. Along with the new flags "umi" and "umisubs" so that you can require reads to only be classified as duplicates if their UMIs match.
Thanks for this report... JGI doesn't use UMI's so I haven't seen them before in Illumina reads. I've duplicated the error and am modifying my header parsers to support UMIs, so that will work correctly in the next release.
Issues when reading IDs with UMIs
reformat.sh producing output with 4 reads having the same id