I'm trying to create a reference index with the following command (openjdk version "1.8.0_232"):
bbmap.sh ref=Lancer.fixed.fasta
And I receive the following error:
java -ea -Xmx414496m -Xms414496m -cp /home/brook/src/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 ref=Lancer.fixed.fasta
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, ref=Lancer.fixed.fasta]
Version 38.70
No output file.
Writing reference.
Executing dna.FastaToChromArrays2 [Lancer.fixed.fasta, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=true, minscaf=1, midpad=300, startpad=8000, stoppad=8000, nodisk=false]
Set genScaffoldInfo=true
Set genome to 1
Loaded Reference: 0.009 seconds.
Exception in thread "main" java.lang.AssertionError: 1, 0
The reference file appears to be empty.
at align2.BBIndex.loadIndex(BBIndex.java:96)
at align2.BBMap.loadIndex(BBMap.java:372)
at align2.BBMap.main(BBMap.java:33)
I am able to run the same command with the phix reference fasta included with BBmap, but I cannot spot any relevant differences between the example fasta and my fasta. The fasta I am using contains 22 (wheat) chromosomes with the names (chr1A, chr1B, chr1D... etc). Any help would be appreciated.
Hi - I'm really sorry about that, but BBMap does not support Wheat as it has a chromosome longer than 500Mbp, the current limit. It's the only organism I'm aware of that has this issue. I'll try to clarify the error message. You could break the chromosome at the centromere, but you're probably better off using a different aligner.
I am running into the same issue. The chromosome of wheat go up to 830 Mp. Would it be possible to incease the limit with a command?
A lot of plant genomes have chromosomes of that size, see https://en.wikipedia.org/wiki/List_of_sequenced_plant_genomes and https://www.researchgate.net/publication/321833590_List_of_plant_genome_sequenced_with_genome_size_and_chromosome_numbers
Hi!
Is this still the current chromosome size limit? I'm trying to use BBSplit to do a dual RNA seq analysis (human contamination) and the indexing part fails with a “The reference file appears to be empty” error too:
Some chromosomes of other genome I'm using exceed the 1,500 Mbp mark. Should I use a different aligner then?
Thanks.
Last edit: Samuel Ruiz-Pérez 2022-05-17