ergatis / Bugs / #81 bsml2fasta fails to match FASTA header

#81 bsml2fasta fails to match FASTA header

Status: open

Owner: Joshua Orvis

Labels: Component problems (50)

Priority: 5

Updated: 2009-04-07

Created: 2009-04-07

Creator: Chris Hemmerich

Private: No

When using a fasta header with spaces, the bsml2fasta component attempts to match (from the BSML) either the id attribute of the Sequence tag ( e.g. "g1_a"), or the identifier attribute of the Seq-data-import tag (e.g. "g1|a" ) to the description line of the fasta file. Neither of these match, because both have all information after the first space truncated, while the fasta file retains that information (e.g. "g1|a stuff"). The entire definition line is available in an Attribute element under the Sequence element.

This was found running bsml2fasta.prediction_CDS in the prokaryotic annotation pipeline.

Discussion

Nobody/Anonymous - 2009-04-07

Oops, I jumped the gun on this. I got the same error on a fasta file with no spaces in the description (BSML::Indexer::Fasta truncates the description at the first space). I tracked it down to a bug in the parse_multi_fasta subroutine in bsml2fasta.pl. The script builds a $sequencelookup hash based on the scrub fasta_id, and then parse_multi_fasta tries to access it with the unscrubbed version. The fix was relatively simple:

- if(exists $e{$h{$sequencelookup->{$specified_header}->{'fasta_id'}}}){
+ if(exists $e{$h{$specified_header}}){

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bsml2fasta fails to match FASTA header

Group

Searches

Help

#81 bsml2fasta fails to match FASTA header

Discussion