regex6.py (sourceforge.net)
Blast results parsing:
This script uses regular expressions to parse BLAST results in text format. By using regular expressions we extract the sequence identifier, the BLAST accession number and the species name, as a comma separated file which can be used to load these data to the BioSQL database.
Here is an example of a file before parsing:
gb|AEC74743.1| cytochrome oxidase subunit 1 [Hymenoptera sp. BOL... 226 2e-57
gb|ABU51883.1| cytochrome oxidase subunit I [Virilastacus rucapi... 226 2e-57
gb|AEC59043.1| cytochrome oxidase subunit 1 [Araneae sp. BOLD:AA... 226 3e-57
gb|AEC74632.1| cytochrome oxidase subunit 1 [Hymenoptera sp. BOL... 226 3e-57
And after parsing:
1 , AEC74743.1 , Hymenoptera sp.
1 , ABU51883.1 , Virilastacus rucapihuelensis
1 , AEC59043.1 , Araneae sp.
1 , AEC74632.1 , Hymenoptera sp.