Aaron Gussman - 2009-06-22

As of revision 6012, the script will remove strain information from the organism name unless that would remove the entire organism name or the entire species.

There is a special regex to handle peculiarities with how Influenza H#N# strain information is stored. From genbank file CY003441:

SOURCE Influenza A virus (A/New York/425/1999(H3N2))
ORGANISM Influenza A virus (A/New York/425/1999(H3N2))

/organism="Influenza A virus (A/New York/425/1999(H3N2))"
/mol_type="genomic RNA"
/strain="A/New York/425/1999"
/serotype="H3N2"

The script explicitly looks for the presence of H\dN\d and handles it accordingly.

Upon further consideration, it might be better to parse out /serotype and check for its presence.