Victor Norman - 2010-11-11

All,

Thanks for your library.  A very nice piece of coding.

I am a professor of computer science, and NOT a chemist, so alot of this stuff is mysterious to me.  But, I'm trying to use your library to parse the pdb file for hemolysin 7AHL (http://www.rcsb.org/pdb/explore/explore.do?pdbId=7AHL).

When I use FileIO.py's test_module() to parse the file, I get a bunch of lines like this:

invalid amino acid atom name H1
invalid amino acid atom name H2
invalid amino acid atom name H3
invalid amino acid atom name HD21
invalid amino acid atom name HD22
invalid amino acid atom name HE21
invalid amino acid atom name HE22
invalid amino acid atom name HH11
invalid amino acid atom name HH12
invalid amino acid atom name HH21
invalid amino acid atom name HH22
invalid amino acid atom name HZ1
invalid amino acid atom name HZ2
invalid amino acid atom name HZ3

This is just a summary of the lines of output.  I actually get 1345 lines out - all duplicates of the lines above.

Can you tell me if these output lines are "important"?  I.e., do these lines indicate that I am not getting a full representation of the protein in my data structure?  If so, how can this be fixed?

Thanks.

Vic Norman
Calvin College
Assistant Professor of Computer Science