[Psisoft-devl] Interface Database?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,

Previously we have sent round emails about bad PDB files for use in
PSIMAP (bad for various reasons). 

At the moment I have come across (again) the old problem of amino acids in
the interface with multiple 'alternate location' forms (those residues or
short stretches with some conformational shift in the crystal structure).
If you are not careful you can end up counting such amino acids multiple
times, and if you *are* careful which actual location should you use?

Another potential problem is that some SCOP Folds are garbage folds, and
should not be used in the way intended (evolutionary/structural units).
For example some split chain domains have not been properly annotated by
Alexy, and he is totally happy because the problem lies in the PDB file.

Also some domains are only fragments of a particular fold, and actually
assume a different conformations in the 'full length' version of the
protein. However, this is not clear from SCOP.

Finally, we don't have a proper assessment of psimap in terms of known
(expertly assessed and experimentally verified) oligomerization state and
(therefore) crystal contacts.

I would like to make a database of interfaces which adds all this kind of
information. 

We can start by looking at all the papers relating to the determination of
crystal contacts, and store all their hard work in determining (from the
literature) the known oligomerization state for the proteins in our
database (along with the citation for the state assigned).

We also need a strategy to cluster and filter the interfaces so we can
remove specific problem instances from the dataset (when a simpler
version of the same interface can be used instead).

We should remove all potential garbage from problematic SCOP entries.

This is quite a big job, and not too exciting, but the resulting dataset
(if we can organize so we can collectively maintain it) will be very
useful to have.

For example we should each pick a different paper on the study of protein
interfaces and copy out the list of pdb files used and the category
assigned to the interface in the paper. Such a simple library (which is
essential for detailed analysis of protein protein interaction) could be
very useful for the whole community, as currently no such manually
created, expertly defined, machine readable database exists.

It would be very simple to put a web interface on a simple underlying
mysql database.

Please let me know what you think,

All the best,
Dan.