Re: [Psisoft-devl] Interface Database?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Sat, 27 Nov 2004, Dan Bolser wrote:

>
>Hi,
>
>Previously we have sent round emails about bad PDB files for use in
>PSIMAP (bad for various reasons). 
>
>At the moment I have come across (again) the old problem of amino acids in
>the interface with multiple 'alternate location' forms (those residues or
>short stretches with some conformational shift in the crystal structure).
>If you are not careful you can end up counting such amino acids multiple
>times, and if you *are* careful which actual location should you use?
>
>Another potential problem is that some SCOP Folds are garbage folds, and
>should not be used in the way intended (evolutionary/structural units).
>For example some split chain domains have not been properly annotated by
>Alexy, and he is totally happy because the problem lies in the PDB file.
>
>Also some domains are only fragments of a particular fold, and actually
>assume a different conformations in the 'full length' version of the
>protein. However, this is not clear from SCOP.
>
>Finally, we don't have a proper assessment of psimap in terms of known
>(expertly assessed and experimentally verified) oligomerization state and
>(therefore) crystal contacts.
>
>I would like to make a database of interfaces which adds all this kind of
>information. 
>
>We can start by looking at all the papers relating to the determination of
>crystal contacts, and store all their hard work in determining (from the
>literature) the known oligomerization state for the proteins in our
>database (along with the citation for the state assigned).
>
>We also need a strategy to cluster and filter the interfaces so we can
>remove specific problem instances from the dataset (when a simpler
>version of the same interface can be used instead).
>
>We should remove all potential garbage from problematic SCOP entries.
>
>This is quite a big job, and not too exciting, but the resulting dataset
>(if we can organize so we can collectively maintain it) will be very
>useful to have.

I just checked, and I found that there are only around 3000 distinct
domain-domain pairs at the 40% sequence identity threshold. that means
that between us we could quite quickly check each one, and manually
classify the interfaces. 

Such a dataset would be totally unique (as far as I know) and invaluable
for future work.

Now we just have to decide what to call the database ;)

>
>For example we should each pick a different paper on the study of protein
>interfaces and copy out the list of pdb files used and the category
>assigned to the interface in the paper. Such a simple library (which is
>essential for detailed analysis of protein protein interaction) could be
>very useful for the whole community, as currently no such manually
>created, expertly defined, machine readable database exists.
>
>It would be very simple to put a web interface on a simple underlying
>mysql database.
>
>Please let me know what you think,
>
>All the best,
>Dan.
>
>
>
>
>-------------------------------------------------------
>SF email is sponsored by - The IT Product Guide
>Read honest & candid reviews on hundreds of IT Products from real users.
>Discover which products truly live up to the hype. Start reading now. 
>http://productguide.itmanagersjournal.com/
>_______________________________________________
>Psisoft-devl mailing list
>Psi...@li...
>https://lists.sourceforge.net/lists/listinfo/psisoft-devl
>