From: Dan B. <dm...@mr...> - 2004-11-27 23:03:48
|
On Sat, 27 Nov 2004, Dan Bolser wrote: > >Hi, > >Previously we have sent round emails about bad PDB files for use in >PSIMAP (bad for various reasons). > >At the moment I have come across (again) the old problem of amino acids in >the interface with multiple 'alternate location' forms (those residues or >short stretches with some conformational shift in the crystal structure). >If you are not careful you can end up counting such amino acids multiple >times, and if you *are* careful which actual location should you use? > >Another potential problem is that some SCOP Folds are garbage folds, and >should not be used in the way intended (evolutionary/structural units). >For example some split chain domains have not been properly annotated by >Alexy, and he is totally happy because the problem lies in the PDB file. > >Also some domains are only fragments of a particular fold, and actually >assume a different conformations in the 'full length' version of the >protein. However, this is not clear from SCOP. > >Finally, we don't have a proper assessment of psimap in terms of known >(expertly assessed and experimentally verified) oligomerization state and >(therefore) crystal contacts. > >I would like to make a database of interfaces which adds all this kind of >information. > >We can start by looking at all the papers relating to the determination of >crystal contacts, and store all their hard work in determining (from the >literature) the known oligomerization state for the proteins in our >database (along with the citation for the state assigned). > >We also need a strategy to cluster and filter the interfaces so we can >remove specific problem instances from the dataset (when a simpler >version of the same interface can be used instead). > >We should remove all potential garbage from problematic SCOP entries. > >This is quite a big job, and not too exciting, but the resulting dataset >(if we can organize so we can collectively maintain it) will be very >useful to have. I just checked, and I found that there are only around 3000 distinct domain-domain pairs at the 40% sequence identity threshold. that means that between us we could quite quickly check each one, and manually classify the interfaces. Such a dataset would be totally unique (as far as I know) and invaluable for future work. Now we just have to decide what to call the database ;) > >For example we should each pick a different paper on the study of protein >interfaces and copy out the list of pdb files used and the category >assigned to the interface in the paper. Such a simple library (which is >essential for detailed analysis of protein protein interaction) could be >very useful for the whole community, as currently no such manually >created, expertly defined, machine readable database exists. > >It would be very simple to put a web interface on a simple underlying >mysql database. > >Please let me know what you think, > >All the best, >Dan. > > > > >------------------------------------------------------- >SF email is sponsored by - The IT Product Guide >Read honest & candid reviews on hundreds of IT Products from real users. >Discover which products truly live up to the hype. Start reading now. >http://productguide.itmanagersjournal.com/ >_______________________________________________ >Psisoft-devl mailing list >Psi...@li... >https://lists.sourceforge.net/lists/listinfo/psisoft-devl > |