From: Dan B. <dm...@mr...> - 2004-11-27 19:00:46
|
Hi, Previously we have sent round emails about bad PDB files for use in PSIMAP (bad for various reasons). At the moment I have come across (again) the old problem of amino acids in the interface with multiple 'alternate location' forms (those residues or short stretches with some conformational shift in the crystal structure). If you are not careful you can end up counting such amino acids multiple times, and if you *are* careful which actual location should you use? Another potential problem is that some SCOP Folds are garbage folds, and should not be used in the way intended (evolutionary/structural units). For example some split chain domains have not been properly annotated by Alexy, and he is totally happy because the problem lies in the PDB file. Also some domains are only fragments of a particular fold, and actually assume a different conformations in the 'full length' version of the protein. However, this is not clear from SCOP. Finally, we don't have a proper assessment of psimap in terms of known (expertly assessed and experimentally verified) oligomerization state and (therefore) crystal contacts. I would like to make a database of interfaces which adds all this kind of information. We can start by looking at all the papers relating to the determination of crystal contacts, and store all their hard work in determining (from the literature) the known oligomerization state for the proteins in our database (along with the citation for the state assigned). We also need a strategy to cluster and filter the interfaces so we can remove specific problem instances from the dataset (when a simpler version of the same interface can be used instead). We should remove all potential garbage from problematic SCOP entries. This is quite a big job, and not too exciting, but the resulting dataset (if we can organize so we can collectively maintain it) will be very useful to have. For example we should each pick a different paper on the study of protein interfaces and copy out the list of pdb files used and the category assigned to the interface in the paper. Such a simple library (which is essential for detailed analysis of protein protein interaction) could be very useful for the whole community, as currently no such manually created, expertly defined, machine readable database exists. It would be very simple to put a web interface on a simple underlying mysql database. Please let me know what you think, All the best, Dan. |