I've looked into the homodimer conservation issue and its not really a bug but
definitely unexpected behaviour. The root of the problem is the hmmsearch
program which picks up different PFam hits for the complete homodimer than for
the single monomer. So, hmmsearch
- usually only identifies the first hit of a Pfam domain but not the second
- may find some additional (usually low-conservation) hits to the whole dimer
>>> m = PDBModel('1a59.pdb')
>>> m0 = m.takeChains(  )
>>> PDBDope( m ).addConservation()
>>> PDBDope( m0 ).addConservation()
>>> hits = m['cons_ent','hmmHits']
>>> hits0 = m0['cons_ent','hmmHits']
>>> for h in hits:
... print '%15s - %25r : %10r' % (h, hits[h], hits0.get(h, '') )
Selenoprotein_S - [[139, 325]] : [[139, 325]]
DUF1265 - [[143, 191]] : [[143, 191]]
Phos_pyr_kin - [[112, 269]] : [[112, 269]]
Oxidored_nitro - [[38, 363]] : [[38, 363]]
HGD-D - [[110, 510]] : ''
NDUFA12 - [[131, 210]] : [[131, 210]]
Citrate_synt - [[7, 364], [384, 741]] : [[7, 364]]
Orbi_VP7 - [[93, 362]] : [[93, 362]]
Peptidase_C15 - [[163, 311]] : [[163, 311]]
Aminotran_4 - [[80, 290]] : [[80, 290]]
CobD_Cbib - [[156, 350]] : [[156, 350]]
DUF1749 - [[372, 614]] : ''
'm' is the complete 1a59 model. m0 is only the first chain of it. The list gives
the Pfam hits identified for the full-length model (left of :) and for the
single chain model (right of :). The numbers are starting and ending position of
the hit. The first monomer ends at residue position 377.
The second monomer is obviously hardly covered and receives very low
conservation scores (see attached plot).
- For homodimers, conservation should be calculated on monomers
- The profile from the first chain should then simply be copied into the second
Perhaps one can alleviate the issue by playing with the hmmsearch thresholds but
I don't know the program well enough...
Dr. Raik Gruenberg