Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


ACP and friends

  • A summary of what's going on in mail exchanges...

    I mean compounds that are partly or entirely proteins. These participate in the reactions and are changed during this, so we need to have them in ChEBI. I list some examples below:
    3-oxoacyl-[acyl-carrier-protein]  (there are many more types of [ACP])
    protein /N/^6 -methyl-L-lysine
    oxidized ferredoxin
    reduced ferredoxin

    A couple of the reactions where they participate:
    (3R)-3-hydroxyacyl-[acyl-carrier-protein] + NADP(+) = 3-oxoacyl-[acyl-carrier-protein] + NADPH.

    (1) Protein N(6),N(6)-dimethyl-L-lysine + 2-oxoglutarate + O(2) = protein N(6)-methyl-L-lysine + succinate + formaldehyde + CO(2).
    (2) Protein N(6)-methyl-L-lysine + 2-oxoglutarate + O(2) = protein L-lysine + succinate + formaldehyde + CO(2).

    A malonyl-[acyl-carrier protein] + a biotinyl-[protein] = an acetyl-[acyl-carrier protein] + a carboxybiotinyl-[protein].

    Pyruvate + CoA + 2 oxidized ferredoxin = acetyl-CoA + CO(2) + 2 reduced ferredoxin + 2 H(+).

    NH(3) + 2 H(2)O + 6 oxidized ferredoxin = nitrite + 6 reduced ferredoxin + 7 H(+).

    We really need to be able to cope with these compounds, or we limit ourselves too much. Ferredoxins currently participate in 29 reactions, and ACP in 26. And these numbers will increase.

    We do have in ChEBI already a number of entries involving acyl-carrier protein. Please take a look at these entries:

    CHEBI:13534 acyl-carrier protein
    CHEBI:16018 acyl-[acyl-carrier protein]
    CHEBI:4349 decanoyl-[acyl-carrier protein]
    CHEBI:7725 octanoyl-[acyl-carrier protein]
    CHEBI:5697 palmitoyl-[acyl-carrier protein]
    CHEBI:50651 myristoyl-[acyl-carrier protein]
    CHEBI:17330 carboxyacetyl-[acyl-carrier protein]
    CHEBI:16759 lauroyl-[acyl-carrier protein]
    CHEBI:17093 acetyl-[acyl-carrier protein]

    All these have status CHECKED. There are another 28 which at the moment have status UNCHECKED.  Please take a look at the checked ones and tell me what extra data you need for these, then I will have a clearer idea of what I need to do in order to check the remainder.

    For protein /N/^6 -methyl-L-lysine  we have CHEBI:8555, which currently only has status OK.  If we can regard this as a modified L-lysine residue, then I can probably check this quickly.  Let me know.

    Oxidized and reduced ferredoxin both have entries in ChEBI with status CHECKED: 17908 and 17513 respectively.  Please take a look and tell me what extra information you need for your purposes.

    The next problem is the compounds still missing and formulas for a number of the below compounds. Even if the compounds are checked themselves it does not help in Rhea if they have no formulas, as they cannot be balanced (and become public) without.

    Here is a preliminary list of ACP compounds needed in Rhea.
    malonyl-[acyl-carrier protein]; missing
    trans-dec-2-enoyl-[acyl-carrier protein]; chebi:10724
    cis-dec-3-enoyl-[acyl-carrier-protein];  missing
    oct-2-enoyl-[acyl-carrier-protein]; missing
    stearoyl-[acyl-carrier-protein]; chebi:16276
    acetyl-[acyl-carrier protein] chebi:17093, This compound even has a structure :-)
    hexadec-2-enoyl-[acyl-carrier-protein] ;  missing

    Perhaps this is nonsense, but: what about abstracting the protein part, and use malolyl group (CHEBI:25134), stearoyl group (CHEBI:26753), etc.?
    The reactions in EC classification are specifically using ACP, but we could use the class of reactions instead.
    For example, EC

       A malonyl-[acyl-carrier protein] + a biotinyl-[protein] = an acetyl-[acyl-carrier protein] + a carboxybiotinyl-[protein].

    use a Rhea reaction like:

       malonyl group + biotinyl group = acetyl group + carboxybiotinyl group

    , which would be a class of reactions, not specifying where the groups are attached.

    The problem is that it is too general. It is not any malonyl group that can be used, but only a malonyl attached to ACP. We have to keep this information if not, the reactions are only approximative.


    More on the ACP's

    (3R)-3-hydroxyoctanoyl-[acyl-carrier protein] chebi:17463. Please reinstate as it is used for Rhea 11564 EC
    malonyl-[acyl-carrier protein]  is identical to carboxyacetyl-[acyl-carrier protein] chebi:17330
    (3R)-3-hydroxyacyl-[acyl-carrier protein] chebi:17718. Please reinstate as it is used for Rhea 17397 EC
    (3R)-3-hydroxypalmitoyl-[acyl-carrier protein] chebi:18017 Please reinstate
    (/Z/)-hexadec-11-enoyl-[acyl-carrier-protein] please create
    (/Z/)-3-oxooctadec-13-enoyl-[acyl-carrier-protein] please create, both used in Rhea:14565 EC

    reduced adrenodoxin chebi:16906, please check
    oxidized adrenodoxin chebi:16341, idem

    There are many more protein compounds. We will also have to deal with these. I will try to link some more compounds to ChEBI. It may be possible via KEGG.

    We talked about the protein issue at our recent EBI team meeting.  We are
    concerned at what Paula nicely termed 'scope creep' - the extension of
    ChEBI's scope outside of which it was originally intended for:  small
    molecules (i.e. excluding proteins and other macromolecules such as
    nucleic acids). We really do not want to extend outside our boundaries and
    we decided that we needed to devise an alternative way of handling these
    protein compounds for Rhea. So please bear with us for the meantime while
    we consider this further.

    Sorry if this holds you up.

    Just on another note - do you guys know any resources which deal with these type of structures?

    • I checked the approach of other reaction databases out there re. ACP:

      · Reactome: ACPs derivatives are considered as complex entities, with the protein and small-molecule parts linked to different databases (ex: http://www.reactome.org/cgi-bin/eventbrowser?DB=gk_current&ID=76173 ; in this case, acetoacetate is linked to acetoacetic acid, not to acetoacetyl group)

      · KEGG: the ACP part of the molecule is summarised as -S-R in the structure (http://www.genome.jp/dbget-bin/www_bget?compound:C01271)

      · MACiE: using KEGG's compounds, the same S-R-approach is used (ex: http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/MACiE/getPage.pl?id=M0136\)

      · MetaCyc: the protein part appears as [a holo-[acyl-carrier protein]] (ex: http://biocyc.org/META/NEW-IMAGE?type=REACTION&object=3-OXOACYL-ACP-REDUCT-RXN\)

      · PathwayCommons: acyl-ACP from H. sapiens (http://www.pathwaycommons.org/pc/record2.do?id=555821) has a formula-like synonym: CH3(CH2)xCO-S-ACP; holo-ACP (http://www.pathwaycommons.org/pc/record2.do?id=555825) too: ACP-SH.

      · ExplorEnz: no structures there, just plain text reactions.

      Rhea could handle compounds imported from sources different to ChEBI, but they would need to be somehow abstract. For example, UniProt returns over 15000 different ACPs (http://www.uniprot.org/uniprot/?query=name%3A%22acyl+carrier+protein%22&sort=score), each one from different species, tissues or organelles. Rhea is agnostic. So I would rule out UniProt IDs.

      Another possibility is to create every needed protein in ChEBI (but that's not really ChEBI's scope...) and then either apply a R-formula or allow unchecked stoichiometry for protein-containing reactions.

      • pmatos

        So Reactome have Homosapiens and thats how they get away with the species.

        Rafael perhaps you could ask the PDB guys if they have anything like that. I remember one of the PDB guys wanted access to IntEnz.