Menu

#95 S. pombe protein complex terms (many)

None
pending
None
5
2018-03-23
2015-03-25
No

Hello,

For GO annotations and network representation (we're using esyN - www.esyn.org), we would find it very useful to have a set of PRO entries for S. pombe complexes. We maintain a list of GO cellular component complex terms and annotated genes that we hope is a good starting point.

May we have PRO terms/ids/etc. for the pombe versions of the complexes in this list (one per GO term)? For us, the ones with PMID references in the "source" column are higher priority than the rest.

ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Complexes/Complex_annotation

(with explanation ftp://ftp.ebi.ac.uk/pub/databases/pombase/pombe/Complexes/README)

If you need us to attach or email a copy of the file, or if you have any problems or questions, please let us know.

Thanks!
Midori (and the rest of the PomBase curators)

Discussion

  • Darren Natale

    Darren Natale - 2015-03-25

    Hi Midori,

    This can be done, even automated, but I see several questionable cases that lead me to think the list isn't complete in terms of complex components, and this is not even counting the lack of cardinality or modification information. For example, how likely is it that GO:0071014 "post-mRNA release spliceosomal complex" contains only a single type of protein? I see other complexes with similar issues. Technically speaking, we don't need to even indicate the subunits, and don't even need to indicate all the components, so the terms can be made, but probably you want more than just "protein complex X (S. pombe)" and probably want to avoid the misleading look of failing to indicate all.

    Please let me know how you'd like to proceed.

    Darren

     
  • Midori Harris

    Midori Harris - 2015-03-25

    Hi Darren,

    Thanks for commenting so promptly on this request, and for the excellent questions. I'll have to ask Val address your concerns, because the complex inventory file is something she's been curating sort-of-manually from PomBase's GO cellular component annotation set. I therefore don't have nearly as good a sense as she would of how complete the inventory is, and in particular, the balance between incompleteness that reflects incomplete curation versus incomplete knowledge available for us to curate.

    We may be able to provide a shorter list of complexes that are more nearly completely characterized, and that we most want to see represented in PRO (for example, I have a GO annotation I could hang on an S. pombe RFC complex ID from a paper I was just reading an hour ago).

    Midori

     
  • Valerie Wood

    Valerie Wood - 2015-03-25

    Sorry we should have only sent you the experimental ones for starters. That is anything in the file with a "PMID". So ignore anything with PomBase GO_REF:0000002/IEA GO_REF:0000024/ISO.

    All of the ISO data are manually curated, and I usually try to identify every subunit in a complex when I do ISO from SGD, however, for some things (like spliceosomal subcomplexes) I have not done this thoroughly if there is a splicing complex grouping term. I will annotate the other subunits of the 'spliceosomal disassembly complex' tomorrow.

    I just noticed also that the final 2 column headers are incorrect
    xref_dbname source
    should be
    source xref_dbname
    I thought that this was corrected so we need to check that the file was correctly updated. I am pretty certain that it wasn't as the new version of the file should have "|" separated PMIDs if there are multiple papers, and I don't detect any pipes in the file on our ftp site....

    Apologies....

    Val

     
  • Valerie Wood

    Valerie Wood - 2015-03-25

    The version of the file is correct, the PMID's are comma spearated, not pipe
    e.g. PMID:16079914,PMID:16079914
    It is just the headers that are incorrect. I will get this fixed.

    Val

     
  • Midori Harris

    Midori Harris - 2015-03-25

    I'm pretty sure the file headers are correct; they're just a bit cryptic. "Source" means the reference.

     
  • Valerie Wood

    Valerie Wood - 2015-03-26

    You are correct. Will clarify the column headers.
    Sorry for the confusion.
    Val

     
  • Darren Natale

    Darren Natale - 2015-03-26

    Val, I'm not sure I understood your message about which to ignore. On the one hand you say to ignore GO_REF:0000024/ISO, but then you say you manually verified these (by the way, I also see GO_REF:0000024 associated with ISM). If they are verified, should they not be included?

    Another question: how should I handle cases where, say, one component is IEA but all others have "good" codes? For example, hcr1 in GO:0070993 is IEA while others are experimental (note: I'm ignoring the IEA part of those that have multiple codes; not sure why there are things like 'IEA,IEA').

     
  • Darren Natale

    Darren Natale - 2015-03-26

    Some numbers: There are 446 GO complexes listed. Of these, if we ignore those that have any IEA or ISO component, we lose about one-fourth. If we ignore those do not have PMID for all components, we lose half.

     
    • Valerie Wood

      Valerie Wood - 2015-03-26

      Cross posted, my comments should address this one too

       
  • Valerie Wood

    Valerie Wood - 2015-03-26

    Hi Darren,

    Sorry I wasn't clear. I hope the following clears up some confusions.

    1. I had spotted the duplicated evidence codes and have already reported this.

    2. All of the ISO/ISM annotations are manually curated, but inferred from sequence similarity. There is likely no experimental data for these as yet, but so far for complexes in S. c where the members are conserved 1:1 in pombe the complexes have been identical composition.

    3. For some EXP described complexes we also have cardinality data, but we have not exported this. We can make this available to you too.

    4. There are only 112 IEAs, we will try to resolve these over the next couple of months, by manually annotating the ones which are split between experimental and IEA codes, and supressing some which are to 'generic' grouping complex or component terms

    5. What type of modification data do you include in the complex entries?

    There is no hurry for this, we just wanted to get this in motion so we could make annotations to complexes and create complex pages in PomBase.

    We can clean up this file over the next couple of months and take it from there.

    Best

    Val

     
  • Darren Natale

    Darren Natale - 2015-03-26

    Some comments on your list of comments:
    1) The duplicate codes don't bother me; already wrote a script to clean them.

    2) IMO anything manually verified is good to go. In PRO we do have complexes that are inferred by comparison with those in other organisms.

    3) Cardinality would be excellent!

    4) Not sure what you mean by "generic grouping complex or component terms." Of the 112 IEAs, only 33 are associated with complexes with otherwise 'better' evidences.

    5) We can include all kinds of modifications. For example, you might want to specify that a particular complex contains a phosphorylated form of a protein. See PR:000037300 for an example. It would be no problem to make the complexes first, then change the components to something more specific later, if you'd like.

    Consider it in motion! I won't make a further move on it until I get the word from you. My preference is to do all the eligible ones at once rather than something like "only those with PMIDs first, then ISOs later." However, if the need for a specific complex (or limited set of them) arises before the bulk are ready, we'll make them right away.

     
  • Valerie Wood

    Valerie Wood - 2015-03-26

    Re 4)

    Too general /not a ‘specific complex’ /not sure that they are a complex in pombe/
    or will be replaced by a more specific annotation in the term set below
    will filter
    GO:0000015 phosphopyruvate hydratase complex
    GO:0000148 1,3-beta-D-glucan synthase complex
    GO:0000159 protein phosphatase type 2A complex
    GO:0000786 nucleosome
    GO:0002178 palmitoyltransferase complex
    GO:0005891 voltage-gated calcium channel complex
    GO:0005952 cAMP-dependent protein kinase complex
    GO:0030118 clathrin coat
    GO:0030119 AP-type membrane coat adaptor complex
    GO:0030130 clathrin coat of trans-Golgi network vesicle
    GO:0030131 clathrin adaptor complex
    GO:0031515 tRNA (m1A) methyltransferase complex
    GO:0032300 mismatch repair complex
    GO:0032301 MutSalpha complex
    GO:0032302 MutSbeta complex
    GO:0033573 high-affinity iron permease complex
    GO:0034703 cation channel complex
    GO:0034704 calcium channel complex
    GO:0034707 chloride channel complex
    GO:0042765 GPI-anchor transamidase complex
    GO:0043527 tRNA methyltransferase complex
    GO:0071010 prespliceosome
    GO:1902562 H4 histone acetyltransferase complex
    GO:0097346 INO80-type complex
    GO:0043189 H4/H2A histone acetyltransferase complex
    GO:0031332 RNAi effector complex

    Will manually annotate (more specifically in some cases), or remove
    GO:0000930 gamma-tubulin complex
    GO:0000932 cytoplasmic mRNA processing body
    GO:0000346 transcription export complex
    GO:0000347 THO complex
    GO:0000444 MIS12/MIND type complex
    GO:0000930 gamma-tubulin complex
    GO:0000932 cytoplasmic mRNA processing body
    GO:0005643 nuclear pore
    GO:0005663 DNA replication factor C complex
    GO:0005665 DNA-directed RNA polymerase II, core complex
    GO:0005680 anaphase-promoting complex
    GO:0005684 U2-type spliceosomal complex
    GO:0005760 gamma DNA polymerase complex
    GO:0005852 eukaryotic translation initiation factor 3 complex
    GO:0005960 glycine cleavage complex
    GO:0008180 COP9 signalosome
    GO:0008280 cohesin core heterodimer
    GO:0008622 epsilon DNA polymerase complex
    GO:0016282 eukaryotic 43S preinitiation complex
    GO:0016442 RISC complex
    GO:0016591 DNA-directed RNA polymerase II, holoenzyme
    GO:0016602 CCAAT-binding factor complex
    GO:0022627 cytosolic small ribosomal subunit
    GO:0030119 AP-type membrane coat adaptor complex
    GO:0030688 preribosome, small subunit precursor
    GO:0030870 Mre11 complex
    GO:0031011 Ino80 complex
    GO:0031515 tRNA (m1A) methyltransferase complex
    GO:0032040 small-subunit processome
    GO:0033290 eukaryotic 48S preinitiation complex
    GO:0035267 NuA4 histone acetyltransferase complex
    GO:0043564 Ku70:Ku80 complex
    GO:0043599 nuclear DNA replication factor C complex
    GO:0070390 transcription export complex 2
    GO:0070993 translation preinitiation complex
    GO:0071004 U2-type prespliceosome
    GO:1990077 primosome complex
    GO:0000812 Swr1 complex
    GO:0042575 DNA polymerase complex

    unsure but one of the above will happen with these:
    GO:0009316 3-isopropylmalate dehydratase complex
    GO:0009331 glycerol-3-phosphate dehydrogenase complex
    GO:0009349 riboflavin synthase complex
    GO:0032777 Piccolo NuA4 histone acetyltransferase complex
    GO:0032797 SMN complex
    GO:0035339 SPOTS complex (I don’t know what this is)
    GO:0097361 CIA complex

     
  • Valerie Wood

    Valerie Wood - 2015-03-26

    Re 1) The duplicate codes don't bother me; already wrote a script to clean them.

    This should now be fixed in our next release

    Re 3) Cardinality

    What we have is along these lines
    heteromeric(2) Thakurta AG et al. (2004)
    So we do not say explicity which units these apply to, but if the info is there, and at least we know which publication it is in. We will extend this so we also know which complex it applied to if there are multiple complexes. A small oversight!

    Re 5) We will arrange for the modification data to be exported, this has been on the to do list for a while. It will be in this format:
    http://www.pombase.org/submit-data/modification-bulk-upload-file-format

    Give us a while to tidy the IEAs and we will send you a new version of the file.

    Thanks for you speed and attention, as always!

    Val

     
  • Darren Natale

    Darren Natale - 2018-03-22
    • status: open --> pending
    • assigned_to: Darren Natale
    • Group: -->
     
  • Darren Natale

    Darren Natale - 2018-03-22

    Going through old requests and closing those that were finished long ago or marking as "pending" those that await input from the requester. If your request is marked Pending, please advise as to whether the request has been satisfactorily addressed or is no longer needed.

     
  • Midori Harris

    Midori Harris - 2018-03-23

    I think we would still like to have the requested terms eventually, but it isn't urgent for us. "Pending" status fits our situation just fine.

     

Log in to post a comment.