Hi
following the UK jamboree annotations I would like to request that the following terms are created:
GO:0032993 protein-DNA complex
> NTR: protein-regulatory region DNA complex
> NTR: protein-promoter complex (or protein-DNA complex at the promoter, or promoter specific protein-DNA complex)
> GO:0034206 enhanceosome
with additional synonym: protein-enhancer complex
I am not convinced that we can bring GO:0005667 transcription factor complex into this ontology, as the definition suggests the complex exists without binding to DNA.
Thanks
Ruth
Ruth,
I'd like to have some input from others, on names, definitions, and utility of the proposed terms, before we proceed.
Karen,
I'm assigning this to you mainly to get the sort of insight I'm after. Maybe also ask Eurie, since "regulatory region" could refer to regulating processes other than transcription? If you think any terms should be added, and want me to do the edits, let me know.
thanks,
m
OK Midori, I'll take a look
-Karen
Regarding non-transcription based regulatory DNA regions, there are regulatory regions flanking the mating type loci that are proposed to regulate recombination at the mating locus. There are also recombination hotspots that can be considered regulatory regions. So 'protein-regulatory region DNA complex' could be used for broader things - is this your intent?
I also have a bunch of questions while looking at the structure.
For 'protein-promoter complex' term, are you suggesting that it be a child of the 'protein-regulatory region' term? And how do you expect general transcription factors, such as TFIID, TFIIH, RNA polymerase complexes, etc. to fit into this structure? I just ask because I see that the different DNA polymerase complexes are in the same branch.
Also how would this correspond to the equivalent MF terms? There are 'promoter binding' (GO:0010843) and 'recombination hotspot binding' (GO:0010844) terms, just to name a few. Are these concepts better captured as MF terms instead of CC terms?
Hi Eurie and Karen
I hadn't considered a protein-DNA complex involved in regulation of recombination, but I think it would be good to have the definition encompass this and also the appropriate child term.
At present the TFIID, TFIIH, RNA polymerase complexes are all described as protein complexes which 'act at a promoter' or 'associates with DNA'. These terms do not have any protein-DNA complex parents and I haven't annotated enough genes here to be confident that these complexes always form in association with DNA. If these complexes always are bound to DNA and do not preform and then attach to the DNA then they could be included as children of these new component terms.
These new terms address the experimental evidence that we come across. For example a ChIP experiment may show a protein in a complex with DNA (and more specifically a promoter) but this does not mean it binds to DNA therefore the function term 'promoter binding' cannot be used. However I agree that the function term 'promoter binding' could in theory be a child of the component term protein-promoter complex, assuming that only proteins would be annotated as 'promoter binding'. If this cross ontology relationship was allowed.
Ruth
Hi,
We annotate a lot of ChIP experiments for S. cerevisiae. Based on what
Eurie and I have seen, we don't think that we would want to define
complexes based on ChIP results. While it is true that protein X that
was studied may be localized to the promoter (or the region of
interest), I don't think that you can conclude that protein X is in a
complex with the promoter DNA. It, or the complex that it is part of,
may be recruited to the promoter, or other DNA, via protein-protein
interactions with some other complex that is already localized to the
DNA. When we come across these types of experiments, we generally
annotate to "nuclear chromatin" since all you can really say is that
protein X was found in association with a region of chromatin.
I also wanted to bring up the outcome of a previous discussion on
somewhat similar terms. I believe it was prior to the 2007 Jesus
College meeting, Karen Eilbeck and I had an extended discussion on the
idea of making terms for things like "promoter region", "5'-region",
and similar to indicate specific regions of a transcription unit in a
way that is generic for the parts that occur in all/most transcription
units regardless of which gene it is. The outcome at that time was
that it would be better not instantiate GO terms that duplicate SO
terms, but instead to use column 16 and a SO ID for the DNA region of
interest.
Thus, I think that if you feel the need to give more information about
which specific region of chromatin protein X is found at, I would
suggest using column 16 in conjunction with an annotation to "nuclear
chromatin", or "chromatin" if that is more appropriate for a specific
experiment.
-Karen
Hi Karen
I am pleased to hear that you have annotated alot of ChIP expts, could you provide an example of a paper you have annotated so that I can see what annotations you do get from these experiments.
I am attaching a review pdf which I think reputes the idea that these experiments do not provide evidence that protein X is in a complex with promoter DNA. There are a great many experiments being undertaken to address the probable binding sites of transcription factors and you consider these all worthless? I agree that I would not want to annotate every protein identified by ChIP analysis to the promoter and would expect appropriate controls and protein characteristics to inform this annotation decision. A well known transcription factor or histone deacetylase identified by ChIP analysis as located in the promoter region is likely to be located here, isn't it?
The proteins within a 'complex' rather than binding to a complex are determined by the definition of the complex. In the attached article in Figure 1 there is a protein DNA complex which includes a transcription factor complex and a histone acetylase complex. This does not mean that we can only annotate to one or other of these complexes, in theory we could annotate to a new term 'transcription factor with histone acetylase complex'. The terms I have suggested will enable the information being generated by methods such as ChIP analysis to be captured by GO through curator judgement and interpretation of author intent.
With respect to your comment about using SO and Column 16 I think this highlights an area of GO which really has not been discussed fully by curators and annotator and users!! I agree with the use of column 16 to improve the specificity of annotations, however I would suggest that more specfic terms than 'nuclear chromatin'' should be provided by GO to improve the effectiveness of GO.
Taking this logic backwards would we then say all enzymes have 'enzyme activity' and then just add the EC number to column 16?
I guess I just feel that I am on the outside of a discussion about improving the specificity of GO by reducing the specficity of the actual GO terms and increasing the specificity of the ontologies included in column 16, without feeling like I have any idea how specific/general GO terms will be. Will enhancersome be deleted and replaced with nuclear chromatin plus SO:0000165 enhancer?
Ruth
Figure 1 from PMID: 19668247
sorry the full paper was too big to upload, have uploaded Figure 1 from PMID: 19668247
Ruth
Hi Ruth,
To me, this review is consistent with the ChIP papers I have read
previously. Thus, I still do not feel that these types of ChIP
experiments from genomic DNA in vivo allow you to draw any conclusions
about what is in a complex with what. They are very indirect, and
generally rely on chemical cross-linking to bring proteins down with
the DNA. Figure 1 appears to be a model. While the results of many
ChIP experiments are consistent with this type of model, and ChIP
experiments are used to computationally derive likely consensus
sequences of known, or presumed, DNA binding transcription factors, in
vivo genomic ChIP experiments do not provide evidence that the
putative DNA binding factor bound the DNA directly.
At SGD, we feel that these experiments allow you to conclude that
protein x was found localized to a particular region of "chromatin".
In cerevisiae, we feel comfortable using the more specific "nuclear
chromatin". That might be fine for transcription factors in mammalian
cells also, assuming transcription does not occur during cellular
division when the nuclear envelope breaks down, in constrast to
cerevisiae where this nuclear envelope breakdown does not occur even
during cell division.
The suggestion to use SO IDs in column 16 was discussed at the January
2007 GO meeting. Clearly this was before column 16 was actually
implemented, but also clearly, we also agreed not to create terms for
things which were identical to SO terms, e.g. "promoter".
For an example of annotations at SGD from ChIP, see the GO annotations
page for FHL1.
http://www.yeastgenome.org/cgi-bin/GO/goAnnotation.pl?dbid=S000006308
Two papers are cited, if I remember correctly, I added the one from
Martin et al. 2005 and reviewed and confirmed the one from Rudra et
al. 2005. The Martin et al. paper was also used to make an annotation
to this GO term for the CRF1 gene.
You can also look at the GO term page for "nuclear chromatin". There
are a number of papers with IDA evidence used to make annotations to
this term. Some of these may also be based on ChIP experiments.
-Karen
Hi Ruth,
Looks like the consensus is not to add these terms, therefore we're closing this item.
Feel free to comment if you feel strongly about re-opening it.
Thanks,
J, B, P