From: Helen P. <par...@eb...> - 2007-01-22 13:15:03
|
Dear all, here are the collated comments as promised. These are mostly minor excepting 4 and 5. No-one has objected to the suggestion in 4, though 3 people have expressed a preference, please see our comments in response to point 5. I think the next step could be a phone call to discuss these, if we need this, I suggest Thursday 25th 4pm GMT, please could you indicate your availibility, cheers Helen 1. Clarification of date format in response to Joe White. YYYY-MM-D with time optional is correct. 3. Suggestion to modify the format of the mapping file/and or provide some notes " In the mapping file it might be helpful to have some description of the MAGEv1.1 items, ie class.association.attribute. In some cases we follow several associations. Unless you know MAGE fairly well, it might be difficult to understand what the mapped values refer to. In all cases, the value starts with a MAGE class, and ends with some MAGE attribute. There will be 0 of more associations in between. 3) In the mapping file, the [...] tend to look like separate columns. " This can be modified if needed. We think the target audience are MAGE literate anyway so it's a minor addition of some explanatory notes. 4. Suggestion from Tim to indicate a source database for protocol or ad accessions One possible alteration which has come up is a means of indicating a source database for protocol or array design accessions, where such information is reused between experiments. I'd like to propose that we allow the Protocol REF and Array Design REF columns to refer to the IDF Term Source Name using either square brackets or parentheses, e.g.: Protocol REF [ArrayExpress] Array Design REF [GEO] where ArrayExpress or GEO are explicitly listed in the IDF as Term Sources. I'd also suggest that in the absence of such tags it is assumed that the identifier is local to the context in which the SDRF is used, e.g. assuming ArrayExpress accessions for submissions to ArrayExpress. Note that there is scope for using the Protocol REF:namespace syntax to add an external namespace to identifiers in the SDRF, but that doesn't really work for accessions which don't have namespaces (for good or ill). OR to allow Protocol REF and Array Design REF to be associated with Term Source REF columns. It's more flexible and only a minor addition to the specification. Michael prefers this option, so do Helen and Tim 5. Set of comments from Michael, my comments in line the additional set of fields for the IDF are to specify a set of files that carry additional annotation information on the Material fields of the SRDF. the use case is perhaps an additional MAGE-ML file whose BioMaterial identifier matches up to the identifier of one of the source, sample or extract names (including the specified or default <authority field) and simply contains <OntologyEntry elements with no reference elements (those are in the SRDF file). the other example type of file might be a CDISC SEND formatted file. i would propose that the IDF be able to include along with the SDRF file, an 'Annotation File' row and an 'Annotation File Type' ("MAGE-ML" or "CDISC-SEND Clinical Pathology") row which could have multiple entries. ------------------------------------------------------------------------- **This is a major extension of the core proposal. Tim and Helen have reservations: 5.1. About modifying the core proposal at this point - we are on a tight deadline for our EBI services review and the discussion required might compromise our implementation being ready on time. 5.2. Mix and matching MAGE and or other formats - MAGE is not human readable and should not be mixed and matched with MAGE-TAB in our view. Either it's MAGE-TAB or MAGE-ML not a mix. Anyone's local implemenatation is of course up to them, but this is a representation format not an implementation. One could use a Comment[CDISC file] for this in the IDF for example if support is needed right away. 5.3. CDISC is an interesting case, this should be investigated and maybe a MAGE-TAB 1.1 could reference such a format. There will probably be other such interesting cases We (AE) don't want to commit to supporting such formats at this point without a group discusson and some examples should be carefully examined. We are not happy to add this to the spec, especially as it's already published with no mention of this. Is there an available parser API? It would be good to initiate a discussion with CDISC as well. So we're not ruling this out, but we would prefer not for this version. In fact it might be better discussed as MAGE2 and MAGE2's TAB representation, where we might consider such extensions. 6. Michael's general editing comments, all OK in principle. =============== Section 1.2 (ADF) If the investigation uses arrays for which a description has been previously provided, cross-references to entries in a public repository (e.g., an ArrayExpress accession number) can be included instead of explicit array descriptions. becomes: If the investigation uses arrays for which a description has been previously provided, cross-references to entries in a public repository (e.g., an ArrayExpress accession number), such as a standard commercial array, can be included instead of explicit array descriptions. === paragraph beginning with "The main weight..." in the e.g. it looks like 'row' should be 'raw' === Section 1.2 ('The degree of nodes') One example has the source nodes having 10 outgoing nodes, so it and reference nodes both might have a large number plus the usual max outside of source and reference nodes is probably more like 4 than 3. === Many of the figures (1,4,7,20.b,22,etc) don't have all the rows and columns with clear separator lines. ==== 2.3.6 the example is confusing to me, it is the variation in ChIP-chip which probably is better as one diagram to show the gap, i think a better example is when there are a lot of annotation columns where breaking it up clearly on a sample or extract as the last column and beginning with that same column in the second file might be less confusing. === 2.3.7 last sentence says "Alternatively...", shouldn't that be "In addition..."? === 2.4 1st para 2nd sentence says "abundance", wouldn't "presence" be better? === 2.3.5 and Notes on Table 7 "gaps (or the - symbol)" might be clearer "gaps (or the - symbol) separated by tabs" === 2.4 3rd para 2nd sentence says 'Composite Elements and Reporters' and figure in 2.5 has column Composite Element Name before Map2Reporter. stylistically (and for clarity) it might be more consistent to always have a Reporter mention before a Composite Element mention (sorry, my english master degree speaking out) === 3.1, 5th bullet if annotation files are added, mention annotation files here in addition === new section 3.1.3 added to mention annotation files === Figure 1 and 24, if annotation files added, adding to figures and example file === 3.1.5 add at end that "this allows specifying <authority in these cases". some of the earlier sections in 3.1 might do to mention how different <authority modifiers to the <name field come in. === 3.2.3 end of first sentence add "and one or more ArrayDesigns" === 3.3.1 3rd para, 5th sentence(?) "umber" should be "number" === 3.3.2 para after figure 26, it is also possible in distinguishing type that when there are two different types at the same level, to resolve this just means moving the node representation to a higher level where there is already a matching type. === table 7 this is a bit confusing, might be better to have a table of the top, non-modifying columns, then the set of columns that modify the top level columns, then the set of columns that modify that set and so on. -- Helen Parkinson, PhD Curation Coordinator Microarray Informatics Team, EBI EBI 01223 494672 Skype: helen.parkinson.ebi |