From: Helen P. <par...@eb...> - 2007-01-22 13:15:03
|
Dear all, here are the collated comments as promised. These are mostly minor excepting 4 and 5. No-one has objected to the suggestion in 4, though 3 people have expressed a preference, please see our comments in response to point 5. I think the next step could be a phone call to discuss these, if we need this, I suggest Thursday 25th 4pm GMT, please could you indicate your availibility, cheers Helen 1. Clarification of date format in response to Joe White. YYYY-MM-D with time optional is correct. 3. Suggestion to modify the format of the mapping file/and or provide some notes " In the mapping file it might be helpful to have some description of the MAGEv1.1 items, ie class.association.attribute. In some cases we follow several associations. Unless you know MAGE fairly well, it might be difficult to understand what the mapped values refer to. In all cases, the value starts with a MAGE class, and ends with some MAGE attribute. There will be 0 of more associations in between. 3) In the mapping file, the [...] tend to look like separate columns. " This can be modified if needed. We think the target audience are MAGE literate anyway so it's a minor addition of some explanatory notes. 4. Suggestion from Tim to indicate a source database for protocol or ad accessions One possible alteration which has come up is a means of indicating a source database for protocol or array design accessions, where such information is reused between experiments. I'd like to propose that we allow the Protocol REF and Array Design REF columns to refer to the IDF Term Source Name using either square brackets or parentheses, e.g.: Protocol REF [ArrayExpress] Array Design REF [GEO] where ArrayExpress or GEO are explicitly listed in the IDF as Term Sources. I'd also suggest that in the absence of such tags it is assumed that the identifier is local to the context in which the SDRF is used, e.g. assuming ArrayExpress accessions for submissions to ArrayExpress. Note that there is scope for using the Protocol REF:namespace syntax to add an external namespace to identifiers in the SDRF, but that doesn't really work for accessions which don't have namespaces (for good or ill). OR to allow Protocol REF and Array Design REF to be associated with Term Source REF columns. It's more flexible and only a minor addition to the specification. Michael prefers this option, so do Helen and Tim 5. Set of comments from Michael, my comments in line the additional set of fields for the IDF are to specify a set of files that carry additional annotation information on the Material fields of the SRDF. the use case is perhaps an additional MAGE-ML file whose BioMaterial identifier matches up to the identifier of one of the source, sample or extract names (including the specified or default <authority field) and simply contains <OntologyEntry elements with no reference elements (those are in the SRDF file). the other example type of file might be a CDISC SEND formatted file. i would propose that the IDF be able to include along with the SDRF file, an 'Annotation File' row and an 'Annotation File Type' ("MAGE-ML" or "CDISC-SEND Clinical Pathology") row which could have multiple entries. ------------------------------------------------------------------------- **This is a major extension of the core proposal. Tim and Helen have reservations: 5.1. About modifying the core proposal at this point - we are on a tight deadline for our EBI services review and the discussion required might compromise our implementation being ready on time. 5.2. Mix and matching MAGE and or other formats - MAGE is not human readable and should not be mixed and matched with MAGE-TAB in our view. Either it's MAGE-TAB or MAGE-ML not a mix. Anyone's local implemenatation is of course up to them, but this is a representation format not an implementation. One could use a Comment[CDISC file] for this in the IDF for example if support is needed right away. 5.3. CDISC is an interesting case, this should be investigated and maybe a MAGE-TAB 1.1 could reference such a format. There will probably be other such interesting cases We (AE) don't want to commit to supporting such formats at this point without a group discusson and some examples should be carefully examined. We are not happy to add this to the spec, especially as it's already published with no mention of this. Is there an available parser API? It would be good to initiate a discussion with CDISC as well. So we're not ruling this out, but we would prefer not for this version. In fact it might be better discussed as MAGE2 and MAGE2's TAB representation, where we might consider such extensions. 6. Michael's general editing comments, all OK in principle. =============== Section 1.2 (ADF) If the investigation uses arrays for which a description has been previously provided, cross-references to entries in a public repository (e.g., an ArrayExpress accession number) can be included instead of explicit array descriptions. becomes: If the investigation uses arrays for which a description has been previously provided, cross-references to entries in a public repository (e.g., an ArrayExpress accession number), such as a standard commercial array, can be included instead of explicit array descriptions. === paragraph beginning with "The main weight..." in the e.g. it looks like 'row' should be 'raw' === Section 1.2 ('The degree of nodes') One example has the source nodes having 10 outgoing nodes, so it and reference nodes both might have a large number plus the usual max outside of source and reference nodes is probably more like 4 than 3. === Many of the figures (1,4,7,20.b,22,etc) don't have all the rows and columns with clear separator lines. ==== 2.3.6 the example is confusing to me, it is the variation in ChIP-chip which probably is better as one diagram to show the gap, i think a better example is when there are a lot of annotation columns where breaking it up clearly on a sample or extract as the last column and beginning with that same column in the second file might be less confusing. === 2.3.7 last sentence says "Alternatively...", shouldn't that be "In addition..."? === 2.4 1st para 2nd sentence says "abundance", wouldn't "presence" be better? === 2.3.5 and Notes on Table 7 "gaps (or the - symbol)" might be clearer "gaps (or the - symbol) separated by tabs" === 2.4 3rd para 2nd sentence says 'Composite Elements and Reporters' and figure in 2.5 has column Composite Element Name before Map2Reporter. stylistically (and for clarity) it might be more consistent to always have a Reporter mention before a Composite Element mention (sorry, my english master degree speaking out) === 3.1, 5th bullet if annotation files are added, mention annotation files here in addition === new section 3.1.3 added to mention annotation files === Figure 1 and 24, if annotation files added, adding to figures and example file === 3.1.5 add at end that "this allows specifying <authority in these cases". some of the earlier sections in 3.1 might do to mention how different <authority modifiers to the <name field come in. === 3.2.3 end of first sentence add "and one or more ArrayDesigns" === 3.3.1 3rd para, 5th sentence(?) "umber" should be "number" === 3.3.2 para after figure 26, it is also possible in distinguishing type that when there are two different types at the same level, to resolve this just means moving the node representation to a higher level where there is already a matching type. === table 7 this is a bit confusing, might be better to have a table of the top, non-modifying columns, then the set of columns that modify the top level columns, then the set of columns that modify that set and so on. -- Helen Parkinson, PhD Curation Coordinator Microarray Informatics Team, EBI EBI 01223 494672 Skype: helen.parkinson.ebi |
From: Joe W. <jw...@ji...> - 2007-01-22 15:09:49
|
Hi Helen, Regarding item 4, I thought the Protocol REF elements actually DID refer to the IDF. So using that option makes sense to me. But I also agree with Tim's idea of allowing a Term Source column in the SDRF as an alternative--that's what we did with other ontology terms. For the AD, we need the Term Source column, since the AD isn't listed in the IDF. So I prefer the same option that you, Michael, and Tim do; however, the default should be that Protocol REF is listed in IDF and the default Term Source is ArrayExpress --since that's where these sheets are going anyway. Alternatively, the default Term Source could be listed in the IDF, if AE is not the destination repository. Cheers, Joe Helen Parkinson wrote: >Dear all, > >here are the collated comments as promised. These are mostly minor >excepting 4 and 5. No-one has objected to the suggestion in 4, though 3 >people have expressed a preference, please see our comments in response >to point 5. I think the next step could be a phone call to discuss >these, if we need this, I suggest Thursday 25th 4pm GMT, please could >you indicate your availibility, > >cheers > >Helen > > > > 1. Clarification of date format in response to Joe White. YYYY-MM-D >with time optional is correct. > > > 3. Suggestion to modify the format of the mapping file/and or provide >some notes > > " In the mapping file it might be helpful to have some description of >the MAGEv1.1 items, ie class.association.attribute. In some cases we >follow several associations. Unless you know MAGE fairly well, it might >be difficult to understand what the mapped values refer to. In all >cases, the value starts with a MAGE class, and ends with some MAGE >attribute. There will be 0 of more associations in between. 3) In the >mapping file, the [...] tend to look like separate columns. " > >This can be modified if needed. We think the target audience are MAGE >literate anyway so it's a minor addition of some explanatory notes. > > 4. Suggestion from Tim to indicate a source database for protocol or ad >accessions > > One possible alteration which has come up is a means of indicating a >source database for protocol or array design accessions, where such >information is reused between experiments. I'd like to propose that we >allow the Protocol REF and Array Design REF columns to refer to the IDF >Term Source Name using either square brackets or parentheses, e.g.: > > Protocol REF [ArrayExpress] > > Array Design REF [GEO] > > where ArrayExpress or GEO are explicitly listed in the IDF as Term >Sources. I'd also suggest that in the absence of such tags it is assumed >that the identifier is local to the context in which the SDRF is used, >e.g. assuming ArrayExpress accessions for submissions to ArrayExpress. > > Note that there is scope for using the Protocol REF:namespace syntax to >add an external namespace to identifiers in the SDRF, but that doesn't >really work for accessions which don't have namespaces (for good or ill). > > > OR > > to allow Protocol REF and Array Design REF to be associated with Term >Source REF columns. It's more flexible and only a minor addition to the >specification. > >Michael prefers this option, so do Helen and Tim > > 5. Set of comments from Michael, my comments in line > > the additional set of fields for the IDF are to specify a set of files > that carry additional annotation information on the Material fields of > the SRDF. the use case is perhaps an additional MAGE-ML file whose > BioMaterial identifier matches up to the identifier of one of the > source, sample or extract names (including the specified or default > <authority field) and simply contains <OntologyEntry elements with no > reference elements (those are in the SRDF file). the other example type > of file might be a CDISC SEND formatted file. > > i would propose that the IDF be able to include along with the SDRF > file, an 'Annotation File' row and an 'Annotation File Type' ("MAGE-ML" > or "CDISC-SEND Clinical Pathology") row which could have multiple > entries. > >------------------------------------------------------------------------- >**This is a major extension of the core proposal. Tim and Helen have >reservations: > >5.1. About modifying the core proposal at this point - we are on a tight >deadline for our EBI services review and the discussion required might >compromise our implementation being ready on time. > >5.2. Mix and matching MAGE and or other formats - MAGE is not human >readable and should not be mixed and matched with MAGE-TAB in our view. >Either it's MAGE-TAB or MAGE-ML not a mix. Anyone's local >implemenatation is of course up to them, but this is a representation >format not an implementation. One could use a Comment[CDISC file] for >this in the IDF for example if support is needed right away. > >5.3. CDISC is an interesting case, this should be investigated and maybe >a MAGE-TAB 1.1 could reference such a format. There will probably be >other such interesting cases We (AE) don't want to commit to supporting >such formats at this point without a group discusson and some examples >should be carefully examined. We are not happy to add this to the spec, >especially as it's already published with no mention of this. Is there >an available parser API? It would be good to initiate a discussion with >CDISC as well. So we're not ruling this out, but we would prefer not for >this version. In fact it might be better discussed as MAGE2 and MAGE2's >TAB representation, where we might consider such extensions. > > >6. Michael's general editing comments, all OK in principle. > =============== > Section 1.2 (ADF) > If the investigation uses arrays for which a description has > been previously provided, cross-references to entries in a public > repository (e.g., an ArrayExpress > accession number) can be included instead of explicit array > descriptions. > > becomes: > > If the investigation uses arrays for which a description has > been previously provided, cross-references to entries in a public > repository (e.g., an ArrayExpress > accession number), such as a standard commercial array, can be included > instead of explicit array descriptions. > === > paragraph beginning with "The main weight..." in the e.g. it looks like > 'row' should be 'raw' > === > Section 1.2 ('The degree of nodes') > One example has the source nodes having 10 outgoing nodes, so it and > reference nodes both might have a large number plus the usual max > outside of source and reference nodes is probably more like 4 than 3. > === > Many of the figures (1,4,7,20.b,22,etc) don't have all the rows and > columns with clear separator lines. > ==== > 2.3.6 > the example is confusing to me, it is the variation in ChIP-chip which > probably is better as one diagram to show the gap, i think a better > example is when there are a lot of annotation columns where breaking it > up clearly on a sample or extract as the last column and beginning with > that same column in the second file might be less confusing. > === > 2.3.7 > last sentence says "Alternatively...", shouldn't that be "In > addition..."? > === > 2.4 > 1st para 2nd sentence says "abundance", wouldn't "presence" be better? > === > 2.3.5 and Notes on Table 7 > "gaps (or the - symbol)" > might be clearer > "gaps (or the - symbol) separated by tabs" > === > 2.4 > 3rd para 2nd sentence says 'Composite Elements and Reporters' and figure > in 2.5 has column Composite Element Name before Map2Reporter. > > stylistically (and for clarity) it might be more consistent to always > have a Reporter mention before a Composite Element mention (sorry, my > english master degree speaking out) > === > 3.1, 5th bullet > if annotation files are added, mention annotation files here in addition > === > new section 3.1.3 added to mention annotation files > === > Figure 1 and 24, > if annotation files added, adding to figures and example file > === > 3.1.5 > add at end that "this allows specifying <authority in these cases". > some of the earlier sections in 3.1 might do to mention how different > <authority modifiers to the <name field come in. > === > 3.2.3 > end of first sentence add "and one or more ArrayDesigns" > === > 3.3.1 > 3rd para, 5th sentence(?) "umber" should be "number" > === > 3.3.2 > para after figure 26, it is also possible in distinguishing type that > when there are two different types at the same level, to resolve this > just means moving the node representation to a higher level where there > is already a matching type. > === > table 7 > this is a bit confusing, might be better to have a table of the top, > non-modifying columns, then the set of columns that modify the top level > columns, then the set of columns that modify that set and so on. > > > |
From: Helen P. <par...@eb...> - 2007-01-22 15:13:45
|
Hi In the interests of not being ArrayExpress centric I'd be interested to see what those who plan to consume/provide these sheets from multiple sources think. Junmin, Upenn people do you have an opinion? cheers Helen Joe White wrote: > Hi Helen, > > Regarding item 4, I thought the Protocol REF elements actually DID refer > to the IDF. So using that option makes sense to me. But I also agree > with Tim's idea of allowing a Term Source column in the SDRF as an > alternative--that's what we did with other ontology terms. For the AD, > we need the Term Source column, since the AD isn't listed in the IDF. > So I prefer the same option that you, Michael, and Tim do; however, the > default should be that Protocol REF is listed in IDF and the default > Term Source is ArrayExpress --since that's where these sheets are going > anyway. Alternatively, the default Term Source could be listed in the > IDF, if AE is not the destination repository. > > Cheers, > Joe > > > > Helen Parkinson wrote: > > >> Dear all, >> >> here are the collated comments as promised. These are mostly minor >> excepting 4 and 5. No-one has objected to the suggestion in 4, though 3 >> people have expressed a preference, please see our comments in response >> to point 5. I think the next step could be a phone call to discuss >> these, if we need this, I suggest Thursday 25th 4pm GMT, please could >> you indicate your availibility, >> >> cheers >> >> Helen >> >> >> >> 1. Clarification of date format in response to Joe White. YYYY-MM-D >> with time optional is correct. >> >> >> 3. Suggestion to modify the format of the mapping file/and or provide >> some notes >> >> " In the mapping file it might be helpful to have some description of >> the MAGEv1.1 items, ie class.association.attribute. In some cases we >> follow several associations. Unless you know MAGE fairly well, it might >> be difficult to understand what the mapped values refer to. In all >> cases, the value starts with a MAGE class, and ends with some MAGE >> attribute. There will be 0 of more associations in between. 3) In the >> mapping file, the [...] tend to look like separate columns. " >> >> This can be modified if needed. We think the target audience are MAGE >> literate anyway so it's a minor addition of some explanatory notes. >> >> 4. Suggestion from Tim to indicate a source database for protocol or ad >> accessions >> >> One possible alteration which has come up is a means of indicating a >> source database for protocol or array design accessions, where such >> information is reused between experiments. I'd like to propose that we >> allow the Protocol REF and Array Design REF columns to refer to the IDF >> Term Source Name using either square brackets or parentheses, e.g.: >> >> Protocol REF [ArrayExpress] >> >> Array Design REF [GEO] >> >> where ArrayExpress or GEO are explicitly listed in the IDF as Term >> Sources. I'd also suggest that in the absence of such tags it is assumed >> that the identifier is local to the context in which the SDRF is used, >> e.g. assuming ArrayExpress accessions for submissions to ArrayExpress. >> >> Note that there is scope for using the Protocol REF:namespace syntax to >> add an external namespace to identifiers in the SDRF, but that doesn't >> really work for accessions which don't have namespaces (for good or ill). >> >> >> OR >> >> to allow Protocol REF and Array Design REF to be associated with Term >> Source REF columns. It's more flexible and only a minor addition to the >> specification. >> >> Michael prefers this option, so do Helen and Tim >> >> 5. Set of comments from Michael, my comments in line >> >> the additional set of fields for the IDF are to specify a set of files >> that carry additional annotation information on the Material fields of >> the SRDF. the use case is perhaps an additional MAGE-ML file whose >> BioMaterial identifier matches up to the identifier of one of the >> source, sample or extract names (including the specified or default >> <authority field) and simply contains <OntologyEntry elements with no >> reference elements (those are in the SRDF file). the other example type >> of file might be a CDISC SEND formatted file. >> >> i would propose that the IDF be able to include along with the SDRF >> file, an 'Annotation File' row and an 'Annotation File Type' ("MAGE-ML" >> or "CDISC-SEND Clinical Pathology") row which could have multiple >> entries. >> >> ------------------------------------------------------------------------- >> **This is a major extension of the core proposal. Tim and Helen have >> reservations: >> >> 5.1. About modifying the core proposal at this point - we are on a tight >> deadline for our EBI services review and the discussion required might >> compromise our implementation being ready on time. >> >> 5.2. Mix and matching MAGE and or other formats - MAGE is not human >> readable and should not be mixed and matched with MAGE-TAB in our view. >> Either it's MAGE-TAB or MAGE-ML not a mix. Anyone's local >> implemenatation is of course up to them, but this is a representation >> format not an implementation. One could use a Comment[CDISC file] for >> this in the IDF for example if support is needed right away. >> >> 5.3. CDISC is an interesting case, this should be investigated and maybe >> a MAGE-TAB 1.1 could reference such a format. There will probably be >> other such interesting cases We (AE) don't want to commit to supporting >> such formats at this point without a group discusson and some examples >> should be carefully examined. We are not happy to add this to the spec, >> especially as it's already published with no mention of this. Is there >> an available parser API? It would be good to initiate a discussion with >> CDISC as well. So we're not ruling this out, but we would prefer not for >> this version. In fact it might be better discussed as MAGE2 and MAGE2's >> TAB representation, where we might consider such extensions. >> >> >> 6. Michael's general editing comments, all OK in principle. >> =============== >> Section 1.2 (ADF) >> If the investigation uses arrays for which a description has >> been previously provided, cross-references to entries in a public >> repository (e.g., an ArrayExpress >> accession number) can be included instead of explicit array >> descriptions. >> >> becomes: >> >> If the investigation uses arrays for which a description has >> been previously provided, cross-references to entries in a public >> repository (e.g., an ArrayExpress >> accession number), such as a standard commercial array, can be included >> instead of explicit array descriptions. >> === >> paragraph beginning with "The main weight..." in the e.g. it looks like >> 'row' should be 'raw' >> === >> Section 1.2 ('The degree of nodes') >> One example has the source nodes having 10 outgoing nodes, so it and >> reference nodes both might have a large number plus the usual max >> outside of source and reference nodes is probably more like 4 than 3. >> === >> Many of the figures (1,4,7,20.b,22,etc) don't have all the rows and >> columns with clear separator lines. >> ==== >> 2.3.6 >> the example is confusing to me, it is the variation in ChIP-chip which >> probably is better as one diagram to show the gap, i think a better >> example is when there are a lot of annotation columns where breaking it >> up clearly on a sample or extract as the last column and beginning with >> that same column in the second file might be less confusing. >> === >> 2.3.7 >> last sentence says "Alternatively...", shouldn't that be "In >> addition..."? >> === >> 2.4 >> 1st para 2nd sentence says "abundance", wouldn't "presence" be better? >> === >> 2.3.5 and Notes on Table 7 >> "gaps (or the - symbol)" >> might be clearer >> "gaps (or the - symbol) separated by tabs" >> === >> 2.4 >> 3rd para 2nd sentence says 'Composite Elements and Reporters' and figure >> in 2.5 has column Composite Element Name before Map2Reporter. >> >> stylistically (and for clarity) it might be more consistent to always >> have a Reporter mention before a Composite Element mention (sorry, my >> english master degree speaking out) >> === >> 3.1, 5th bullet >> if annotation files are added, mention annotation files here in addition >> === >> new section 3.1.3 added to mention annotation files >> === >> Figure 1 and 24, >> if annotation files added, adding to figures and example file >> === >> 3.1.5 >> add at end that "this allows specifying <authority in these cases". >> some of the earlier sections in 3.1 might do to mention how different >> <authority modifiers to the <name field come in. >> === >> 3.2.3 >> end of first sentence add "and one or more ArrayDesigns" >> === >> 3.3.1 >> 3rd para, 5th sentence(?) "umber" should be "number" >> === >> 3.3.2 >> para after figure 26, it is also possible in distinguishing type that >> when there are two different types at the same level, to resolve this >> just means moving the node representation to a higher level where there >> is already a matching type. >> === >> table 7 >> this is a bit confusing, might be better to have a table of the top, >> non-modifying columns, then the set of columns that modify the top level >> columns, then the set of columns that modify that set and so on. >> >> >> >> > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Mged-MAGE2 mailing list > Mge...@li... > https://lists.sourceforge.net/lists/listinfo/mged-mage2 > -- Helen Parkinson, PhD Curation Coordinator Microarray Informatics Team, EBI EBI 01223 494672 Skype: helen.parkinson.ebi |
From: Junmin L. <ju...@pc...> - 2007-01-25 19:59:07
|
Hi, Helen and others Sorry for the late comments. In terms of the protocol REF, I will agree with Joe that Protocol is listed in IDF, and we can assume its default term source by destination or source repository. But I have question on term source for Array Design REF. For example, if Array Express gets a MAGE-TAB refers to an array design in GEO, you cann't load it until AE loads this array design into AE locally first, can you? So you have to convert it to AE's array design id, right? ---junmin On Mon, 22 Jan 2007, Helen Parkinson wrote: > Hi > > In the interests of not being ArrayExpress centric I'd be interested to > see what those who plan to consume/provide these sheets from multiple > sources think. Junmin, Upenn people do you have an opinion? > > cheers > > Helen > > > Joe White wrote: >> Hi Helen, >> >> Regarding item 4, I thought the Protocol REF elements actually DID refer >> to the IDF. So using that option makes sense to me. But I also agree >> with Tim's idea of allowing a Term Source column in the SDRF as an >> alternative--that's what we did with other ontology terms. For the AD, >> we need the Term Source column, since the AD isn't listed in the IDF. >> So I prefer the same option that you, Michael, and Tim do; however, the >> default should be that Protocol REF is listed in IDF and the default >> Term Source is ArrayExpress --since that's where these sheets are going >> anyway. Alternatively, the default Term Source could be listed in the >> IDF, if AE is not the destination repository. >> >> Cheers, >> Joe >> >> >> >> Helen Parkinson wrote: >> >> >>> Dear all, >>> >>> here are the collated comments as promised. These are mostly minor >>> excepting 4 and 5. No-one has objected to the suggestion in 4, though 3 >>> people have expressed a preference, please see our comments in response >>> to point 5. I think the next step could be a phone call to discuss >>> these, if we need this, I suggest Thursday 25th 4pm GMT, please could >>> you indicate your availibility, >>> >>> cheers >>> >>> Helen >>> >>> >>> >>> 1. Clarification of date format in response to Joe White. YYYY-MM-D >>> with time optional is correct. >>> >>> >>> 3. Suggestion to modify the format of the mapping file/and or provide >>> some notes >>> >>> " In the mapping file it might be helpful to have some description of >>> the MAGEv1.1 items, ie class.association.attribute. In some cases we >>> follow several associations. Unless you know MAGE fairly well, it might >>> be difficult to understand what the mapped values refer to. In all >>> cases, the value starts with a MAGE class, and ends with some MAGE >>> attribute. There will be 0 of more associations in between. 3) In the >>> mapping file, the [...] tend to look like separate columns. " >>> >>> This can be modified if needed. We think the target audience are MAGE >>> literate anyway so it's a minor addition of some explanatory notes. >>> >>> 4. Suggestion from Tim to indicate a source database for protocol or ad >>> accessions >>> >>> One possible alteration which has come up is a means of indicating a >>> source database for protocol or array design accessions, where such >>> information is reused between experiments. I'd like to propose that we >>> allow the Protocol REF and Array Design REF columns to refer to the IDF >>> Term Source Name using either square brackets or parentheses, e.g.: >>> >>> Protocol REF [ArrayExpress] >>> >>> Array Design REF [GEO] >>> >>> where ArrayExpress or GEO are explicitly listed in the IDF as Term >>> Sources. I'd also suggest that in the absence of such tags it is assumed >>> that the identifier is local to the context in which the SDRF is used, >>> e.g. assuming ArrayExpress accessions for submissions to ArrayExpress. >>> >>> Note that there is scope for using the Protocol REF:namespace syntax to >>> add an external namespace to identifiers in the SDRF, but that doesn't >>> really work for accessions which don't have namespaces (for good or ill). >>> >>> >>> OR >>> >>> to allow Protocol REF and Array Design REF to be associated with Term >>> Source REF columns. It's more flexible and only a minor addition to the >>> specification. >>> >>> Michael prefers this option, so do Helen and Tim >>> >>> 5. Set of comments from Michael, my comments in line >>> >>> the additional set of fields for the IDF are to specify a set of files >>> that carry additional annotation information on the Material fields of >>> the SRDF. the use case is perhaps an additional MAGE-ML file whose >>> BioMaterial identifier matches up to the identifier of one of the >>> source, sample or extract names (including the specified or default >>> <authority field) and simply contains <OntologyEntry elements with no >>> reference elements (those are in the SRDF file). the other example type >>> of file might be a CDISC SEND formatted file. >>> >>> i would propose that the IDF be able to include along with the SDRF >>> file, an 'Annotation File' row and an 'Annotation File Type' ("MAGE-ML" >>> or "CDISC-SEND Clinical Pathology") row which could have multiple >>> entries. >>> >>> ------------------------------------------------------------------------- >>> **This is a major extension of the core proposal. Tim and Helen have >>> reservations: >>> >>> 5.1. About modifying the core proposal at this point - we are on a tight >>> deadline for our EBI services review and the discussion required might >>> compromise our implementation being ready on time. >>> >>> 5.2. Mix and matching MAGE and or other formats - MAGE is not human >>> readable and should not be mixed and matched with MAGE-TAB in our view. >>> Either it's MAGE-TAB or MAGE-ML not a mix. Anyone's local >>> implemenatation is of course up to them, but this is a representation >>> format not an implementation. One could use a Comment[CDISC file] for >>> this in the IDF for example if support is needed right away. >>> >>> 5.3. CDISC is an interesting case, this should be investigated and maybe >>> a MAGE-TAB 1.1 could reference such a format. There will probably be >>> other such interesting cases We (AE) don't want to commit to supporting >>> such formats at this point without a group discusson and some examples >>> should be carefully examined. We are not happy to add this to the spec, >>> especially as it's already published with no mention of this. Is there >>> an available parser API? It would be good to initiate a discussion with >>> CDISC as well. So we're not ruling this out, but we would prefer not for >>> this version. In fact it might be better discussed as MAGE2 and MAGE2's >>> TAB representation, where we might consider such extensions. >>> >>> >>> 6. Michael's general editing comments, all OK in principle. >>> =============== >>> Section 1.2 (ADF) >>> If the investigation uses arrays for which a description has >>> been previously provided, cross-references to entries in a public >>> repository (e.g., an ArrayExpress >>> accession number) can be included instead of explicit array >>> descriptions. >>> >>> becomes: >>> >>> If the investigation uses arrays for which a description has >>> been previously provided, cross-references to entries in a public >>> repository (e.g., an ArrayExpress >>> accession number), such as a standard commercial array, can be included >>> instead of explicit array descriptions. >>> === >>> paragraph beginning with "The main weight..." in the e.g. it looks like >>> 'row' should be 'raw' >>> === >>> Section 1.2 ('The degree of nodes') >>> One example has the source nodes having 10 outgoing nodes, so it and >>> reference nodes both might have a large number plus the usual max >>> outside of source and reference nodes is probably more like 4 than 3. >>> === >>> Many of the figures (1,4,7,20.b,22,etc) don't have all the rows and >>> columns with clear separator lines. >>> ==== >>> 2.3.6 >>> the example is confusing to me, it is the variation in ChIP-chip which >>> probably is better as one diagram to show the gap, i think a better >>> example is when there are a lot of annotation columns where breaking it >>> up clearly on a sample or extract as the last column and beginning with >>> that same column in the second file might be less confusing. >>> === >>> 2.3.7 >>> last sentence says "Alternatively...", shouldn't that be "In >>> addition..."? >>> === >>> 2.4 >>> 1st para 2nd sentence says "abundance", wouldn't "presence" be better? >>> === >>> 2.3.5 and Notes on Table 7 >>> "gaps (or the - symbol)" >>> might be clearer >>> "gaps (or the - symbol) separated by tabs" >>> === >>> 2.4 >>> 3rd para 2nd sentence says 'Composite Elements and Reporters' and figure >>> in 2.5 has column Composite Element Name before Map2Reporter. >>> >>> stylistically (and for clarity) it might be more consistent to always >>> have a Reporter mention before a Composite Element mention (sorry, my >>> english master degree speaking out) >>> === >>> 3.1, 5th bullet >>> if annotation files are added, mention annotation files here in addition >>> === >>> new section 3.1.3 added to mention annotation files >>> === >>> Figure 1 and 24, >>> if annotation files added, adding to figures and example file >>> === >>> 3.1.5 >>> add at end that "this allows specifying <authority in these cases". >>> some of the earlier sections in 3.1 might do to mention how different >>> <authority modifiers to the <name field come in. >>> === >>> 3.2.3 >>> end of first sentence add "and one or more ArrayDesigns" >>> === >>> 3.3.1 >>> 3rd para, 5th sentence(?) "umber" should be "number" >>> === >>> 3.3.2 >>> para after figure 26, it is also possible in distinguishing type that >>> when there are two different types at the same level, to resolve this >>> just means moving the node representation to a higher level where there >>> is already a matching type. >>> === >>> table 7 >>> this is a bit confusing, might be better to have a table of the top, >>> non-modifying columns, then the set of columns that modify the top level >>> columns, then the set of columns that modify that set and so on. >>> >>> >>> >>> >> >> >> ------------------------------------------------------------------------- >> Take Surveys. Earn Cash. Influence the Future of IT >> Join SourceForge.net's Techsay panel and you'll get the chance to share your >> opinions on IT & business topics through brief surveys - and earn cash >> http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV >> _______________________________________________ >> Mged-MAGE2 mailing list >> Mge...@li... >> https://lists.sourceforge.net/lists/listinfo/mged-mage2 >> > > -- > Helen Parkinson, PhD > Curation Coordinator > Microarray Informatics Team, > EBI > > EBI 01223 494672 > Skype: helen.parkinson.ebi > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Mged-MAGE2 mailing list > Mge...@li... > https://lists.sourceforge.net/lists/listinfo/mged-mage2 > |
From: Tim R. <ra...@eb...> - 2007-01-31 13:46:17
|
Hi, We've discussed the various comments on the previous version of the MAGE-TAB spec (Jan 8), and it appears that the consensus from ArrayExpress is as follows: 1. Since we need to get a version 1.0 specification finalised so that implementation deadlines are met, we feel that full support for external annotation files should be deferred to version 1.1. This will allow for more complete discussion of the requirements, in particular in light of any considerations from the FuGE crowd. In the meantime a minor note has been added to section 3.1.1 regarding suggested use of a Comment[] tag to support these in the meantime. 2. Regarding adding a mandatory .idf extension for the IDF file, I think this is not necessary, since it's pretty easy to design a submission system which would track the IDF without an extension (e.g., the current Tab2MAGE submissions system already does this, in effect). Additionally, a new file extension would have to be mapped to Excel or OpenOffice by the end user for it to be any use to them (I believe this is true of both Windows and Mac). This may not be a huge deal, but it's another barrier to the casual user. Such mapping does not always guarantee that a document opens in the desired application either (OpenOffice, I'm looking at you...). 3. Use of quotes in IDF and SDRF; while maybe they could be dispensed with in SDRF, they will be needed in IDF to allow e.g. newlines in protocol text (and believe me, users will want this!). The original Tab2MAGE implementation didn't allow fields to be quoted like this to preserve special characters (i.e., newlines and tabs), and it was awful. An additional advantage to using quotes is that "text" date fields such as "2007-01-31" will be preserved by spreadsheet software, while 2007-01-31 is often corrupted by such "helpful" applications. I have added a new section (3.1.6) which briefly discusses use of quotes to escape data fields. On the bright side, this format tends to be the default for "tab-delimited" export from Excel and OpenOffice in any case. 4. We agreed that the authority:namespace component of the resulting MAGE identifiers would be left to the implementation of any parser; while this precludes sharing e.g. Sources defined in MAGE-TAB documents, this is a relatively minor use case. Again, this could be revisited for a 1.1 specification. 5. Regarding Junmin's comment about array design accessions, it is true that submissions to us will be using ArrayExpress accessions exclusively (at least for the forseeable future) but this is not necessarily true of other users, e.g. those downloading and distributing MAGE-TAB documents from us. 6. As discussed, the specification does indeed allow SDRFs to be split (on any "Name" column) into as many sub-SDRF documents as necessary. I've made the other modifications suggested to the list and put up a new specification document (Jan 31) here: http://www.ebi.ac.uk/systems-srv/mp/file-exchange/MAGE-TABv1.0.tar.gz Unless there are serious arguments to the contrary, I personally will be treating this as a finalised version 1.0 specification. Thanks very much for all your comments, Tim -- Tim Rayner, Ph.D. Scientific Database Curator Microarray Informatics Team European Bioinformatics Institute |
From: Miller, M. D (Rosetta) <Michael_Miller@Rosettabio.com> - 2007-01-31 17:46:21
|
hi tim, thanks again for all your great work on this. my only real problem point is 4). 2) can be lived with as long as it is a recommended extension and 3) just will add complexity to the parsing of the documents. > 1. ... full support for external=20 > annotation files should be deferred to version 1.1. ... this sounds fine.=20 > 2. Regarding adding a mandatory .idf extension for the IDF=20 > file, I think=20 > this is not necessary, since it's pretty easy to design a submission=20 > system which would track the IDF without an extension=20 this is not true of automated pipeline systems, which is a huge use case (as much or more gene expression data and annotation are loaded via these pipelines). in windows, anyway, it is very easy to set this up to open automatically in excel or whatever spreadsheet program is desired.=20 i would be happy with a recommendation that the file extension be 'idf'. if this could be added to the 1st paragraph of section 3.1.1, that would be great. > 3. Use of quotes in IDF and SDRF; while maybe they could be=20 > dispensed with=20 > in SDRF, they will be needed in IDF to allow e.g. newlines in=20 > protocol=20 > text (and believe me, users will want this!).=20 as the addition of information on addition annotation files in the idf was considered a late addition without time for comment, this is a rather late addition, it was never part of the specification proper. they can not be made optional, they either must be mandatory or not. =20 i would like to see a handful of fields (like protocol descriptions) where they are required and the rest where they must not be used but i can live with them being mandatory in the IDF and not used in the SDRF. there also must be a provision for escaping quotes within the quoted field, believe me, they will definitely occur (the escape can be as simple as '\"' and, less likely, but possible, it will be the first actual character of the field, which is why they must not be optional for fields) > 4. We agreed that the authority:namespace component of the=20 > resulting MAGE=20 > identifiers would be left to the implementation of any=20 > parser; while this=20 > precludes sharing e.g. Sources defined in MAGE-TAB documents,=20 > this is a=20 > relatively minor use case. Again, this could be revisited for a 1.1=20 > specification. this is not a relatively minor use case for many who would want to use MAGETAB--it may be for ArrayExpress right now but it is a common use case for investigation on how to organize a microarray experiment (MAQC!!!). the sharing of sources also occurs for a variety of other reasons. this, i thought would be a minor addition to the IDF file to define the default naming authority. this is not a parser issue--a parser is developed in accordance to the specification. this would be very bad to leave until 1.1, it will cause valuable biological information to be lost. > 5. Regarding Junmin's comment about array design accessions, ... > 6. As discussed, the specification does indeed allow SDRFs to ... fine. cheers, michael Michael Miller Lead Software Developer Rosetta Biosoftware Business Unit www.rosettabio.com > -----Original Message----- > From: mge...@li...=20 > [mailto:mge...@li...] On Behalf=20 > Of Tim Rayner > Sent: Wednesday, January 31, 2007 5:46 AM > To: 'MGED-mage'; mged-mage2 > Subject: Re: [Mged-mage2] [Mged-mage] Collated comments on=20 > MAGETAB spec >=20 >=20 > Hi, >=20 > We've discussed the various comments on the previous version of the=20 > MAGE-TAB spec (Jan 8), and it appears that the consensus from=20 > ArrayExpress=20 > is as follows: >=20 > 1. Since we need to get a version 1.0 specification finalised so that=20 > implementation deadlines are met, we feel that full support=20 > for external=20 > annotation files should be deferred to version 1.1. This will=20 > allow for=20 > more complete discussion of the requirements, in particular=20 > in light of=20 > any considerations from the FuGE crowd. In the meantime a minor note=20 > has been added to section 3.1.1 regarding suggested use of a=20 > Comment[] tag=20 > to support these in the meantime. >=20 > 2. Regarding adding a mandatory .idf extension for the IDF=20 > file, I think=20 > this is not necessary, since it's pretty easy to design a submission=20 > system which would track the IDF without an extension (e.g.,=20 > the current=20 > Tab2MAGE submissions system already does this, in effect).=20 > Additionally, a=20 > new file extension would have to be mapped to Excel or=20 > OpenOffice by the=20 > end user for it to be any use to them (I believe this is true of both=20 > Windows and Mac). This may not be a huge deal, but it's=20 > another barrier to=20 > the casual user. Such mapping does not always guarantee that=20 > a document=20 > opens in the desired application either (OpenOffice, I'm looking at=20 > you...). >=20 > 3. Use of quotes in IDF and SDRF; while maybe they could be=20 > dispensed with=20 > in SDRF, they will be needed in IDF to allow e.g. newlines in=20 > protocol=20 > text (and believe me, users will want this!). The original Tab2MAGE=20 > implementation didn't allow fields to be quoted like this to preserve=20 > special characters (i.e., newlines and tabs), and it was awful. An=20 > additional advantage to using quotes is that "text" date=20 > fields such as=20 > "2007-01-31" will be preserved by spreadsheet software, while=20 > 2007-01-31=20 > is often corrupted by such "helpful" applications. I have added a new=20 > section (3.1.6) which briefly discusses use of quotes to escape data=20 > fields. On the bright side, this format tends to be the default=20 > for "tab-delimited" export from Excel and OpenOffice in any case. >=20 > 4. We agreed that the authority:namespace component of the=20 > resulting MAGE=20 > identifiers would be left to the implementation of any=20 > parser; while this=20 > precludes sharing e.g. Sources defined in MAGE-TAB documents,=20 > this is a=20 > relatively minor use case. Again, this could be revisited for a 1.1=20 > specification. >=20 > 5. Regarding Junmin's comment about array design accessions,=20 > it is true=20 > that submissions to us will be using ArrayExpress accessions=20 > exclusively=20 > (at least for the forseeable future) but this is not=20 > necessarily true of=20 > other users, e.g. those downloading and distributing MAGE-TAB=20 > documents=20 > from us. >=20 > 6. As discussed, the specification does indeed allow SDRFs to=20 > be split (on=20 > any "Name" column) into as many sub-SDRF documents as necessary. >=20 > I've made the other modifications suggested to the list and=20 > put up a new=20 > specification document (Jan 31) here: >=20 > http://www.ebi.ac.uk/systems-srv/mp/file-exchange/MAGE-TABv1.0.tar.gz >=20 > Unless there are serious arguments to the contrary, I=20 > personally will be=20 > treating this as a finalised version 1.0 specification.=20 > Thanks very much=20 > for all your comments, >=20 > Tim >=20 >=20 > --=20 > Tim Rayner, Ph.D. > Scientific Database Curator > Microarray Informatics Team > European Bioinformatics Institute >=20 >=20 > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the=20 > chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge &CID=3DDEVDEV _______________________________________________ Mged-MAGE2 mailing list Mge...@li... https://lists.sourceforge.net/lists/listinfo/mged-mage2 |
From: Miller, M. D (Rosetta) <Michael_Miller@Rosettabio.com> - 2007-01-22 21:38:27
|
hi all, first, a question on why in the example srdf and idf are the fields surrounded by quotes? since the file is tab-delimited, the quotes are not needed and only tend to get in the way. in reference to point 5 i should explain my motivation a bit more. it is very clear that what MAGETAB specifically cannot do is to provide full, rich annotation of BioMaterial. for many organizations that will be interchanging data, there will be a need to use more than one format to be able to fully exchange all the relevant information to a large investigation. =20 MAGETAB will be very useful for the minimum information about the gene expression experiment itself but without the ability to get the full clinical data on the sources would only be marginally useful. my thought was that these optional columns would help group together files that were meant to go together, if an application couldn't read them, that would be fine since the MAGETAB would be the minimum they would need. but if one had a queue of MAGETAB documents, how would one group the other annotation files with them? since an import application wouldn't be required to read these other files, then the ArrayExpress import would be free not to. i mentioned MAGE-ML below because if all one was doing was exporting annotation information, the burden is no way as large as exporting an entire experiment. "MAGE is not human readable" not particularly true, but some of the data files that will come as raw data certainly won't be human readable so that's a moot point. this emphasis on MAGETAB seems to be having the side effect of dumbing down the information shared which i don't think we really want to see as a trend. cheers, michael > -----Original Message----- > From: mge...@li...=20 > [mailto:mge...@li...] On Behalf=20 > Of Helen Parkinson > Sent: Monday, January 22, 2007 5:15 AM > To: mged-mage2; 'MGED-mage' > Subject: [Mged-mage2] Collated comments on MAGETAB spec >=20 >=20 >=20 >=20 > Dear all, >=20 > here are the collated comments as promised. These are mostly minor=20 > excepting 4 and 5. No-one has objected to the suggestion in=20 > 4, though 3=20 > people have expressed a preference, please see our comments=20 > in response=20 > to point 5. I think the next step could be a phone call to discuss=20 > these, if we need this, I suggest Thursday 25th 4pm GMT, please could=20 > you indicate your availibility, >=20 > cheers >=20 > Helen >=20 >=20 >=20 > 1. Clarification of date format in response to Joe White. YYYY-MM-D=20 > with time optional is correct. >=20 >=20 > 3. Suggestion to modify the format of the mapping file/and=20 > or provide=20 > some notes >=20 > " In the mapping file it might be helpful to have some=20 > description of=20 > the MAGEv1.1 items, ie class.association.attribute. In some cases we=20 > follow several associations. Unless you know MAGE fairly=20 > well, it might=20 > be difficult to understand what the mapped values refer to. In all=20 > cases, the value starts with a MAGE class, and ends with some MAGE=20 > attribute. There will be 0 of more associations in between.=20 > 3) In the=20 > mapping file, the [...] tend to look like separate columns. " >=20 > This can be modified if needed. We think the target audience are MAGE=20 > literate anyway so it's a minor addition of some explanatory notes. >=20 > 4. Suggestion from Tim to indicate a source database for=20 > protocol or ad=20 > accessions >=20 > One possible alteration which has come up is a means of indicating a=20 > source database for protocol or array design accessions, where such=20 > information is reused between experiments. I'd like to=20 > propose that we=20 > allow the Protocol REF and Array Design REF columns to refer=20 > to the IDF=20 > Term Source Name using either square brackets or parentheses, e.g.: >=20 > Protocol REF [ArrayExpress] >=20 > Array Design REF [GEO] >=20 > where ArrayExpress or GEO are explicitly listed in the IDF as Term=20 > Sources. I'd also suggest that in the absence of such tags it=20 > is assumed=20 > that the identifier is local to the context in which the SDRF=20 > is used,=20 > e.g. assuming ArrayExpress accessions for submissions to ArrayExpress. >=20 > Note that there is scope for using the Protocol=20 > REF:namespace syntax to=20 > add an external namespace to identifiers in the SDRF, but=20 > that doesn't=20 > really work for accessions which don't have namespaces (for=20 > good or ill). >=20 >=20 > OR >=20 > to allow Protocol REF and Array Design REF to be associated=20 > with Term=20 > Source REF columns. It's more flexible and only a minor=20 > addition to the=20 > specification. >=20 > Michael prefers this option, so do Helen and Tim >=20 > 5. Set of comments from Michael, my comments in line >=20 > the additional set of fields for the IDF are to specify a=20 > set of files > that carry additional annotation information on the Material=20 > fields of > the SRDF. the use case is perhaps an additional MAGE-ML file whose > BioMaterial identifier matches up to the identifier of one of the > source, sample or extract names (including the specified or default > <authority field) and simply contains <OntologyEntry elements with no > reference elements (those are in the SRDF file). the other=20 > example type > of file might be a CDISC SEND formatted file. >=20 > i would propose that the IDF be able to include along with the SDRF > file, an 'Annotation File' row and an 'Annotation File Type'=20 > ("MAGE-ML" > or "CDISC-SEND Clinical Pathology") row which could have multiple > entries. >=20 > -------------------------------------------------------------- > ----------- > **This is a major extension of the core proposal. Tim and Helen have=20 > reservations: >=20 > 5.1. About modifying the core proposal at this point - we are=20 > on a tight=20 > deadline for our EBI services review and the discussion=20 > required might=20 > compromise our implementation being ready on time. >=20 > 5.2. Mix and matching MAGE and or other formats - MAGE is not human=20 > readable and should not be mixed and matched with MAGE-TAB in=20 > our view.=20 > Either it's MAGE-TAB or MAGE-ML not a mix. Anyone's local=20 > implemenatation is of course up to them, but this is a representation=20 > format not an implementation. One could use a Comment[CDISC file] for=20 > this in the IDF for example if support is needed right away. >=20 > 5.3. CDISC is an interesting case, this should be=20 > investigated and maybe=20 > a MAGE-TAB 1.1 could reference such a format. There will probably be=20 > other such interesting cases We (AE) don't want to commit to=20 > supporting=20 > such formats at this point without a group discusson and some=20 > examples=20 > should be carefully examined. We are not happy to add this to=20 > the spec,=20 > especially as it's already published with no mention of this.=20 > Is there=20 > an available parser API? It would be good to initiate a=20 > discussion with=20 > CDISC as well. So we're not ruling this out, but we would=20 > prefer not for=20 > this version. In fact it might be better discussed as MAGE2=20 > and MAGE2's=20 > TAB representation, where we might consider such extensions. >=20 >=20 > 6. Michael's general editing comments, all OK in principle. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Section 1.2 (ADF) > If the investigation uses arrays for which a description has > been previously provided, cross-references to entries in a public > repository (e.g., an ArrayExpress > accession number) can be included instead of explicit array > descriptions. >=20 > becomes: >=20 > If the investigation uses arrays for which a description has > been previously provided, cross-references to entries in a public > repository (e.g., an ArrayExpress > accession number), such as a standard commercial array, can=20 > be included > instead of explicit array descriptions. > =3D=3D=3D > paragraph beginning with "The main weight..." in the e.g.=20 > it looks like > 'row' should be 'raw' > =3D=3D=3D > Section 1.2 ('The degree of nodes') > One example has the source nodes having 10 outgoing nodes, so it and > reference nodes both might have a large number plus the usual max > outside of source and reference nodes is probably more like 4 than 3. > =3D=3D=3D > Many of the figures (1,4,7,20.b,22,etc) don't have all the rows and > columns with clear separator lines. > =3D=3D=3D=3D > 2.3.6 > the example is confusing to me, it is the variation in=20 > ChIP-chip which > probably is better as one diagram to show the gap, i think a better > example is when there are a lot of annotation columns where=20 > breaking it > up clearly on a sample or extract as the last column and=20 > beginning with > that same column in the second file might be less confusing. > =3D=3D=3D > 2.3.7 > last sentence says "Alternatively...", shouldn't that be "In > addition..."? > =3D=3D=3D > 2.4 > 1st para 2nd sentence says "abundance", wouldn't "presence"=20 > be better? > =3D=3D=3D > 2.3.5 and Notes on Table 7 > "gaps (or the - symbol)" > might be clearer > "gaps (or the - symbol) separated by tabs" > =3D=3D=3D > 2.4 > 3rd para 2nd sentence says 'Composite Elements and=20 > Reporters' and figure > in 2.5 has column Composite Element Name before Map2Reporter. >=20 > stylistically (and for clarity) it might be more consistent to always > have a Reporter mention before a Composite Element mention (sorry, my > english master degree speaking out) > =3D=3D=3D > 3.1, 5th bullet > if annotation files are added, mention annotation files here=20 > in addition > =3D=3D=3D > new section 3.1.3 added to mention annotation files > =3D=3D=3D > Figure 1 and 24, > if annotation files added, adding to figures and example file > =3D=3D=3D > 3.1.5 > add at end that "this allows specifying <authority in these cases". > some of the earlier sections in 3.1 might do to mention how different > <authority modifiers to the <name field come in. > =3D=3D=3D > 3.2.3 > end of first sentence add "and one or more ArrayDesigns" > =3D=3D=3D > 3.3.1 > 3rd para, 5th sentence(?) "umber" should be "number" > =3D=3D=3D > 3.3.2 > para after figure 26, it is also possible in distinguishing type that > when there are two different types at the same level, to resolve this > just means moving the node representation to a higher level=20 > where there > is already a matching type. > =3D=3D=3D > table 7 > this is a bit confusing, might be better to have a table of the top, > non-modifying columns, then the set of columns that modify=20 > the top level > columns, then the set of columns that modify that set and so on. >=20 > --=20 > Helen Parkinson, PhD > Curation Coordinator > Microarray Informatics Team,=20 > EBI >=20 > EBI 01223 494672 > Skype: helen.parkinson.ebi >=20 >=20 > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the=20 > chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge &CID=3DDEVDEV _______________________________________________ Mged-MAGE2 mailing list Mge...@li... https://lists.sourceforge.net/lists/listinfo/mged-mage2 |
From: Miller, M. D (Rosetta) <Michael_Miller@Rosettabio.com> - 2007-01-23 22:48:16
|
hi all, one more issue i've run into: in section 3.1.3: "...and define <authority>:[<namespace>] for the whole file at once." where does this global <authority> get defined? since it is common to have the same sample used in separate experiments/investigations it would be good to be able to define this in terms of the organization that put together the MAGE-TAB file, but i don't see where that is done. cheers, michael > -----Original Message----- > From: mge...@li...=20 > [mailto:mge...@li...] On Behalf=20 > Of Helen Parkinson > Sent: Monday, January 22, 2007 5:15 AM > To: mged-mage2; 'MGED-mage' > Subject: [Mged-mage2] Collated comments on MAGETAB spec >=20 >=20 >=20 >=20 > Dear all, >=20 > here are the collated comments as promised. These are mostly minor=20 > excepting 4 and 5. No-one has objected to the suggestion in=20 > 4, though 3=20 > people have expressed a preference, please see our comments=20 > in response=20 > to point 5. I think the next step could be a phone call to discuss=20 > these, if we need this, I suggest Thursday 25th 4pm GMT, please could=20 > you indicate your availibility, >=20 > cheers >=20 > Helen >=20 >=20 >=20 > 1. Clarification of date format in response to Joe White. YYYY-MM-D=20 > with time optional is correct. >=20 >=20 > 3. Suggestion to modify the format of the mapping file/and=20 > or provide=20 > some notes >=20 > " In the mapping file it might be helpful to have some=20 > description of=20 > the MAGEv1.1 items, ie class.association.attribute. In some cases we=20 > follow several associations. Unless you know MAGE fairly=20 > well, it might=20 > be difficult to understand what the mapped values refer to. In all=20 > cases, the value starts with a MAGE class, and ends with some MAGE=20 > attribute. There will be 0 of more associations in between.=20 > 3) In the=20 > mapping file, the [...] tend to look like separate columns. " >=20 > This can be modified if needed. We think the target audience are MAGE=20 > literate anyway so it's a minor addition of some explanatory notes. >=20 > 4. Suggestion from Tim to indicate a source database for=20 > protocol or ad=20 > accessions >=20 > One possible alteration which has come up is a means of indicating a=20 > source database for protocol or array design accessions, where such=20 > information is reused between experiments. I'd like to=20 > propose that we=20 > allow the Protocol REF and Array Design REF columns to refer=20 > to the IDF=20 > Term Source Name using either square brackets or parentheses, e.g.: >=20 > Protocol REF [ArrayExpress] >=20 > Array Design REF [GEO] >=20 > where ArrayExpress or GEO are explicitly listed in the IDF as Term=20 > Sources. I'd also suggest that in the absence of such tags it=20 > is assumed=20 > that the identifier is local to the context in which the SDRF=20 > is used,=20 > e.g. assuming ArrayExpress accessions for submissions to ArrayExpress. >=20 > Note that there is scope for using the Protocol=20 > REF:namespace syntax to=20 > add an external namespace to identifiers in the SDRF, but=20 > that doesn't=20 > really work for accessions which don't have namespaces (for=20 > good or ill). >=20 >=20 > OR >=20 > to allow Protocol REF and Array Design REF to be associated=20 > with Term=20 > Source REF columns. It's more flexible and only a minor=20 > addition to the=20 > specification. >=20 > Michael prefers this option, so do Helen and Tim >=20 > 5. Set of comments from Michael, my comments in line >=20 > the additional set of fields for the IDF are to specify a=20 > set of files > that carry additional annotation information on the Material=20 > fields of > the SRDF. the use case is perhaps an additional MAGE-ML file whose > BioMaterial identifier matches up to the identifier of one of the > source, sample or extract names (including the specified or default > <authority field) and simply contains <OntologyEntry elements with no > reference elements (those are in the SRDF file). the other=20 > example type > of file might be a CDISC SEND formatted file. >=20 > i would propose that the IDF be able to include along with the SDRF > file, an 'Annotation File' row and an 'Annotation File Type'=20 > ("MAGE-ML" > or "CDISC-SEND Clinical Pathology") row which could have multiple > entries. >=20 > -------------------------------------------------------------- > ----------- > **This is a major extension of the core proposal. Tim and Helen have=20 > reservations: >=20 > 5.1. About modifying the core proposal at this point - we are=20 > on a tight=20 > deadline for our EBI services review and the discussion=20 > required might=20 > compromise our implementation being ready on time. >=20 > 5.2. Mix and matching MAGE and or other formats - MAGE is not human=20 > readable and should not be mixed and matched with MAGE-TAB in=20 > our view.=20 > Either it's MAGE-TAB or MAGE-ML not a mix. Anyone's local=20 > implemenatation is of course up to them, but this is a representation=20 > format not an implementation. One could use a Comment[CDISC file] for=20 > this in the IDF for example if support is needed right away. >=20 > 5.3. CDISC is an interesting case, this should be=20 > investigated and maybe=20 > a MAGE-TAB 1.1 could reference such a format. There will probably be=20 > other such interesting cases We (AE) don't want to commit to=20 > supporting=20 > such formats at this point without a group discusson and some=20 > examples=20 > should be carefully examined. We are not happy to add this to=20 > the spec,=20 > especially as it's already published with no mention of this.=20 > Is there=20 > an available parser API? It would be good to initiate a=20 > discussion with=20 > CDISC as well. So we're not ruling this out, but we would=20 > prefer not for=20 > this version. In fact it might be better discussed as MAGE2=20 > and MAGE2's=20 > TAB representation, where we might consider such extensions. >=20 >=20 > 6. Michael's general editing comments, all OK in principle. > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Section 1.2 (ADF) > If the investigation uses arrays for which a description has > been previously provided, cross-references to entries in a public > repository (e.g., an ArrayExpress > accession number) can be included instead of explicit array > descriptions. >=20 > becomes: >=20 > If the investigation uses arrays for which a description has > been previously provided, cross-references to entries in a public > repository (e.g., an ArrayExpress > accession number), such as a standard commercial array, can=20 > be included > instead of explicit array descriptions. > =3D=3D=3D > paragraph beginning with "The main weight..." in the e.g.=20 > it looks like > 'row' should be 'raw' > =3D=3D=3D > Section 1.2 ('The degree of nodes') > One example has the source nodes having 10 outgoing nodes, so it and > reference nodes both might have a large number plus the usual max > outside of source and reference nodes is probably more like 4 than 3. > =3D=3D=3D > Many of the figures (1,4,7,20.b,22,etc) don't have all the rows and > columns with clear separator lines. > =3D=3D=3D=3D > 2.3.6 > the example is confusing to me, it is the variation in=20 > ChIP-chip which > probably is better as one diagram to show the gap, i think a better > example is when there are a lot of annotation columns where=20 > breaking it > up clearly on a sample or extract as the last column and=20 > beginning with > that same column in the second file might be less confusing. > =3D=3D=3D > 2.3.7 > last sentence says "Alternatively...", shouldn't that be "In > addition..."? > =3D=3D=3D > 2.4 > 1st para 2nd sentence says "abundance", wouldn't "presence"=20 > be better? > =3D=3D=3D > 2.3.5 and Notes on Table 7 > "gaps (or the - symbol)" > might be clearer > "gaps (or the - symbol) separated by tabs" > =3D=3D=3D > 2.4 > 3rd para 2nd sentence says 'Composite Elements and=20 > Reporters' and figure > in 2.5 has column Composite Element Name before Map2Reporter. >=20 > stylistically (and for clarity) it might be more consistent to always > have a Reporter mention before a Composite Element mention (sorry, my > english master degree speaking out) > =3D=3D=3D > 3.1, 5th bullet > if annotation files are added, mention annotation files here=20 > in addition > =3D=3D=3D > new section 3.1.3 added to mention annotation files > =3D=3D=3D > Figure 1 and 24, > if annotation files added, adding to figures and example file > =3D=3D=3D > 3.1.5 > add at end that "this allows specifying <authority in these cases". > some of the earlier sections in 3.1 might do to mention how different > <authority modifiers to the <name field come in. > =3D=3D=3D > 3.2.3 > end of first sentence add "and one or more ArrayDesigns" > =3D=3D=3D > 3.3.1 > 3rd para, 5th sentence(?) "umber" should be "number" > =3D=3D=3D > 3.3.2 > para after figure 26, it is also possible in distinguishing type that > when there are two different types at the same level, to resolve this > just means moving the node representation to a higher level=20 > where there > is already a matching type. > =3D=3D=3D > table 7 > this is a bit confusing, might be better to have a table of the top, > non-modifying columns, then the set of columns that modify=20 > the top level > columns, then the set of columns that modify that set and so on. >=20 > --=20 > Helen Parkinson, PhD > Curation Coordinator > Microarray Informatics Team,=20 > EBI >=20 > EBI 01223 494672 > Skype: helen.parkinson.ebi >=20 >=20 > -------------------------------------------------------------- > ----------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the=20 > chance to share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge &CID=3DDEVDEV _______________________________________________ Mged-MAGE2 mailing list Mge...@li... https://lists.sourceforge.net/lists/listinfo/mged-mage2 |
From: Philippe <ro...@eb...> - 2007-01-27 16:52:02
|
Hello all, We just had a very productive meeting with folks from FDA and reviewed MAGE-TAB specifications. A use case put to us was how to represent the situation where one applies 2 different normalizations on overlapping subsets of raw data files. (raw datafiles 1,2,3 normalized using method A producing derived datafile dA, and raw datafiles 2,3,4 normalized using method B producing derived datafile dB) to report this case in MAGE tab, one would have to duplicate a set of columns related to normalization event. Source-Sample-Extract-LE-Hyb-Norm-Norm This seems not to be consistent with the basic mechanism of the sdrf whereby columns added on the right qualify all the columns on the left. The question here is the following: In such situation, is it the case of creating a second SDRF to report the second normalization event ? More generally, will MAGE-TAB support cover submissions with several SDRFs, possibly corresponding splitting on every 'Pivotal' element (Source, Sample, Extract, LE, Hyb...) Thanks Philippe |
From: Philippe <ro...@eb...> - 2007-01-27 16:54:16
|
Hello Again, Another thing that probably need to be changed/added. The investigation file should have a specific extension (idf) to allow for immediate identification by the parser supposed to deal with a submission. Cheers Philippe |
From: Ugis S. <ug...@eb...> - 2007-01-29 09:18:40
|
Alvis Brazma wrote: >Philippe at al, > >I think this example falls out of the 90% or so of typical cases for which >MAGE-TAB has been really designed for. Have we got a single one like this >in ArrayExpress? We have many where two or more different normalisations >are applied to all raw files, but why would one want to do data treatment >like you describe? (There may be a case of course, but how typical?). > >That said, as you point out yourself, in principle MAGE-TAB can describe >this via two SDRF files. The way I have thought of the specification, >there is nothing to prevent having two SDRFs. > >The very worst would be to forget the principle that MAGE-TAB should deal >well with the typical cases and start complicating it because somebody can >come up with a rare case which cannot be described easily. However, I >think that this paricular case can be described quite well via 2 SDRDs > > I think Philippes example is in fact just like what Alvis mentions, i.e. two or more normalisations are applied to all raw data files. In either case some raw data files are used more than once, from the graph representation point of view there is a split at the raw data column level. 2 SDRFs work well, and in general any proper MAGE-TAB parser should be able to deal with any number of SDRFs that define nodes and edges of the material/data manipulation graph. Ugis >Cheers, >- Alvis > > >On Sat, 27 Jan 2007, Philippe wrote: > > > >>Hello all, >> >>We just had a very productive meeting with folks from FDA and reviewed >>MAGE-TAB specifications. >>A use case put to us was how to represent the situation where one >>applies 2 different normalizations on overlapping subsets of raw data files. >>(raw datafiles 1,2,3 normalized using method A producing derived >>datafile dA, and raw datafiles 2,3,4 normalized using method B >>producing derived datafile dB) >>to report this case in MAGE tab, one would have to duplicate a set of >>columns related to normalization event. >>Source-Sample-Extract-LE-Hyb-Norm-Norm >>This seems not to be consistent with the basic mechanism of the sdrf >>whereby columns added on the right qualify all the columns on the left. >>The question here is the following: In such situation, is it the case of >>creating a second SDRF to report the second normalization event ? >>More generally, will MAGE-TAB support cover submissions with several >>SDRFs, possibly corresponding splitting on every 'Pivotal' element >>(Source, Sample, Extract, LE, Hyb...) >> >>Thanks >> >>Philippe >> >> |
From: Paul S. <PTS...@lb...> - 2007-01-29 19:39:47
|
We talked about allowing SDRFs to be split. i.e. start a second set with new headers for the various steps (but still in the same doc). I think this is essential and much better than making a second file because there may be cases that require dozens of SDRFs -- i.e. a paper comparing normalization methods.... paul On Jan 29, 2007, at 1:18 AM, Ugis Sarkans wrote: > Alvis Brazma wrote: > >> Philippe at al, >> >> I think this example falls out of the 90% or so of typical cases >> for which >> MAGE-TAB has been really designed for. Have we got a single one >> like this >> in ArrayExpress? We have many where two or more different >> normalisations >> are applied to all raw files, but why would one want to do data >> treatment >> like you describe? (There may be a case of course, but how typical?). >> >> That said, as you point out yourself, in principle MAGE-TAB can >> describe >> this via two SDRF files. The way I have thought of the specification, >> there is nothing to prevent having two SDRFs. >> >> The very worst would be to forget the principle that MAGE-TAB >> should deal >> well with the typical cases and start complicating it because >> somebody can >> come up with a rare case which cannot be described easily. However, I >> think that this paricular case can be described quite well via 2 >> SDRDs >> >> > I think Philippes example is in fact just like what Alvis mentions, > i.e. > two or more normalisations are > applied to all raw data files. In either case some raw data files are > used more than once, from the > graph representation point of view there is a split at the raw data > column level. 2 SDRFs work well, > and in general any proper MAGE-TAB parser should be able to deal with > any number of SDRFs that > define nodes and edges of the material/data manipulation graph. > > Ugis > >> Cheers, >> - Alvis >> >> >> On Sat, 27 Jan 2007, Philippe wrote: >> >> >> >>> Hello all, >>> >>> We just had a very productive meeting with folks from FDA and >>> reviewed >>> MAGE-TAB specifications. >>> A use case put to us was how to represent the situation where one >>> applies 2 different normalizations on overlapping subsets of raw >>> data files. >>> (raw datafiles 1,2,3 normalized using method A producing derived >>> datafile dA, and raw datafiles 2,3,4 normalized using method B >>> producing derived datafile dB) >>> to report this case in MAGE tab, one would have to duplicate a >>> set of >>> columns related to normalization event. >>> Source-Sample-Extract-LE-Hyb-Norm-Norm >>> This seems not to be consistent with the basic mechanism of the sdrf >>> whereby columns added on the right qualify all the columns on >>> the left. >>> The question here is the following: In such situation, is it the >>> case of >>> creating a second SDRF to report the second normalization event ? >>> More generally, will MAGE-TAB support cover submissions with several >>> SDRFs, possibly corresponding splitting on every 'Pivotal' element >>> (Source, Sample, Extract, LE, Hyb...) >>> >>> Thanks >>> >>> Philippe >>> >>> > > > ---------------------------------------------------------------------- > --- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to > share your > opinions on IT & business topics through brief surveys - and earn cash > http://www.techsay.com/default.php? > page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > Mged-mage mailing list > Mge...@li... > https://lists.sourceforge.net/lists/listinfo/mged-mage |