Re: [Mged-mage] [Mged-mage2] Collated comments on MAGETAB spec

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

hi tim,

thanks again for all your great work on this.

my only real problem point is 4).  2) can be lived with as long as it is
a recommended extension and 3) just will add complexity to the parsing
of the documents.

> 1. ... full support for external=20
> annotation files should be deferred to version 1.1. ...

this sounds fine.=20

> 2. Regarding adding a mandatory .idf extension for the IDF=20
> file, I think=20
> this is not necessary, since it's pretty easy to design a submission=20
> system which would track the IDF without an extension=20

this is not true of automated pipeline systems, which is a huge use case
(as much or more gene expression data and annotation are loaded via
these pipelines).  in windows, anyway, it is very easy to set this up to
open automatically in excel or whatever spreadsheet program is desired.=20

i would be happy with a recommendation that the file extension be 'idf'.
if this could be added to the 1st paragraph of section 3.1.1, that would
be great.

> 3. Use of quotes in IDF and SDRF; while maybe they could be=20
> dispensed with=20
> in SDRF, they will be needed in IDF to allow e.g. newlines in=20
> protocol=20
> text (and believe me, users will want this!).=20

as the addition of information on addition annotation files in the idf
was considered a late addition without time for comment, this is a
rather late addition, it was never part of the specification proper.
they can not be made optional, they either must be mandatory or not. =20

i would like to see a handful of fields (like protocol descriptions)
where they are required and the rest where they must not be used but i
can live with them being mandatory in the IDF and not used in the SDRF.

there also must be a provision for escaping quotes within the quoted
field, believe me, they will definitely occur (the escape can be as
simple as '\"' and, less likely, but possible, it will be the first
actual character of the field, which is why they must not be optional
for fields)

> 4. We agreed that the authority:namespace component of the=20
> resulting MAGE=20
> identifiers would be left to the implementation of any=20
> parser; while this=20
> precludes sharing e.g. Sources defined in MAGE-TAB documents,=20
> this is a=20
> relatively minor use case. Again, this could be revisited for a 1.1=20
> specification.

this is not a relatively minor use case for many who would want to use
MAGETAB--it may be for ArrayExpress right now but it is a common use
case for investigation on how to organize a microarray experiment
(MAQC!!!).  the sharing of sources also occurs for a variety of other
reasons.

this, i thought would be a minor addition to the IDF file to define the
default naming authority.  this is not a parser issue--a parser is
developed in accordance to the specification.  this would be very bad to
leave until 1.1, it will cause valuable biological information to be
lost.

> 5. Regarding Junmin's comment about array design accessions, ...

> 6. As discussed, the specification does indeed allow SDRFs to ...

fine.

cheers,
michael

Michael Miller
Lead Software Developer
Rosetta Biosoftware Business Unit
www.rosettabio.com

> -----Original Message-----
> From: mge...@li...=20
> [mailto:mge...@li...] On Behalf=20
> Of Tim Rayner
> Sent: Wednesday, January 31, 2007 5:46 AM
> To: 'MGED-mage'; mged-mage2
> Subject: Re: [Mged-mage2] [Mged-mage] Collated comments on=20
> MAGETAB spec
>=20
>=20
> Hi,
>=20
> We've discussed the various comments on the previous version of the=20
> MAGE-TAB spec (Jan 8), and it appears that the consensus from=20
> ArrayExpress=20
> is as follows:
>=20
> 1. Since we need to get a version 1.0 specification finalised so that=20
> implementation deadlines are met, we feel that full support=20
> for external=20
> annotation files should be deferred to version 1.1. This will=20
> allow for=20
> more complete discussion of the requirements, in particular=20
> in light of=20
> any considerations from the FuGE crowd. In the meantime a minor note=20
> has been added to section 3.1.1 regarding suggested use of a=20
> Comment[] tag=20
> to support these in the meantime.
>=20
> 2. Regarding adding a mandatory .idf extension for the IDF=20
> file, I think=20
> this is not necessary, since it's pretty easy to design a submission=20
> system which would track the IDF without an extension (e.g.,=20
> the current=20
> Tab2MAGE submissions system already does this, in effect).=20
> Additionally, a=20
> new file extension would have to be mapped to Excel or=20
> OpenOffice by the=20
> end user for it to be any use to them (I believe this is true of both=20
> Windows and Mac). This may not be a huge deal, but it's=20
> another barrier to=20
> the casual user. Such mapping does not always guarantee that=20
> a document=20
> opens in the desired application either (OpenOffice, I'm looking at=20
> you...).
>=20
> 3. Use of quotes in IDF and SDRF; while maybe they could be=20
> dispensed with=20
> in SDRF, they will be needed in IDF to allow e.g. newlines in=20
> protocol=20
> text (and believe me, users will want this!). The original Tab2MAGE=20
> implementation didn't allow fields to be quoted like this to preserve=20
> special characters (i.e., newlines and tabs), and it was awful. An=20
> additional advantage to using quotes is that "text" date=20
> fields such as=20
> "2007-01-31" will be preserved by spreadsheet software, while=20
> 2007-01-31=20
> is often corrupted by such "helpful" applications. I have added a new=20
> section (3.1.6) which briefly discusses use of quotes to escape data=20
> fields. On the bright side, this format tends to be the default=20
> for "tab-delimited" export from Excel and OpenOffice in any case.
>=20
> 4. We agreed that the authority:namespace component of the=20
> resulting MAGE=20
> identifiers would be left to the implementation of any=20
> parser; while this=20
> precludes sharing e.g. Sources defined in MAGE-TAB documents,=20
> this is a=20
> relatively minor use case. Again, this could be revisited for a 1.1=20
> specification.
>=20
> 5. Regarding Junmin's comment about array design accessions,=20
> it is true=20
> that submissions to us will be using ArrayExpress accessions=20
> exclusively=20
> (at least for the forseeable future) but this is not=20
> necessarily true of=20
> other users, e.g. those downloading and distributing MAGE-TAB=20
> documents=20
> from us.
>=20
> 6. As discussed, the specification does indeed allow SDRFs to=20
> be split (on=20
> any "Name" column) into as many sub-SDRF documents as necessary.
>=20
> I've made the other modifications suggested to the list and=20
> put up a new=20
> specification document (Jan 31) here:
>=20
> http://www.ebi.ac.uk/systems-srv/mp/file-exchange/MAGE-TABv1.0.tar.gz
>=20
> Unless there are serious arguments to the contrary, I=20
> personally will be=20
> treating this as a finalised version 1.0 specification.=20
> Thanks very much=20
> for all your comments,
>=20
> Tim
>=20
>=20
> --=20
> Tim Rayner, Ph.D.
> Scientific Database Curator
> Microarray Informatics Team
> European Bioinformatics Institute
>=20
>=20
> --------------------------------------------------------------
> -----------
> Take Surveys. Earn Cash. Influence the Future of IT
> Join SourceForge.net's Techsay panel and you'll get the=20
> chance to share your
> opinions on IT & business topics through brief surveys - and earn cash
> http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge
&CID=3DDEVDEV
_______________________________________________
Mged-MAGE2 mailing list
Mge...@li...
https://lists.sourceforge.net/lists/listinfo/mged-mage2