Is this correct?
1. It was agreed at the NCI workshop to go for the 'vertical format',
for the reason that it is easier to view and scroll vertically than
Which format is easier to scroll is, perhaps, a matter of taste. What
is *not* open to debate is that the horizontal format will fail to
catch a major class of invalid entries because of it's inability to do
type checking. The choice of a horizontal format guarantees that the
chief priority of MAGE-TAB -- as a way for researchers to enter and
supply information -- will be badly compromised.
I thought the horizontal format *will* do error and type checking. What
is meant by 'horizontal' format? Column headers along the top--right?
Don Maier wrote:
> Hi Alvis,
> I fear that some of your points glide past some critical
> considerations (see below).
> But I do appreciate your willingness to keep this issue open!
> Don M.
> On May 20, 2006, at 12:39 AM, Alvis Brazma wrote:
>> 1. It was agreed at the NCI workshop to go for the 'vertical
>> format', for the reason that it is easier to view and scroll
>> vertically than horizontally;
> Which format is easier to scroll is, perhaps, a matter of taste.
> What is *not* open to debate is that the horizontal format will fail
> to catch a major class of invalid entries because of it's inability
> to do type checking. The choice of a horizontal format guarantees
> that the chief priority of MAGE-TAB -- as a way for researchers to
> enter and supply information -- will be badly compromised.
>> 2. It is a trivial operation to transpose a matrix, so if anybody
>> wants it horizontally for his/her, use they can do it and then
>> transpose back;
> Of course any one of us can *program* a matrix transpose. But we
> ignore the prime intent of MAGE-TAB if we expect researchers trying
> to enter data to attempt this. And while transpose is an operation
> in Excel, it does not change the columns' role in defining data
> types. Nor does this change any spreadsheet user's expectations
> that the column headers will be at the top!
>> 3. We did agree to revisit and finalize some of the details in the
>> next jamboree in Hinxton in July;
>> 4. IDF is the very simplest of the MAGE-TAB formats, therefore I
>> think it is overkill to spend too much of the idscussion time on
>> this particular issue.
> Unfortunately, small design errors can (and often do) undermine
> entire projects. The fact that IDF holds fewer data than ADF or EDF
> does not make it any less critical in a coherent design.
>> Personally I do not mind which format we use, but for now it's
>> 'vertical' as agreed, and if there are compelling reason to
>> transpose it, let's do it in Hinxton
>> On Fri, 19 May 2006, Catherine Ball wrote:
>>> When I mocked up the initial stab at the IDF, it was done
>>> horizontally for no particular reason -- maybe because it fit
>>> better on my laptop monitor during the single afternoon I was
>>> working on it. Unless there are compelling reasons to stick with
>>> the horizontal arrangement, I see no reason not to change it to be
>>> easier to use and in the same orientation as the other files (ADF,
>>> On May 19, 2006, at 2:41 PM, Don Maier wrote:
>>>> Hi Alvis,
>>>> In light of my previous problems getting my thoughts out, please
>>>> let me resubmit my comments on the Investigation Design Format
>>>> (IDF) in response to your note:
>>>> I'd like to resurrect the discussion of the so-called "vertical"
>>>> versus "horizontal" format for the IDF. I'm doing this
>>>> reluctantly -- only because this issue really is too important to
>>>> leave without making sure that we understand the implications of
>>>> adopting a "vertical" format.
>>>> By "vertical" we mean that the column headers are arranged
>>>> vertically -- on the left -- as opposed to the "horizontal" format
>>>> in which the column headers are on the top. I think that some of
>>>> us may have thought that this is a matter of taste in data display
>>>> -- especially for IDF information which has a lot of headers and
>>>> few (one, two or three) rows.
>>>> Unfortunately, this choice is *not* just a matter of taste, because:
>>>> 1) The vertical format does not allow a spreadsheet program to
>>>> impose or check data types. Data type checking is one of the
>>>> chief attributes that makes spreadsheets easy to use -- because it
>>>> facilitates easy correction of a significant class of user data
>>>> entry errors. Consider, for example, the "Public Release Date",
>>>> which we want to be a date, not an arbitrary string; or the PubMed
>>>> ID, which should be a number, or Author List, which should be a
>>>> semi-colon-separated list. At least half (maybe more) of the
>>>> entries have a format that can Excel (or most other spreadsheet
>>>> programs) can validate at the point of entry -- provided that the
>>>> validation criteria are specified for a column.
>>>> In short, with the "vertical" format where no validation is
>>>> possible, the chances of getting correctly entered data are likely
>>>> near zero.
>>>> 2) (Corollary of 1) The vertical format requires that every
>>>> piece of information be of the same data type -- or actually
>>>> (worse) untyped. Even if all the data are strings now, you can't
>>>> later add a number, or an enumeration (fixed set of values). It
>>>> would be really confusing for users to make one data type
>>>> masquerade as another.
>>>> 3) The vertical format differs from the horizontal format used
>>>> for all the other (ADF, etc.) spreadsheets. This non-uniformity
>>>> is guaranteed to confuse users.
>>>> 4) The vertical format precludes automatic generation of
>>>> mapping code. Automatic code generators rely on the semantics of a
>>>> column being uniform -- because that's how spreadsheets (should)
>>>> work. Sure, you or I can write a parser and translator
>>>> manually. But it is highly likely that it won't be as thoroughly
>>>> tested or bug-free as the automatically generated translator. Nor
>>>> will it be as maintainable -- because a change will require
>>>> another, bug-prone manual effort.
>>>> In short, the vertical IDF format is a *form* of some kind. But
>>>> it's *not* a spreadsheet in any meaningful sense. And it does
>>>> not have the ease of use that we are trying to achieve in the
>>>> MAGE-TAB format.
>>>> I'd urge that we not make this serious design error.
>>>> I would suggest that we break the currently proposed format (which
>>>> is neither a table nor a spreadsheet) into two or three
>>>> spreadsheets within a single .xls file. I think that this would
>>>> very nicely accommodate the desire to collect together all this
>>>> information together as "header" information in one "workbook",
>>>> while still segregating the different types of information (into
>>>> different "worksheets") that permit them to be effectively
>>>> represented as spreadsheets.
>>>> Don Maier
>>>> Sr. Software Designer/Research
>>>> Dept. of Biochemistry
>>>> Stanford University School of Medicine
>>>> On Apr 27, 2006, at 4:55 AM, Alvis Brazma wrote:
>>>>> Dear All,
>>>>> As some of you know, jointly with my colleagues including some of
>>>>> you, I've been working on a draft paper about MAGE-TAB. The
>>>>> latest draft is in the attachment for your comments. The current
>>>>> list of authors include include some of those who have
>>>>> contributed to the proposal substantially, either in the two MAGE
>>>>> workshops or otherwise. I may have missed somebody, and who
>>>>> knows, may be some may not want to be authors. The authors list
>>>>> is open and will be finalised after the May workshop at NCI, if
>>>>> we decide to go ahead and submit this paper.
>>>>> The MAGE-TAB documentation has been recently updated and is
>>>>> available from http://www.mged.org/Workgroups/MAGE/
>>>>> All comments are most welcome either before or during the NCI
>>>>> - Alvis
> Don Maier
> Sr. Software Designer/Research
> Dept. of Biochemistry
> Stanford University School of Medicine