|
From: <Mar...@gs...> - 2004-11-23 15:12:51
|
VERY LONG AND DETAILED MESSAGE. This color is MB. This color is MM.
I would like to introduce a new developer to the group, Mark Mullins (MM)
of SSI, who is working on an AnIML implementation. Mark has discovered
some difficulties with fitting his simple UV chromatography data into
AnIML, as have I (MB) for report LCMS data.
Here is my understanding of the AnIML to SSI SaMPL concordance (MM has
SaMPL xml in grey below)
ExperimentStep and DataSource are equivalent - the application of a
technique or data related to a single detector source
PageSet and DataSet are containers only
Page and RawData are equivalent to a chromatogram or spectrum (2 or more
dimensional data)
Axis and Vector are equivalent (AnIML has a Vector container called
VectorSet; SaMPL allows an Axis to have multiple child Axis tags as below
which is a reasonable way to represent independent and dependent
variables. Though less explicit than AnIML, it is more flexible.)
EncodedDataSet and EncodedDataSet are identical
Case 1:
Scenario: Data acquisition for 10 minutes with 1 second intervals. You
have 30 seconds of missing data points, 3 minutes into the run.
In SaMPL, we represented this with 2 sets of X-Axis vectors as follows:
This is equivalent to multiple Time Vectors and multiple Intensity Vectors
in AnIML.
<Dataset datasetID="1" resultType="chromatogram">
<RawData>
<Axis axisID="1" name="Time" datapoints="180" datatype="float32"
units="sec">
<AutoIncrementDataSet startValue="0"
increment="1.00"></AutoIncrementDataSet>
<Axis axisID="2" name="Intensity" datapoints="180"
datatype="float32" units="millivolts">
<EncodedDataSet>89213074132847847213473...</EncodedDataSet>
</Axis>
</Axis>
<Axis axisID="3" name="Time" datapoints="390" datatype="float32"
units="sec">
<AutoIncrementDataSet startValue="210.0"
increment="1.00"></AutoIncrementDataSet>
<Axis axisID="4" name="Intensity" datapoints="390"
datatype="float32" units="millivolts">
<EncodedDataSet>89213074132847847213473...</EncodedDataSet>
</Axis>
</Axis>
</RawData>
</Dataset>
The problem I am having with AnIML is at the <VectorSet> node. This is
where you specify the 'length' attribute to describe how many datapoints
your vectors will contain. In this example, the vectors each need to
contain a different number of datapoints.
I've verified this and agree that there is a Length atribute in VectorSet.
However, there is a misunderstanding about what it means. I know this
because I have Dominic Poetz's documentation printout (but cannot email it
as I do not have it electronically - Gary Kramer may have it).
VectorSet.Length actually represents the Vector count, not the count for
the number of points in each contained vector. Thus it is ok to have
multiple Vectors with different lengths, just as in SaMPL. Nevertheless,
my inclination is to accomodate segmentation of data like this by allowing
multiple VectorSets. I think all Vectors in a VectorSet should have the
same length
Case 2:
Scenario: 3D Data acquisition for 10 minutes with 1 second intervals and
255 wavelengths.
In SaMPL, we represented this with 1 set of vectors as follows:
<Dataset datasetID="1" resultType="chromatogram">
<RawData>
<Axis axisID="1" name="Time" datapoints="600" datatype="float32"
units="sec">
<AutoIncrementDataSet startValue="0"
increment="1.00"></AutoIncrementDataSet>
<Axis axisID="2" name="Wavelength" datapoints="255" datatype="int"
units="nm">
<EncodedDataSet>89213074132847847213473...</EncodedDataSet>
<Axis axisID="3" name="Absorbance" datapoints="255"
datatype="float32" units="absorb units">
<EncodedDataSet>89213074132847847213473...</EncodedDataSet>
<EncodedDataSet>89213074132847847213473...</EncodedDataSet>
<!-- There would be 598 more EncodedDataSets here -->
</Axis>
</Axis>
</Axis>
</RawData>
</Dataset>
The problem I am having is the same as in Case 1 above, since each of the
contained Vectors will have a different length.
I believe the intention in AnIML would be to create a chromatogram Page
with a subordinate PageSet of UV spectra, each spectrum referencing its
superordinate chromatogram Page point.
Case 3:
Scenario: LCMS Data with standard chromatogram but summed and
background-subtracted spectral reports (sum and subtract start and stop
points needed as timeline)
Again we have discontinuous timelines as in Case 1, but here each spectrum
is a separate Page and should be displayed on separate graphs as opposed
to Case 1 where the segments should be displayed on the same graph. We
can have a subordinate PageSet of MS spectra but what do we put in the
superordinate (top-level) Page? What I want is a series of (0-many)
summation (start and stop) and (0-many) subtraction selections describing
each spectrum. The closest thing in AnIML is Individual Value Set. I
would need one set for summation starts, another for summation stops, and
a third and fourth for subtraction start and stop. However, as there can
be 0 - many summations or subtractions per spectrum, I believe we begin to
run into problems indentical to Case 1 - a segmented timeline representing
the collection of start-stop segments used in creating each spectrum.
|