|
From: Burkhard S. <b_...@us...> - 2004-12-16 02:48:20
|
Hi everybody,
I looked again at the question of the various "length" attributes in and
around the Vector element. Let's look at the various elements and what
the length attribute would mean there.
VectorSet
--------
The length attribute in the VectorSet element describes the total number
of data points in the diagram. The values (components) that make up a
data point can be retrieved by looking at the same index in all vectors.
Here's a little drawing (please forgive my poor ASCII art ;-) )
Let's say we have a UV/VIS with two vectors: Wavelength and Absorbtion.
We want to store 100 data points, so VectorSet.length is 100.
+----+
Wavelength [ w1 w2 | w3 | w4 w5 ...... w100 ]
Absorbtion [ a1 a2 | a3 | a4 a5 ...... a100 ]
+----+
3rd data point: (w3, a3)
This is pretty straightforward. Each Vector contains a single ValueSet
(no matter if Indidual/Encoded/AutoIncremented) with a startOffset of 0
and an endOffset of 99.
Now what happens if we have holes in the data? So let's assume we don't
have an absorbtion reading for w3 and w4. In our example we only have a
single dependant vector (absorbtion). So we would just leave out the
wavelength values w3 and w4 and we'd be set:
Wavelength [ w1 w2 w5 w6 ...... w100 ]
Absorbtion [ a1 a2 a5 w6 ...... a100 ]
In this case, VectorSet.length would only be 98.
But let's assume we have multiple dependant vectors. I can't think of a
good second dependant vector for UV/VIS, so let's call it Vector3. In
this case we can't leave out w3 and w4 because we might have a reading
vor Vector3 there. We could declare that like this:
Wavelength [ w1 w2 w3 w4 w5 w6 ...... w100 ]
Absorbtion [ a1 a2 ] [ a5 w6 ...... a100 ] <-- two valuesets here
Vector3 [ v1 v2 v3 v4 v5 v6 ...... v100 ]
Again, we have 100 data points. We don't have a value for absorbtion at
a3 and a4, but that is perfectly legal and valid. Absorbtion would use
two valuesets:
- startOffset 0 - endOffset 1 and
- startOffset 4 - endOffset 99
All this can be stored without having a Vector.length attribute. In
fact, what good would it do to explicitly store that Vector3 only has 98
values? If we actually need that number, we can easily calculate it
using the function ( sum(endOffsets) - sum(startOffsets) ). Adding the
Vector.length attribute would not increase the expressive power and
would add another point where a file could become inconsistent, making
validation more difficult.
This same argument exmplains why a length attribute in the *ValueSet
elements would not be beneficial. Here, the number of values is even
easier to calculate (endOffset-startOffset).
Consequently, I would suggest to keep the VectorSet.length attribute
defined as the number of data points.
I look forward to seeing you all again ("virtually") tomorrow. :-)
Best wishes,
Burkhard
|
|
From: Mark F. B. <sa...@co...> - 2004-12-16 03:23:54
|
I agree with Burkhard. And by the way, the more I wrestle with AnIML in close detail for the documentation, the more impressed I am with Burkhard's brain! He solved a number of problems that I would not have known how to do. The recent changes are small items in comparison with the tremendous job done by Burkhard, Dominik, and Maren. But let's not stop there - I want to consider the restructuring proposal seriously tomorrow. Perhaps Burkhard missed the point here (probably because my emails have been confusing). The problem originated when Mark Mullins wanted to put segmented chromatogram vectors in AutoIncrementedValueSets - which just isn't going to work unless we allow more than one AutoIncrementedValueSet - which AnIML does. Having done that, the problem became knowing how long each Vector segment was - and so the changes started to snowball. I recommended that he simply encode the data as an EncodedDataSet and not worry about the space saving, and that we restrict AutoIncrementedValueSet and EncodedDataSet to 0 to 1 per Vector. If we do NOT do that, the VectorSet length becomes problematic as there is no guarantee that all ValueSets in a Vector are the same length. This illustrates my point that AnIML flexibility may need to be constrained more - there are too many solutions to problems right now. I hope you can join us tomorrow Burkhard, best wishes, Mark |
|
From: Burkhard S. <b_...@us...> - 2004-12-16 03:39:14
|
Mark, thanks for your quick reply. Yes, I actually did miss the point. :-) Thanks for helping me get my head around this. I agree that we have a lot of options for encoding something with AnIML. And I think that's exactly where the Technique Definitions come in. Using those we can pretty closely control how to encode something. This relieves end-users from the tedious modelling task. Had we had a Technique Definition for Mark Mullins' experiment, things would have been a lot easier. So this puts a lot of responsibility into the handy of the authors of Technique Definitions. But that's what we as a committee are here for. It also means that we should soon look into creating a document with "best practices for Technique Definition authors". I've looked over your proposal and am looking forward to hearing more about it tomorrow. It looks to me that it's adding even more degrees of freedom -- which is both good and bad. As I've posted to the list (not through yet), I've recently written a full parser for AnIML and (if there's enough time tomorrow) could share some experiences. Especially the recursion is rather straightforward to handle if you follow a certain pattern. Talk to you tommorow. I'm looking forward to my long distance bill... ;-) Best wishes, Burkhard Mark F. Bean wrote: > I agree with Burkhard. And by the way, the more I wrestle with AnIML in > close detail for the documentation, the more impressed I am with Burkhard's > brain! He solved a number of problems that I would not have known how to > do. The recent changes are small items in comparison with the tremendous > job done by Burkhard, Dominik, and Maren. But let's not stop there - I want > to consider the restructuring proposal seriously tomorrow. > > Perhaps Burkhard missed the point here (probably because my emails have been > confusing). The problem originated when Mark Mullins wanted to put > segmented chromatogram vectors in AutoIncrementedValueSets - which just > isn't going to work unless we allow more than one AutoIncrementedValueSet - > which AnIML does. Having done that, the problem became knowing how long > each Vector segment was - and so the changes started to snowball. I > recommended that he simply encode the data as an EncodedDataSet and not > worry about the space saving, and that we restrict AutoIncrementedValueSet > and EncodedDataSet to 0 to 1 per Vector. If we do NOT do that, the > VectorSet length becomes problematic as there is no guarantee that all > ValueSets in a Vector are the same length. > > This illustrates my point that AnIML flexibility may need to be constrained > more - there are too many solutions to problems right now. > > I hope you can join us tomorrow Burkhard, > > best wishes, Mark > > |
|
From: <Mar...@wa...> - 2004-12-17 09:49:41
|
Hi Burkhard, > Had we had a > Technique Definition for Mark Mullins' experiment, things would have > been a lot easier. We do have a (documented!) technique definition for chromatography, so this is not the problem. Mark Mullins' experiment would probably need a *technique extension*. > It also means that we should soon look into creating a document with > "best practices for Technique Definition authors". Good idea. If you provide a first draft, I can help you review and finish that. Mit freundlichen Gr=FC=DFen / Best regards Dr. Maren Fiege Product Manager -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99 Reply to: mar...@wa... http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D The information in this email is confidential, and is intended solely for = the addressee(s). Access to this email by anyone else is unauthorized and = therefore prohibited. If you are not the intended recipient you are = notified that disclosing, copying, distributing or taking any action in = reliance on the contents of this information is strictly prohibited and may= = be unlawful. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |
|
From: Burkhard S. <b_...@us...> - 2004-12-16 03:47:25
|
Oops, one thing I have overlooked: > Perhaps Burkhard missed the point here (probably because my emails have been > confusing). The problem originated when Mark Mullins wanted to put > segmented chromatogram vectors in AutoIncrementedValueSets - which just > isn't going to work unless we allow more than one AutoIncrementedValueSet - > which AnIML does. Having done that, the problem became knowing how long > each Vector segment was - and so the changes started to snowball. I > recommended that he simply encode the data as an EncodedDataSet and not > worry about the space saving, Makes sense. > and that we restrict AutoIncrementedValueSet > and EncodedDataSet to 0 to 1 per Vector. If we do NOT do that, the > VectorSet length becomes problematic as there is no guarantee that all > ValueSets in a Vector are the same length. I think my example illustrates that not all ValueSets need to share the same length. We are currently allowing an unlimited number of ValueSets per Vector to permit storage of data with "holes" / sparse data. 4:47 am... ;-) Burkhard |
|
From: <Mar...@wa...> - 2004-12-16 14:33:20
|
Hi Burkhard, thanks for the clarification. This should have been in your format documentation. Mit freundlichen Gr=FC=DFen / Best regards Dr. Maren Fiege Product Manager -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99 Reply to: mar...@wa... http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- =20 "Burkhard =20 Schaefer" =20 <b_...@us... To urceforge.net> "AnIML Developer List" =20 <ani...@li... 16.12.2004 03:48 t> =20 cc "Mark F. Bean" =20 <mar...@gs...>, "Mark =20 Mullins" <mar...@sc...>, "Maren Fiege" =20 <Mar...@wa...>, "Mark F. Bean" <sa...@co...> =20 Subject Vector Length Issues =20 =20 =20 =20 =20 =20 =20 Hi everybody, I looked again at the question of the various "length" attributes in and around the Vector element. Let's look at the various elements and what the length attribute would mean there. VectorSet -------- The length attribute in the VectorSet element describes the total number of data points in the diagram. The values (components) that make up a data point can be retrieved by looking at the same index in all vectors. Here's a little drawing (please forgive my poor ASCII art ;-) ) Let's say we have a UV/VIS with two vectors: Wavelength and Absorbtion. We want to store 100 data points, so VectorSet.length is 100. +----+ Wavelength [ w1 w2 | w3 | w4 w5 ...... w100 ] Absorbtion [ a1 a2 | a3 | a4 a5 ...... a100 ] +----+ 3rd data point: (w3, a3) This is pretty straightforward. Each Vector contains a single ValueSet (no matter if Indidual/Encoded/AutoIncremented) with a startOffset of 0 and an endOffset of 99. Now what happens if we have holes in the data=3F So let's assume we don't have an absorbtion reading for w3 and w4. In our example we only have a single dependant vector (absorbtion). So we would just leave out the wavelength values w3 and w4 and we'd be set: Wavelength [ w1 w2 w5 w6 ...... w100 ] Absorbtion [ a1 a2 a5 w6 ...... a100 ] In this case, VectorSet.length would only be 98. But let's assume we have multiple dependant vectors. I can't think of a good second dependant vector for UV/VIS, so let's call it Vector3. In this case we can't leave out w3 and w4 because we might have a reading vor Vector3 there. We could declare that like this: Wavelength [ w1 w2 w3 w4 w5 w6 ...... w100 ] Absorbtion [ a1 a2 ] [ a5 w6 ...... a100 ] <-- two valuesets here Vector3 [ v1 v2 v3 v4 v5 v6 ...... v100 ] Again, we have 100 data points. We don't have a value for absorbtion at a3 and a4, but that is perfectly legal and valid. Absorbtion would use two valuesets: - startOffset 0 - endOffset 1 and - startOffset 4 - endOffset 99 All this can be stored without having a Vector.length attribute. In =66act, what good would it do to explicitly store that Vector3 only has 98 values=3F If we actually need that number, we can easily calculate it using the function ( sum(endOffsets) - sum(startOffsets) ). Adding the Vector.length attribute would not increase the expressive power and would add another point where a file could become inconsistent, making validation more difficult. This same argument exmplains why a length attribute in the *ValueSet elements would not be beneficial. Here, the number of values is even easier to calculate (endOffset-startOffset). Consequently, I would suggest to keep the VectorSet.length attribute defined as the number of data points. I look forward to seeing you all again ("virtually") tomorrow. :-) Best wishes, Burkhard =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D The information in this email is confidential, and is intended solely for = the addressee(s). Access to this email by anyone else is unauthorized and = therefore prohibited. If you are not the intended recipient you are = notified that disclosing, copying, distributing or taking any action in = reliance on the contents of this information is strictly prohibited and may= = be unlawful. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |