You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(1) |
Feb
|
Mar
(9) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(4) |
Sep
(2) |
Oct
|
Nov
(9) |
Dec
(29) |
| 2005 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(2) |
May
(1) |
Jun
(2) |
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(1) |
Dec
(1) |
| 2006 |
Jan
|
Feb
(1) |
Mar
(2) |
Apr
(22) |
May
(7) |
Jun
(3) |
Jul
|
Aug
(3) |
Sep
(5) |
Oct
(1) |
Nov
(2) |
Dec
(4) |
| 2007 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Mark F. B. <sa...@co...> - 2004-12-16 03:23:54
|
I agree with Burkhard. And by the way, the more I wrestle with AnIML in close detail for the documentation, the more impressed I am with Burkhard's brain! He solved a number of problems that I would not have known how to do. The recent changes are small items in comparison with the tremendous job done by Burkhard, Dominik, and Maren. But let's not stop there - I want to consider the restructuring proposal seriously tomorrow. Perhaps Burkhard missed the point here (probably because my emails have been confusing). The problem originated when Mark Mullins wanted to put segmented chromatogram vectors in AutoIncrementedValueSets - which just isn't going to work unless we allow more than one AutoIncrementedValueSet - which AnIML does. Having done that, the problem became knowing how long each Vector segment was - and so the changes started to snowball. I recommended that he simply encode the data as an EncodedDataSet and not worry about the space saving, and that we restrict AutoIncrementedValueSet and EncodedDataSet to 0 to 1 per Vector. If we do NOT do that, the VectorSet length becomes problematic as there is no guarantee that all ValueSets in a Vector are the same length. This illustrates my point that AnIML flexibility may need to be constrained more - there are too many solutions to problems right now. I hope you can join us tomorrow Burkhard, best wishes, Mark |
|
From: Burkhard S. <b_...@us...> - 2004-12-16 02:56:42
|
Hi everybody, I was wondering about whether we should keep the animl-develop list moderated. Please don't get me wrong, I think Randy is doing an excellent job (thanks!). The thing is that I just saw hints in the SourceForge documentation that it is possible to set up a list in such a way that only subscribers can post and that only posts not coming from a member need to be reviewed by the moderator. This would reduce the burden for Randy and make sure that member emails are delivered immediately. What are your thoughts on this? Best wishes, Burkhard |
|
From: Burkhard S. <b_...@us...> - 2004-12-16 02:50:35
|
Regarding Mark B.'s question about Base64 array lengths: I think most algorithms work the same way as the one you've written. You can infer the number of values by looking at the length of the base64 string. (if you know whether it contains float64 or float32 values) This way you can find out how much memory you need to allocate before actually decoding the base64 string. Best wishes, Burkhard |
|
From: Burkhard S. <b_...@us...> - 2004-12-16 02:48:20
|
Hi everybody,
I looked again at the question of the various "length" attributes in and
around the Vector element. Let's look at the various elements and what
the length attribute would mean there.
VectorSet
--------
The length attribute in the VectorSet element describes the total number
of data points in the diagram. The values (components) that make up a
data point can be retrieved by looking at the same index in all vectors.
Here's a little drawing (please forgive my poor ASCII art ;-) )
Let's say we have a UV/VIS with two vectors: Wavelength and Absorbtion.
We want to store 100 data points, so VectorSet.length is 100.
+----+
Wavelength [ w1 w2 | w3 | w4 w5 ...... w100 ]
Absorbtion [ a1 a2 | a3 | a4 a5 ...... a100 ]
+----+
3rd data point: (w3, a3)
This is pretty straightforward. Each Vector contains a single ValueSet
(no matter if Indidual/Encoded/AutoIncremented) with a startOffset of 0
and an endOffset of 99.
Now what happens if we have holes in the data? So let's assume we don't
have an absorbtion reading for w3 and w4. In our example we only have a
single dependant vector (absorbtion). So we would just leave out the
wavelength values w3 and w4 and we'd be set:
Wavelength [ w1 w2 w5 w6 ...... w100 ]
Absorbtion [ a1 a2 a5 w6 ...... a100 ]
In this case, VectorSet.length would only be 98.
But let's assume we have multiple dependant vectors. I can't think of a
good second dependant vector for UV/VIS, so let's call it Vector3. In
this case we can't leave out w3 and w4 because we might have a reading
vor Vector3 there. We could declare that like this:
Wavelength [ w1 w2 w3 w4 w5 w6 ...... w100 ]
Absorbtion [ a1 a2 ] [ a5 w6 ...... a100 ] <-- two valuesets here
Vector3 [ v1 v2 v3 v4 v5 v6 ...... v100 ]
Again, we have 100 data points. We don't have a value for absorbtion at
a3 and a4, but that is perfectly legal and valid. Absorbtion would use
two valuesets:
- startOffset 0 - endOffset 1 and
- startOffset 4 - endOffset 99
All this can be stored without having a Vector.length attribute. In
fact, what good would it do to explicitly store that Vector3 only has 98
values? If we actually need that number, we can easily calculate it
using the function ( sum(endOffsets) - sum(startOffsets) ). Adding the
Vector.length attribute would not increase the expressive power and
would add another point where a file could become inconsistent, making
validation more difficult.
This same argument exmplains why a length attribute in the *ValueSet
elements would not be beneficial. Here, the number of values is even
easier to calculate (endOffset-startOffset).
Consequently, I would suggest to keep the VectorSet.length attribute
defined as the number of data points.
I look forward to seeing you all again ("virtually") tomorrow. :-)
Best wishes,
Burkhard
|
|
From: Burkhard S. <b_...@us...> - 2004-12-15 22:13:49
|
Hi everybody, I know Gary already sent this out to some of you, but if you didn't receive it: Here's a link to a pre-print version of a paper that provides a summary of AnIML, along with its design goals and some usage scenarios. http://appserver.bubusoft.com/animl/JALA_Article.pdf I've received permission from the editor to share it with this group. Best regards, Burkhard |
|
From: Mark F. B. <sa...@co...> - 2004-12-14 16:01:12
|
I shouldn't have called it AnIML 2 but rather AnIML alt. Please excuse my forwardness in the naming. I had thought I had been updating the models on CVS, but as there is a several-hour lag between updates, I only recently found out I was not doing it correctly. Anyway, there are now updates in the Schema section and a Documentation update for Maren. The AnIML 2.0.xsd does not correspond to the updated model and should only be used to get the gist of my proposed alternate structure. The best place to view the model is in the graphic image AnIML2.gif, which should be viewed in competent imaging software (If you use Internet Explorer be sure to set Full Screen on). I will email my conference presentation to attendees tonight and put the files in CVS as well so others can see. Mark |
|
From: <Mar...@wa...> - 2004-12-13 10:06:27
|
Hi Mark,
> as Karen points out
Maren.
> I intend to make the following revisions to AnIML 1.03 to accomodate Mark
> Mullins' needs and call it AnIML 1.05; the rationale follows the changes.
Please make the changes to 1.04. Otherwise, we will lose track of which
version came from what.
> (1) Moved the length attribute from VectorSet to EncodedValueSet -
addresses
> Mark's point 2
We need the length attribute for AutoIncrementedValueSet as well (the
Increment is added (length) times to the StartValue).
> (3) Added a Operation type with values {add, subtract} and used that type
as
> a new attribute of Reference called "operation" - addresses a prior point
of
> my own
I couldn't find the rationale for this one. What is this intended for=3F
> (2) --------------------- AutoIncrementedValueSet and EncodedValueSet now
> bounded 0 to 1
> To resolve the above difficulties, I changed AutoIncrementedValueSet and
> EncodedValueSet from unbounded to 0 to 1 but left InvidualValueSets
> unbounded. This would force us to adopt a different, clearer, but less
> compact approach to segmented chromatograms - using EncodedValueSets for
> both the Time and the Intensity Vectors.
I don't get how this resolves the problem of orderliness (or lack thereof).
On the other hand, I would expect it to be very easy for a software to
validate whether a file has matching numbers of vectors in a Page, so I
don't think this is a necessary change.
Maren.
Mit freundlichen Gr=FC=DFen / Best regards
Dr. Maren Fiege
Product Manager
--------------------------------------------------------------
Waters Informatics
Europaallee 27, D-50226 Frechen, Germany
Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99
Reply to: mar...@wa...
http://www.creonlabcontrol.com
http://www.watersinformatics.net
--------------------------------------------------------------
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
The information in this email is confidential, and is intended solely for =
the addressee(s). Access to this email by anyone else is unauthorized and =
therefore prohibited. If you are not the intended recipient you are =
notified that disclosing, copying, distributing or taking any action in =
reliance on the contents of this information is strictly prohibited and may=
=
be unlawful.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
|
|
From: Mark F. B. <sa...@co...> - 2004-12-13 02:15:15
|
Hi again
I have uploaded to CVS Mark Mullins' 1.04 schema and his xml sample (which,
as Karen points out, did not implement the Chromatography technique so is
not yet valid AnIML; still, it is useful to illustrate his intent). I have
given enough thought to his proposed changes to suggest we find an alternate
solution to his points 1 and 2.
I intend to make the following revisions to AnIML 1.03 to accomodate Mark
Mullins' needs and call it AnIML 1.05; the rationale follows the changes.
(1) Moved the length attribute from VectorSet to EncodedValueSet - addresses
Mark's point 2
(2) Changed AutoIncrementedValueSet and EncodedValueSet from unbounded to 0
to 1
(3) Added a Operation type with values {add, subtract} and used that type as
a new attribute of Reference called "operation" - addresses a prior point of
my own
There is much more, but I will make it separate emails...
regards, Mark Bean
(1) --------------------- VectorSet.length attribute (Mark Mullins' point 2)
Apparently there is chromatography data in existence that is discontinuous
in time - for example 0-5 mins and 10-25 mins. This was the source of some
of Mark Mullins'(also of SSI) concerns motivating him to make his changes.
I am assured by the president of SSI (makers of EZChrome) that this is a
rare but real situation, perhaps from a single vendor.
AnIML was originally written to permit multiple data containers
ExperimentSteps, Pages, Vectors, ValueSets. Circumscribing these collections
are the non-data containers ExperimentStepSet ("MeasurementData"), PageSet,
VectorSet -- but not a ValueSetSet. Only one of the non-data collections
has a "length" attribute - VectorSet. There is no indication what this
means in the schema itself, but according to the Dominik Poetz
documentation, this is not the number of Vectors but rather "how many
elements a vector is supposed to have", which clearly assumes that all
vectors in the VectorSet will have the same length.
Before I discuss Mark Mullins' proposal, I should elaborate on why I think
VectorSet, of all the four possible non-data collections, is the only one to
only have a length attribute. Length (or Count) is often useful for
programmers in that it permits dimensioning of arrays prior to reading the
data into them (required in many languages). As Length is not an attribute
of all the collections, Burkhard must have thought that one could obtain the
length for any collection simply by parsing it. What makes Vector different
is the fact that the number of items in an EncodedValueSet base64Binary
array cannot be obtained directly be a parser, so maybe Burkhard added a
length attribute to handle this. Because he assumed that all Vectors would
have the same length, he moved it up into VectorSet (ok, that may be a bit
confusing). Perhaps it should have been named "vectorLength" rather than
"length" to clarify which thing's length it describes.
Now in discontinuous data (described in the first paragraph onf this point
above), one cannot use a VectorSet length attribute describing the number of
elements in a Vector as the number of elements in the Vector ValueSets vary
(e.g. 0-5 and 10-25 mins). As one is allowed multiple ValueSets per Vector
in AnIML, the concept of VectorSet.length is broken.
Mullins proposed adding a length to Vector and AutoIncrementedValueSet, but
I would prefer more consistent usage of length in the collections, and three
options come to mind:
(a) Add length to every collection (set) in AnIML and thus also have to add
a ValueSetSet (collection of ValueSets)
(b) Omit length from VectorSet (and thus from collections)
(c) Move length from VectorSet to EncodedValueSet under the assumption
Burkhard's intent was to indicate that EncodedValueSet length is a special
case
The Mullins proposal propagates inclusion of a length attribute
inconsistently in collections and might be said to share an additional
weakness pervasive in AnIML - assumption of orderliness between parallel
elements. In one section of the "AnIML 1.04 lc mockup with errors.animl"
there are two AutoIncrementedValueSets and then two EncodedValueSets
representing the two discontinuous chromatogram segements. That is
currently legal, but so is a situation where the order of the
AutoIncrementedValueSets is not the same as the order of the
EncodedValueSets. This assumption of orderliness also exists between
Vectors in a Page and between Templates and ExperimentSteps among other
places, so it can only be considered further propagation of an existing
weakness.
(2) --------------------- AutoIncrementedValueSet and EncodedValueSet now
bounded 0 to 1
To resolve the above difficulties, I changed AutoIncrementedValueSet and
EncodedValueSet from unbounded to 0 to 1 but left InvidualValueSets
unbounded. This would force us to adopt a different, clearer, but less
compact approach to segmented chromatograms - using EncodedValueSets for
both the Time and the Intensity Vectors. As discontinuous data of this sort
is rare, the impact on file size may not be important. I prefer
constraining the ways we fill AnIML wherever there is a suitable approach
like this. It also resolves the next point.
(3) ------------------------- References, key\keyrefs (including Mark
Mullins' point 3)
Reference
---------
One or many References may exists in a Page as a reference to a data point
or data point range in a superordinate Page with attributes signableItem,
name, VectorID, index, and refWidth. A PDA UV-vis spectrum on one page can
refer to a particular index in an associated (derived) UV summed-absorbance
chromatogram. A mass spectrum on one page can refer to a particular index
and the refWidth number of points (a data range) in an associated (derived)
total-ion-current chromatogram where the width represents a summation of
spectra.
(4) ------------------------- ParameterCategorySet added under
MeasurementData (Mark Mullins' point 1)
This is a reasonable change; nevertheless, we need to explore Templates more
closely (next email) as their references are not well made.
.................................
.......from Mark's email........
1. Added a ParameterCategorySet node under the MeasurementData node.
This allows custom parameters to be added that describe the measurement data
itself.
2. Modification of "length" attribute information inside of the VectorSets.
a. Changed the definition of the "length" attribute on the VectorSet
node to describe the number of Vectors contained in the VectorSet.
b. Added "length" attribute to the Vector node. This will describe
the number of ValueSets in the Vector.
c. Added "length" attribute to the AutoIncrementedValueSet node. This
will describe the number of values in this AutoIncrementedValueSet.
These changes are necessary to specify the lengths of the individual items,
allowing for each of the individual items to contain any number of subitems.
This will give you the ability to have a VectorSet that contains multiple
vectors, each having a different length.
3. Added a References node under the IndividualValueSet, EncodedValueSet,
and AutoIncrementedValueSet nodes.
This gives the ability for an individual set of values to be related back to
Vector in a Superordinate page. Previously, only the entire VectorSet could
be related to a Superordinate Vector. To accomidate this change for the
EncodedValueSet node, the definition of this node had to be changed to
contain a sub-element node called "Values".
|
|
From: <Mar...@wa...> - 2004-12-10 10:10:15
|
Hi everybody, I found out that the current AnIML technique schema dies not validate either in XMLMSpy 2005. The following errors occur: - Line 505: "Attribute 'name' is not allowed for element 'xs:simpleType' because it is prohibited by complexType 'localSimpleType'." I fixed this by removing the name attribute. - Line 579: "Element Declaration 'ParameterCategoryBlueprint' repeatedly occurs in the same content model but does not refer to a top-level Type Definition." Fixed by removing the sequence and making the choice unbounded. - Line 305: "<xs:element ref=3D'ParameterBlueprint'> would make the content model non-deterministic." Fixed by removing the sequence and making the choice unbounded. The good thing is that this didn't break the existing technique definitions. I uploaded the corrected file as "animl-technique 1.01.xsd". Mit freundlichen Gr=FC=DFen / Best regards Dr. Maren Fiege Product Manager -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99 Reply to: mar...@wa... http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D The information in this email is confidential, and is intended solely for = the addressee(s). Access to this email by anyone else is unauthorized and = therefore prohibited. If you are not the intended recipient you are = notified that disclosing, copying, distributing or taking any action in = reliance on the contents of this information is strictly prohibited and may= = be unlawful. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |
|
From: <Mar...@wa...> - 2004-12-10 09:26:30
|
Hello Mark, > I noticed that you saved this same information - just down 1 level > under the ExperimentStep node. This is also where you placed the > Method information. This is fine and will work, however the problem > I have with this is just the overall size of the resulting AnIML > file. Any repeating information should be put into a Template and referenced. You will then only have to re-state items that changed. Mit freundlichen Gr=FC=DFen / Best regards Dr. Maren Fiege Product Manager -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99 Reply to: mar...@wa... http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D The information in this email is confidential, and is intended solely for = the addressee(s). Access to this email by anyone else is unauthorized and = therefore prohibited. If you are not the intended recipient you are = notified that disclosing, copying, distributing or taking any action in = reliance on the contents of this information is strictly prohibited and may= = be unlawful. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |
|
From: Mark F. B. <sa...@co...> - 2004-12-10 06:22:22
|
I know this is a bit of a side track, but I have roughed out all of ANIML by Burkhard in the form of XML Schema data tables and relations. I added various tables to permit Techniques to restrict the available types. I added a Selection table which replaces References and the need for hiearchical Pages. I added a many-to-many breaking table between Sample and Experiment as needed. The process of doing this is worth discussing too. (1) Create the visual diagram in Visio. Use Visio to validate this model. (2) Create a Microsoft Access (or other) database directly from Visio (3) In Visual Studio, connect to the database and drag the tables into the XML Schema designer to create the XML Schema automagically. The entire process is done graphically, creates diagram, database, and XML Schema as well as a strongly typed DataSet which is really the Schema plus a ton of auto-generated code allowing one to transport information as XML into and out of the schema, into and out of dataabases. It took less than 2 hours total. On the CVS under Schema is - the Microsoft Visio database diagram (sorry, you need the Enterprise version to load this) - a gif of the diagram for those who don't have the Visio (!!AT LEAST LOOK AT THIS!!) - a Microsoft Access AnIML database created directly from Visio - the AnIML 2.xsd schema which was created automatically within Microsoft Visual Studio.Net as a typed dataset. It is rough but complete. It demonstrates by example that AnIML can be expressed MUCH more simply and still get nearly all the same features plus some new ones that I considered worthwhile. There are 17 tables, 6 of which I added. In short, AnIML can be expressed succinctly in a handful of simple, relational tables. One of the possible advantages of this approach is that nearly all application developers understand database models while a much smaller group understand XML schema. Nevertheless, it is a completely valid Schema (as per XMLSpy and VisualStudio). Again please look at the AnIML2.gif in the CVS.Schema Mark |
|
From: Mark F. B. <sa...@co...> - 2004-12-08 04:19:28
|
I have popped a very prelimimary draft of my AnIML Schema Documentation on the Sourceforge CVS for AnIML. It should appear here when the servers decide to publish it: http://cvs.sourceforge.net/viewcvs.py/animl/documents/ and is called "AnIML Core Schema Documentation.doc" even though it should also include the Technique Schema. What I would like you to consider is whether this draft is headed in the right direction (it is a LOT of work). I took the opposite tack to Maren and decided that the term definitions were fundamental and that the hierarchical relationship between the elements was already described in the schema themselves, so I included some XMLSpy snapshots in an Appendix (nonmandatory). Even those are incomplete and I think clearly indicate how newcomers find AnIML overwhelming. As an illustration of what I have meant by a restructuring of AnIML: I have also sent up to the CVS an incomplete picture of another approach in which XML Schema perfectly mimick database tables and their relationships and constraints. I would attach it if I could, but go look here (AnIML 2.xsd or AnIML2.gif if you prefer): http://cvs.sourceforge.net/viewcvs.py/animl/schema/ It looks really different, doesn't it! a.. It is really clear where your data goes b.. There are no empty elements (at least not yet as I have not wrapped my head around digital signatures, sorry!) c.. There is provision from the start for incorporation of database keys d.. Relationships are all spelled out in the constraint section instead of by parsing endless Russian dolls (and even recursive!) hierarchies. e.. The data can slide into and out of databases amazingly easily. An AnIML database might look a lot like this. Some programmatic tools allow SQL queries of structures like this f.. It is strongly typed. g.. Extensions are possible by adding tables as long as they reference in and don't demand any references out from the core. I talked about this at our first ASTM meeting and am not sure all reference requirements are yet in place. I am sure other intriguing possibiities are out there. regards, Mark |
|
From: Mark M. <Mar...@sc...> - 2004-12-07 17:42:46
|
Hello Maren, I really appreciate your time you have taken to look through this. The = following are my comments: 1. In the ParameterCategorySet node under the MeasurementData node, we = would be placing general information about the sequence run that is = being performed. For example, the creator of the sequence, the date of = creation for the sequence, etc. =20 I noticed that you saved this same information - just down 1 level under = the ExperimentStep node. This is also where you placed the Method = information. This is fine and will work, however the problem I have = with this is just the overall size of the resulting AnIML file. For = example, our general sequence information, the method configuration, = instrument configuration, and injector configuration data could very = easily account for 200K - 300K worth of data. This data will be the = same for every line of the sequence. So if we have a 100 line sequence, = we could easily end up with an AnIML file that is an extra 20MB in size. = After we have generated our AnIML files, we need to store these in a = database. So, we would like to keep these as small as possible. 2. No comments. 3. I see what you have done with the storage of the vectors. Instead = of using sub-pagesets, you have used sub-experimentsteps. I like this = because you get the added advantage of additional ParameterCategorySets = and the ability to specify technique files. =20 My only comment here is for the last channel in the file (Channel_4). = Here you have an ExperimentStepSet with 2 ExperimentSteps = (Channel_4_Chrom and Channel_4_UV). I believe that the 2nd = ExperimentStep must be a sub-experimentstep of the Channel_4_Chrom) = ExperimentStep. This is because of the References. They must refer to = a Vector in a super-ordinate page (not a sibling page). Otherwise, you = will not be able to determine with Vector they are related to (as all of = the siblings are allowed to have Vectors with the same ID. Again, thank you so much for taking the time to look over this. Best regards, Mark Mullins =20 -----Original Message----- From: Mar...@wa... [mailto:Mar...@wa...]=20 Sent: Tuesday, December 07, 2004 8:06 AM To: Mark Mullins Cc: ani...@li...; = ani...@li... Subject: Re: [Animl-develop] Proposal for changes to AnIML core schema Hello Mark, > 1. Added a ParameterCategorySet node under the MeasurementData node. > This allows custom parameters to be added that describe the=20 > measurement data itself. This would probably be incompatible with the current technique = definitions. What kind of parameters would you put there? > 2. Modification of "length" attribute information inside of the VectorSets. These modifications are fine with me. It looks like a good way to = accommodate discontinuous data. > 3. Added a References node under the IndividualValueSet,=20 > EncodedValueSet, and AutoIncrementedValueSet nodes. OK. > I have also attached a sample XML file that illustrates the above=20 > heirarchy with some fictional data. Note that in the VectorSets I=20 > have been able to store the following types of multi-channel=20 > chromatographic data: The sample file you sent is not using AnIML as intended (although I = admit that getting the intention is hard with no documentation = available...). I cannot see any use of the technique definition either. Please find attached a revised version of your sample file. (See attached file: animl-core 1.04_MF.xml) I urgently ask the other readers to participate in this discussion! Mit freundlichen Gr=FC=DFen / Best regards Dr. Maren Fiege Product Manager -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 = 2234 9207-99 Reply to: mar...@wa... = http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D The information in this email is confidential, and is intended solely = for the addressee(s). Access to this email by anyone else is = unauthorized and therefore prohibited. If you are not the intended = recipient you are notified that disclosing, copying, distributing or = taking any action in reliance on the contents of this information is = strictly prohibited and may be unlawful. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |
|
From: <Mar...@wa...> - 2004-12-07 16:06:49
|
Hello Mark, > 1. Added a ParameterCategorySet node under the MeasurementData node. > This allows custom parameters to be added that describe the > measurement data itself. This would probably be incompatible with the current technique definitions. What kind of parameters would you put there=3F > 2. Modification of "length" attribute information inside of the VectorSets. These modifications are fine with me. It looks like a good way to accommodate discontinuous data. > 3. Added a References node under the IndividualValueSet, > EncodedValueSet, and AutoIncrementedValueSet nodes. OK. > I have also attached a sample XML file that illustrates the above > heirarchy with some fictional data. Note that in the VectorSets I > have been able to store the following types of multi-channel > chromatographic data: The sample file you sent is not using AnIML as intended (although I admit that getting the intention is hard with no documentation available...). I cannot see any use of the technique definition either. Please find attached a revised version of your sample file. (See attached file: animl-core 1.04_MF.xml) I urgently ask the other readers to participate in this discussion! Mit freundlichen Gr=FC=DFen / Best regards Dr. Maren Fiege Product Manager -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99 Reply to: mar...@wa... http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D The information in this email is confidential, and is intended solely for = the addressee(s). Access to this email by anyone else is unauthorized and = therefore prohibited. If you are not the intended recipient you are = notified that disclosing, copying, distributing or taking any action in = reliance on the contents of this information is strictly prohibited and may= = be unlawful. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |
|
From: Mark F. B. <sa...@co...> - 2004-12-07 14:22:43
|
We have made good progress on AnIML but I think it may be worth a short
pause for a major structural review before we push ahead to release AnIML
1.0. We have too little XML Schema expertise and especially too few who are
expert in analytical chemistry, XML Schema, and programming. It might be
worthwhile to ask Murray Rust for an opinion. Remember the XML instances
may never exist as documents, and only as byte streams across a network as
information moves from its container to a viewer. I would not enjoy writing
a fully-functional AnIML viewer based on our current schema.
Suggestions for Improvement of the AnIML Schema
1. Improve database table to schema translation, perhaps using simple
data table schema and relations instead of hierarchical encoding. Add
database keys (long or int) to every table as primary key. I can provide an
example alternate schema constructed this way.
2. AnIML is too complex. It is nearly impossible to grasp it without
XMLSpy, and even then it is frustratingly difficult. Consider trimming down
to what we know we need, permitting core extension.
a. Remove recursive nesting of PageSet, ExperimentStepSet, and
ParameterCategorySet until it is proven necessary.
b. Remove empty containers that confuse and complicate an already
complicated schema. These are simply containers of 1 to many containers
which themselves hold no information. However, as many of these have
SignableItems, another approach to signing might be needed. Needs thought.
This would trim the number of elements from 45 to 33, but some of these
appear multiple times or recursively, so the effect would be greater than it
appears.
i. AnIML
ii. SampleSet
container of Sample
iii.
ParameterCategorySet container of ParameterCategory
iv. ParameterSet
container of Parameter
v. Template (simply
add a flag, isTemplate to ExperimentStep) either wrong or confused in
schema as it derives by extension
vi. SamplesUsed
container of Sample
vii. References
container of IndexRef
viii. ExperimentStepSet
and possibly MeasurementData container of ExperimentStep
ix. PageSet container
of Page
x. VectorSet
container of Vector
xi. AuditTrail (or
LogEntry) container of LogEntry
xii. Signatures
container of Signature
c. Consider removing attributes and moving info within compleTypes
3. Make AnIML less flexible in the ways it can be filled or there will
be 10 ways to insert chromatographic data. Use more strong typing.
4. Make method of extension much clearer; providing examples in
Specification.
5. Consider reworking referencing for so-called hierarchical data,
making it easier to get the point index, the point value, as well as
pointRanges, pointValueRanges, etc for spectra from a chromatogram (for
example)
6. Add information on how to validate, certify and test AnIML
extensions.
7. Clarify rationale for using Technique xml files instead of
Technique schema extensions of the core it is currently not recorded
anywhere.
8. Trim JCAMP from Technique and create a JCAMP extension.
9. Add filename and file URL references if they are not already there
somewhere (I can't find them).
10. Values encoded in Base64 are not limited or specified there is
currently no way to know into what we should decode our Base64 (float32,
float64, etc.)
I suspect with help we could amplify this list,
regards, Mark
|
|
From: Mark F. B. <sa...@co...> - 2004-11-27 20:27:24
|
I have placed a PDF of the scanned pages of the Poetz-Kramer documentation on the AnIML CVS under Documentation. As it is scanned and required 400 dpi, it is also huge (26Mb or so). It may take a little while to appear. http://cvs.sourceforge.net/viewcvs.py/animl/documents/?only_with_tag=MAIN I also note that I am incorrect about VectorSet.Length. The documentation states "An important atrribute in the "VectorSet" is the "length" attribute that determines how many elements a vactor is supposed to have." My apologies, yet perhaps my rationale may still hold water for which length or count we espouse. |
|
From: Mark F. B. <sa...@co...> - 2004-11-27 16:56:13
|
Hi again (Mark Bean here from home) I think it is time for a phone conference, ASAP please. This note documents in detail changes Maren and I made (below). Please search for HELP if you are conversant with XML Schema. At this point we are lacking needed expertise and the HELP tags areas where we have questions. Maren's 1.3 version validates in VisualStudio.Net and XMLSpy. (1) There is a proposal on the board to allow multiple VectorSets. No answer from Mark Mullins or anyone else yet as to whether that is a good option. (2) Before that, Vector.Length. There is none! I remember in our discussions that we felt we should omit lengths unless they were needed, and I felt pretty sure that any Base65 conversion algorithm would provide a length for array dimensioning. Here is the C# conversion code: // Decode a GAML-type XData or YData Base64 string to array of doubles public double[] B64ToDblArr(string crdString) { return ByteToDouble(Convert.FromBase64String(crdString)); } As you can see, the array is dimensioned and filled without needing to know the length. I don't know if other algorithms work the same way, but it is possible. What we wanted to avoid was having to parse the XML file to find the length and then reparse it to obtain the values. As a result, VectorSet has Length but Vector does not. (2) LCMS reports have summed spectra from multiple regions. Maren suggested using a timeline with the first data point of each region. I had actually thought to use the "peak retention time", but am not sure that is generally applicable. What do you think? (4) AnIML Schema Changes up to version 1.3 Line numbers in v. 1.0 in brown. 1.03 version in red lines 2 MF (unnecessary XMLSPY entries) <!-- edited with XMLSPY v5 rel. 4 U (http://www.xmlspy.com) by Maren Fiege (Creon·Lab·Control AG) --> <!-- edited with XMLSpy v2005 U (http://www.xmlspy.com) by Maren Fiege (Waters) --> line 316 MF (HELP no explanation yet -see letter of Nov 26) removed the following from v.1.0 <xs:keyref name="vectorIDRef" refer="vectorID"> <xs:selector xpath="."/> <xs:field xpath="@vector"/> </xs:keyref> line 606 (MFB XMLSpy validation error: "Not a valid restriction since its "maxOccurs" value <unbounded> is greater than "1".) This is because the base complex type "ParameterValueType" has no such maxOccurs. The rule is that in order to define by restriction from a base complex type, one must only restrict, not extend. I fixed this by setting the base xs:choice to unbounded as well. <xs:choice> changed to <xs:choice maxOccurs="unbounded"> ---------------------------------------------------------------------------- - line 165 MB HELP. I don't think my fix is quite right yet. (XMLSpy: "The complex type "no name" has multiple Attribute Definitions (e.g. "templateid") whose type defintions is derived from "ID"). ID attributes are required to be unique. I believe the problem here is that the attributeGroup "SignableItem" has an attribute whose name is ID and type is xs:ID and that this Template element has this group by reference as well as another atrribute of the same type. "symbol space for unique IDs is the entire document, while for unique keys it is the target scope of the XPath. This is particularly useful if uniqueness is needed in two overlapping value spaces with different scopes in the same XML document. An example of this would be an XML document that contained room numbers and table numbers for a hotel. " added to element name Template <xs:unique name="templateIdKey"> <xs:selector xpath="." /> <xs:field xpath="@templateId" /> </xs:unique> and also line 144 MB <xs:attribute name="templateUsed" type="xs:IDREF" use="optional"/> changed to <xs:attribute name="templateUsed" type="xs:string" use="optional" /> and also line 162 MB <xs:attribute name="templateId" type="xs:ID" use="required"/> changed to <xs:attribute name="templateId" type="xs:string" use="required" /> ---------------------------------------------------------------------------- - MAJOR CHANGES TO derivation from by restriction (ExperimentStepType and TemplateType). The problem is that derivation by restriction must contain all the original tags and no others, being able only to restrict the original value ranges. Derivation by extension might be better, but I couldn't get it to work so opted simply to create a new <xs:complexType name="Technique">. HELP needed to derive by extension ---------------------------------------------------------------------------- ------ line 450...459 removed the following from v. 1.0 <xs:keyref name="sampleIDUsage" refer="sampleID"> <xs:selector xpath="."/> <xs:field xpath="@sampleID"/> </xs:keyref> and moved it to outside the ComplexType but still inside SamplesUsed element <xs:keyref name="sampleIDUsage" refer="sampleID"> <xs:selector xpath=".//SamplesUsed" /> <xs:field xpath="SampleRef/@sampleID" /> </xs:keyref> <xs:complexType name="ExperimentStepType"> line 491... MFB commented out derivation by extension (note that <!-- .... --> comments out all enclosed XML) <!--xs:element name="Technique" minOccurs="0"> <xs:annotation> <xs:documentation>Reference to Technique used in this Experiment.</xs:documentation> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name="Extension" minOccurs="0" maxOccurs="unbounded"> <xs:annotation> <xs:documentation>Reference to Extension to amend Technique.</xs:documentation> </xs:annotation> <xs:complexType> <xs:attribute name="uri" type="xs:anyURI" use="required"> <xs:annotation> <xs:documentation>URI where Extension file can be fetched.</xs:documentation> </xs:annotation> </xs:attribute> <xs:attribute name="name" type="xs:token" use="required"> <xs:annotation> <xs:documentation>Name of Extension to be used. Must match Name given in Extension Definition file.</xs:documentation> </xs:annotation> </xs:attribute> </xs:complexType> </xs:element> </xs:sequence> <xs:attributeGroup ref="SignableItemWithName"/> <xs:attribute name="uri" type="xs:anyURI" use="required"> <xs:annotation> <xs:documentation>URI where Technique Definition file can be fetched.</xs:documentation> </xs:annotation> </xs:attribute> </xs:complexType> </xs:element--> <xs:complexType name="TemplateType"> line 496...MFB commented out derivation by extension same as above <!--xs:element name="Technique" minOccurs="0"> <xs:annotation> <xs:documentation>Reference to Technique used in this Experiment.</xs:documentation> </xs:annotation> <xs:complexType> <xs:sequence> <xs:element name="Extension" minOccurs="0" maxOccurs="unbounded"> <xs:annotation> <xs:documentation>Reference to Extension to amend Technique.</xs:documentation> </xs:annotation> <xs:complexType> <xs:attribute name="uri" type="xs:anyURI" use="required"> <xs:annotation> <xs:documentation>URI where Extension file can be fetched.</xs:documentation> </xs:annotation> </xs:attribute> <xs:attribute name="name" type="xs:token" use="required"> <xs:annotation> <xs:documentation>Name of Extension to be used. Must match Name given in Extension Definition file.</xs:documentation> </xs:annotation> </xs:attribute> </xs:complexType> </xs:element> </xs:sequence> <xs:attributeGroup ref="SignableItemWithName"/> <xs:attribute name="uri" type="xs:anyURI" use="required"> <xs:annotation> <xs:documentation>URI where Technique Definition file can be fetched.</xs:documentation> </xs:annotation> </xs:attribute> </xs:complexType> </xs:element--> line 544 (remaining close tags also commented out - some are extraneous) <!--xs:complexContent--> <!--xs:restriction base="ExperimentStepType"--> <!--/xs:restriction--> <!--/xs:complexContent--> line 893 MFB Added <xs:complexType name="Technique"> <xs:complexType name="Technique"> <xs:annotation> <xs:documentation>Reference to Technique used in this Experiment.</xs:documentation> </xs:annotation> <xs:sequence> <xs:element name="Extension" minOccurs="0" maxOccurs="unbounded"> <xs:annotation> <xs:documentation>Reference to Extension to amend Technique.</xs:documentation> </xs:annotation> <xs:complexType> <xs:attribute name="uri" type="xs:anyURI" use="required"> <xs:annotation> <xs:documentation>URI where Extension file can be fetched.</xs:documentation> </xs:annotation> </xs:attribute> <xs:attribute name="name" type="xs:token" use="required"> <xs:annotation> <xs:documentation>Name of Extension to be used. Must match Name given in Extension Definition file.</xs:documentation> </xs:annotation> </xs:attribute> </xs:complexType> </xs:element> </xs:sequence> <xs:attributeGroup ref="SignableItemWithName" /> <xs:attribute name="uri" type="xs:anyURI" use="required"> <xs:annotation> <xs:documentation>URI where Technique Definition file can be fetched.</xs:documentation> </xs:annotation> </xs:attribute> </xs:complexType> ---------------------------------------------------------------------------- ------ end of MAJOR CHANGES TO ... |
|
From: <Mar...@wa...> - 2004-11-26 16:47:08
|
Hi,
I managed to make the sample file validate by changing the core schema some
more (there seemed to be some problems with the keys/keyrefs). I strongly
suggest that someone knowing more about XML have a look at this, though.
I'm not sure I didn't break anything.
The new schema is on CVS ("animl-core 1.03.xsd"). I also updated the
bruker-quinine.animl file to reference this new schema and uploaded it
likewise.
Maren.
Mit freundlichen Gr=FC=DFen / Best regards
Dr. Maren Fiege
Product Manager
--------------------------------------------------------------
Waters Informatics
Europaallee 27, D-50226 Frechen, Germany
Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99
Reply to: mar...@wa...
http://www.creonlabcontrol.com
http://www.watersinformatics.net
--------------------------------------------------------------
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
The information in this email is confidential, and is intended solely for =
the addressee(s). Access to this email by anyone else is unauthorized and =
therefore prohibited. If you are not the intended recipient you are =
notified that disclosing, copying, distributing or taking any action in =
reliance on the contents of this information is strictly prohibited and may=
=
be unlawful.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D
|
|
From: <Mar...@wa...> - 2004-11-26 16:32:46
|
Hi Mark, > I would like to introduce a new developer to the group, Mark Mullins (MM) of > SSI, who is working on an AnIML implementation. Mark has discovered > some difficulties with fitting his simple UV chromatography data > into AnIML, as have I (MB) for report LCMS data. Welcome! > Here is my understanding of the AnIML to SSI SaMPL concordance (MM > has SaMPL xml in grey below) =46orgive my ignorance: What is SaMPL=3F > I've verified this and agree that there is a Length atribute in > VectorSet. However, there is a misunderstanding about what it > means. I know this because I have Dominic Poetz's documentation > printout (but cannot email it as I do not have it electronically - > Gary Kramer may have it). VectorSet.Length actually represents the > Vector count, not the count for the number of points in each > contained vector. In this case, how do we know how many elements a vector has=3F We need this information for the AutoIncrementedValueSet. Are we supposed to use the StartOffset and EndOffset attributes for this=3F Maybe you can scan Dominik's text to a PDF you can send. That would certainly help everybody understand things better (plus it would help me a lot in creating technique definitions). > Case 3: > Scenario: LCMS Data with standard chromatogram but summed and > background-subtracted spectral reports (sum and subtract start and > stop points needed as timeline) > > Again we have discontinuous timelines as in Case 1, but here each > spectrum is a separate Page and should be displayed on separate > graphs as opposed to Case 1 where the segments should be displayed > on the same graph. We can have a subordinate PageSet of MS spectra > but what do we put in the superordinate (top-level) Page=3F What I > want is a series of (0-many) summation (start and stop) and (0-many) > subtraction selections describing each spectrum. The closest thing > in AnIML is Individual Value Set. I would need one set for > summation starts, another for summation stops, and a third and > fourth for subtraction start and stop. However, as there can be 0 - > many summations or subtractions per spectrum, I believe we begin to > run into problems indentical to Case 1 - a segmented timeline > representing the collection of start-stop segments used in creating > each spectrum. What we could do is create a "timeline" technique containing only one time vector, then "attach" the other pages to this vector. The timeline would only contain the starting points of each measurement. This would also allow to accommodate different sampling rates. Maren. Mit freundlichen Gr=FC=DFen / Best regards Dr. Maren Fiege Product Manager -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99 Reply to: mar...@wa... http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D The information in this email is confidential, and is intended solely for = the addressee(s). Access to this email by anyone else is unauthorized and = therefore prohibited. If you are not the intended recipient you are = notified that disclosing, copying, distributing or taking any action in = reliance on the contents of this information is strictly prohibited and may= = be unlawful. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |
|
From: <Mar...@gs...> - 2004-11-23 15:12:51
|
VERY LONG AND DETAILED MESSAGE. This color is MB. This color is MM.
I would like to introduce a new developer to the group, Mark Mullins (MM)
of SSI, who is working on an AnIML implementation. Mark has discovered
some difficulties with fitting his simple UV chromatography data into
AnIML, as have I (MB) for report LCMS data.
Here is my understanding of the AnIML to SSI SaMPL concordance (MM has
SaMPL xml in grey below)
ExperimentStep and DataSource are equivalent - the application of a
technique or data related to a single detector source
PageSet and DataSet are containers only
Page and RawData are equivalent to a chromatogram or spectrum (2 or more
dimensional data)
Axis and Vector are equivalent (AnIML has a Vector container called
VectorSet; SaMPL allows an Axis to have multiple child Axis tags as below
which is a reasonable way to represent independent and dependent
variables. Though less explicit than AnIML, it is more flexible.)
EncodedDataSet and EncodedDataSet are identical
Case 1:
Scenario: Data acquisition for 10 minutes with 1 second intervals. You
have 30 seconds of missing data points, 3 minutes into the run.
In SaMPL, we represented this with 2 sets of X-Axis vectors as follows:
This is equivalent to multiple Time Vectors and multiple Intensity Vectors
in AnIML.
<Dataset datasetID="1" resultType="chromatogram">
<RawData>
<Axis axisID="1" name="Time" datapoints="180" datatype="float32"
units="sec">
<AutoIncrementDataSet startValue="0"
increment="1.00"></AutoIncrementDataSet>
<Axis axisID="2" name="Intensity" datapoints="180"
datatype="float32" units="millivolts">
<EncodedDataSet>89213074132847847213473...</EncodedDataSet>
</Axis>
</Axis>
<Axis axisID="3" name="Time" datapoints="390" datatype="float32"
units="sec">
<AutoIncrementDataSet startValue="210.0"
increment="1.00"></AutoIncrementDataSet>
<Axis axisID="4" name="Intensity" datapoints="390"
datatype="float32" units="millivolts">
<EncodedDataSet>89213074132847847213473...</EncodedDataSet>
</Axis>
</Axis>
</RawData>
</Dataset>
The problem I am having with AnIML is at the <VectorSet> node. This is
where you specify the 'length' attribute to describe how many datapoints
your vectors will contain. In this example, the vectors each need to
contain a different number of datapoints.
I've verified this and agree that there is a Length atribute in VectorSet.
However, there is a misunderstanding about what it means. I know this
because I have Dominic Poetz's documentation printout (but cannot email it
as I do not have it electronically - Gary Kramer may have it).
VectorSet.Length actually represents the Vector count, not the count for
the number of points in each contained vector. Thus it is ok to have
multiple Vectors with different lengths, just as in SaMPL. Nevertheless,
my inclination is to accomodate segmentation of data like this by allowing
multiple VectorSets. I think all Vectors in a VectorSet should have the
same length
Case 2:
Scenario: 3D Data acquisition for 10 minutes with 1 second intervals and
255 wavelengths.
In SaMPL, we represented this with 1 set of vectors as follows:
<Dataset datasetID="1" resultType="chromatogram">
<RawData>
<Axis axisID="1" name="Time" datapoints="600" datatype="float32"
units="sec">
<AutoIncrementDataSet startValue="0"
increment="1.00"></AutoIncrementDataSet>
<Axis axisID="2" name="Wavelength" datapoints="255" datatype="int"
units="nm">
<EncodedDataSet>89213074132847847213473...</EncodedDataSet>
<Axis axisID="3" name="Absorbance" datapoints="255"
datatype="float32" units="absorb units">
<EncodedDataSet>89213074132847847213473...</EncodedDataSet>
<EncodedDataSet>89213074132847847213473...</EncodedDataSet>
<!-- There would be 598 more EncodedDataSets here -->
</Axis>
</Axis>
</Axis>
</RawData>
</Dataset>
The problem I am having is the same as in Case 1 above, since each of the
contained Vectors will have a different length.
I believe the intention in AnIML would be to create a chromatogram Page
with a subordinate PageSet of UV spectra, each spectrum referencing its
superordinate chromatogram Page point.
Case 3:
Scenario: LCMS Data with standard chromatogram but summed and
background-subtracted spectral reports (sum and subtract start and stop
points needed as timeline)
Again we have discontinuous timelines as in Case 1, but here each spectrum
is a separate Page and should be displayed on separate graphs as opposed
to Case 1 where the segments should be displayed on the same graph. We
can have a subordinate PageSet of MS spectra but what do we put in the
superordinate (top-level) Page? What I want is a series of (0-many)
summation (start and stop) and (0-many) subtraction selections describing
each spectrum. The closest thing in AnIML is Individual Value Set. I
would need one set for summation starts, another for summation stops, and
a third and fourth for subtraction start and stop. However, as there can
be 0 - many summations or subtractions per spectrum, I believe we begin to
run into problems indentical to Case 1 - a segmented timeline representing
the collection of start-stop segments used in creating each spectrum.
|
|
From: Mark F. B. <sa...@co...> - 2004-11-18 13:54:27
|
Hi again (Mark Bean writing from home) Validation of AnIML Schema. I believe that XMLSpy may not be catching all the Schema errors in validation. Our reference point is always http://www.w3.org/XML/Schema. By the way, Schema 1.1 is in the works and open for comment and those changes may well effect this discussion. The point here is derivation from a complex type by restriction. Here are the w3 definitions: "[Definition:] A type definition whose declarations or facets are in a one-to-one relation with those of another specified type definition, with each in turn restricting the possibilities of the one it corresponds to, is said to be a restriction. The specific restrictions might include narrowed ranges or reduced alternatives. Members of a type, A, whose definition is a ·restriction· of the definition of another type, B, are always members of type B as well. [Definition:] A complex type definition which allows element or attribute content in addition to that allowed by another specified type definition is said to be an extension. See also http://www.w3.org/TR/xmlschema-1/#Complex_Type_Definitions An example of the types in question is TemplateType: <xs:complexType name="TemplateType"> <xs:complexContent> <xs:restriction base="ExperimentStepType"> If we loook at TemplateType and ExperimentStepType we will see a different number of elements. It is my contention that as ExperimentStepType has new elements not seen in TemplateType, that it should be derived by extension from TemplateType rather than TemplateType being derived from ExperimentStepType by restriction. This may also require changes in the complexContent tagging. I have NOT made these changes and request help in doing so (which means that I tried to do it but it wouldn't validate!) regards, Mark |
|
From: Mark F. B. <sa...@co...> - 2004-11-18 12:57:01
|
Hi everyone (Mark Bean here writing from home) Nice to see some of you again in the AnIML business meeting in Somerset. I finally got CVS working today and checked in a revised version into CVS schema animl-core 1.01.xsd. I also checked in a new folder under AnIML called base64_converters and in that added a C#Converters.cs file with a bunch of conversion routines, a few of which are relevant to AnIML and C#.Net implementers. If you don't have access to the CVS except in browsing, here's how I managed it. I followed a link provided by Maren Fiege, but used TortoiseCVS instead of WinCVS and have to recommend TortoiseCVS as easy to use (you check in and out from Windows Explorer). You can get TortoiseCVS from Sourceforge. To install the required client software for using AnIML CVS, follow the instructions here: http://sourceforge.net/docman/display_doc.php?docid=766&group_id=1 I also downloaded the latest enterprise version of XMLSpy and ran into a number of invalid faults in the AnIML core schema. The faults I find at work and at home are not the same using Microsoft VisualStudio.Net, but perhaps I have different versions of MSXML. I also am behind a firewall at work and that probably makes it impossible for the validator to access the schemaLocation="http://www.w3.org/TR/xmldsig-core/xmldsig-core-schema.xsd" for digital signatures. I suspect if I modified that to a local reference some of those errors would disappear. Anyway, here are the errors I found and what I did to "fix" them. Fix is in quotes because I don't really know XML Schema or AnIML and may have created more problems than I fixed. I believe that except for the first error, I commented out the old tags and added new tags just before it, so you can see what I did. 1.. XMLSpy: Not a valid restriction since its maxOccurs value <unbounded> is greater than 1. <xs:complexType name="VectorValueType"> <xs:annotation> <xs:documentation>Elements for IndividualValueSets in Vectors.</xs:documentation> <xs:documentation>Elements for values in Vectors.</xs:documentation> </xs:annotation> <xs:complexContent> <xs:restriction base="ParameterValueType"> <xs:choice maxOccurs="unbounded"> This is because the base complex type "ParameterValueType" has no such maxOccurs. The rule is that in order to define by restriction from a base complex type, one must only restrict, not extend. I fixed this by setting the base xs:choice to unbounded as well. 2.. XMLSpy: The complex type no name has multiple Attribute Definitions (e.g. templateid) whose type definitions is derived from ID <xs:element name="Template"> <xs:annotation> <xs:documentation>ExperimentStep templates</xs:documentation> <xs:documentation>Represents a container for ExperementStep templates</xs:documentation> </xs:annotation> <xs:complexType> <xs:complexContent> <xs:extension base="TemplateType"> <xs:attributeGroup ref="SignableItem"/> <xs:attribute name="templateId" type="xs:ID" use="required"/> </xs:extension> </xs:complexContent> </xs:complexType> </xs:element> ID attributes are required to be unique. I believe the problem here is that the attributeGroup "SignableItem" has an attribute whose name is ID and type is xs:ID and that this Template element has this group by reference as well as another atrribute of the same type. Perhaps we should pay attention to what MSDN has to say about use of IDs and IDREFs and replace them with KEY and KEYREFs. Why you should favor key/keyref/unique over ID/IDREF for identity constraints DTDs provide a mechanism for specifying that an attribute has a type ID, meaning that its value will be unique within the document and that it matches the Name production in XML 1.0. IDs in XML 1.0 can also be referenced by attributes of type IDREF or IDREFS. For compatibility with DTDs, W3C XML Schema has the xs:ID, xs:IDREF and xs:IDREFS types. W3C XML Schema identity constraints are used for specifying unique values, keys or references to keys using XPath expressions defined within the scope of an element declaration. Comparing feature for feature, the identity constraint mechanisms offer more than their ID/IDREF counterparts do. For one, there is no limitation on the values or types that can be used as part of an identity constraint, whereas IDs can only be one of a specific range of values (for example, 7 is not a valid ID). A more important benefit of the schema identity constraints over ID/IDREF is that, while the latter have to be unique within the document, the former do not. In other words, the symbol space for unique IDs is the entire document, while for unique keys it is the target scope of the XPath. This is particularly useful if uniqueness is needed in two overlapping value spaces with different scopes in the same XML document. An example of this would be an XML document that contained room numbers and table numbers for a hotel. It is likely that some of the numbers overlap (i.e. there is a room 18 and a table 18), but they should not overlap within either value space. Note The W3C XML Schema family of ID types are not exactly compatible with the DTD ID types. For one, the xs:ID, xs:IDREF and xs:IDREFS types can be applied to both elements and attributes in the W3C XML Schema, although they can only apply to attributes in their DTD equivalents. Secondly, there is no restriction on how many attributes of type xs:ID can appear on an element, although such a restriction exists for ID attributes in the DTD equivalents. I added the following key and commented out the old one. I feel sure I broke something somewhere. <xs:unique name="templateIdKey"> <xs:selector xpath=".//Template"/> <xs:field xpath=" templateId"/> </xs:unique> 3.. XMLSpy: Type Definition Technique is no valid derivation of Type Definition Technique. <xs:complexType name="TemplateType"> <xs:complexContent> <xs:restriction base="ExperimentStepType"> <xs:sequence> <xs:element name="Author" type="UserInformationType" minOccurs="0"> </xs:element> <xs:element ref="Timestamp" minOccurs="0"/> <xs:element name="Technique" minOccurs="0"> I notice that there are two definitions of the element named Technique. One in ExperimentStepType and one in TemplateType. They look identical to me. I created a new complexType called Technique and then referred to it in both ExperimentStepType and TemplateType. The resultant xml schema animl-core 1.01.xsd now validated correctly. I then used the schema to generate a sample XML file for manual filling. I used the XMLSpy title bar menu item DTD/Schema:Generate Sample XML File and checked all boxes. However, I was unable to save it as it itself was invalid! Lets go through those errors: 4.. XMLSpy: The <keyfef> Identity Constraint Definition VectorIDRef did not match any elements or attributes within the scope of element IndexRef <MeasurementData id="ID000006"> <Template id="ID000007"> <PageSet id="ID000013"> <Page id="ID000014" name="token"> <References id="ID000015"> <IndexRef index="1" refWidth="1" id="ID000016" name="token" vectorID="String"/> I gave up at this point but would appreciate some help in completing the task of obtaining a valid XML file. By the way the bruker_quinine sample also had validation errors against new or old schema using XMLSpy - and that is without taking the technique validation into consideration! Microsoft has a good discussion of XML Schema design recommendations. It would be great if AnIML core were simpler. W3C XML Schema Design Patterns: Avoiding Complexity Dare Obasanjo, Microsoft Corporation http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnxml/html/ xmlschemacomplex.asp regards, Mark |
|
From: SourceForge.net <no...@so...> - 2004-11-17 21:11:43
|
Read and respond to this message at: https://sourceforge.net/forum/message.php?msg_id=2856947 By: dmartinsen As I mentioned in the E13.15 Business Meeting yesterday in Somerset, XMLSpy 2005 also gives an error now when trying to validate the technique definition files (*.atid). This error did not occur with XMLSpy 2004 (I just went back and checked). The error does not occur when validating the animl-technique.xsd itself. However, any of the .atid files, when validated, flag the following error in animl-technique.xsd: This schema is not valid: Attribute 'name' is not allowed for element 'xs:simpleType' because it is prohibited by complexType 'xs:localSimpleType', at line 505 in the animl-technique.xsd. ______________________________________________________________________ You are receiving this email because you elected to monitor this forum. To stop monitoring this forum, login to SourceForge.net and visit: https://sourceforge.net/forum/unmonitor.php?forum_id=262129 |
|
From: SourceForge.net <no...@so...> - 2004-11-15 18:20:52
|
Read and respond to this message at: https://sourceforge.net/forum/message.php?msg_id=2853237 By: beanmf I am getting "Invalid particle derivation by restriction" errors when using Microsoft VisualStuido.Net to create a typed dataset from the core schema. I believe that our AnIML core is not valid and here is the evidence. "When generating code from a schema you may get the error 'Invalid particle derivation by restriction'. This may be due to the use of an XSD that does not conform to the W3C Schema Standard Section 4.4. Derivation by restriction does not allow you to add or omit elements (unless they are optional in the base type), it simply allows you to restrict their valid values e.g. set a default value or set type="string" where previously no type was specified. Some popular XML tools do not comply with the XSD standard as they do not mark such schema as invalid. " The actual documentation that describes this requirement for derivation by restriction can be found here. http://www.w3.org/TR/2001/REC-xmlschema-0-20010502/#DerivByRestrict where it clearly states (with examples), "Notice that types derived by restriction must repeat all the components of the base type definition that are to be included in the derived type: " The AnIML core fails this requirement in two places: the complex type TemplateType derives from ExperimentStepType by restriction. The complex type VectorValueType derives from ParameterValueType by restriction. ______________________________________________________________________ You are receiving this email because you elected to monitor this forum. To stop monitoring this forum, login to SourceForge.net and visit: https://sourceforge.net/forum/unmonitor.php?forum_id=262129 |
|
From: Randall K J. <JUL...@LI...> - 2004-09-02 15:06:24
|
The AnIML website has been updated with new material and a new layout. The most recent version of the schema is available for download directly from the front page as are the technique specific schema. http://animl.sf.net Also, the minutes from the July working meeting are now available. Report any broken links or problems to rk...@li.... Thanks, Randy |