You can subscribe to this list here.
| 2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2004 |
Jan
(1) |
Feb
|
Mar
(9) |
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
(4) |
Sep
(2) |
Oct
|
Nov
(9) |
Dec
(29) |
| 2005 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
(2) |
May
(1) |
Jun
(2) |
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
(1) |
Dec
(1) |
| 2006 |
Jan
|
Feb
(1) |
Mar
(2) |
Apr
(22) |
May
(7) |
Jun
(3) |
Jul
|
Aug
(3) |
Sep
(5) |
Oct
(1) |
Nov
(2) |
Dec
(4) |
| 2007 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: David M. <d_m...@ac...> - 2005-11-15 15:53:46
|
Call for Papers: XML in Chemistry =20 Organized by Chemical Information Division (CINF) =20 At 231st ACS National Meeting, Atlanta, GA, March 26-30, 2006 =20 This session will examine the current status of applications of XML in chemistry. The use of XML in document markup, tagging of chemical = structures and data, storage and exchange of chemical information, XML standards efforts, and the display of textual and experimental data based on XML markup are some of the areas of interest. =20 Note that this is a venue to publicize work not only about AnIML, but = also other XML applications in chemistry as well. The conference will also include a =BD day session on ThermoML, organized by Michael Frenkel, = which will look at the ways in which that standard was developed and = deployed, and is now being used. =20 Please submit abstracts by November 23, 2005 to the ACS OASYS system. General instructions can be found at http://oasys.acs.org/oasys.htm <http://oasys.acs.org/oasys.htm> . A direct link to the CINF submission = area is at http://oasys.acs.org/acs/231nm/cinf/papers/index.cgi <http://oasys.acs.org/acs/231nm/cinf/papers/index.cgi> . =20 Thanks for your interest. If you have any questions, please contact me directly. =20 Dave Martinsen =20 ********************************* David Martinsen American Chemical Society 1155 16th St. NW Washington, DC 20036 d_m...@ac... =20 |
|
From: David M. <d_m...@ac...> - 2005-08-22 23:18:14
|
The minutes for the Aug. 4 virtual meeting have been posted to the AnIML web site (http://animl.sf.net <http://animl.sf.net/> ). Note that the next virtual meeting is scheduled for Thursday, August 25 at 10:30 EDT. Dave ********************************* David Martinsen American Chemical Society 1155 16th St. NW Washington, DC 20036 d_m...@ac... |
|
From: David M. <d_m...@ac...> - 2005-08-02 21:20:59
|
1) The new AnIML web site has been released, http://animl.sf.net <http://animl.sf.net/> . 2) Minutes from recent meetings have been posted. Click on the "Meetings" link on the right side of the web page. 3) The next virtual meeting is scheduled for Thursday, Aug 4 at 10:30 am - 12:30 pm EDT. See the website for more information. Please let me know if you have any comments or questions. Thanks, Dave Martinsen ********************************* David Martinsen American Chemical Society 1155 16th St. NW Washington, DC 20036 d_m...@ac... |
|
From: Peter J. L. <pet...@ni...> - 2005-06-17 17:25:00
|
Anh Dao,
Please find attached two JCAMP-DX files which contain
integer values for absorbance and transmittance values:
epa-ir.dx This file contains a gas phase spectrum for
butanoic acid. It is part of the EPA's gas
phase IR collection (which was later
transferred to NIST).
quant-ir.dx This file contains a gas phase spectrum for
methyl ethyl ketone. It comes from the NIST
Quantitative Infrared Database (SRD 79) which
is developed by the NIST Analytical Chemistry
Division.
The data in both of these files could be represented
in AnIML by simply multiplying the integer Y values by
the JCAMP-DX parameter "YFACTOR" and storing the resulting
values as floating point numbers. Doing this may result
in a loss of information about reported digits and any
rounding due to the integer representation. Since compression
in JCAMP-DX requires data in integer form, compressed
JCAMP-DX data will consist of integers multiplied by a
floating point value.
In the case of the first file such a conversion will
hide the fact that some values are only reported to two
or three digits. In the second file the fact that the
multiplier is reported with five digits (versus up to
eight for the integer values) will be lost.
I hope these examples are of use to you.
Peter
================================================
Peter J. Linstrom
NIST, Physical and Chemical Properties Division
Phone: (301) 975-5422
================================================
|
|
From: David M. <d_m...@ac...> - 2005-06-15 15:00:36
|
The minutes of the recent meetings of the AnIML Working Group have been posted to the sourceforge site. They may be accessed via http://animl.sourceforge.net <http://animl.sourceforge.net/> . Let me know if you have any questions or comments. Thanks, Dave Martinsen ********************************* David Martinsen American Chemical Society 1155 16th St. NW Washington, DC 20036 d_m...@ac... |
|
From: Amanda C. <af...@ab...> - 2005-05-20 15:15:59
|
Dear AnIMLers, I'm interested in the application of XML standards to plate reader data, because we have some data that we would like to make available in XML, and I was curious to see if AnIML would be the way to go for this. I notice that in the minutes of your meeting on 1st April 2005 there was the comment "Burkhard noted that if it is not already there, one requirement should be that AnIML does not become obsolete when new analytical techniques are developed. For example, 96 well plate readers". So I thought I'd put together an example of how it might look so I could get a feel for using this as opposed to creating my own XML format for my own purposes. I've created a few files: http://users.aber.ac.uk/afc/work/platereaders/20041208-2572.victor.animl http://users.aber.ac.uk/afc/work/platereaders/PlateReader_base_instance_document.atid Of course, they are not by any means detailed or complete, just enough to get a feel for the pros and cons of using the AnIML format to represent my data. I now have a few questions about AnIML, would it be appropriate to ask these here? 1) Do you have plans to include microtitre plate reader data in future (and if so, would it look something like this, am I on the right lines)? I think the main issue I had was trying to record pairs of values in a vector. For each well within a 96 well plate we have many readings, and for each reading there is a timestamp I'd like to record. For this I used a Page for each well, and a VectorSet of 2 Vectors within each Page (one Vector for the readings and one Vector for the corresponding timestamps). 2) I think I don't yet fully understand the main purpose of AnIML. That is, as I see it there's a difference between automatically collected data that is created by the machine (plate reader, or whatever, including the parameters that were used to set up that run of the machine), and human input data that is describing what the data is to be used for and how. A difference between data that relates to the measurements and data that relates to the purpose of the expt. I feel the former should be required, and the latter should be optional. At the moment AnIML seems to require both. For example: if I describe the samples used in an experiment I'm required to describe their role. And if I describe a Vector, I must say whether it contains a dependent or independent variable, and what its plot use is to be. The plate reader machine can hardly know this, though a human would. So I am suprised that these attributes are "required" rather than "optional". 3) Are there tools to allow validation against the atid document? At the moment this seems to be like a meta-schema, and I can't use it for validating my documents. Then I have various questions about technical parts of AnIML that I couldn't find answers to on the website. I apologise if they have already been answered elsewhere and I haven't come across it yet. 1) float vs float32/64 In the *technique.atid documents "float" is used: VectorBlueprint name="Readings" type="float" In the animl-technique.xsd docs float is used for the VectorTypeNames/AllTypeNames In the animl-core.xsd VectorValueType can be float32 or float64 (mapping to xs:float and xs:double respectively) 2) Namespaces: will they be used in future? It makes AnIML much less useful without them. 3) Filenames with spaces (such as animl-core 1.04.xsd) sometimes cause problems (for eaxmple, if you have namespaces and want to specify multiple schemaLocations for the various namespaces), and I think it would be better to avoid spaces in filenames if possible. 4) I am partly confused about the distinction between modality and using maxOccurs/minOccurs in the atid docs. Why does animl-technique allow both maxOccurs and modality? what happens if they conflict? And why is minOccurs not allowed? 5) What value should I use for values that weren't recorded? (NaN? N/A?) I have many of these. 6) Which are the versions of animl-core.xsd and animl-technique.xsd that I should be using, and do they have any relationship to the compulsory 'version="1.0"' attribute of animl-core or 1.1 of animl-technique? Thank you in advance for any help and advice. Amanda -- Amanda Clare http://users.aber.ac.uk/afc/ Tel: +44 (0)1970 622410 Fax: +44 (0)1970 628536 Dept. of Computer Science, University of Wales, Aberystwyth, SY23 3DB |
|
From: David M. <d_m...@ac...> - 2005-04-20 16:15:05
|
The minutes from recent meetings have been posted on the AnIML website (http://animl.sf.net <http://animl.sf.net/> ). These include: 1. April 1 working group virtual meeting 2. March 2 working group meeting at Pittcon Dave ********************************* David Martinsen American Chemical Society 1155 16th St. NW Washington, DC 20036 d_m...@ac... |
|
From: David M. <d_m...@ac...> - 2005-04-11 12:29:01
|
The minutes from recent meetings have been posted on the AnIML website (http://animl.sf.net <http://animl.sf.net/> ). These include: 1. December 16 working group virtual meeting 2. February 28 business meeting at Pittcon 3. March 18 working group virtual meeting The March 2 working group meeting at Pittcon and the April 1 working group virtual meeting will be coming soon. Dave ********************************* David Martinsen American Chemical Society 1155 16th St. NW Washington, DC 20036 d_m...@ac... |
|
From: Burkhard S. <b_...@us...> - 2005-02-05 20:44:14
|
Dave, thanks for the URL. I read through the specification. It looks like XOP can be applied to any existing XML document. It works by packaging the XML document into a multipart MIME container and extracting parts of the document. These extracted parts are moved to a new MIME part (which is 8-bit clean) and is then referenced from the location is previously held in the document. All this generates pretty significant overhead, mainly due to the MIME container and MIME boundaries, but also due to the reference pointers. So it only makes sense if large chunks of data are used. AnIML is compatible with this specification. Very large (!) Base64-encoded data sets could be treated that way. There is nothing to keep users from applying it to their AnIML files. It's just that the XML parser reading the file later needs to support it. Since the tool support is extremely limited right now, we should wait and see if it catches on. Best regards, Burkhard David Martinsen wrote: > I just ran across this release of the "XML-binary Optimized Packaging" as a > W3C recommendation: http://www.w3.org/TR/2005/REC-xop10-20050125. It strikes > me that this has the potential to a) reduce the file size of AnIML files; > and b) remove the need for encoding/decoding base64 strings, thereby > improving performance of software processing AnIML files. > > I'm not proposing a switch to this specification at this point, but just > wanted to ask my learned colleagues whether this spec might make sense at > some point in the future? > > Regards, > Dave > > -----Original Message----- > From: Burkhard Schaefer [mailto:b_...@us...] > Sent: Wednesday, December 15, 2004 9:51 PM > To: AnIML Developer List > Subject: [Animl-develop] Array Sizing for Base64 values > > Regarding Mark B.'s question about Base64 array lengths: I think most > algorithms work the same way as the one you've written. You can infer > the number of values by looking at the length of the base64 string. (if > you know whether it contains float64 or float32 values) > > This way you can find out how much memory you need to allocate before > actually decoding the base64 string. > > Best wishes, > Burkhard > > > ------------------------------------------------------- > SF email is sponsored by - The IT Product Guide > Read honest & candid reviews on hundreds of IT Products from real users. > Discover which products truly live up to the hype. Start reading now. > http://productguide.itmanagersjournal.com/ > _______________________________________________ > Animl-develop mailing list > Ani...@li... > https://lists.sourceforge.net/lists/listinfo/animl-develop > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Animl-develop mailing list > Ani...@li... > https://lists.sourceforge.net/lists/listinfo/animl-develop > |
|
From: David M. <d_m...@ac...> - 2005-01-27 20:00:17
|
I just ran across this release of the "XML-binary Optimized Packaging" as a W3C recommendation: http://www.w3.org/TR/2005/REC-xop10-20050125. It strikes me that this has the potential to a) reduce the file size of AnIML files; and b) remove the need for encoding/decoding base64 strings, thereby improving performance of software processing AnIML files. I'm not proposing a switch to this specification at this point, but just wanted to ask my learned colleagues whether this spec might make sense at some point in the future? Regards, Dave -----Original Message----- From: Burkhard Schaefer [mailto:b_...@us...] Sent: Wednesday, December 15, 2004 9:51 PM To: AnIML Developer List Subject: [Animl-develop] Array Sizing for Base64 values Regarding Mark B.'s question about Base64 array lengths: I think most algorithms work the same way as the one you've written. You can infer the number of values by looking at the length of the base64 string. (if you know whether it contains float64 or float32 values) This way you can find out how much memory you need to allocate before actually decoding the base64 string. Best wishes, Burkhard ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Animl-develop mailing list Ani...@li... https://lists.sourceforge.net/lists/listinfo/animl-develop |
|
From: Burkhard S. <b_...@us...> - 2005-01-27 01:42:21
|
Hi Maren and Mark, please see my earlier email about the XYZSet container tags. It may be beneficial to keep them. > However, I'm wondering whether we really need the possibility to sign > single elements in the AnIML file, or whether it would suffice to be able > to sign the whole document. Any thoughts on that? Signing only whole documents may be too strict a limitation. The philosophy in most Part 11-compliant implementations is that everybody signs the data she/he is responsible for. So one person (i.e. a chemist) could have created the samples and somebody else (i.e. a lab technician) could run the experiment. A 3rd person is responsible for calibration, etc. So that's one thing. Another point is that parts of a document could be generated at different points in time. Here it is a critical feature to preserve the original signatures, even when data is later added in other parts of the document (not covered by the initial signature). A third, and very interesting point: In the future, we could consider instruments that directly sign their result data. I've talked to a few folks (instrument manufacturers) at the LIMS Conference in Barcelona last September, and this is something they get asked about in regulated environments. And right now, only AnIML provides a (non-proprietary) solution to this problem. > - In the Technique Schema, we have an attribute "maxOccurs" and an > attribute "modality". It would be more consistent to replace "modality" > with "minOccurs" (=0 or 1). Sounds good. > - What is the exact meaning of the attributes "inheritable" and > "upwardsInherited"? These seem rather technical to me; what do we need > them for? "inheritable" is used with nested techniques. It indicate if a technique can inherit a sample from the surrounding experiment step. Example: LC +-- MS (inherits sample from LC) Here we don't have to explicitly declare the sample consumed by the MS, because the MS ExperimentStep is attached to the chromatogram page and refers to a point (or range) of the LC time axis. So we know where the sample comes from without having to create an explicit entry for it in the SampleSet section. For this to work, the MS technique definition needs to set "inheritable=true" for the run sample. > - What is the benefit of assigning "consumed" or "produced" to a sample? It allows us to easily track the material flow in the experiment. We can see how a sample is created by looking at the ExperimentStep that "produced" it. We can also find out what happened to it by looking at all the steps that "consumed" the sample. One important consequence: If a sample is "consumed" in a step, the result data pages of that step will tell us something we measured about the sample. If a sample is "produced", that step did not measure the sample but merely produced the material; i.e. we will typically need to take additional steps to measure its characteristics. So we can chain together experiment steps using the produced/consumed concept. This very feature allows us to cover the lab workflow, so it's one of the most important attributes in the core schema. :-) > - Parameters are usually not stored as binary data. Is it useful to have > float and double data types for them? Yes. Using float and double as a data type does not mean that the data is stored in binary/base64. The digits are stored in plain text. Some instruments deliver IEEE floating point values, so having these types is certainly a good idea. > Storing non-binary data as floats or > doubles incurs a loss of precision as the decimal numbers are converted > internally to the closest binary number, which is not always exact. We > might want to be able to use the XML datatype xs:decimal as well. You are right on the rounding. Perhaps I got you wrong in the last paragraph. I've looked at xs:decimal and it seems interesting for us. From what I gather this type supports arbitrary precision numbers. This is somewhat problematic to handle in software, since there is no data type in a programming language that would directly map to it. I know that Java, C++, and .NET have classes available to encapsulate it. Nevertheless, implementation tends to be hairy. But my gut feeling would be that xs:decimal should be put in. > Another issue (which we probably can't solve in XML) is that you cannot > specify how high the precision of your data is. "1.200" is something > different from "1.2" as the two zeroes tell you that the precision is > three decimal digits. From what I've found in an XML book, however, it > looks like XML views the trailing zeroes as "non-significant". That's true. It's also in the Schema spec for the decimal type. http://www.w3.org/TR/xmlschema-2/#decimal Viele Grüße, Burkhard |
|
From: Burkhard S. <b_...@us...> - 2005-01-27 00:08:40
|
Hi Anh Dao and Maren,
the ds:Reference mechanism is the right way to go. That way you can
point to the item(s) you want signed.
I spent some more time to think whether we really need the container
tags (XYZSet). The following example came to my mind. Let's assume we
have a VectorSet vs1 containing two Vectors v1 and v2 -- like this:
Scenario 1 (with XYZSets):
<Page>
<VectorSet id="vs1">
<Vector id="v1"> ... </Vector>
<Vector id="v2"> ... </Vector>
</VectorSet>
...
</Page>
Scenario 2 (without XYZSets):
<Page>
<Vector id="v1"> ... </Vector>
<Vector id="v2"> ... </Vector>
...
</Page>
Now the question is: Is a signature over [vs1] in scenario 1 equivalent
to a signature over [v1, v2] in scenario 2? The surprising answer is "no".
In both scenarios, the signatures prevent the modification of the data
in v1 and v2. But in scenario 2, we could easily introduce another
Vector v3 without invalidating the signature - because v3 is outside the
scope of the signature:
<Page>
<Vector id="v1"> ... </Vector>
<Vector id="v2"> ... </Vector>
<Vector id="v3"> ... </Vector>
...
</Page>
In scenario 1, doing this would make the signature invalid (which is the
desired behavior).
So taking out the container tags (XYZSets) would significantly weaken
the security of our digital signature mechanism: It allows data to be
added without the ability to detect it (unless you sign the entire
surrounding element - which will not always be possible). This will make
the 21 CFR Part 11 folks very unhappy.
Therefore I would tend to leave the container tags in.
Best wishes,
Burkhard
Anh Dao Nguyen wrote:
>
> Hi Maren,
>
> I have developed a small digital-signature-signing and –verification
> program which allows signing numerous data objects (elements) all at
> once. If the purpose for creating the “XYZSet” containers was to be able
> to sign several elements of similar type at one time, then it would be
> unnecessary. You are right, we theoretical can remove all the
> Set-containers from the Core Schema.
>
> At the beginning stage of the application development, I used
> “Object”-element-concept (according to the paragraph 2.3 of the
> DSig-Specification) to sign multiple data objects. That means the data
> being signed must be accommodated within the “Object” element.
>
> For instance:
> ….
> <Sample derived="false" *id="signitem1"* sampleID="SRM936">
> …
> </Sample>
> …….
> <Signatures>
> <Signature Id="sig0">
> <ds:SignedInfo>
> <ds:Reference URI="*#signObject1*">
> …
> </ds:Reference>
> </ds:SignedInfo>
> <ds:*Object* Id="*signObject1*">
> <Sample derived="false" *id="signitem1"*
> sampleID="SRM936">
> …
> </Sample>
>
> </ds:Object>
> </Signature>
> </Signatures>
>
>
> As everyone already knew, each ID must be unique within an
> AnIML-document. Therefore I had to remove the original data of the
> signed element (outside of the signature element) from the AnIML file.
> Otherwise, we would have double ID in the AnIML file (id=”signitem1”).
> Finally, I decided to utilize “Reference” element within the “SignInfo”
> element to refer to the signed data by using “URI” attribute (see the
> attach file).
>
> ….
> <Sample derived="false" id="*signitem1*" sampleID="SRM936">
> …
> </Sample>
> …….
> <Signatures>
> <Signature Id="sig0">
> <ds:SignedInfo>
> <ds:Reference *URI="# signitem1"*>
> …
> </ds:Reference>
> </ds:SignedInfo>
>
> </Signature>
> </Signatures>
>
>
> Anh Dao
>
>
>
>
> At 05:57 AM 1/26/2005, Mar...@wa... wrote:
>
>> Hi Mark,
>>
>> > But let's go on and see if we can make a better hierarchical model.
>> OK. I had a look at the XML-DSIG Specification
>> (http://www.w3.org/TR/xmldsig-core/) yesterday, and it looks like we can
>> get rid of the "XYZSet" containers in the Core Schema:
>> The idea of these was, as far as I remember, to be able to sign several
>> elements of the same kind at once.
>> XML-DSIG has an element called "Signature" with several children. The one
>> that tells you what you are signing is "SignedInfo/Reference", which has
>> an optional "URI" attribute where you state the URI of the object you're
>> signing. Now, paragraph 2.3 of the Spec says that you can include
>> multiple
>> "Reference" elements within "SignedInfo" to sign multiple data objects.
>> Judging from this, we don't need the containers any more. I made a new
>> core schema based on animl-core 1.05 where I took the container elements
>> out:
>>
>> By the way, I am not sure any more whether this is the version with the
>> corrected references or not. Maybe Anh Dao can help clarify this.
>>
>> However, I'm wondering whether we really need the possibility to sign
>> single elements in the AnIML file, or whether it would suffice to be able
>> to sign the whole document. Any thoughts on that?
>>
>> BTW, The technique schema remains untouched by the changes I made.
>>
>> I have some more questions/suggestions about schema details:
>>
>> - In the Technique Schema, we have an attribute "maxOccurs" and an
>> attribute "modality". It would be more consistent to replace "modality"
>> with "minOccurs" (=0 or 1).
>>
>> - What is the exact meaning of the attributes "inheritable" and
>> "upwardsInherited"? These seem rather technical to me; what do we need
>> them for?
>>
>> - What is the benefit of assigning "consumed" or "produced" to a sample?
>>
>> - Parameters are usually not stored as binary data. Is it useful to have
>> float and double data types for them? Storing non-binary data as
>> floats or
>> doubles incurs a loss of precision as the decimal numbers are converted
>> internally to the closest binary number, which is not always exact. We
>> might want to be able to use the XML datatype xs:decimal as well.
>> Another issue (which we probably can't solve in XML) is that you cannot
>> specify how high the precision of your data is. "1.200" is something
>> different from "1.2" as the two zeroes tell you that the precision is
>> three decimal digits. From what I've found in an XML book, however, it
>> looks like XML views the trailing zeroes as "non-significant".
>>
>>
>> > I don't think I had a response on the need for the concept of an
>> > "analysis" in AnIML
>> I'm going to look into this soon. Maybe Mark Mullins can tell us about
>> his
>> experiences implementing AnIML?
>>
>>
>> Maren.
>>
>>
>> Mit freundlichen Grüßen / Best regards
>>
>> Dr. Maren Fiege
>> Product Manager
>>
>> --------------------------------------------------------------
>> Waters Informatics
>> Europaallee 27, D-50226 Frechen, Germany
>> Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99
>> Reply to: mar...@wa...
>> http://www.creonlabcontrol.com <http://www.creonlabcontrol.com/>
>> http://www.watersinformatics.net <http://www.watersinformatics.net/>
>> --------------------------------------------------------------
>> ===========================================================
>>
>> The information in this email is confidential, and is intended solely
>> for the addressee(s). Access to this email by anyone else is
>> unauthorized and therefore prohibited. If you are not the intended
>> recipient you are notified that disclosing, copying, distributing or
>> taking any action in reliance on the contents of this information is
>> strictly prohibited and may be unlawful.
>>
>> ===========================================================
|
|
From: Burkhard S. <b_...@us...> - 2004-12-17 16:15:40
|
Hi Maren and Mark, I think Maren put this very well. A few comments: > > (1) Is LCMS its own technique or does it reuse LC and MS? (do hybrid > > techniques need their own individual definitions?) Let's assume LCMS > reuses > > LC and MS. > As far as I remember, we decided that we build hyphenated techniques out of > base techniques, so your assumption is correct. Yes. > > If ExperimentStep is the application of a technique, then LC and MS Pages > go > > in differnt ExperimentSteps while UV214, UV254, ELS, NCLD all go in the > same > > ExperimentStep. > This is the difficult point about chromatography: the multitude of > different detectors. I think the cleanest approach would be to make each > detector a different technique (or maybe technique extensions). This would > require to put the UV, ELSm NCLD data into different ExperimentSteps. I agree, using different techniques for each detector and putting the data into different ExperimentSteps would be the cleanest way. > > If multi-detector data goes in multiple ExperimentSteps, then how do we > hold > > multiple instances of that hybrid analysis in one file? > Each detector-specific ExperimentStep would have a Reference element > pointing up the tree to the mother element, and would possibly sit in a > subordinate ExperimentStepSet of the mother page. Correct. So each MS spectrum would have a Reference pointing up to the time vector of the LC chromatogram page. Have a good weekend! Best wishes, Burkhard |
|
From: Burkhard S. <b_...@us...> - 2004-12-17 16:10:00
|
Hi everybody, > there is no logical reason why startOffset should be less than endOffset We have to remember that startOffset and endOffset are actually indices. That's why we renamed them in yesterday's phone conference to startIndex and endIndex. They indicate the index of the 1st and the last data point the associated *ValueSet should map to. So the start index actually has to be smaller than the end index. Building a ValueSet where the endIndex is smaller than the startIndex would be very confusing. What would it mean? Should the values be mapped in reverse order? So I believe that the startIndex <= endIndex constraint actually makes sense. Best wishes, Burkhard |
|
From: Burkhard S. <b_...@us...> - 2004-12-17 16:03:54
|
Am Freitag, den 17.12.2004, 07:10 -0500 schrieb Stuart Chalk: > I like this idea > > > Proposal: > > - make startOffset and endOffset required for all valuesets > > (they are optional right now) > > - leave the number *ValueSets at unbounded (as is) > > However, a limitation of XML is that there is no way for enforcing that > startOffset is less than endOffset. This needs to be done in the software > that generates and uses the XML files. That's true. I think we could live with that. Best regards, Burkhard |
|
From: <Mar...@wa...> - 2004-12-17 14:12:58
|
Hi Mark, > (1) Is LCMS its own technique or does it reuse LC and MS=3F (do hybrid > techniques need their own individual definitions=3F) Let's assume LCMS reuses > LC and MS. As far as I remember, we decided that we build hyphenated techniques out of base techniques, so your assumption is correct. > If ExperimentStep is the application of a technique, then LC and MS Pages go > in differnt ExperimentSteps while UV214, UV254, ELS, NCLD all go in the same > ExperimentStep. This is the difficult point about chromatography: the multitude of different detectors. I think the cleanest approach would be to make each detector a different technique (or maybe technique extensions). This would require to put the UV, ELSm NCLD data into different ExperimentSteps. > If multi-detector data goes in multiple ExperimentSteps, then how do we hold > multiple instances of that hybrid analysis in one file=3F Each detector-specific ExperimentStep would have a Reference element pointing up the tree to the mother element, and would possibly sit in a subordinate ExperimentStepSet of the mother page. > Or perhaps they shoould all go in the same Exper ExperimentStep. Now we > have further possibilities - are these to be grouped together, entered > hierarchically in PageSet-Page-PageSet nests=3F What if you have LCUV and a > series of MS spectra with no parent chromatogram=3F etc. etc. Will each new > hybrid technique have to define its own way of nesting data. Perhaps you > begin to see why I suggest we do away with nesting and use clearly defined > categories and pointers to link them together (keyrefs). I see your point here. We need to put up guidelines for users how to structure hyphenated data. Maren. Mit freundlichen Gr=FC=DFen / Best regards Dr. Maren Fiege Product Manager -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99 Reply to: mar...@wa... http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D The information in this email is confidential, and is intended solely for = the addressee(s). Access to this email by anyone else is unauthorized and = therefore prohibited. If you are not the intended recipient you are = notified that disclosing, copying, distributing or taking any action in = reliance on the contents of this information is strictly prohibited and may= = be unlawful. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |
|
From: <Ton...@wa...> - 2004-12-17 13:42:13
|
Hi, there is no logical reason why startOffset should be less than endOffset Tony -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99 Reply to: ton...@wa... http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- WATZIP "Stuart Chalk" <sc...@un...> Sent by: To animl-develop-adm "Burkhard Schaefer" in...@li... <b_...@us...> rge.net cc "Mark F. Bean" <sa...@co...>, "AnIML Developer List" 17.12.2004 13:10 <ani...@li... t> Subject Re: [Animl-develop] Re: Vector Length Issues I like this idea > Proposal: > - make startOffset and endOffset required for all valuesets > (they are optional right now) > - leave the number *ValueSets at unbounded (as is) However, a limitation of XML is that there is no way for enforcing that startOffset is less than endOffset. This needs to be done in the software that generates and uses the XML files. -- Stuart Chalk, Ph.D. Phone:904-620-1938 Associate Professor of Chemistry Fax:904-620-1989 Department of Chemistry and Physics "The Flow Analysis Database" University of North Florida http://www.fia.unf.edu/ 4567 St. Johns Bluff Road S. "The Analytical Sciences Digital Library" Jacksonville FL 32224 USA http://www.asdlib.org/ --------------------------------------------------------- This mail sent through UNF Webmail: https://horde.unf.edu ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ Animl-develop mailing list Ani...@li... https://lists.sourceforge.net/lists/listinfo/animl-develop =========================================================== The information in this email is confidential, and is intended solely for the addressee(s). Access to this email by anyone else is unauthorized and therefore prohibited. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. =========================================================== |
|
From: Stuart C. <sc...@un...> - 2004-12-17 12:12:29
|
I like this idea > Proposal: > - make startOffset and endOffset required for all valuesets > (they are optional right now) > - leave the number *ValueSets at unbounded (as is) However, a limitation of XML is that there is no way for enforcing that startOffset is less than endOffset. This needs to be done in the software that generates and uses the XML files. -- Stuart Chalk, Ph.D. Phone:904-620-1938 Associate Professor of Chemistry Fax:904-620-1989 Department of Chemistry and Physics "The Flow Analysis Database" University of North Florida http://www.fia.unf.edu/ 4567 St. Johns Bluff Road S. "The Analytical Sciences Digital Library" Jacksonville FL 32224 USA http://www.asdlib.org/ --------------------------------------------------------- This mail sent through UNF Webmail: https://horde.unf.edu |
|
From: <Mar...@wa...> - 2004-12-17 09:49:41
|
Hi Burkhard, > Had we had a > Technique Definition for Mark Mullins' experiment, things would have > been a lot easier. We do have a (documented!) technique definition for chromatography, so this is not the problem. Mark Mullins' experiment would probably need a *technique extension*. > It also means that we should soon look into creating a document with > "best practices for Technique Definition authors". Good idea. If you provide a first draft, I can help you review and finish that. Mit freundlichen Gr=FC=DFen / Best regards Dr. Maren Fiege Product Manager -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99 Reply to: mar...@wa... http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D The information in this email is confidential, and is intended solely for = the addressee(s). Access to this email by anyone else is unauthorized and = therefore prohibited. If you are not the intended recipient you are = notified that disclosing, copying, distributing or taking any action in = reliance on the contents of this information is strictly prohibited and may= = be unlawful. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |
|
From: <Mar...@wa...> - 2004-12-16 14:33:20
|
Hi Burkhard, thanks for the clarification. This should have been in your format documentation. Mit freundlichen Gr=FC=DFen / Best regards Dr. Maren Fiege Product Manager -------------------------------------------------------------- Waters Informatics Europaallee 27, D-50226 Frechen, Germany Tel. +49 2234 9207 - 0 Fax. +49 2234 9207-99 Reply to: mar...@wa... http://www.creonlabcontrol.com http://www.watersinformatics.net -------------------------------------------------------------- =20 "Burkhard =20 Schaefer" =20 <b_...@us... To urceforge.net> "AnIML Developer List" =20 <ani...@li... 16.12.2004 03:48 t> =20 cc "Mark F. Bean" =20 <mar...@gs...>, "Mark =20 Mullins" <mar...@sc...>, "Maren Fiege" =20 <Mar...@wa...>, "Mark F. Bean" <sa...@co...> =20 Subject Vector Length Issues =20 =20 =20 =20 =20 =20 =20 Hi everybody, I looked again at the question of the various "length" attributes in and around the Vector element. Let's look at the various elements and what the length attribute would mean there. VectorSet -------- The length attribute in the VectorSet element describes the total number of data points in the diagram. The values (components) that make up a data point can be retrieved by looking at the same index in all vectors. Here's a little drawing (please forgive my poor ASCII art ;-) ) Let's say we have a UV/VIS with two vectors: Wavelength and Absorbtion. We want to store 100 data points, so VectorSet.length is 100. +----+ Wavelength [ w1 w2 | w3 | w4 w5 ...... w100 ] Absorbtion [ a1 a2 | a3 | a4 a5 ...... a100 ] +----+ 3rd data point: (w3, a3) This is pretty straightforward. Each Vector contains a single ValueSet (no matter if Indidual/Encoded/AutoIncremented) with a startOffset of 0 and an endOffset of 99. Now what happens if we have holes in the data=3F So let's assume we don't have an absorbtion reading for w3 and w4. In our example we only have a single dependant vector (absorbtion). So we would just leave out the wavelength values w3 and w4 and we'd be set: Wavelength [ w1 w2 w5 w6 ...... w100 ] Absorbtion [ a1 a2 a5 w6 ...... a100 ] In this case, VectorSet.length would only be 98. But let's assume we have multiple dependant vectors. I can't think of a good second dependant vector for UV/VIS, so let's call it Vector3. In this case we can't leave out w3 and w4 because we might have a reading vor Vector3 there. We could declare that like this: Wavelength [ w1 w2 w3 w4 w5 w6 ...... w100 ] Absorbtion [ a1 a2 ] [ a5 w6 ...... a100 ] <-- two valuesets here Vector3 [ v1 v2 v3 v4 v5 v6 ...... v100 ] Again, we have 100 data points. We don't have a value for absorbtion at a3 and a4, but that is perfectly legal and valid. Absorbtion would use two valuesets: - startOffset 0 - endOffset 1 and - startOffset 4 - endOffset 99 All this can be stored without having a Vector.length attribute. In =66act, what good would it do to explicitly store that Vector3 only has 98 values=3F If we actually need that number, we can easily calculate it using the function ( sum(endOffsets) - sum(startOffsets) ). Adding the Vector.length attribute would not increase the expressive power and would add another point where a file could become inconsistent, making validation more difficult. This same argument exmplains why a length attribute in the *ValueSet elements would not be beneficial. Here, the number of values is even easier to calculate (endOffset-startOffset). Consequently, I would suggest to keep the VectorSet.length attribute defined as the number of data points. I look forward to seeing you all again ("virtually") tomorrow. :-) Best wishes, Burkhard =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D The information in this email is confidential, and is intended solely for = the addressee(s). Access to this email by anyone else is unauthorized and = therefore prohibited. If you are not the intended recipient you are = notified that disclosing, copying, distributing or taking any action in = reliance on the contents of this information is strictly prohibited and may= = be unlawful. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D |
|
From: Burkhard S. <b_...@us...> - 2004-12-16 14:04:11
|
Mark, >>>and EncodedDataSet to 0 to 1 per Vector. If we do NOT do that, the >>>VectorSet length becomes problematic as there is no guarantee that all >>>ValueSets in a Vector are the same length. >> >> I think my example illustrates that not all ValueSets need to share the >> same length. We are currently allowing an unlimited number of ValueSets >> per Vector to permit storage of data with "holes" / sparse data. > > I don't think that is completely true. The only ValueSet that absolutely > requires a length is the AutoIncrementedValueSet as you cannot AutoIncrement > without knowing how many times to increment!! I agree that we need to know how many times to increment. How can we find out in the current structure? Calculate endOffset-startOffset in the AutoIncrementedValueSet. Proposal: - make startOffset and endOffset required for all valuesets (they are optional right now) - leave the number *ValueSets at unbounded (as is) Justificatin: We need the offsets anyway in the case of sparse / non-continuous data. If we make them mandatory, we can use them not only to "align the data points" but also to determine the number of values to generate in the AutoIncrementedValueSet. > The number of increments is > being taken from the VectorSet @length. However, if each > AutoIncrementedValueSet has a different number of increments, then we are > stuck. There are options: Taking the length from endOffset-startOffset will do it. One less point of possible inconsistency in the file. > take care, I look forward to your recursive parser in an infinitely flexible > AnIML kingdom! AnIML kingdom -- nice. ;-) Best wishes, Burkhard |
|
From: Mark F. B. <sa...@co...> - 2004-12-16 12:16:42
|
While I have Burkhard's attention...(hopefully) (1) Is LCMS its own technique or does it reuse LC and MS? (do hybrid techniques need their own individual definitions?) Let's assume LCMS reuses LC and MS. If ExperimentStep is the application of a technique, then LC and MS Pages go in differnt ExperimentSteps while UV214, UV254, ELS, NCLD all go in the same ExperimentStep. If multi-detector data goes in multiple ExperimentSteps, then how do we hold multiple instances of that hybrid analysis in one file? Or perhaps they shoould all go in the same Exper ExperimentStep. Now we have further possibilities - are these to be grouped together, entered hierarchically in PageSet-Page-PageSet nests? What if you have LCUV and a series of MS spectra with no parent chromatogram? etc. etc. Will each new hybrid technique have to define its own way of nesting data. Perhaps you begin to see why I suggest we do away with nesting and use clearly defined categories and pointers to link them together (keyrefs). If they go in the same ExperimentStep, is there any requirement that they be in a particular order? Could you order them like this: MS1, UV chromatogram, MS5, ELS chromatogram, MS2, TIC chromatogram, MS25, XIC mz215-217 chromatogram, MS4... Inserting data is one thing, parsing another, but attempting to hold large datasets in memory quite another. As a result, AnIML Viewers may have an impossible challenge. I would recommend that we strictly standardize the order of elements in AnIML and also require the creation of a timeline technique to include: timeOffset index character-offsets to each Page\Vector The last item makes viewing large numbers of spectra possible. Without it, AnIML is reduced to a data interchange format and cannot be viewed directly. Which raises the question of character encoding - should we enforce Unicode xxx...thus doubling the size of our base64binary EncodedValueSets. I want to reduce the possible ways to fill AnIML to one. |
|
From: Burkhard S. <b_...@us...> - 2004-12-16 03:47:25
|
Oops, one thing I have overlooked: > Perhaps Burkhard missed the point here (probably because my emails have been > confusing). The problem originated when Mark Mullins wanted to put > segmented chromatogram vectors in AutoIncrementedValueSets - which just > isn't going to work unless we allow more than one AutoIncrementedValueSet - > which AnIML does. Having done that, the problem became knowing how long > each Vector segment was - and so the changes started to snowball. I > recommended that he simply encode the data as an EncodedDataSet and not > worry about the space saving, Makes sense. > and that we restrict AutoIncrementedValueSet > and EncodedDataSet to 0 to 1 per Vector. If we do NOT do that, the > VectorSet length becomes problematic as there is no guarantee that all > ValueSets in a Vector are the same length. I think my example illustrates that not all ValueSets need to share the same length. We are currently allowing an unlimited number of ValueSets per Vector to permit storage of data with "holes" / sparse data. 4:47 am... ;-) Burkhard |
|
From: Burkhard S. <b_...@us...> - 2004-12-16 03:39:14
|
Mark, thanks for your quick reply. Yes, I actually did miss the point. :-) Thanks for helping me get my head around this. I agree that we have a lot of options for encoding something with AnIML. And I think that's exactly where the Technique Definitions come in. Using those we can pretty closely control how to encode something. This relieves end-users from the tedious modelling task. Had we had a Technique Definition for Mark Mullins' experiment, things would have been a lot easier. So this puts a lot of responsibility into the handy of the authors of Technique Definitions. But that's what we as a committee are here for. It also means that we should soon look into creating a document with "best practices for Technique Definition authors". I've looked over your proposal and am looking forward to hearing more about it tomorrow. It looks to me that it's adding even more degrees of freedom -- which is both good and bad. As I've posted to the list (not through yet), I've recently written a full parser for AnIML and (if there's enough time tomorrow) could share some experiences. Especially the recursion is rather straightforward to handle if you follow a certain pattern. Talk to you tommorow. I'm looking forward to my long distance bill... ;-) Best wishes, Burkhard Mark F. Bean wrote: > I agree with Burkhard. And by the way, the more I wrestle with AnIML in > close detail for the documentation, the more impressed I am with Burkhard's > brain! He solved a number of problems that I would not have known how to > do. The recent changes are small items in comparison with the tremendous > job done by Burkhard, Dominik, and Maren. But let's not stop there - I want > to consider the restructuring proposal seriously tomorrow. > > Perhaps Burkhard missed the point here (probably because my emails have been > confusing). The problem originated when Mark Mullins wanted to put > segmented chromatogram vectors in AutoIncrementedValueSets - which just > isn't going to work unless we allow more than one AutoIncrementedValueSet - > which AnIML does. Having done that, the problem became knowing how long > each Vector segment was - and so the changes started to snowball. I > recommended that he simply encode the data as an EncodedDataSet and not > worry about the space saving, and that we restrict AutoIncrementedValueSet > and EncodedDataSet to 0 to 1 per Vector. If we do NOT do that, the > VectorSet length becomes problematic as there is no guarantee that all > ValueSets in a Vector are the same length. > > This illustrates my point that AnIML flexibility may need to be constrained > more - there are too many solutions to problems right now. > > I hope you can join us tomorrow Burkhard, > > best wishes, Mark > > |
|
From: Burkhard S. <b_...@us...> - 2004-12-16 03:27:17
|
Hi everybody, recently I had the "pleasure" to implement a fully functional AnIML parser. I discovered a few areas that are fairly difficult to handle. Other things were astonishingly natural to implement. I'd like to share some of my experiences with you in this forum, probably spread out over several messages in the next few days. Best wishes, Burkhard |