Thread: [Animl-develop] Vector Length Issues

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi everybody,

I looked again at the question of the various "length" attributes in and 
around the Vector element. Let's look at the various elements and what 
the length attribute would mean there.

VectorSet
--------
The length attribute in the VectorSet element describes the total number 
of data points in the diagram. The values (components) that make up a 
data point can be retrieved by looking at the same index in all vectors. 
Here's a little drawing (please forgive my poor ASCII art ;-) )

Let's say we have a UV/VIS with two vectors: Wavelength and Absorbtion. 
We want to store 100 data points, so VectorSet.length is 100.

                     +----+
Wavelength  [ w1 w2 | w3 | w4 w5 ...... w100 ]
Absorbtion  [ a1 a2 | a3 | a4 a5 ...... a100 ]
                     +----+
                  3rd data point: (w3, a3)

This is pretty straightforward. Each Vector contains a single ValueSet 
(no matter if Indidual/Encoded/AutoIncremented) with a startOffset of 0 
and an endOffset of 99.

Now what happens if we have holes in the data? So let's assume we don't 
have an absorbtion reading for w3 and w4. In our example we only have a 
single dependant vector (absorbtion). So we would just leave out the 
wavelength values w3 and w4 and we'd be set:

Wavelength  [ w1 w2 w5 w6 ...... w100 ]
Absorbtion  [ a1 a2 a5 w6 ...... a100 ]

In this case, VectorSet.length would only be 98.

But let's assume we have multiple dependant vectors. I can't think of a 
good second dependant vector for UV/VIS, so let's call it Vector3. In 
this case we can't leave out w3 and w4 because we might have a reading 
vor Vector3 there. We could declare that like this:

Wavelength  [ w1 w2 w3 w4 w5 w6 ...... w100 ]
Absorbtion  [ a1 a2 ]   [ a5 w6 ...... a100 ]  <-- two valuesets here
Vector3     [ v1 v2 v3 v4 v5 v6 ...... v100 ]

Again, we have 100 data points. We don't have a value for absorbtion at 
a3 and a4, but that is perfectly legal and valid. Absorbtion would use 
two valuesets:

   - startOffset 0 - endOffset 1  and
   - startOffset 4 - endOffset 99

All this can be stored without having a Vector.length attribute. In 
fact, what good would it do to explicitly store that Vector3 only has 98 
values? If we actually need that number, we can easily calculate it 
using the function ( sum(endOffsets) - sum(startOffsets) ). Adding the 
Vector.length attribute would not increase the expressive power and 
would add another point where a file could become inconsistent, making 
validation more difficult.

This same argument exmplains why a length attribute in the *ValueSet 
elements would not be beneficial. Here, the number of values is even 
easier to calculate (endOffset-startOffset).

Consequently, I would suggest to keep the VectorSet.length attribute 
defined as the number of data points.

I look forward to seeing you all again ("virtually") tomorrow. :-)

Best wishes,
Burkhard

Thread: [Animl-develop] Vector Length Issues

Open XML format for analytical chemistry and biology data.

animl-develop