Hi Sara,

Actually, this format is really is used to store the mz data of a single run. We have other schemas for lab experiments.
The tables instrument and run have a single record stored.
This is not conventional but this is the way I find to set the data annotations.

The goal of this schema was very close to the one you solved with the Rtree.
I effectively divide the mz acquistion range in slices (by default one run_slice for each uma).
Scans are cut to be transformed in "scan slices" and they are linked their corresponding run_slice.
mz data points (mz_list and intensity_list) are stored in the scan_slice in a binary structure (consecutive DOUBLE numbers).

Using the indexation mechanism of SQLite on tables run_slice and scan_slice it is thus possible to make fast range queries on the mz data.
However some postprocess mz filtering has to be done on the data contained in the retrieved slices to have only the wanted data points.


Le 03/11/2010 00:38, Sara Nasso a écrit :
Hi David,

I had a look at your schema. As far as I can see you conceived it to be used as a DB for all lab experiments: we could use it at the end of the work as an add-on to the format. But then, thinking of a lab level, I don't know if SQLite would be that efficient. 
I'd like to ask you something about it: why are you dividing the mz acquisition range in slices?


Da: David Bouyssié <david.bouyssie@ipbs.fr>
A: Sara Nasso <apeir0n@yahoo.it>
Cc: Matthew Chambers <matt.chambers42@gmail.com>; Francesco Silvestri <silvest1@dei.unipd.it>
Inviato: Mar 2 novembre 2010, 14:47:03
Oggetto: Re: [Psidev-pi-dev]mzRTree -quite urgent


The simp file seems to be a XML file generated with an ER designer but I don't know which.
Personally I use Power Architect which is a Java based open source software with many features.

I send you an ER diagram I started to develop before knowing your work about RTree indexation.
I know it is a bit different with the implementation of an RTree but I think we could pick up some ideas from it.


Le 02/11/2010 13:38, Sara Nasso a écrit :
Hi David,

here attached there's everything. I've just modified them according to Matt's email. I hope everything is ok, if not, please tell me.

@ Matt:

I only have a doubt on experimental setup:

If A is a precursor scan of B (MS1) and C (MS2), then can I say that B is precursor of C?

Thus, how should I populate the precursorProduct linker table?

Is it like:


or like:




Sara Nasso, Ph.D. student
Department of Information Engineering (DEI)
University of Padova
Via Ognissanti 72 35129 PADOVA, ITALY
Voice: +39-049-8277834
Fax: +39-049-8277826

Da: David Bouyssié <david.bouyssie@ipbs.fr>
A: Matthew Chambers <matt.chambers42@gmail.com>
Cc: Sara Nasso <apeir0n@yahoo.it>; Francesco Silvestri <silvest1@dei.unipd.it>
Inviato: Mar 2 novembre 2010, 10:29:40
Oggetto: Re: [Psidev-pi-dev]mzRTree -quite urgent


Thank you Matt for your clear answers.

Sara could you please send me the ER diagram ?
Which diagram designer do you use ?

Thank you,


Le 01/11/2010 16:49, Matthew Chambers a écrit :
> Hi Sara and David. I've written responses inline.
> On 10/29/2010 5:42 AM, Sara Nasso wrote:
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------
>> *Da:* Sara Nasso <apeir0n@yahoo.it>
>> *A:* Matt Chambers <matt.chambers42@gmail.com>
>> *Cc:* Francesco Silvestri <silvest1@dei.unipd.it>
>> *Inviato:* Ven 29 ottobre 2010, 12:35:51
>> *Oggetto:* Re: [Psidev-pi-dev]mzRTree
>> Hi Matt,
>> I've prepared the ER diagram you find here attached (pdf and simp files) and the draft code for the tables based on the discussion we had previously with you and Francesco. If you have any comment let me know, especially on the chromatogramList table. I am not sure about the meaning of the indexing attribute when the mzML reports chromatograms instead of spectra.
>> If you agree I would send everything to David too.
> Let's first try this without bothering with the chromatogramList. The two way data membership is potentially quite confusing. A typical MSn experiment will have multiple products for a single precursor scan, so the tree relationship indicated by "product_spectrumList_scan" would have to be in a linker table. And anyway I think we can also exclude that for a first iteration (my idea is to test that it performs well in SQLite before spending a lot of time). Since there will be millions of points in the DATA table, it will be quite costly to have the spectrumList_scan foreign key. That cost isn't just in disk space either: a bigger table will have a performance impact due to taking longer to load.
> -------- Message original --------
> Sujet:    Re: mzrtree and sqlite
> Date :    Fri, 29 Oct 2010 09:36:32 +0200
> De :    David Bouyssié <david.bouyssie@ipbs.fr>
> Pour :    Sara Nasso <apeir0n@yahoo.it>
>> Hi,
>> I have an additional questions:
>> - do you have written some specifications for this project ?
> There's no formal specification. As I said above I want to make sure that SQLite can handle this before investing a lot of time in it.
>> - is there a SQL schema started ?
> Yes, Sara has given me a draft of a SQLite schema which I made comments on above. I leave it up to her to either share it before or after responding to my comments.
>> - is there a google group or something equivalent to manage this pwiz subproject ?
> I just created a proteowizard-mzrtree group to serve this purpose. It will take a few hours to activate.
>> I'am also writing my phd thesis on a quantitation topic. It seems that we have very similar concerns and deadlines ;-)
>> I have compiled pwiz yesterday because I thought it was a more appropriate project than readw for that.
>> So I totally agree Matt Chambers and I'm happy to see that you agree too.
>> You can count on me to work on this project.
>> I have downloaded source code from SVN repository of pwiz.
>> Do you have open a new code branch somewhere else ?
> There's no new branch. After the concept and performance is proven by Sara, I will hammer out a pwiz implementation in trunk in a couple days.
>> I have also some questions about the pwiz compilation. I'm unable to make a compilation with a support for RAW data files.
>> Maybe you have some advices for this.
>> I have compiled on WinXP + VS C++ 2010 and I have installed msfilereader from thermo.
> We don't yet test with VC10 and there are some compatibility problems that haven't been addressed. Do you have access to VC9?
> Thanks,
> -Matt