You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(5) |
Aug
(4) |
Sep
(4) |
Oct
(10) |
Nov
(1) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2008 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(8) |
May
(40) |
Jun
(30) |
Jul
(61) |
Aug
(21) |
Sep
(12) |
Oct
(56) |
Nov
(99) |
Dec
(83) |
2009 |
Jan
(3) |
Feb
(9) |
Mar
(1) |
Apr
(5) |
May
(88) |
Jun
(43) |
Jul
(60) |
Aug
(54) |
Sep
(4) |
Oct
(18) |
Nov
(9) |
Dec
(5) |
2010 |
Jan
|
Feb
(3) |
Mar
(1) |
Apr
(8) |
May
(10) |
Jun
(8) |
Jul
(10) |
Aug
(18) |
Sep
(11) |
Oct
(19) |
Nov
(14) |
Dec
(26) |
2011 |
Jan
(27) |
Feb
(38) |
Mar
(50) |
Apr
(128) |
May
(54) |
Jun
(116) |
Jul
(79) |
Aug
(163) |
Sep
(21) |
Oct
(14) |
Nov
(19) |
Dec
(9) |
2012 |
Jan
(7) |
Feb
(34) |
Mar
(34) |
Apr
(50) |
May
(70) |
Jun
(23) |
Jul
(8) |
Aug
(24) |
Sep
(35) |
Oct
(40) |
Nov
(276) |
Dec
(34) |
2013 |
Jan
(25) |
Feb
(23) |
Mar
(12) |
Apr
(59) |
May
(31) |
Jun
(11) |
Jul
(21) |
Aug
(7) |
Sep
(18) |
Oct
(11) |
Nov
(12) |
Dec
(18) |
2014 |
Jan
(37) |
Feb
(22) |
Mar
(9) |
Apr
(10) |
May
(38) |
Jun
(20) |
Jul
(15) |
Aug
(4) |
Sep
(4) |
Oct
(3) |
Nov
(8) |
Dec
(5) |
2015 |
Jan
(13) |
Feb
(34) |
Mar
(27) |
Apr
(5) |
May
(12) |
Jun
(10) |
Jul
(12) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
(17) |
Apr
(139) |
May
(120) |
Jun
(90) |
Jul
(10) |
Aug
|
Sep
|
Oct
(11) |
Nov
(6) |
Dec
(2) |
2017 |
Jan
(24) |
Feb
(8) |
Mar
(7) |
Apr
(2) |
May
(5) |
Jun
(11) |
Jul
(5) |
Aug
(9) |
Sep
(6) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
|
Mar
(4) |
Apr
(6) |
May
(10) |
Jun
(6) |
Jul
(7) |
Aug
|
Sep
(7) |
Oct
(5) |
Nov
(3) |
Dec
(3) |
2019 |
Jan
(3) |
Feb
|
Mar
(4) |
Apr
(3) |
May
(2) |
Jun
(6) |
Jul
(3) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(12) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: David C. <dc...@ma...> - 2008-08-07 14:04:11
|
Hi Martin, Angel, The use of 15N (or other metabolic labelling) isn't really a modification. We'd have to duplicate every nitrogen containing modification in PSI-MOD - once with standard 14N and once with 15N. And then maybe the same for 13C... David Martin Eisenacher wrote: >>> This seems to work pretty well (as expected!!) >>> The interesting thing is that there are two sets of residue masses, two >>> SpectrumIdentificationList (one for light, one for heavy) but just one >>> ProteinDetectionList. > At first glance I thought it would be better to have one residue mass list > and a modification for N15 (see below). But I saw it was a bad solution. > > It indeed works like you modeled it, because we don't report any masses > for DBSequences or Proteins. > > >>> And (possibly confusing at first glance!) unmodified peptides with >>> different masses like this: >>> >>> <Peptide id="peptide_48_1" sequenceMass="1025.481796" > >>> <peptideSequence>STNLDWYK</peptideSequence> >>> </Peptide> >>> >>> <Peptide id="peptide_53_1" sequenceMass="1036.449188"> >>> <peptideSequence>STNLDWYK</peptideSequence> >>> </Peptide> >> I think this is where I come in with the mod specifications for >> peptide results. I'll try and add this as an example to issue 35 > It could be possible in the current schema with the CustomModification element. > But that should be "extended" to report "elements" (N, C, O) instead > of "locations". > Or PSI-MOD has to include "N15 mod of amino acid X" with the respective mass > or delta mass, which maybe there already. But that would be quite verbose. > > Bye > Martin > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-08-07 13:51:07
|
> > This seems to work pretty well (as expected!!) > > The interesting thing is that there are two sets of residue masses, two > > SpectrumIdentificationList (one for light, one for heavy) but just one > > ProteinDetectionList. At first glance I thought it would be better to have one residue mass list and a modification for N15 (see below). But I saw it was a bad solution. It indeed works like you modeled it, because we don't report any masses for DBSequences or Proteins. > > And (possibly confusing at first glance!) unmodified peptides with > > different masses like this: > > > > <Peptide id="peptide_48_1" sequenceMass="1025.481796" > > > <peptideSequence>STNLDWYK</peptideSequence> > > </Peptide> > > > > <Peptide id="peptide_53_1" sequenceMass="1036.449188"> > > <peptideSequence>STNLDWYK</peptideSequence> > > </Peptide> > > I think this is where I come in with the mod specifications for > peptide results. I'll try and add this as an example to issue 35 It could be possible in the current schema with the CustomModification element. But that should be "extended" to report "elements" (N, C, O) instead of "locations". Or PSI-MOD has to include "N15 mod of amino acid X" with the respective mass or delta mass, which maybe there already. But that would be quite verbose. Bye Martin |
From: Jones, A. <And...@li...> - 2008-08-06 09:25:12
|
> 4. Do we have final agreement on: > http://code.google.com/p/psi-pi/issues/detail?id=28 > See email exchange on 1st August. (Andy, pls. could you add final > proposal to the issue) I've added the proposal to the issues list. It would be good if people could review and comment prior to the call if they have major objections to this type of structure. I can see this taking up the entire call if we start discussing the pros and cons from scratch again! Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of David Creasy > Sent: 06 August 2008 10:00 > To: psi...@li... > Subject: [Psidev-pi-dev] PSI-PI Working group conference call on Thursday 7th > August at 4:00pm UK time > > Hi everyone, > > There will be an AnalysisXML working group conference call on Thursday at: > http://www.timeanddate.com/worldclock/fixedtime.html?day=7&month=8&year=200 > 8&hour=16&min=0&sec=0&p1=136 > > Minutes from last meeting at: > http://www.psidev.info/index.php?q=node/359 > > Agenda: > 1. Telecon meetings for rest of August: Phil won't be available on the > 14th, 21st and 28th August. I'm away 21st August. Agree who/when/how... > > 2. Agree on what else required before submitting to steering group to > review. Assign willing volunteers! > http://code.google.com/p/psi-pi/issues/detail?id=41 > > 3. Make final? decision on enzymes based on proposal due from Angel: > http://code.google.com/p/psi-pi/issues/detail?id=30#c19 > > 4. Do we have final agreement on: > http://code.google.com/p/psi-pi/issues/detail?id=28 > See email exchange on 1st August. (Andy, pls. could you add final > proposal to the issue) > > 5. Modifications proposal from Angel: > http://code.google.com/p/psi-pi/issues/detail?id=35 > Additional feedback to the list or the issue before the meeting > please. > > > Dial in details: > > + Germany: 08001012079 > + Switzerland: 0800000860 > + UK: 08081095644 > + USA: 1-866-314-3683 > + Generic international: +44 2083222500 (UK number) > > access code: 297427 > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > > > > > > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: David C. <dc...@ma...> - 2008-08-06 09:00:07
|
Hi everyone, There will be an AnalysisXML working group conference call on Thursday at: http://www.timeanddate.com/worldclock/fixedtime.html?day=7&month=8&year=2008&hour=16&min=0&sec=0&p1=136 Minutes from last meeting at: http://www.psidev.info/index.php?q=node/359 Agenda: 1. Telecon meetings for rest of August: Phil won't be available on the 14th, 21st and 28th August. I'm away 21st August. Agree who/when/how... 2. Agree on what else required before submitting to steering group to review. Assign willing volunteers! http://code.google.com/p/psi-pi/issues/detail?id=41 3. Make final? decision on enzymes based on proposal due from Angel: http://code.google.com/p/psi-pi/issues/detail?id=30#c19 4. Do we have final agreement on: http://code.google.com/p/psi-pi/issues/detail?id=28 See email exchange on 1st August. (Andy, pls. could you add final proposal to the issue) 5. Modifications proposal from Angel: http://code.google.com/p/psi-pi/issues/detail?id=35 Additional feedback to the list or the issue before the meeting please. Dial in details: + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Jones, A. <And...@li...> - 2008-08-01 15:30:50
|
> Well, notice I use "values" instead of "value" (but that wasn't > intentional, hehe). We could either use the value attribute we get from > subclassing, or add a new attribute that is more specific (like > "seriesIndexList" or something). Y I think either of these options would be fine. I would marginally favour adding, say, seriesIndex = "" then we can add a list datatype and specific documentation for the attribute in the XSD. > > I don't even know what an immonium ion is so I don't want to have to sign off on > a perfect list of all ion types in the analysisXML documentation! By using ontology > terms we can leave flexibility in there so that implementers can report whatever ion > types they like. IMO being as compact as possible is not really a big deal? > > > Yes, those ion types are troublesome to represent in a label. How will > that be done in the CV? In other words, the same question applies to the > cvParam approach. :) But your point about having to have universal > agreement up front is a good one. Yet, if we DON'T have agreement up > front about ion types, will that mean we start getting requests for > obscure, vendor-specific ion types as CV terms? Or will we see > userParams used there instead? It's not clear to me exactly how the CV > approach solves the up front agreement issue either (at least, not > without creating its own issues). I think the PSI cv should contain the main ion types (b / y, neutral losses etc) with definitions. For the rest, I was thinking obscure vendor specific terms was the way to go i.e. not my problem :-) I just want to focus on getting the XSD over the line into version 1 state. CV terms can evolve as and when they are needed but once the spec doc is fixed at version 1 we don't want to have to touch it again! Cheers Andy > -----Original Message----- > From: Matthew Chambers [mailto:mat...@va...] > Sent: 01 August 2008 16:21 > To: Jones, Andy > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragmentation Ions > > Hi Andy, > > > Jones, Andy wrote: > > Hi Matt, > > > > > >> I'm (still) not sure what the use case is for all the "extra" > >> measurements that seem to me to be redundant with the label, but if > >> reporting those is the decision of the implementor, I'm happy with > >> having that capability. > >> > > On the call, it was discussed that some implementers may wish to report > additional scores associated with particular peaks e.g. why a particular peak was > identified as a particular ion (for example related to the abundance of an adjacent > peak). I think this is a niche use case but this proposal would allow it to be done. > Also see the other values in comment 5 of issue 28 e.g. product ion m/z error > > > I can see the use of being able to report additional scores on a per > peak basis, although I don't personally know how to use that information. :) > > > >> My modified proposal (with some extra compactness possible by opting to > >> leave out the extra measurements): > >> > >>> <Fragmentation> > >>> <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > >>> values="1 2 3"/> > >>> <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>> values="4 5 6"/> > >>> </Fragmentation> > >>> > > We can definitely do it this way in the XSD. The only minor advantage of doing it > the other way is that the format (xsd:list) can be verified by an XML parser rather > than relying on the validator software, either is fine though. > > > Well, notice I use "values" instead of "value" (but that wasn't > intentional, hehe). We could either use the value attribute we get from > subclassing, or add a new attribute that is more specific (like > "seriesIndexList" or something). Even if we reuse CVParam's value, I > would expect there to be some way of overriding the type of the > inherited attribute? In any case, xsd:list isn't enough to semantically > validate the series, because a list like "100 99999 2939439" is a valid > xsd:list but obviously semantically crazy. :) > > > >> Although even then, it's about 10 times less compact than the formatted > >> attribute proposal, but which would only work by explicitly denying the > >> storage of extra measurements. For reference, using the same conditions > >> for my approximate calculations above, MyriMatch's output by the > >> formatted attribute method would have about 1.5mb of fragmentation info. > >> > >>> fragmentEvidence="y1 y2 y3 b4 b5 b6" > >>> > > > > Although initially I was in favour of this approach, it suffers from the problem that > we have to decide now (in the schema documentation) on all ion types and > definitions. I'm not even sure if there is universal agreement on what constitutes > each ion type, see comment from David: > > > > > >> "What about internal fragments, immonium ions, side chain cleavages?" > >> > > > > I don't even know what an immonium ion is so I don't want to have to sign off on > a perfect list of all ion types in the analysisXML documentation! By using ontology > terms we can leave flexibility in there so that implementers can report whatever ion > types they like. IMO being as compact as possible is not really a big deal? > > > Yes, those ion types are troublesome to represent in a label. How will > that be done in the CV? In other words, the same question applies to the > cvParam approach. :) But your point about having to have universal > agreement up front is a good one. Yet, if we DON'T have agreement up > front about ion types, will that mean we start getting requests for > obscure, vendor-specific ion types as CV terms? Or will we see > userParams used there instead? It's not clear to me exactly how the CV > approach solves the up front agreement issue either (at least, not > without creating its own issues). > > -Matt |
From: Matthew C. <mat...@va...> - 2008-08-01 15:21:16
|
Hi Andy, Jones, Andy wrote: > Hi Matt, > > >> I'm (still) not sure what the use case is for all the "extra" >> measurements that seem to me to be redundant with the label, but if >> reporting those is the decision of the implementor, I'm happy with >> having that capability. >> > On the call, it was discussed that some implementers may wish to report additional scores associated with particular peaks e.g. why a particular peak was identified as a particular ion (for example related to the abundance of an adjacent peak). I think this is a niche use case but this proposal would allow it to be done. Also see the other values in comment 5 of issue 28 e.g. product ion m/z error > I can see the use of being able to report additional scores on a per peak basis, although I don't personally know how to use that information. :) >> My modified proposal (with some extra compactness possible by opting to >> leave out the extra measurements): >> >>> <Fragmentation> >>> <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" >>> values="1 2 3"/> >>> <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" >>> values="4 5 6"/> >>> </Fragmentation> >>> > We can definitely do it this way in the XSD. The only minor advantage of doing it the other way is that the format (xsd:list) can be verified by an XML parser rather than relying on the validator software, either is fine though. > Well, notice I use "values" instead of "value" (but that wasn't intentional, hehe). We could either use the value attribute we get from subclassing, or add a new attribute that is more specific (like "seriesIndexList" or something). Even if we reuse CVParam's value, I would expect there to be some way of overriding the type of the inherited attribute? In any case, xsd:list isn't enough to semantically validate the series, because a list like "100 99999 2939439" is a valid xsd:list but obviously semantically crazy. :) >> Although even then, it's about 10 times less compact than the formatted >> attribute proposal, but which would only work by explicitly denying the >> storage of extra measurements. For reference, using the same conditions >> for my approximate calculations above, MyriMatch's output by the >> formatted attribute method would have about 1.5mb of fragmentation info. >> >>> fragmentEvidence="y1 y2 y3 b4 b5 b6" >>> > > Although initially I was in favour of this approach, it suffers from the problem that we have to decide now (in the schema documentation) on all ion types and definitions. I'm not even sure if there is universal agreement on what constitutes each ion type, see comment from David: > > >> "What about internal fragments, immonium ions, side chain cleavages?" >> > > I don't even know what an immonium ion is so I don't want to have to sign off on a perfect list of all ion types in the analysisXML documentation! By using ontology terms we can leave flexibility in there so that implementers can report whatever ion types they like. IMO being as compact as possible is not really a big deal? > Yes, those ion types are troublesome to represent in a label. How will that be done in the CV? In other words, the same question applies to the cvParam approach. :) But your point about having to have universal agreement up front is a good one. Yet, if we DON'T have agreement up front about ion types, will that mean we start getting requests for obscure, vendor-specific ion types as CV terms? Or will we see userParams used there instead? It's not clear to me exactly how the CV approach solves the up front agreement issue either (at least, not without creating its own issues). -Matt |
From: Jones, A. <And...@li...> - 2008-08-01 13:57:38
|
Hi Matt, > I'm (still) not sure what the use case is for all the "extra" > measurements that seem to me to be redundant with the label, but if > reporting those is the decision of the implementor, I'm happy with > having that capability. On the call, it was discussed that some implementers may wish to report additional scores associated with particular peaks e.g. why a particular peak was identified as a particular ion (for example related to the abundance of an adjacent peak). I think this is a niche use case but this proposal would allow it to be done. Also see the other values in comment 5 of issue 28 e.g. product ion m/z error > My modified proposal (with some extra compactness possible by opting to > leave out the extra measurements): > > <Fragmentation> > > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > > values="1 2 3"/> > > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > > values="4 5 6"/> > > </Fragmentation> We can definitely do it this way in the XSD. The only minor advantage of doing it the other way is that the format (xsd:list) can be verified by an XML parser rather than relying on the validator software, either is fine though. > My modified proposal (with some extra compactness possible by opting to > leave out the extra measurements): > > <Fragmentation> > > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > > values="1 2 3"/> > > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > > values="4 5 6"/> > > </Fragmentation> I agree this is fine for a viewer, in that it tells you the expected ion types that were detected, but it doesn't tell you which observed ion types were matched to them. If multiple peaks fall in the same range near an expected peak, you've lost information. > Although even then, it's about 10 times less compact than the formatted > attribute proposal, but which would only work by explicitly denying the > storage of extra measurements. For reference, using the same conditions > for my approximate calculations above, MyriMatch's output by the > formatted attribute method would have about 1.5mb of fragmentation info. > > fragmentEvidence="y1 y2 y3 b4 b5 b6" Although initially I was in favour of this approach, it suffers from the problem that we have to decide now (in the schema documentation) on all ion types and definitions. I'm not even sure if there is universal agreement on what constitutes each ion type, see comment from David: >"What about internal fragments, immonium ions, side chain cleavages?" I don't even know what an immonium ion is so I don't want to have to sign off on a perfect list of all ion types in the analysisXML documentation! By using ontology terms we can leave flexibility in there so that implementers can report whatever ion types they like. IMO being as compact as possible is not really a big deal? Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Matt Chambers > Sent: 01 August 2008 14:24 > To: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragmentation Ions > > Hi all, > > I'm (still) not sure what the use case is for all the "extra" > measurements that seem to me to be redundant with the label, but if > reporting those is the decision of the implementor, I'm happy with > having that capability. Some rough calculation tells me that if I was to > write this format from MyriMatch with 10k spectra with 5 results each > and an average of 2 y ions and 2 b ions matched, that would be about > 16mb of fragmentation data (leaving out the "extra" measurements). That > is a lot better than where we were before. But I think we can compact it > some more. IIRC, other places in the schema have elements that > essentially subclass cvParam, is that right? It would compact things to > make IonType such a subclass with the intention that the accession > attribute point to an ion CV term and an extra attribute would > correspond with the FragArrayIndex. > > The current proposal: > > <Fragmentation> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion"/> > > <FragArrayIndex values="1 2 3"/> > > </IonType> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> > > <FragArrayIndex values="4 5 6"/> > > </IonType> > > </Fragmentation> > > My modified proposal (with some extra compactness possible by opting to > leave out the extra measurements): > > <Fragmentation> > > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > > values="1 2 3"/> > > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > > values="4 5 6"/> > > </Fragmentation> > > This method would not interfere with the capability of having extra > measurements, and it provides roughly 30% more compact way of annotating > an ion series. > > Although even then, it's about 10 times less compact than the formatted > attribute proposal, but which would only work by explicitly denying the > storage of extra measurements. For reference, using the same conditions > for my approximate calculations above, MyriMatch's output by the > formatted attribute method would have about 1.5mb of fragmentation info. > > fragmentEvidence="y1 y2 y3 b4 b5 b6" > > -Matt > > > > Jones, Andy wrote: > >> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the > >> Y-H20 ions for this peptide identification) then the attribute > >> value="3" on the cvParam element should be removed - or have I > >> misunderstood how this works? > >> > > > > Correct, my mistake. The example says we have found y3-H2O y8-H2O and > y10-H2O, the cvParam should not have had the value > > > > > > <Fragmentation> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion - > H2O"/> > > <FragArrayIndex values = "3 8 10"/> > > <FragArray Measure_ref = "m1" values = "379.2215 457.12345 > 540.234"/> > > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > > <!-- and so on for other measures as defined in the > FragmentationTable --> > > </IonType> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> > > <FragArrayIndex values = "2 12 14"/> > > <FragArray Measure_ref = "m1" values = "560.153 859.111 > 945.653"/> > > <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > > <!-- and so on for other measures as defined in the > FragmentationTable --> > > </IonType> > > > > </Fragmentation> > > > > > > > >> Please excuse me for stating the obvious, but... there is no reason > >> why the pointers m1, m2, m3, m4 could not be more human readable, so > >> changed in this example to mz, inten, mz_error, ret_error for example. > >> (To help implementors understand the mechanism). > >> > > > > Good suggestion. > > > > Cheers > > Andy > > > > > > > > > >> -----Original Message----- > >> From: phi...@go... [mailto:phi...@go...] > On > >> Behalf Of Phil Jones @ EBI > >> Sent: 01 August 2008 11:23 > >> To: Jones, Andy; psi...@li... > >> Subject: Re: [Psidev-pi-dev] Fragmentation Ions > >> > >> Hi Andy, > >> > >> This looks really good - both flexible and compact. > >> > >> Just to clarify - in your example: > >> > >> <IonType> > >> <cvParam cvLabel="Waters" accession="PLGS:00035" > >> name="y ion -H2O" value="3"/> > >> <FragArrayIndex values = "3 8 10"/> > >> <FragArray Measure_ref = "m1" values = "379.2215 > >> 457.1234 540.234"/> > >> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > >> <!-- and so on for other measures as defined in the > >> FragmentationTable --> > >> </IonType> > >> > >> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the > >> Y-H20 ions for this peptide identification) then the attribute > >> value="3" on the cvParam element should be removed - or have I > >> misunderstood how this works? > >> > >> Please excuse me for stating the obvious, but... there is no reason > >> why the pointers m1, m2, m3, m4 could not be more human readable, so > >> changed in this example to mz, inten, mz_error, ret_error for example. > >> (To help implementors understand the mechanism). > >> > >> best regards, > >> > >> Phil. > >> > >> > >> > >> 2008/8/1 Jones, Andy <And...@li...>: > >> > >>> Hi all, > >>> > >>> Here's a proposal for fragmentation ions as discussed on the call that's > halfway > >>> > >> between using cvParams for all values and using an array based encoding. I > think > >> it's pretty flexible and concise. > >> > >>> First up, setup a FragmentationTable for the entire list of the spectra, which > says > >>> > >> the kinds of measures you're going to report lower down: > >> > >>> <SpectrumIdentificationList id="MASCOT_results"> > >>> <FragmentationTable> > >>> <Measures> > >>> <Measure id = "m1"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00024" > >>> > >> name="product ion m/z"/> > >> > >>> </Measure> > >>> <Measure id = "m2"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00025" > >>> > >> name="product ion intensity"/> > >> > >>> </Measure> > >>> <Measure id = "m3"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00026" > >>> > >> name="product ion m/z error"/> > >> > >>> </Measure> > >>> <Measure id = "m4"> > >>> <cvParam cvLabel="Waters" accession="PLGS:00027" > >>> > >> name="product ion retention time error"/> > >> > >>> </Measure> > >>> </Measures> > >>> </FragmentationTable> > >>> > >>> Then for each SpectrumIdentificationItem, you reference back to these > >>> > >> Measures > >> > >>> <SpectrumIdentificationItem id="SEQ_spec1_pep1" > Peptide_ref="prot1_pep1" > >>> > >> chargeState="1"> > >> > >>> <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" > end="79" > >>> > >> isDecoy="false" /> > >> > >>> ... > >>> > >>> <Fragmentation> > >>> <IonType> > >>> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion - > >>> > >> H2O" value="3"/> > >> > >>> <FragArrayIndex values = "3 8 10"/> > >>> <FragArray Measure_ref = "m1" values = "379.2215 457.1234 > >>> > >> 540.234"/> > >> > >>> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > >>> <!-- and so on for other measures as defined in the > >>> > >> FragmentationTable --> > >> > >>> </IonType> > >>> <IonType> > >>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>> > >> value="4"/> > >> > >>> <FragArrayIndex values = "2 12 14"/> > >>> <FragArray Measure_ref = "m1" values = "560.153 859.111 > >>> > >> 945.653"/> > >> > >>> <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > >>> <!-- and so on for other measures as defined in the > >>> > >> FragmentationTable --> > >> > >>> </IonType> > >>> > >>> </Fragmentation> > >>> > >>> > >>> Each array contains space separated values (i.e. an xsd:list). The > FragArrayIndex > >>> > >> tells you which ions you've found i.e. for the second IonType we have b2 b12 > and > >> b14 which have the m/z and intensity values in the m1 and m2 arrays. This will > >> save a lot of space if there are many ions of the same type in each array and I > >> think it is fairly easy to read as well. Slightly more space could be saved by > >> defining the ion types in the FragmentationTable but not much really once > you've > >> added a reference back up to it. > >> > >>> Cheers > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>>> -----Original Message----- > >>>> From: psi...@li... [mailto:psidev-pi-dev- > >>>> bo...@li...] On Behalf Of Matthew Chambers > >>>> Sent: 18 July 2008 16:00 > >>>> To: psi...@li... > >>>> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is > currently > >>>> handled in PRIDE (Issue 28) > >>>> > >>>> I also agree that anything beyond an array is far too verbose. To answer > >>>> this question, I think we need to decide the scope of the problem. What > >>>> do we want fragment ion information to represent? I think analysis > >>>> software is too diverse to use it for anything more than basic > >>>> annotation, but basic annotation is important. If there are ways people > >>>> want it to be usable beyond that, speak up. :) > >>>> > >>>> For basic annotation, all I think is needed is the fragment type, series > >>>> number, charge state, and possibly any modification like a neutral loss > >>>> or radical. The array can be an attribute or text node. We can use a > >>>> grammar for each term, where each term represents an ion and terms are > >>>> space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > >>>> and peptide_length>[<+|-><formula>][,(<+|-><charge>] > >>>> We could make the charge part mandatory or if it was optional, assume a > >>>> +1 charge (or possibly allow the charge to be based on the polarity of > >>>> the source scan?). I assume there is a standard chemical formula format > >>>> that can be represented compactly in ASCII text, but I don't know it. > >>>> An example to show how compact it could be: > >>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > >>>> > >>>> For basic annotation, the masses are not necessary I think. Expected > >>>> mass can be recomputed if all the label metadata is complete and > >>>> regular, and the observed mass is unimportant for annotation (IMO). > >>>> > >>>> -Matt > >>>> > >>>> > >>>> David Creasy wrote: > >>>> > >>>>> Hi Phil, > >>>>> > >>>>> Just to be sure I've not misunderstood... from below, each fragment ion > >>>>> takes approx 500 bytes. Lets assume a conservative average of 20 > >>>>> fragment matches per spectrum and a modest search with 100k spectra. > >>>>> Assuming that we just report fragment matches for the top match for each > >>>>> spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > >>>>> If we reported fragment matches for the the top 10 matches for each > >>>>> spectrum, this would be 10Gb. Is this reasonable and acceptable? > >>>>> > >>>>> David > >>>>> > >>>>> > >>>>> > >>>>> Phil Jones @ EBI wrote: > >>>>> > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> Regarding Issue 28 > >>>>>> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >>>>>> reporting of fragment ions" > >>>>>> > >>>>>> As a suggestion of how this might be tackled: > >>>>>> > >>>>>> The latest development version of the PRIDE database includes a very > >>>>>> simple mechanism > >>>>>> for recording fragment ion information, illustrated below. (Please > >>>>>> note - made up data.) > >>>>>> > >>>>>> In this example, CV terms are used to define the type of ion and > >>>>>> related information > >>>>>> / annotation. Note that this is even more simple that the suggestion > >>>>>> made by Andy > >>>>>> above - no attempt is made here to indicate which residue has been > >>>>>> called for each > >>>>>> fragment ion - it is just listing the ions. > >>>>>> > >>>>>> Also note that while the PeptideItem is referencing the mass spectrum > >>>>>> > >> (which is > >> > >>>>>> reported in detail in the associated mzData file), the individual > >>>>>> fragment ions are > >>>>>> just reporting the m/z value and not attempting to make any kind of > >>>>>> hard reference to > >>>>>> the spectrum. > >>>>>> > >>>>>> As you can see, this has been developed in collaboration with Waters, > >>>>>> with output > >>>>>> from the ProteinLynx Global Server. (Actual values / sequence have > >>>>>> been changed). > >>>>>> > >>>>>> One possible change would be to make the m/z value an attribute of the > >>>>>> FragmentIon element, as this value will be mandatory and required to > >>>>>> relate the fragment ion to the correct peak on the mass spectrum. The > >>>>>> CV used for the annotation would also need to be part of the PI CV ?? > >>>>>> > >>>>>> Note that in the existing model, there are other terms available, to > >>>>>> allow any kind of fragment ion to be described (not just B and Y ions) > >>>>>> > >>>>>> In the context of analysisXML, the <FragmentIon/> elements would be > >>>>>> children of a <SpectrumIdentificationResultItem/> > >>>>>> > >>>>>> best regards, > >>>>>> > >>>>>> Phil. > >>>>>> > >>>>>> <PeptideItem> > >>>>>> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >>>>>> <Start>435</Start> > >>>>>> <End>460</End> > >>>>>> <SpectrumReference>123</SpectrumReference> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>>>>> > >>>> value="3"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="379.2215"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="1382.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-7.1543"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="0.0207"/> > >>>>>> </FragmentIon> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >>>>>> > >>>> value="4"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="534.2811"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="1242.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-8.2315"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="0.0029"/> > >>>>>> </FragmentIon> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > >>>>>> > >>>> value="3"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="394.1813"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="1917.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-14.7098"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="-0.0013"/> > >>>>>> </FragmentIon> > >>>>>> <FragmentIon> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > >>>>>> > >>>> value="3"/> > >>>> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >>>>>> m/z" value="367.1669"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >>>>>> intensity" value="345.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > >>>>>> > >> m/z > >> > >>>>>> error" value="-18.767"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >>>>>> retention time error" value="0.0025"/> > >>>>>> </FragmentIon> > >>>>>> <additional> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor > >>>>>> > >> mass" > >> > >>>>>> value="1971.9194"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >>>>>> intensity" value="181349.0"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor > >>>>>> > >> error > >> > >>>>>> in ppm" value="0.8043"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >>>>>> retention time in minutes" value="57.3537"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >>>>>> mass RMS error" value="14.5969"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >>>>>> retention time RMS error" value="0.0093"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >>>>>> average charge state" value="2.2"/> > >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one > >>>>>> > >> match" > >> > >>>>>> value="" /> > >>>>>> </additional> > >>>>>> </PeptideItem> > >>>>>> > >>>>>> > >>>>>> -- > >>>>>> Phil Jones > >>>>>> Senior Software Engineer > >>>>>> PRIDE Project Team > >>>>>> PANDA Group, EMBL-EBI > >>>>>> Wellcome Trust Genome Campus > >>>>>> Hinxton, Cambridge, CB10 1SD > >>>>>> UK. > >>>>>> > >>>>>> Work phone: +44 1223 492662 (NEW NUMBER) > >>>>>> Skype: philip-jones > >>>>>> > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Matt C. <mat...@va...> - 2008-08-01 13:24:20
|
Hi all, I'm (still) not sure what the use case is for all the "extra" measurements that seem to me to be redundant with the label, but if reporting those is the decision of the implementor, I'm happy with having that capability. Some rough calculation tells me that if I was to write this format from MyriMatch with 10k spectra with 5 results each and an average of 2 y ions and 2 b ions matched, that would be about 16mb of fragmentation data (leaving out the "extra" measurements). That is a lot better than where we were before. But I think we can compact it some more. IIRC, other places in the schema have elements that essentially subclass cvParam, is that right? It would compact things to make IonType such a subclass with the intention that the accession attribute point to an ion CV term and an extra attribute would correspond with the FragArrayIndex. The current proposal: > <Fragmentation> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion"/> > <FragArrayIndex values="1 2 3"/> > </IonType> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> > <FragArrayIndex values="4 5 6"/> > </IonType> > </Fragmentation> My modified proposal (with some extra compactness possible by opting to leave out the extra measurements): > <Fragmentation> > <IonType cvLabel="Waters" accession="PLGS:00035" name="y ion" > values="1 2 3"/> > <IonType cvLabel="Waters" accession="PLGS:00032" name="b ion" > values="4 5 6"/> > </Fragmentation> This method would not interfere with the capability of having extra measurements, and it provides roughly 30% more compact way of annotating an ion series. Although even then, it's about 10 times less compact than the formatted attribute proposal, but which would only work by explicitly denying the storage of extra measurements. For reference, using the same conditions for my approximate calculations above, MyriMatch's output by the formatted attribute method would have about 1.5mb of fragmentation info. > fragmentEvidence="y1 y2 y3 b4 b5 b6" -Matt Jones, Andy wrote: >> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the >> Y-H20 ions for this peptide identification) then the attribute >> value="3" on the cvParam element should be removed - or have I >> misunderstood how this works? >> > > Correct, my mistake. The example says we have found y3-H2O y8-H2O and y10-H2O, the cvParam should not have had the value > > > <Fragmentation> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O"/> > <FragArrayIndex values = "3 8 10"/> > <FragArray Measure_ref = "m1" values = "379.2215 457.12345 540.234"/> > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > <!-- and so on for other measures as defined in the FragmentationTable --> > </IonType> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> > <FragArrayIndex values = "2 12 14"/> > <FragArray Measure_ref = "m1" values = "560.153 859.111 945.653"/> > <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > <!-- and so on for other measures as defined in the FragmentationTable --> > </IonType> > > </Fragmentation> > > > >> Please excuse me for stating the obvious, but... there is no reason >> why the pointers m1, m2, m3, m4 could not be more human readable, so >> changed in this example to mz, inten, mz_error, ret_error for example. >> (To help implementors understand the mechanism). >> > > Good suggestion. > > Cheers > Andy > > > > >> -----Original Message----- >> From: phi...@go... [mailto:phi...@go...] On >> Behalf Of Phil Jones @ EBI >> Sent: 01 August 2008 11:23 >> To: Jones, Andy; psi...@li... >> Subject: Re: [Psidev-pi-dev] Fragmentation Ions >> >> Hi Andy, >> >> This looks really good - both flexible and compact. >> >> Just to clarify - in your example: >> >> <IonType> >> <cvParam cvLabel="Waters" accession="PLGS:00035" >> name="y ion -H2O" value="3"/> >> <FragArrayIndex values = "3 8 10"/> >> <FragArray Measure_ref = "m1" values = "379.2215 >> 457.1234 540.234"/> >> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> >> <!-- and so on for other measures as defined in the >> FragmentationTable --> >> </IonType> >> >> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the >> Y-H20 ions for this peptide identification) then the attribute >> value="3" on the cvParam element should be removed - or have I >> misunderstood how this works? >> >> Please excuse me for stating the obvious, but... there is no reason >> why the pointers m1, m2, m3, m4 could not be more human readable, so >> changed in this example to mz, inten, mz_error, ret_error for example. >> (To help implementors understand the mechanism). >> >> best regards, >> >> Phil. >> >> >> >> 2008/8/1 Jones, Andy <And...@li...>: >> >>> Hi all, >>> >>> Here's a proposal for fragmentation ions as discussed on the call that's halfway >>> >> between using cvParams for all values and using an array based encoding. I think >> it's pretty flexible and concise. >> >>> First up, setup a FragmentationTable for the entire list of the spectra, which says >>> >> the kinds of measures you're going to report lower down: >> >>> <SpectrumIdentificationList id="MASCOT_results"> >>> <FragmentationTable> >>> <Measures> >>> <Measure id = "m1"> >>> <cvParam cvLabel="Waters" accession="PLGS:00024" >>> >> name="product ion m/z"/> >> >>> </Measure> >>> <Measure id = "m2"> >>> <cvParam cvLabel="Waters" accession="PLGS:00025" >>> >> name="product ion intensity"/> >> >>> </Measure> >>> <Measure id = "m3"> >>> <cvParam cvLabel="Waters" accession="PLGS:00026" >>> >> name="product ion m/z error"/> >> >>> </Measure> >>> <Measure id = "m4"> >>> <cvParam cvLabel="Waters" accession="PLGS:00027" >>> >> name="product ion retention time error"/> >> >>> </Measure> >>> </Measures> >>> </FragmentationTable> >>> >>> Then for each SpectrumIdentificationItem, you reference back to these >>> >> Measures >> >>> <SpectrumIdentificationItem id="SEQ_spec1_pep1" Peptide_ref="prot1_pep1" >>> >> chargeState="1"> >> >>> <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" end="79" >>> >> isDecoy="false" /> >> >>> ... >>> >>> <Fragmentation> >>> <IonType> >>> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion - >>> >> H2O" value="3"/> >> >>> <FragArrayIndex values = "3 8 10"/> >>> <FragArray Measure_ref = "m1" values = "379.2215 457.1234 >>> >> 540.234"/> >> >>> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> >>> <!-- and so on for other measures as defined in the >>> >> FragmentationTable --> >> >>> </IonType> >>> <IonType> >>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" >>> >> value="4"/> >> >>> <FragArrayIndex values = "2 12 14"/> >>> <FragArray Measure_ref = "m1" values = "560.153 859.111 >>> >> 945.653"/> >> >>> <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> >>> <!-- and so on for other measures as defined in the >>> >> FragmentationTable --> >> >>> </IonType> >>> >>> </Fragmentation> >>> >>> >>> Each array contains space separated values (i.e. an xsd:list). The FragArrayIndex >>> >> tells you which ions you've found i.e. for the second IonType we have b2 b12 and >> b14 which have the m/z and intensity values in the m1 and m2 arrays. This will >> save a lot of space if there are many ions of the same type in each array and I >> think it is fairly easy to read as well. Slightly more space could be saved by >> defining the ion types in the FragmentationTable but not much really once you've >> added a reference back up to it. >> >>> Cheers >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: psi...@li... [mailto:psidev-pi-dev- >>>> bo...@li...] On Behalf Of Matthew Chambers >>>> Sent: 18 July 2008 16:00 >>>> To: psi...@li... >>>> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently >>>> handled in PRIDE (Issue 28) >>>> >>>> I also agree that anything beyond an array is far too verbose. To answer >>>> this question, I think we need to decide the scope of the problem. What >>>> do we want fragment ion information to represent? I think analysis >>>> software is too diverse to use it for anything more than basic >>>> annotation, but basic annotation is important. If there are ways people >>>> want it to be usable beyond that, speak up. :) >>>> >>>> For basic annotation, all I think is needed is the fragment type, series >>>> number, charge state, and possibly any modification like a neutral loss >>>> or radical. The array can be an attribute or text node. We can use a >>>> grammar for each term, where each term represents an ion and terms are >>>> space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 >>>> and peptide_length>[<+|-><formula>][,(<+|-><charge>] >>>> We could make the charge part mandatory or if it was optional, assume a >>>> +1 charge (or possibly allow the charge to be based on the polarity of >>>> the source scan?). I assume there is a standard chemical formula format >>>> that can be represented compactly in ASCII text, but I don't know it. >>>> An example to show how compact it could be: >>>> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >>>> >>>> For basic annotation, the masses are not necessary I think. Expected >>>> mass can be recomputed if all the label metadata is complete and >>>> regular, and the observed mass is unimportant for annotation (IMO). >>>> >>>> -Matt >>>> >>>> >>>> David Creasy wrote: >>>> >>>>> Hi Phil, >>>>> >>>>> Just to be sure I've not misunderstood... from below, each fragment ion >>>>> takes approx 500 bytes. Lets assume a conservative average of 20 >>>>> fragment matches per spectrum and a modest search with 100k spectra. >>>>> Assuming that we just report fragment matches for the top match for each >>>>> spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. >>>>> If we reported fragment matches for the the top 10 matches for each >>>>> spectrum, this would be 10Gb. Is this reasonable and acceptable? >>>>> >>>>> David >>>>> >>>>> >>>>> >>>>> Phil Jones @ EBI wrote: >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> Regarding Issue 28 >>>>>> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support >>>>>> reporting of fragment ions" >>>>>> >>>>>> As a suggestion of how this might be tackled: >>>>>> >>>>>> The latest development version of the PRIDE database includes a very >>>>>> simple mechanism >>>>>> for recording fragment ion information, illustrated below. (Please >>>>>> note - made up data.) >>>>>> >>>>>> In this example, CV terms are used to define the type of ion and >>>>>> related information >>>>>> / annotation. Note that this is even more simple that the suggestion >>>>>> made by Andy >>>>>> above - no attempt is made here to indicate which residue has been >>>>>> called for each >>>>>> fragment ion - it is just listing the ions. >>>>>> >>>>>> Also note that while the PeptideItem is referencing the mass spectrum >>>>>> >> (which is >> >>>>>> reported in detail in the associated mzData file), the individual >>>>>> fragment ions are >>>>>> just reporting the m/z value and not attempting to make any kind of >>>>>> hard reference to >>>>>> the spectrum. >>>>>> >>>>>> As you can see, this has been developed in collaboration with Waters, >>>>>> with output >>>>>> from the ProteinLynx Global Server. (Actual values / sequence have >>>>>> been changed). >>>>>> >>>>>> One possible change would be to make the m/z value an attribute of the >>>>>> FragmentIon element, as this value will be mandatory and required to >>>>>> relate the fragment ion to the correct peak on the mass spectrum. The >>>>>> CV used for the annotation would also need to be part of the PI CV ?? >>>>>> >>>>>> Note that in the existing model, there are other terms available, to >>>>>> allow any kind of fragment ion to be described (not just B and Y ions) >>>>>> >>>>>> In the context of analysisXML, the <FragmentIon/> elements would be >>>>>> children of a <SpectrumIdentificationResultItem/> >>>>>> >>>>>> best regards, >>>>>> >>>>>> Phil. >>>>>> >>>>>> <PeptideItem> >>>>>> <Sequence>LFQQSQWTREVFSNSCK</Sequence> >>>>>> <Start>435</Start> >>>>>> <End>460</End> >>>>>> <SpectrumReference>123</SpectrumReference> >>>>>> <FragmentIon> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" >>>>>> >>>> value="3"/> >>>> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >>>>>> m/z" value="379.2215"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >>>>>> intensity" value="1382.0"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion >>>>>> >> m/z >> >>>>>> error" value="-7.1543"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >>>>>> retention time error" value="0.0207"/> >>>>>> </FragmentIon> >>>>>> <FragmentIon> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" >>>>>> >>>> value="4"/> >>>> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >>>>>> m/z" value="534.2811"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >>>>>> intensity" value="1242.0"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion >>>>>> >> m/z >> >>>>>> error" value="-8.2315"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >>>>>> retention time error" value="0.0029"/> >>>>>> </FragmentIon> >>>>>> <FragmentIon> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" >>>>>> >>>> value="3"/> >>>> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >>>>>> m/z" value="394.1813"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >>>>>> intensity" value="1917.0"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion >>>>>> >> m/z >> >>>>>> error" value="-14.7098"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >>>>>> retention time error" value="-0.0013"/> >>>>>> </FragmentIon> >>>>>> <FragmentIon> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" >>>>>> >>>> value="3"/> >>>> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >>>>>> m/z" value="367.1669"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >>>>>> intensity" value="345.0"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion >>>>>> >> m/z >> >>>>>> error" value="-18.767"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >>>>>> retention time error" value="0.0025"/> >>>>>> </FragmentIon> >>>>>> <additional> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor >>>>>> >> mass" >> >>>>>> value="1971.9194"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor >>>>>> intensity" value="181349.0"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor >>>>>> >> error >> >>>>>> in ppm" value="0.8043"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor >>>>>> retention time in minutes" value="57.3537"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion >>>>>> mass RMS error" value="14.5969"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion >>>>>> retention time RMS error" value="0.0093"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted >>>>>> average charge state" value="2.2"/> >>>>>> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one >>>>>> >> match" >> >>>>>> value="" /> >>>>>> </additional> >>>>>> </PeptideItem> >>>>>> >>>>>> >>>>>> -- >>>>>> Phil Jones >>>>>> Senior Software Engineer >>>>>> PRIDE Project Team >>>>>> PANDA Group, EMBL-EBI >>>>>> Wellcome Trust Genome Campus >>>>>> Hinxton, Cambridge, CB10 1SD >>>>>> UK. >>>>>> >>>>>> Work phone: +44 1223 492662 (NEW NUMBER) >>>>>> Skype: philip-jones >>>>>> |
From: Angel P. <an...@ma...> - 2008-08-01 12:07:26
|
On Thu, Jul 31, 2008 at 8:45 AM, David Creasy <dc...@ma...> wrote: > Hi, > > I've added an example N15 Mascot search to: > http://code.google.com/p/psi-pi/source/browse/#svn/trunk/examples/schema_usecase_examples/working29July > > This seems to work pretty well (as expected!!) > The interesting thing is that there are two sets of residue masses, two > SpectrumIdentificationList (one for light, one for heavy) but just one > ProteinDetectionList. > > And (possibly confusing at first glance!) unmodified peptides with > different masses like this: > > <Peptide id="peptide_48_1" sequenceMass="1025.481796" > > <peptideSequence>STNLDWYK</peptideSequence> > </Peptide> > > <Peptide id="peptide_53_1" sequenceMass="1036.449188"> > <peptideSequence>STNLDWYK</peptideSequence> > </Peptide> I think this is where I come in with the mod specifications for peptide results. I'll try and add this as an example to issue 35 -angel |
From: Jones, A. <And...@li...> - 2008-08-01 10:26:05
|
> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the > Y-H20 ions for this peptide identification) then the attribute > value="3" on the cvParam element should be removed - or have I > misunderstood how this works? Correct, my mistake. The example says we have found y3-H2O y8-H2O and y10-H2O, the cvParam should not have had the value <Fragmentation> <IonType> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O"/> <FragArrayIndex values = "3 8 10"/> <FragArray Measure_ref = "m1" values = "379.2215 457.12345 540.234"/> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> <IonType> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion"/> <FragArrayIndex values = "2 12 14"/> <FragArray Measure_ref = "m1" values = "560.153 859.111 945.653"/> <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> </Fragmentation> > Please excuse me for stating the obvious, but... there is no reason > why the pointers m1, m2, m3, m4 could not be more human readable, so > changed in this example to mz, inten, mz_error, ret_error for example. > (To help implementors understand the mechanism). Good suggestion. Cheers Andy > -----Original Message----- > From: phi...@go... [mailto:phi...@go...] On > Behalf Of Phil Jones @ EBI > Sent: 01 August 2008 11:23 > To: Jones, Andy; psi...@li... > Subject: Re: [Psidev-pi-dev] Fragmentation Ions > > Hi Andy, > > This looks really good - both flexible and compact. > > Just to clarify - in your example: > > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00035" > name="y ion -H2O" value="3"/> > <FragArrayIndex values = "3 8 10"/> > <FragArray Measure_ref = "m1" values = "379.2215 > 457.1234 540.234"/> > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > <!-- and so on for other measures as defined in the > FragmentationTable --> > </IonType> > > If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the > Y-H20 ions for this peptide identification) then the attribute > value="3" on the cvParam element should be removed - or have I > misunderstood how this works? > > Please excuse me for stating the obvious, but... there is no reason > why the pointers m1, m2, m3, m4 could not be more human readable, so > changed in this example to mz, inten, mz_error, ret_error for example. > (To help implementors understand the mechanism). > > best regards, > > Phil. > > > > 2008/8/1 Jones, Andy <And...@li...>: > > Hi all, > > > > Here's a proposal for fragmentation ions as discussed on the call that's halfway > between using cvParams for all values and using an array based encoding. I think > it's pretty flexible and concise. > > > > > > First up, setup a FragmentationTable for the entire list of the spectra, which says > the kinds of measures you're going to report lower down: > > > > > > <SpectrumIdentificationList id="MASCOT_results"> > > <FragmentationTable> > > <Measures> > > <Measure id = "m1"> > > <cvParam cvLabel="Waters" accession="PLGS:00024" > name="product ion m/z"/> > > </Measure> > > <Measure id = "m2"> > > <cvParam cvLabel="Waters" accession="PLGS:00025" > name="product ion intensity"/> > > </Measure> > > <Measure id = "m3"> > > <cvParam cvLabel="Waters" accession="PLGS:00026" > name="product ion m/z error"/> > > </Measure> > > <Measure id = "m4"> > > <cvParam cvLabel="Waters" accession="PLGS:00027" > name="product ion retention time error"/> > > </Measure> > > </Measures> > > </FragmentationTable> > > > > Then for each SpectrumIdentificationItem, you reference back to these > Measures > > > > <SpectrumIdentificationItem id="SEQ_spec1_pep1" Peptide_ref="prot1_pep1" > chargeState="1"> > > <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" end="79" > isDecoy="false" /> > > > > ... > > > > <Fragmentation> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion - > H2O" value="3"/> > > <FragArrayIndex values = "3 8 10"/> > > <FragArray Measure_ref = "m1" values = "379.2215 457.1234 > 540.234"/> > > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > > <!-- and so on for other measures as defined in the > FragmentationTable --> > > </IonType> > > <IonType> > > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="4"/> > > <FragArrayIndex values = "2 12 14"/> > > <FragArray Measure_ref = "m1" values = "560.153 859.111 > 945.653"/> > > <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > > <!-- and so on for other measures as defined in the > FragmentationTable --> > > </IonType> > > > > </Fragmentation> > > > > > > Each array contains space separated values (i.e. an xsd:list). The FragArrayIndex > tells you which ions you've found i.e. for the second IonType we have b2 b12 and > b14 which have the m/z and intensity values in the m1 and m2 arrays. This will > save a lot of space if there are many ions of the same type in each array and I > think it is fairly easy to read as well. Slightly more space could be saved by > defining the ion types in the FragmentationTable but not much really once you've > added a reference back up to it. > > > > Cheers > > Andy > > > > > > > > > > > > > > > > > >> -----Original Message----- > >> From: psi...@li... [mailto:psidev-pi-dev- > >> bo...@li...] On Behalf Of Matthew Chambers > >> Sent: 18 July 2008 16:00 > >> To: psi...@li... > >> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently > >> handled in PRIDE (Issue 28) > >> > >> I also agree that anything beyond an array is far too verbose. To answer > >> this question, I think we need to decide the scope of the problem. What > >> do we want fragment ion information to represent? I think analysis > >> software is too diverse to use it for anything more than basic > >> annotation, but basic annotation is important. If there are ways people > >> want it to be usable beyond that, speak up. :) > >> > >> For basic annotation, all I think is needed is the fragment type, series > >> number, charge state, and possibly any modification like a neutral loss > >> or radical. The array can be an attribute or text node. We can use a > >> grammar for each term, where each term represents an ion and terms are > >> space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > >> and peptide_length>[<+|-><formula>][,(<+|-><charge>] > >> We could make the charge part mandatory or if it was optional, assume a > >> +1 charge (or possibly allow the charge to be based on the polarity of > >> the source scan?). I assume there is a standard chemical formula format > >> that can be represented compactly in ASCII text, but I don't know it. > >> An example to show how compact it could be: > >> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > >> > >> For basic annotation, the masses are not necessary I think. Expected > >> mass can be recomputed if all the label metadata is complete and > >> regular, and the observed mass is unimportant for annotation (IMO). > >> > >> -Matt > >> > >> > >> David Creasy wrote: > >> > Hi Phil, > >> > > >> > Just to be sure I've not misunderstood... from below, each fragment ion > >> > takes approx 500 bytes. Lets assume a conservative average of 20 > >> > fragment matches per spectrum and a modest search with 100k spectra. > >> > Assuming that we just report fragment matches for the top match for each > >> > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > >> > If we reported fragment matches for the the top 10 matches for each > >> > spectrum, this would be 10Gb. Is this reasonable and acceptable? > >> > > >> > David > >> > > >> > > >> > > >> > Phil Jones @ EBI wrote: > >> > > >> >> Hi, > >> >> > >> >> Regarding Issue 28 > >> >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >> >> reporting of fragment ions" > >> >> > >> >> As a suggestion of how this might be tackled: > >> >> > >> >> The latest development version of the PRIDE database includes a very > >> >> simple mechanism > >> >> for recording fragment ion information, illustrated below. (Please > >> >> note - made up data.) > >> >> > >> >> In this example, CV terms are used to define the type of ion and > >> >> related information > >> >> / annotation. Note that this is even more simple that the suggestion > >> >> made by Andy > >> >> above - no attempt is made here to indicate which residue has been > >> >> called for each > >> >> fragment ion - it is just listing the ions. > >> >> > >> >> Also note that while the PeptideItem is referencing the mass spectrum > (which is > >> >> reported in detail in the associated mzData file), the individual > >> >> fragment ions are > >> >> just reporting the m/z value and not attempting to make any kind of > >> >> hard reference to > >> >> the spectrum. > >> >> > >> >> As you can see, this has been developed in collaboration with Waters, > >> >> with output > >> >> from the ProteinLynx Global Server. (Actual values / sequence have > >> >> been changed). > >> >> > >> >> One possible change would be to make the m/z value an attribute of the > >> >> FragmentIon element, as this value will be mandatory and required to > >> >> relate the fragment ion to the correct peak on the mass spectrum. The > >> >> CV used for the annotation would also need to be part of the PI CV ?? > >> >> > >> >> Note that in the existing model, there are other terms available, to > >> >> allow any kind of fragment ion to be described (not just B and Y ions) > >> >> > >> >> In the context of analysisXML, the <FragmentIon/> elements would be > >> >> children of a <SpectrumIdentificationResultItem/> > >> >> > >> >> best regards, > >> >> > >> >> Phil. > >> >> > >> >> <PeptideItem> > >> >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >> >> <Start>435</Start> > >> >> <End>460</End> > >> >> <SpectrumReference>123</SpectrumReference> > >> >> <FragmentIon> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >> value="3"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> >> m/z" value="379.2215"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> >> intensity" value="1382.0"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > m/z > >> >> error" value="-7.1543"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> >> retention time error" value="0.0207"/> > >> >> </FragmentIon> > >> >> <FragmentIon> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > >> value="4"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> >> m/z" value="534.2811"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> >> intensity" value="1242.0"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > m/z > >> >> error" value="-8.2315"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> >> retention time error" value="0.0029"/> > >> >> </FragmentIon> > >> >> <FragmentIon> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > >> value="3"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> >> m/z" value="394.1813"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> >> intensity" value="1917.0"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > m/z > >> >> error" value="-14.7098"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> >> retention time error" value="-0.0013"/> > >> >> </FragmentIon> > >> >> <FragmentIon> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > >> value="3"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> >> m/z" value="367.1669"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> >> intensity" value="345.0"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion > m/z > >> >> error" value="-18.767"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> >> retention time error" value="0.0025"/> > >> >> </FragmentIon> > >> >> <additional> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor > mass" > >> >> value="1971.9194"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >> >> intensity" value="181349.0"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor > error > >> >> in ppm" value="0.8043"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >> >> retention time in minutes" value="57.3537"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >> >> mass RMS error" value="14.5969"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >> >> retention time RMS error" value="0.0093"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >> >> average charge state" value="2.2"/> > >> >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one > match" > >> >> value="" /> > >> >> </additional> > >> >> </PeptideItem> > >> >> > >> >> > >> >> -- > >> >> Phil Jones > >> >> Senior Software Engineer > >> >> PRIDE Project Team > >> >> PANDA Group, EMBL-EBI > >> >> Wellcome Trust Genome Campus > >> >> Hinxton, Cambridge, CB10 1SD > >> >> UK. > >> >> > >> >> Work phone: +44 1223 492662 (NEW NUMBER) > >> >> Skype: philip-jones > >> >> > >> >> ------------------------------------------------------------------------- > >> >> This SF.Net email is sponsored by the Moblin Your Move Developer's > >> challenge > >> >> Build the coolest Linux based applications with Moblin SDK & win great > prizes > >> >> Grand prize is a trip for two to an Open Source event anywhere in the world > >> >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> >> _______________________________________________ > >> >> Psidev-pi-dev mailing list > >> >> Psi...@li... > >> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >> >> > >> > > >> > > >> > >> ------------------------------------------------------------------------- > >> This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > >> Build the coolest Linux based applications with Moblin SDK & win great prizes > >> Grand prize is a trip for two to an Open Source event anywhere in the world > >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> _______________________________________________ > >> Psidev-pi-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > -- > Phil Jones > Senior Software Engineer > PRIDE Project Team > PANDA Group, EMBL-EBI > Wellcome Trust Genome Campus > Hinxton, Cambridge, CB10 1SD > UK. > > Work phone: +44 1223 492662 (NEW NUMBER) > Skype: philip-jones |
From: Phil J. @ E. <pj...@eb...> - 2008-08-01 10:22:52
|
Hi Andy, This looks really good - both flexible and compact. Just to clarify - in your example: <IonType> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> <FragArrayIndex values = "3 8 10"/> <FragArray Measure_ref = "m1" values = "379.2215 457.1234 540.234"/> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> If this is describing three Y-H20 ions, 3, 8 and 10 (i.e. all of the Y-H20 ions for this peptide identification) then the attribute value="3" on the cvParam element should be removed - or have I misunderstood how this works? Please excuse me for stating the obvious, but... there is no reason why the pointers m1, m2, m3, m4 could not be more human readable, so changed in this example to mz, inten, mz_error, ret_error for example. (To help implementors understand the mechanism). best regards, Phil. 2008/8/1 Jones, Andy <And...@li...>: > Hi all, > > Here's a proposal for fragmentation ions as discussed on the call that's halfway between using cvParams for all values and using an array based encoding. I think it's pretty flexible and concise. > > > First up, setup a FragmentationTable for the entire list of the spectra, which says the kinds of measures you're going to report lower down: > > > <SpectrumIdentificationList id="MASCOT_results"> > <FragmentationTable> > <Measures> > <Measure id = "m1"> > <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z"/> > </Measure> > <Measure id = "m2"> > <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity"/> > </Measure> > <Measure id = "m3"> > <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error"/> > </Measure> > <Measure id = "m4"> > <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error"/> > </Measure> > </Measures> > </FragmentationTable> > > Then for each SpectrumIdentificationItem, you reference back to these Measures > > <SpectrumIdentificationItem id="SEQ_spec1_pep1" Peptide_ref="prot1_pep1" chargeState="1"> > <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" end="79" isDecoy="false" /> > > ... > > <Fragmentation> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> > <FragArrayIndex values = "3 8 10"/> > <FragArray Measure_ref = "m1" values = "379.2215 457.1234 540.234"/> > <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> > <!-- and so on for other measures as defined in the FragmentationTable --> > </IonType> > <IonType> > <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> > <FragArrayIndex values = "2 12 14"/> > <FragArray Measure_ref = "m1" values = "560.153 859.111 945.653"/> > <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> > <!-- and so on for other measures as defined in the FragmentationTable --> > </IonType> > > </Fragmentation> > > > Each array contains space separated values (i.e. an xsd:list). The FragArrayIndex tells you which ions you've found i.e. for the second IonType we have b2 b12 and b14 which have the m/z and intensity values in the m1 and m2 arrays. This will save a lot of space if there are many ions of the same type in each array and I think it is fairly easy to read as well. Slightly more space could be saved by defining the ion types in the FragmentationTable but not much really once you've added a reference back up to it. > > Cheers > Andy > > > > > > > > >> -----Original Message----- >> From: psi...@li... [mailto:psidev-pi-dev- >> bo...@li...] On Behalf Of Matthew Chambers >> Sent: 18 July 2008 16:00 >> To: psi...@li... >> Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently >> handled in PRIDE (Issue 28) >> >> I also agree that anything beyond an array is far too verbose. To answer >> this question, I think we need to decide the scope of the problem. What >> do we want fragment ion information to represent? I think analysis >> software is too diverse to use it for anything more than basic >> annotation, but basic annotation is important. If there are ways people >> want it to be usable beyond that, speak up. :) >> >> For basic annotation, all I think is needed is the fragment type, series >> number, charge state, and possibly any modification like a neutral loss >> or radical. The array can be an attribute or text node. We can use a >> grammar for each term, where each term represents an ion and terms are >> space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 >> and peptide_length>[<+|-><formula>][,(<+|-><charge>] >> We could make the charge part mandatory or if it was optional, assume a >> +1 charge (or possibly allow the charge to be based on the polarity of >> the source scan?). I assume there is a standard chemical formula format >> that can be represented compactly in ASCII text, but I don't know it. >> An example to show how compact it could be: >> fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" >> >> For basic annotation, the masses are not necessary I think. Expected >> mass can be recomputed if all the label metadata is complete and >> regular, and the observed mass is unimportant for annotation (IMO). >> >> -Matt >> >> >> David Creasy wrote: >> > Hi Phil, >> > >> > Just to be sure I've not misunderstood... from below, each fragment ion >> > takes approx 500 bytes. Lets assume a conservative average of 20 >> > fragment matches per spectrum and a modest search with 100k spectra. >> > Assuming that we just report fragment matches for the top match for each >> > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. >> > If we reported fragment matches for the the top 10 matches for each >> > spectrum, this would be 10Gb. Is this reasonable and acceptable? >> > >> > David >> > >> > >> > >> > Phil Jones @ EBI wrote: >> > >> >> Hi, >> >> >> >> Regarding Issue 28 >> >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support >> >> reporting of fragment ions" >> >> >> >> As a suggestion of how this might be tackled: >> >> >> >> The latest development version of the PRIDE database includes a very >> >> simple mechanism >> >> for recording fragment ion information, illustrated below. (Please >> >> note - made up data.) >> >> >> >> In this example, CV terms are used to define the type of ion and >> >> related information >> >> / annotation. Note that this is even more simple that the suggestion >> >> made by Andy >> >> above - no attempt is made here to indicate which residue has been >> >> called for each >> >> fragment ion - it is just listing the ions. >> >> >> >> Also note that while the PeptideItem is referencing the mass spectrum (which is >> >> reported in detail in the associated mzData file), the individual >> >> fragment ions are >> >> just reporting the m/z value and not attempting to make any kind of >> >> hard reference to >> >> the spectrum. >> >> >> >> As you can see, this has been developed in collaboration with Waters, >> >> with output >> >> from the ProteinLynx Global Server. (Actual values / sequence have >> >> been changed). >> >> >> >> One possible change would be to make the m/z value an attribute of the >> >> FragmentIon element, as this value will be mandatory and required to >> >> relate the fragment ion to the correct peak on the mass spectrum. The >> >> CV used for the annotation would also need to be part of the PI CV ?? >> >> >> >> Note that in the existing model, there are other terms available, to >> >> allow any kind of fragment ion to be described (not just B and Y ions) >> >> >> >> In the context of analysisXML, the <FragmentIon/> elements would be >> >> children of a <SpectrumIdentificationResultItem/> >> >> >> >> best regards, >> >> >> >> Phil. >> >> >> >> <PeptideItem> >> >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> >> >> <Start>435</Start> >> >> <End>460</End> >> >> <SpectrumReference>123</SpectrumReference> >> >> <FragmentIon> >> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" >> value="3"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> >> m/z" value="379.2215"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> >> intensity" value="1382.0"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> >> error" value="-7.1543"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> >> retention time error" value="0.0207"/> >> >> </FragmentIon> >> >> <FragmentIon> >> >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" >> value="4"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> >> m/z" value="534.2811"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> >> intensity" value="1242.0"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> >> error" value="-8.2315"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> >> retention time error" value="0.0029"/> >> >> </FragmentIon> >> >> <FragmentIon> >> >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" >> value="3"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> >> m/z" value="394.1813"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> >> intensity" value="1917.0"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> >> error" value="-14.7098"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> >> retention time error" value="-0.0013"/> >> >> </FragmentIon> >> >> <FragmentIon> >> >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" >> value="3"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion >> >> m/z" value="367.1669"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion >> >> intensity" value="345.0"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z >> >> error" value="-18.767"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion >> >> retention time error" value="0.0025"/> >> >> </FragmentIon> >> >> <additional> >> >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" >> >> value="1971.9194"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor >> >> intensity" value="181349.0"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error >> >> in ppm" value="0.8043"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor >> >> retention time in minutes" value="57.3537"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion >> >> mass RMS error" value="14.5969"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion >> >> retention time RMS error" value="0.0093"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted >> >> average charge state" value="2.2"/> >> >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" >> >> value="" /> >> >> </additional> >> >> </PeptideItem> >> >> >> >> >> >> -- >> >> Phil Jones >> >> Senior Software Engineer >> >> PRIDE Project Team >> >> PANDA Group, EMBL-EBI >> >> Wellcome Trust Genome Campus >> >> Hinxton, Cambridge, CB10 1SD >> >> UK. >> >> >> >> Work phone: +44 1223 492662 (NEW NUMBER) >> >> Skype: philip-jones >> >> >> >> ------------------------------------------------------------------------- >> >> This SF.Net email is sponsored by the Moblin Your Move Developer's >> challenge >> >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> >> Grand prize is a trip for two to an Open Source event anywhere in the world >> >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> >> _______________________________________________ >> >> Psidev-pi-dev mailing list >> >> Psi...@li... >> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> >> > >> > >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > -- Phil Jones Senior Software Engineer PRIDE Project Team PANDA Group, EMBL-EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD UK. Work phone: +44 1223 492662 (NEW NUMBER) Skype: philip-jones |
From: Jones, A. <And...@li...> - 2008-08-01 10:03:49
|
Hi all, Here's a proposal for fragmentation ions as discussed on the call that's halfway between using cvParams for all values and using an array based encoding. I think it's pretty flexible and concise. First up, setup a FragmentationTable for the entire list of the spectra, which says the kinds of measures you're going to report lower down: <SpectrumIdentificationList id="MASCOT_results"> <FragmentationTable> <Measures> <Measure id = "m1"> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion m/z"/> </Measure> <Measure id = "m2"> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion intensity"/> </Measure> <Measure id = "m3"> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z error"/> </Measure> <Measure id = "m4"> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion retention time error"/> </Measure> </Measures> </FragmentationTable> Then for each SpectrumIdentificationItem, you reference back to these Measures <SpectrumIdentificationItem id="SEQ_spec1_pep1" Peptide_ref="prot1_pep1" chargeState="1"> <PeptideEvidence id="PE1_SEQ_spec1_pep1" start="67" pre="-" end="79" isDecoy="false" /> ... <Fragmentation> <IonType> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" value="3"/> <FragArrayIndex values = "3 8 10"/> <FragArray Measure_ref = "m1" values = "379.2215 457.1234 540.234"/> <FragArray Measure_ref = "m2" values = "1382.0 2055.5 340.0"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> <IonType> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" value="4"/> <FragArrayIndex values = "2 12 14"/> <FragArray Measure_ref = "m1" values = "560.153 859.111 945.653"/> <FragArray Measure_ref = "m2" values = "502.0 330.5 559.5"/> <!-- and so on for other measures as defined in the FragmentationTable --> </IonType> </Fragmentation> Each array contains space separated values (i.e. an xsd:list). The FragArrayIndex tells you which ions you've found i.e. for the second IonType we have b2 b12 and b14 which have the m/z and intensity values in the m1 and m2 arrays. This will save a lot of space if there are many ions of the same type in each array and I think it is fairly easy to read as well. Slightly more space could be saved by defining the ion types in the FragmentationTable but not much really once you've added a reference back up to it. Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Matthew Chambers > Sent: 18 July 2008 16:00 > To: psi...@li... > Subject: Re: [Psidev-pi-dev] Fragment Ions in analysisXML - how it is currently > handled in PRIDE (Issue 28) > > I also agree that anything beyond an array is far too verbose. To answer > this question, I think we need to decide the scope of the problem. What > do we want fragment ion information to represent? I think analysis > software is too diverse to use it for anything more than basic > annotation, but basic annotation is important. If there are ways people > want it to be usable beyond that, speak up. :) > > For basic annotation, all I think is needed is the fragment type, series > number, charge state, and possibly any modification like a neutral loss > or radical. The array can be an attribute or text node. We can use a > grammar for each term, where each term represents an ion and terms are > space delimited. The grammar might look like: <a|b|c|x|y|z><# between 1 > and peptide_length>[<+|-><formula>][,(<+|-><charge>] > We could make the charge part mandatory or if it was optional, assume a > +1 charge (or possibly allow the charge to be based on the polarity of > the source scan?). I assume there is a standard chemical formula format > that can be represented compactly in ASCII text, but I don't know it. > An example to show how compact it could be: > fragmentIons="b3 y7,+2 b4 y5 y4 b7-H2O y3 y2 b7-H2O,+2 y3 y2" > > For basic annotation, the masses are not necessary I think. Expected > mass can be recomputed if all the label metadata is complete and > regular, and the observed mass is unimportant for annotation (IMO). > > -Matt > > > David Creasy wrote: > > Hi Phil, > > > > Just to be sure I've not misunderstood... from below, each fragment ion > > takes approx 500 bytes. Lets assume a conservative average of 20 > > fragment matches per spectrum and a modest search with 100k spectra. > > Assuming that we just report fragment matches for the top match for each > > spectrum, this would result in a file that is 500 x 20 x 100,000 = 1Gb. > > If we reported fragment matches for the the top 10 matches for each > > spectrum, this would be 10Gb. Is this reasonable and acceptable? > > > > David > > > > > > > > Phil Jones @ EBI wrote: > > > >> Hi, > >> > >> Regarding Issue 28 > >> <http://code.google.com/p/psi-pi/issues/detail?id=28> "support > >> reporting of fragment ions" > >> > >> As a suggestion of how this might be tackled: > >> > >> The latest development version of the PRIDE database includes a very > >> simple mechanism > >> for recording fragment ion information, illustrated below. (Please > >> note - made up data.) > >> > >> In this example, CV terms are used to define the type of ion and > >> related information > >> / annotation. Note that this is even more simple that the suggestion > >> made by Andy > >> above - no attempt is made here to indicate which residue has been > >> called for each > >> fragment ion - it is just listing the ions. > >> > >> Also note that while the PeptideItem is referencing the mass spectrum (which is > >> reported in detail in the associated mzData file), the individual > >> fragment ions are > >> just reporting the m/z value and not attempting to make any kind of > >> hard reference to > >> the spectrum. > >> > >> As you can see, this has been developed in collaboration with Waters, > >> with output > >> from the ProteinLynx Global Server. (Actual values / sequence have > >> been changed). > >> > >> One possible change would be to make the m/z value an attribute of the > >> FragmentIon element, as this value will be mandatory and required to > >> relate the fragment ion to the correct peak on the mass spectrum. The > >> CV used for the annotation would also need to be part of the PI CV ?? > >> > >> Note that in the existing model, there are other terms available, to > >> allow any kind of fragment ion to be described (not just B and Y ions) > >> > >> In the context of analysisXML, the <FragmentIon/> elements would be > >> children of a <SpectrumIdentificationResultItem/> > >> > >> best regards, > >> > >> Phil. > >> > >> <PeptideItem> > >> <Sequence>LFQQSQWTREVFSNSCK</Sequence> > >> <Start>435</Start> > >> <End>460</End> > >> <SpectrumReference>123</SpectrumReference> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="379.2215"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1382.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-7.1543"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0207"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00032" name="b ion" > value="4"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="534.2811"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1242.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-8.2315"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0029"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00031" name="y ion" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="394.1813"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="1917.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-14.7098"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="-0.0013"/> > >> </FragmentIon> > >> <FragmentIon> > >> <cvParam cvLabel="Waters" accession="PLGS:00035" name="y ion -H2O" > value="3"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00024" name="product ion > >> m/z" value="367.1669"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00025" name="product ion > >> intensity" value="345.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00026" name="product ion m/z > >> error" value="-18.767"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00027" name="product ion > >> retention time error" value="0.0025"/> > >> </FragmentIon> > >> <additional> > >> <cvParam cvLabel="Waters" accession="PLGS:00014" name="precursor mass" > >> value="1971.9194"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00015" name="precursor > >> intensity" value="181349.0"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00016" name="precursor error > >> in ppm" value="0.8043"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00017" name="precursor > >> retention time in minutes" value="57.3537"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00019" name="product ion > >> mass RMS error" value="14.5969"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00020" name="product ion > >> retention time RMS error" value="0.0093"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00021" name="weighted > >> average charge state" value="2.2"/> > >> <cvParam cvLabel="Waters" accession="PLGS:00039" name="pass one match" > >> value="" /> > >> </additional> > >> </PeptideItem> > >> > >> > >> -- > >> Phil Jones > >> Senior Software Engineer > >> PRIDE Project Team > >> PANDA Group, EMBL-EBI > >> Wellcome Trust Genome Campus > >> Hinxton, Cambridge, CB10 1SD > >> UK. > >> > >> Work phone: +44 1223 492662 (NEW NUMBER) > >> Skype: philip-jones > >> > >> ------------------------------------------------------------------------- > >> This SF.Net email is sponsored by the Moblin Your Move Developer's > challenge > >> Build the coolest Linux based applications with Moblin SDK & win great prizes > >> Grand prize is a trip for two to an Open Source event anywhere in the world > >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> _______________________________________________ > >> Psidev-pi-dev mailing list > >> Psi...@li... > >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >> > > > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: David C. <dc...@ma...> - 2008-07-31 14:46:06
|
Martin Eisenacher wrote: >>> But I might be wrong and we definitely have to wait for >>> a top-down instance doc. >> Sorry for the delay ;) I've put one here: >> http://code.google.com/p/psi-pi/source/browse/#svn/trunk/examples/schema_usecase_examples/working31July >> It's not so bad really. In the case of signal peptides, or leading >> methionine (as in this example), the protein that was analysed may be >> different from the sequence in the database, and there must be a way of >> representing this. > So you think it's okay like it is and no doubling. Or can I derive an issue from > the "not so bad really" phrase ;-) I think it has to be the way it is with potential for the sequence to be in the document twice. Otherwise, how do we cope with the case of signal peptides and/or a leading methionine? > > >>>> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be >>>> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? >>> Yes, if <PeptideEvidence> stays optional. >> What about denovo where there is no database... > That is an argument to have PeptideEvidence optional, isn't it? > But DBSequence_ref as attribute of it should be mandatory. Doh, sorry, yes you are totally correct. It should be mandatory. >(and it is missing from the instance docs). I believe it's in all the Mascot ones? > > >>>> It is a database search parameter: >>>> <AdditionalSearchParams> >>>> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> >>> Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. >>> http://code.google.com/p/psi-pi/issues/detail?id=37 >> I'm not sure I understand whether this is OK or not now? (And why use >> Pride CV?) > I think the current schema is not okay, because it allows "average" in one SpecIdent and "mono" in another, > so it is not well-defined for the masses in elements or attributes. > We need a global attribute :-) or element. Or it can be done later in semantic validation :-( . I think it's actually _required_ to be like this. For example, at least one search engine allows you to specify mono for masses below x and average for masses above x. So, in this case, the output should be similar to the N15 example that I've supplied, with two separate mass tables. Maybe you could look at the Mascot_N15_example.xml and see if you think that this is OK. Talk soon, David > > Bye > Martin > > >>>>> -----Original Message----- >>>>> From: psi...@li... [mailto:psidev-pi-dev- >>>>> bo...@li...] On Behalf Of Martin Eisenacher >>>>> Sent: 30 July 2008 13:05 >>>>> To: 'Pierre-Alain Binz' >>>>> Cc: psi...@li... >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>> >>>>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: >>>>> >>>>>> 2nd July, 2008: >>>>>> a couple of questions, just to make sure: >>>>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection >>>>> information? >>>>> I hope not, by referencing the same identifier. >>>>> >>>>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide >>>>> element >>>>>> (and not to a DBSequence), identification is obligatory a Peptide? >>>>> At the moment I think it's possible to directly reference a DBSeq. At the time the >>>>> foreign key definitions are implemented we can forbid that. >>>>> But we should have in mind, that a peptide is a sequence plus modifications, so if >>>>> top-down >>>>> identifies only a sequence, we should allow that and if top-down identifies with >>>>> mods, >>>>> we should forbid that. >>>>> It would be quite helpful to have a top-down instance doc. To check >>>>> whether our thoughts are really deep enough... >>>>> >>>>>> 2) and what about spectral library searches, do we have to have Peptide >>>>>> elements with possibly undefined explicit sequences to refer to >>>>> >from the SpectrumIdentificationResult (because non peptidic, or because not >>>>> identified >>>>>> but good spectrum) >>>>> At the moment the sequence element can be empty or even left out. >>>>> User or CV params are allowed. >>>>> How do they report results in spectral lib search if they identify non-peptidic or >>>>> unidentified? >>>>> We need CV terms for that... >>>>> >>>>>> 3) in the Peptide element, the Modifications are defined in a much more >>>>>> detailed manner than in ModificationParams (PSI-MOD is there for >>>>>> instance). Does this simply mean that The ModificationParams codes >>>>>> the search engine settings and the Peptide includes the formal PSI >>>>>> definition of the Mod? And the only reference is the ModName value? >>>>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms >>>>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV >>>>> or >>>>> they can define their own. >>>>> >>>>>> 4) all mass values (sequenceMass, calculatedMassToCharge, >>>>> experimentalMassToCharge, >>>>>> are not specified whether monoisotopic or averaged. >>>>>> Do we assume that averaged does not exist anymore? >>>>> No, we decided to have only one type of masses in the whole analysisXML. >>>>> But I cannot find a note for that or a schema attribute... I will add an issue for that. >>>>> >>>>> >>>>>> 5) is sequenceMass the mass value with/without the mods? If with, the >>>>>> name might be missleading (peptideMass would be more appropriate) >>>>> It is indeed the mass of the sequence without mods. >>>>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation >>>>> >>>>>> 6) in case the DBSequence is nucleotide, is there a tag for saying >>>>>> this? (NB: MS on nucleotide molecules can be performed and analysed, >>>>>> not only MS on AA sequences that are interpreting nucleotide sequences). >>>>>> Or do we neglect MS experiments done on nucleotide molecules (and by >>>>>> the way on glycans...) and only represent the DBSequences as AA >>>>>> sequences (frame translations)? (and what about glycans?) >>>>>> Probaly can be solved if one can replace SequenceCollection by >>>>>> something else if needed (SmallMoleculeCollection, GlycanCollection, >>>>>> MoleculeCollection)... but the validator might not like this. >>>>> Mh, these can be extensions, I think they are not possible at the moment. >>>>> But a tag for the type can indeed be useful, it could be a CV param. >>>>> I will create an issue for that. >>>>> >>>>>> 7) in case that DBSequence is nucleotide, do we represent the >>>>>> Peptide as AA sequence in case of MS done on proteins? >>>>> I hope the following answers this: >>>>> >>>>> <DBSequence> is the nucleotide seq from the nucleotide DB, >>>>> <Peptide> is the identified amino acid sequence plus mods (without any translation >>>>> frame or something). >>>>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a >>>>> TranslationTable_Ref attribute. >>>>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB >>>>> case.) >>>>> If a protein detection is performed, there are <PeptideHypothesis> elements >>>>> referencing >>>>> PeptideEvidence elements from SpectrumIdentificationItem sections. >>>>> >>>>> >>>>> >>>>> Bye >>>>> Martin >>>>> >>>>> >>>>> >>>>> >>>>> David Creasy wrote: >>>>> Thanks Andy, >>>>> >>>>> I've added an updated example document to SVN: >>>>> http://code.google.com/p/psi- >>>>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 >>>>> 1350.xml >>>>> >>>>> Problem is that we have now removed the main point of these recent changes >>>>> which was to add the decoy flag... I think >>>>> that we need to add isDecoy to SpectrumIdentificationItem. >>>>> >>>>> And yes, I suspect that we should go back to using the >>>>> ConceptualMoleculeCollection >>>>> Um, and since we've not actually ended up adding anything to DBSequence... we >>>>> haven't actually achieved anything? >>>>> I think we need to discuss this again at the next telecon. >>>>> >>>>> David >>>>> >>>>> Jones, Andy wrote: >>>>> Hi all, >>>>> >>>>> I’ve updated the schema in SVN with the following main changes: >>>>> >>>>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the >>>>> call (simple mappings to proteins are done >>>>> at this level) >>>>> Added DBSequence that should be used instead of Sequence (following some of >>>>> the discussion below) >>>>> Created a new collection class SequenceCollection (rather than >>>>> ConceptualMoleculeCollection) so that only references can >>>>> be given to DBSequence and Peptide >>>>> In fact, I’m not sure if this is sensible since it prevents other types of >>>>> ConceptualMolecule being added later... to >>>>> discuss >>>>> In FuGE on cvParam, the value attribute is no longer mandatory >>>>> >>>>> I’ve added a simple example that validates under >>>>> examples\schema_usecase_examples\working27June >>>>> >>>>> Feel free to mail me any changes to make on Monday, >>>>> Cheers >>>>> Andy >>>>> >>>>> >>>>> >>>>> From: psi...@li... [mailto:psidev-pi-dev- >>>>> bo...@li...] On Behalf Of >>>>> Jones, Andy >>>>> Sent: 27 June 2008 16:24 >>>>> To: Angel Pizarro >>>>> Cc: psi...@li... >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>> >>>>> I think Angel’s response below might not have made it round the list yet. >>>>> >>>>> I tend to agree that isDecoy is redundant information and perhaps this is not the >>>>> best place to encode semantic >>>>> information. An alternative would be to have a parameter, say on >>>>> SpectrumIdentification for cvParam = “decoy_string” >>>>> value = “Rev”. This would be a more compact representation and we would not >>>>> have to add what is quite a specific >>>>> attribute type (isDecoy) to Sequence. >>>>> >>>>> >>>>> >>>>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel >>>>> Pizarro >>>>> Sent: 27 June 2008 15:59 >>>>> To: Jones, Andy >>>>> Cc: psi...@li... >>>>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>>>> >>>>> my 2¢ : >>>>> You need to be able to extend this to all molecule types, or am I missing the point >>>>> of this thread, and you mean that >>>>> this would be a suclass of the conceptual molecule element? >>>>> >>>>> Second, and this is is tangentially related, but are decoy sequences really a >>>>> problem we should be putting our effort >>>>> into? Is it in our domain to encode semantic information about a sequence, and >>>>> possibly relating reported sequences as >>>>> part of our schema? >>>>> On a personal level I could care less if "isDecoy" is an attribute or not, but the >>>>> temptation then would be for folks to >>>>> encode the same accession for two different sequences, effectively making the >>>>> primary key of the sequence object >>>>> (accession, isDecoy) >>>>> >>>>> >>>>> Do we want to go there? >>>>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> >>>>> wrote: >>>>> So how about include length as an attribute and then let all other things go in the >>>>> CV (pI, mass, etc.)? >>>>> >>>>> >>>>> >>>>> From: Jones, Andy >>>>> Sent: 27 June 2008 14:54 >>>>> To: 'David Creasy' >>>>> Subject: RE: [Psidev-pi-dev] Representing Sequences >>>>> >>>>> id and name are standard for all elements that inherit from FuGE identifiable – this >>>>> is perhaps a separate discussion as >>>>> to whether the optional name attribute should be there. >>>>> >>>>> I agree that length may be useful – is this just an integer value with no unit? >>>>> Yes, I think so. >>>>> I'm less sure about pI and mass since mass at least can be calculated very simply >>>>> Only if you have the sequence... (we have residue masses in the file). >>>>> >>>>> >>>>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless >>>>> Scandalous! (I happen to agree, but now some people will never speak to either of >>>>> us ever again). >>>>> >>>>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is >>>>> nuleic acid rather than residues. >>>>> Why not just allow CV there? We can share the same CV as the PEFF format, >>>>> which includes, taxonomy, sequence type, gene >>>>> ID, and lots of wonderful other things? >>>>> >>>>> >>>>> – unless someone can convince me otherwise? >>>>> Cheers >>>>> Andy >>>>> >>>>> >>>>> From: David Creasy [mailto:dc...@ma...] >>>>> Sent: 27 June 2008 14:51 >>>>> To: Jones, Andy >>>>> Cc: psi...@li... >>>>> Subject: Re: [Psidev-pi-dev] Representing Sequences >>>>> >>>>> Hi Andy, >>>>> >>>>> length may be useful, because some people won't want to output the actual >>>>> sequence for space reasons. The other things >>>>> we wanted to add before were pI and mass. >>>>> Why do we want name? Is this for, say, a description line? >>>>> (Also, identifier -> id?) >>>>> >>>>> David >>>>> >>>>> Jones, Andy wrote: >>>>> Hi all, >>>>> >>>>> It was decided on the call that we would like to flag that Sequences in the >>>>> ConceptualMoleculeCollection should have a >>>>> Boolean attribute to capture if they are decoy sequences. At the moment we are >>>>> using the FuGE:Sequence element. I don't >>>>> really want to add another attribute to this (it's less problematic cutting down FuGE >>>>> than adding new things), so I'm >>>>> wondering if we should define our own Sequence type in AnalysisXML. This >>>>> would also allow us to choose exactly the >>>>> relevant attributes. At the moment, Sequence can have all of the following: >>>>> >>>>> <pf:Sequence isCircular="true" sequence="String" length="0" >>>>> isApproximateLength="true" >>>>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" >>>>> name="String"> >>>>> >>>>> Several of these attributes were created to represent concepts that probably will >>>>> never be required or implemented in >>>>> AnalysisXML. How about the following: >>>>> >>>>> <DBSequence identifier = "" name = "" isDecoy = "true"> >>>>> <seq>MCTMG...</seq> >>>>> <pf:DatabaseReference Database_ref="" >>>>> accession="Rev_IPI00013808.1"/> >>>>> </DBSequence> >>>>> >>>>> Are any of the other attributes on Sequence actually required? I'll post a new >>>>> version of the schema with other changes >>>>> WRT to PeptideEvidence shortly, >>>>> Cheers >>>>> Andy >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> Check out the new SourceForge.net Marketplace. >>>>> It's the best place to buy or sell services for >>>>> just about anything Open Source. >>>>> http://sourceforge.net/services/buy/index.php >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>>> >>>>> -- >>>>> David Creasy >>>>> Matrix Science >>>>> 64 Baker Street >>>>> London W1U 7GB, UK >>>>> Tel: +44 (0)20 7486 1050 >>>>> Fax: +44 (0)20 7224 1344 >>>>> >>>>> dc...@ma... >>>>> http://www.matrixscience.com >>>>> >>>>> Matrix Science Ltd. is registered in England and Wales >>>>> Company number 3533898 >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> Check out the new SourceForge.net Marketplace. >>>>> It's the best place to buy or sell services for >>>>> just about anything Open Source. >>>>> http://sourceforge.net/services/buy/index.php >>>>> >>>>> >>>>> >>>>> >>>>> ________________________________________ >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>>> >>>>> -- >>>>> David Creasy >>>>> Matrix Science >>>>> 64 Baker Street >>>>> London W1U 7GB, UK >>>>> Tel: +44 (0)20 7486 1050 >>>>> Fax: +44 (0)20 7224 1344 >>>>> >>>>> dc...@ma... >>>>> http://www.matrixscience.com >>>>> >>>>> Matrix Science Ltd. is registered in England and Wales >>>>> Company number 3533898 >>>>> >>>>> ------------------------------------------------------------------------- >>>>> Check out the new SourceForge.net Marketplace. >>>>> It's the best place to buy or sell services for >>>>> just about anything Open Source. >>>>> http://sourceforge.net/services/buy/index.php >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>>> >>>>> >>>>> -- >>>>> Angel Pizarro >>>>> Director, ITMAT Bioinformatics Facility >>>>> 806 Biological Research Building >>>>> 421 Curie Blvd. >>>>> Philadelphia, PA 19104-6160 >>>>> 215-573-3736 >>>>> ________________________________________ >>>>> >>>>> ------------------------------------------------------------------------- >>>>> Check out the new SourceForge.net Marketplace. >>>>> It's the best place to buy or sell services for >>>>> just about anything Open Source. >>>>> http://sourceforge.net/services/buy/index.php >>>>> ________________________________________ >>>>> >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>>> >>>>> >>>>> -- >>>>> David Creasy >>>>> Matrix Science >>>>> 64 Baker Street >>>>> London W1U 7GB, UK >>>>> Tel: +44 (0)20 7486 1050 >>>>> Fax: +44 (0)20 7224 1344 >>>>> >>>>> dc...@ma... >>>>> http://www.matrixscience.com >>>>> >>>>> Matrix Science Ltd. is registered in England and Wales >>>>> Company number 3533898 >>>>> >>>>> ________________________________________ >>>>> >>>>> ------------------------------------------------------------------------- >>>>> Check out the new SourceForge.net Marketplace. >>>>> It's the best place to buy or sell services for >>>>> just about anything Open Source. >>>>> http://sourceforge.net/services/buy/index.php >>>>> >>>>> ________________________________________ >>>>> >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------- >>>>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>>>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>>>> Grand prize is a trip for two to an Open Source event anywhere in the world >>>>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>>>> _______________________________________________ >>>>> Psidev-pi-dev mailing list >>>>> Psi...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-07-31 14:25:46
|
> >But I might be wrong and we definitely have to wait for > > a top-down instance doc. > Sorry for the delay ;) I've put one here: > http://code.google.com/p/psi-pi/source/browse/#svn/trunk/examples/schema_usecase_examples/working31July > It's not so bad really. In the case of signal peptides, or leading > methionine (as in this example), the protein that was analysed may be > different from the sequence in the database, and there must be a way of > representing this. So you think it's okay like it is and no doubling. Or can I derive an issue from the "not so bad really" phrase ;-) > >> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be > >> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? > > Yes, if <PeptideEvidence> stays optional. > What about denovo where there is no database... That is an argument to have PeptideEvidence optional, isn't it? But DBSequence_ref as attribute of it should be mandatory. > >> It is a database search parameter: > >> <AdditionalSearchParams> > >> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> > > Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. > > http://code.google.com/p/psi-pi/issues/detail?id=37 > > I'm not sure I understand whether this is OK or not now? (And why use > Pride CV?) I think the current schema is not okay, because it allows "average" in one SpecIdent and "mono" in another, so it is not well-defined for the masses in elements or attributes. We need a global attribute :-) or element. Or it can be done later in semantic validation :-( . Bye Martin > >>> -----Original Message----- > >>> From: psi...@li... [mailto:psidev-pi-dev- > >>> bo...@li...] On Behalf Of Martin Eisenacher > >>> Sent: 30 July 2008 13:05 > >>> To: 'Pierre-Alain Binz' > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>> > >>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: > >>> > >>>> 2nd July, 2008: > >>>> a couple of questions, just to make sure: > >>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection > >>> information? > >>> I hope not, by referencing the same identifier. > >>> > >>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > >>> element > >>>> (and not to a DBSequence), identification is obligatory a Peptide? > >>> At the moment I think it's possible to directly reference a DBSeq. At the time the > >>> foreign key definitions are implemented we can forbid that. > >>> But we should have in mind, that a peptide is a sequence plus modifications, so if > >>> top-down > >>> identifies only a sequence, we should allow that and if top-down identifies with > >>> mods, > >>> we should forbid that. > >>> It would be quite helpful to have a top-down instance doc. To check > >>> whether our thoughts are really deep enough... > >>> > >>>> 2) and what about spectral library searches, do we have to have Peptide > >>>> elements with possibly undefined explicit sequences to refer to > >>> >from the SpectrumIdentificationResult (because non peptidic, or because not > >>> identified > >>>> but good spectrum) > >>> At the moment the sequence element can be empty or even left out. > >>> User or CV params are allowed. > >>> How do they report results in spectral lib search if they identify non-peptidic or > >>> unidentified? > >>> We need CV terms for that... > >>> > >>>> 3) in the Peptide element, the Modifications are defined in a much more > >>>> detailed manner than in ModificationParams (PSI-MOD is there for > >>>> instance). Does this simply mean that The ModificationParams codes > >>>> the search engine settings and the Peptide includes the formal PSI > >>>> definition of the Mod? And the only reference is the ModName value? > >>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > >>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > >>> or > >>> they can define their own. > >>> > >>>> 4) all mass values (sequenceMass, calculatedMassToCharge, > >>> experimentalMassToCharge, > >>>> are not specified whether monoisotopic or averaged. > >>>> Do we assume that averaged does not exist anymore? > >>> No, we decided to have only one type of masses in the whole analysisXML. > >>> But I cannot find a note for that or a schema attribute... I will add an issue for that. > >>> > >>> > >>>> 5) is sequenceMass the mass value with/without the mods? If with, the > >>>> name might be missleading (peptideMass would be more appropriate) > >>> It is indeed the mass of the sequence without mods. > >>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > >>> > >>>> 6) in case the DBSequence is nucleotide, is there a tag for saying > >>>> this? (NB: MS on nucleotide molecules can be performed and analysed, > >>>> not only MS on AA sequences that are interpreting nucleotide sequences). > >>>> Or do we neglect MS experiments done on nucleotide molecules (and by > >>>> the way on glycans...) and only represent the DBSequences as AA > >>>> sequences (frame translations)? (and what about glycans?) > >>>> Probaly can be solved if one can replace SequenceCollection by > >>>> something else if needed (SmallMoleculeCollection, GlycanCollection, > >>>> MoleculeCollection)... but the validator might not like this. > >>> Mh, these can be extensions, I think they are not possible at the moment. > >>> But a tag for the type can indeed be useful, it could be a CV param. > >>> I will create an issue for that. > >>> > >>>> 7) in case that DBSequence is nucleotide, do we represent the > >>>> Peptide as AA sequence in case of MS done on proteins? > >>> I hope the following answers this: > >>> > >>> <DBSequence> is the nucleotide seq from the nucleotide DB, > >>> <Peptide> is the identified amino acid sequence plus mods (without any translation > >>> frame or something). > >>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a > >>> TranslationTable_Ref attribute. > >>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB > >>> case.) > >>> If a protein detection is performed, there are <PeptideHypothesis> elements > >>> referencing > >>> PeptideEvidence elements from SpectrumIdentificationItem sections. > >>> > >>> > >>> > >>> Bye > >>> Martin > >>> > >>> > >>> > >>> > >>> David Creasy wrote: > >>> Thanks Andy, > >>> > >>> I've added an updated example document to SVN: > >>> http://code.google.com/p/psi- > >>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 > >>> 1350.xml > >>> > >>> Problem is that we have now removed the main point of these recent changes > >>> which was to add the decoy flag... I think > >>> that we need to add isDecoy to SpectrumIdentificationItem. > >>> > >>> And yes, I suspect that we should go back to using the > >>> ConceptualMoleculeCollection > >>> Um, and since we've not actually ended up adding anything to DBSequence... we > >>> haven't actually achieved anything? > >>> I think we need to discuss this again at the next telecon. > >>> > >>> David > >>> > >>> Jones, Andy wrote: > >>> Hi all, > >>> > >>> Ive updated the schema in SVN with the following main changes: > >>> > >>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the > >>> call (simple mappings to proteins are done > >>> at this level) > >>> Added DBSequence that should be used instead of Sequence (following some of > >>> the discussion below) > >>> Created a new collection class SequenceCollection (rather than > >>> ConceptualMoleculeCollection) so that only references can > >>> be given to DBSequence and Peptide > >>> In fact, Im not sure if this is sensible since it prevents other types of > >>> ConceptualMolecule being added later... to > >>> discuss > >>> In FuGE on cvParam, the value attribute is no longer mandatory > >>> > >>> Ive added a simple example that validates under > >>> examples\schema_usecase_examples\working27June > >>> > >>> Feel free to mail me any changes to make on Monday, > >>> Cheers > >>> Andy > >>> > >>> > >>> > >>> From: psi...@li... [mailto:psidev-pi-dev- > >>> bo...@li...] On Behalf Of > >>> Jones, Andy > >>> Sent: 27 June 2008 16:24 > >>> To: Angel Pizarro > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>> > >>> I think Angels response below might not have made it round the list yet. > >>> > >>> I tend to agree that isDecoy is redundant information and perhaps this is not the > >>> best place to encode semantic > >>> information. An alternative would be to have a parameter, say on > >>> SpectrumIdentification for cvParam = decoy_string > >>> value = Rev. This would be a more compact representation and we would not > >>> have to add what is quite a specific > >>> attribute type (isDecoy) to Sequence. > >>> > >>> > >>> > >>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel > >>> Pizarro > >>> Sent: 27 June 2008 15:59 > >>> To: Jones, Andy > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > >>> > >>> my 2¢ : > >>> You need to be able to extend this to all molecule types, or am I missing the point > >>> of this thread, and you mean that > >>> this would be a suclass of the conceptual molecule element? > >>> > >>> Second, and this is is tangentially related, but are decoy sequences really a > >>> problem we should be putting our effort > >>> into? Is it in our domain to encode semantic information about a sequence, and > >>> possibly relating reported sequences as > >>> part of our schema? > >>> On a personal level I could care less if "isDecoy" is an attribute or not, but the > >>> temptation then would be for folks to > >>> encode the same accession for two different sequences, effectively making the > >>> primary key of the sequence object > >>> (accession, isDecoy) > >>> > >>> > >>> Do we want to go there? > >>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> > >>> wrote: > >>> So how about include length as an attribute and then let all other things go in the > >>> CV (pI, mass, etc.)? > >>> > >>> > >>> > >>> From: Jones, Andy > >>> Sent: 27 June 2008 14:54 > >>> To: 'David Creasy' > >>> Subject: RE: [Psidev-pi-dev] Representing Sequences > >>> > >>> id and name are standard for all elements that inherit from FuGE identifiable this > >>> is perhaps a separate discussion as > >>> to whether the optional name attribute should be there. > >>> > >>> I agree that length may be useful is this just an integer value with no unit? > >>> Yes, I think so. > >>> I'm less sure about pI and mass since mass at least can be calculated very simply > >>> Only if you have the sequence... (we have residue masses in the file). > >>> > >>> > >>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless > >>> Scandalous! (I happen to agree, but now some people will never speak to either of > >>> us ever again). > >>> > >>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is > >>> nuleic acid rather than residues. > >>> Why not just allow CV there? We can share the same CV as the PEFF format, > >>> which includes, taxonomy, sequence type, gene > >>> ID, and lots of wonderful other things? > >>> > >>> > >>> unless someone can convince me otherwise? > >>> Cheers > >>> Andy > >>> > >>> > >>> From: David Creasy [mailto:dc...@ma...] > >>> Sent: 27 June 2008 14:51 > >>> To: Jones, Andy > >>> Cc: psi...@li... > >>> Subject: Re: [Psidev-pi-dev] Representing Sequences > >>> > >>> Hi Andy, > >>> > >>> length may be useful, because some people won't want to output the actual > >>> sequence for space reasons. The other things > >>> we wanted to add before were pI and mass. > >>> Why do we want name? Is this for, say, a description line? > >>> (Also, identifier -> id?) > >>> > >>> David > >>> > >>> Jones, Andy wrote: > >>> Hi all, > >>> > >>> It was decided on the call that we would like to flag that Sequences in the > >>> ConceptualMoleculeCollection should have a > >>> Boolean attribute to capture if they are decoy sequences. At the moment we are > >>> using the FuGE:Sequence element. I don't > >>> really want to add another attribute to this (it's less problematic cutting down FuGE > >>> than adding new things), so I'm > >>> wondering if we should define our own Sequence type in AnalysisXML. This > >>> would also allow us to choose exactly the > >>> relevant attributes. At the moment, Sequence can have all of the following: > >>> > >>> <pf:Sequence isCircular="true" sequence="String" length="0" > >>> isApproximateLength="true" > >>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" > >>> name="String"> > >>> > >>> Several of these attributes were created to represent concepts that probably will > >>> never be required or implemented in > >>> AnalysisXML. How about the following: > >>> > >>> <DBSequence identifier = "" name = "" isDecoy = "true"> > >>> <seq>MCTMG...</seq> > >>> <pf:DatabaseReference Database_ref="" > >>> accession="Rev_IPI00013808.1"/> > >>> </DBSequence> > >>> > >>> Are any of the other attributes on Sequence actually required? I'll post a new > >>> version of the schema with other changes > >>> WRT to PeptideEvidence shortly, > >>> Cheers > >>> Andy > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> > >>> > >>> > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> -- > >>> David Creasy > >>> Matrix Science > >>> 64 Baker Street > >>> London W1U 7GB, UK > >>> Tel: +44 (0)20 7486 1050 > >>> Fax: +44 (0)20 7224 1344 > >>> > >>> dc...@ma... > >>> http://www.matrixscience.com > >>> > >>> Matrix Science Ltd. is registered in England and Wales > >>> Company number 3533898 > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> > >>> > >>> > >>> > >>> ________________________________________ > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> -- > >>> David Creasy > >>> Matrix Science > >>> 64 Baker Street > >>> London W1U 7GB, UK > >>> Tel: +44 (0)20 7486 1050 > >>> Fax: +44 (0)20 7224 1344 > >>> > >>> dc...@ma... > >>> http://www.matrixscience.com > >>> > >>> Matrix Science Ltd. is registered in England and Wales > >>> Company number 3533898 > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> > >>> -- > >>> Angel Pizarro > >>> Director, ITMAT Bioinformatics Facility > >>> 806 Biological Research Building > >>> 421 Curie Blvd. > >>> Philadelphia, PA 19104-6160 > >>> 215-573-3736 > >>> ________________________________________ > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> ________________________________________ > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> > >>> -- > >>> David Creasy > >>> Matrix Science > >>> 64 Baker Street > >>> London W1U 7GB, UK > >>> Tel: +44 (0)20 7486 1050 > >>> Fax: +44 (0)20 7224 1344 > >>> > >>> dc...@ma... > >>> http://www.matrixscience.com > >>> > >>> Matrix Science Ltd. is registered in England and Wales > >>> Company number 3533898 > >>> > >>> ________________________________________ > >>> > >>> ------------------------------------------------------------------------- > >>> Check out the new SourceForge.net Marketplace. > >>> It's the best place to buy or sell services for > >>> just about anything Open Source. > >>> http://sourceforge.net/services/buy/index.php > >>> > >>> ________________________________________ > >>> > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > >>> > >>> > >>> > >>> ------------------------------------------------------------------------- > >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > >>> Build the coolest Linux based applications with Moblin SDK & win great prizes > >>> Grand prize is a trip for two to an Open Source event anywhere in the world > >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >>> _______________________________________________ > >>> Psidev-pi-dev mailing list > >>> Psi...@li... > >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 |
From: David C. <dc...@ma...> - 2008-07-31 13:50:22
|
Hi, Martin Eisenacher wrote: > Hi Andy, hi all, > >> As I see it SpectrumIdentificationItem is intended only for identifying Peptides. I didn't fully understand > Yes, I agree; but I understood Pierre-Alains question as a hint, that top-down > identifies protein sequences, so we would have to double information, referencing a protein sequence > as <Peptide> from <SpectrumIdentificationItem> and then the same sequence as <DBSequence> from > <ProteinDetectionResult>. But I might be wrong and we definitely have to wait for > a top-down instance doc. Sorry for the delay ;) I've put one here: http://code.google.com/p/psi-pi/source/browse/#svn/trunk/examples/schema_usecase_examples/working31July It's not so bad really. In the case of signal peptides, or leading methionine (as in this example), the protein that was analysed may be different from the sequence in the database, and there must be a way of representing this. > > >> Looking at it again, the model of SpectrumIdentificationItem is a little hard to understand and we could >> probably improve it. This is because SpectrumIdentificationItem has both Peptide_ref (i.e. a reference to a >> Peptide sequence and its mods) plus PeptideEvidence which is a reference to the part of the ProteinSequence >> this Peptide was derived from. The PeptideEvidence lines could be shifted up to <Peptide> and renamed e.g. >> SourceProtein - this would save some space and would appear to be a logically more sensible model... > You mean shifting <PeptideEvidence> under <Peptide> in the SequenceCollection? But missedcleavages > is only well-defined in relation to a search (using an enzyme)! > > >> I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be >> mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? > Yes, if <PeptideEvidence> stays optional. What about denovo where there is no database... > > >>>> 4) all mass values (sequenceMass, calculatedMassToCharge, >>> experimentalMassToCharge, >>>> are not specified whether monoisotopic or averaged. >>>> Do we assume that averaged does not exist anymore? >>> No, we decided to have only one type of masses in the whole analysisXML. >>> But I cannot find a note for that or a schema attribute... I will add an issue for that. >> It is a database search parameter: >> <AdditionalSearchParams> >> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> > > Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. > http://code.google.com/p/psi-pi/issues/detail?id=37 I'm not sure I understand whether this is OK or not now? (And why use Pride CV?) David > > > bye > Martin > > >>> -----Original Message----- >>> From: psi...@li... [mailto:psidev-pi-dev- >>> bo...@li...] On Behalf Of Martin Eisenacher >>> Sent: 30 July 2008 13:05 >>> To: 'Pierre-Alain Binz' >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: >>> >>>> 2nd July, 2008: >>>> a couple of questions, just to make sure: >>>> 1) in case of top-down approach, do we have to duplicate sequenceCollection >>> information? >>> I hope not, by referencing the same identifier. >>> >>>> as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide >>> element >>>> (and not to a DBSequence), identification is obligatory a Peptide? >>> At the moment I think it's possible to directly reference a DBSeq. At the time the >>> foreign key definitions are implemented we can forbid that. >>> But we should have in mind, that a peptide is a sequence plus modifications, so if >>> top-down >>> identifies only a sequence, we should allow that and if top-down identifies with >>> mods, >>> we should forbid that. >>> It would be quite helpful to have a top-down instance doc. To check >>> whether our thoughts are really deep enough... >>> >>>> 2) and what about spectral library searches, do we have to have Peptide >>>> elements with possibly undefined explicit sequences to refer to >>> >from the SpectrumIdentificationResult (because non peptidic, or because not >>> identified >>>> but good spectrum) >>> At the moment the sequence element can be empty or even left out. >>> User or CV params are allowed. >>> How do they report results in spectral lib search if they identify non-peptidic or >>> unidentified? >>> We need CV terms for that... >>> >>>> 3) in the Peptide element, the Modifications are defined in a much more >>>> detailed manner than in ModificationParams (PSI-MOD is there for >>>> instance). Does this simply mean that The ModificationParams codes >>>> the search engine settings and the Peptide includes the formal PSI >>>> definition of the Mod? And the only reference is the ModName value? >>> I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms >>> for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV >>> or >>> they can define their own. >>> >>>> 4) all mass values (sequenceMass, calculatedMassToCharge, >>> experimentalMassToCharge, >>>> are not specified whether monoisotopic or averaged. >>>> Do we assume that averaged does not exist anymore? >>> No, we decided to have only one type of masses in the whole analysisXML. >>> But I cannot find a note for that or a schema attribute... I will add an issue for that. >>> >>> >>>> 5) is sequenceMass the mass value with/without the mods? If with, the >>>> name might be missleading (peptideMass would be more appropriate) >>> It is indeed the mass of the sequence without mods. >>> THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation >>> >>>> 6) in case the DBSequence is nucleotide, is there a tag for saying >>>> this? (NB: MS on nucleotide molecules can be performed and analysed, >>>> not only MS on AA sequences that are interpreting nucleotide sequences). >>>> Or do we neglect MS experiments done on nucleotide molecules (and by >>>> the way on glycans...) and only represent the DBSequences as AA >>>> sequences (frame translations)? (and what about glycans?) >>>> Probaly can be solved if one can replace SequenceCollection by >>>> something else if needed (SmallMoleculeCollection, GlycanCollection, >>>> MoleculeCollection)... but the validator might not like this. >>> Mh, these can be extensions, I think they are not possible at the moment. >>> But a tag for the type can indeed be useful, it could be a CV param. >>> I will create an issue for that. >>> >>>> 7) in case that DBSequence is nucleotide, do we represent the >>>> Peptide as AA sequence in case of MS done on proteins? >>> I hope the following answers this: >>> >>> <DBSequence> is the nucleotide seq from the nucleotide DB, >>> <Peptide> is the identified amino acid sequence plus mods (without any translation >>> frame or something). >>> <PeptideEvidence> contains the DBSequence_Ref together with a frame and a >>> TranslationTable_Ref attribute. >>> (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB >>> case.) >>> If a protein detection is performed, there are <PeptideHypothesis> elements >>> referencing >>> PeptideEvidence elements from SpectrumIdentificationItem sections. >>> >>> >>> >>> Bye >>> Martin >>> >>> >>> >>> >>> David Creasy wrote: >>> Thanks Andy, >>> >>> I've added an updated example document to SVN: >>> http://code.google.com/p/psi- >>> pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 >>> 1350.xml >>> >>> Problem is that we have now removed the main point of these recent changes >>> which was to add the decoy flag... I think >>> that we need to add isDecoy to SpectrumIdentificationItem. >>> >>> And yes, I suspect that we should go back to using the >>> ConceptualMoleculeCollection >>> Um, and since we've not actually ended up adding anything to DBSequence... we >>> haven't actually achieved anything? >>> I think we need to discuss this again at the next telecon. >>> >>> David >>> >>> Jones, Andy wrote: >>> Hi all, >>> >>> I’ve updated the schema in SVN with the following main changes: >>> >>> PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the >>> call (simple mappings to proteins are done >>> at this level) >>> Added DBSequence that should be used instead of Sequence (following some of >>> the discussion below) >>> Created a new collection class SequenceCollection (rather than >>> ConceptualMoleculeCollection) so that only references can >>> be given to DBSequence and Peptide >>> In fact, I’m not sure if this is sensible since it prevents other types of >>> ConceptualMolecule being added later... to >>> discuss >>> In FuGE on cvParam, the value attribute is no longer mandatory >>> >>> I’ve added a simple example that validates under >>> examples\schema_usecase_examples\working27June >>> >>> Feel free to mail me any changes to make on Monday, >>> Cheers >>> Andy >>> >>> >>> >>> From: psi...@li... [mailto:psidev-pi-dev- >>> bo...@li...] On Behalf Of >>> Jones, Andy >>> Sent: 27 June 2008 16:24 >>> To: Angel Pizarro >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> I think Angel’s response below might not have made it round the list yet. >>> >>> I tend to agree that isDecoy is redundant information and perhaps this is not the >>> best place to encode semantic >>> information. An alternative would be to have a parameter, say on >>> SpectrumIdentification for cvParam = “decoy_string” >>> value = “Rev”. This would be a more compact representation and we would not >>> have to add what is quite a specific >>> attribute type (isDecoy) to Sequence. >>> >>> >>> >>> From: an...@it... [mailto:an...@it...] On Behalf Of Angel >>> Pizarro >>> Sent: 27 June 2008 15:59 >>> To: Jones, Andy >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> my 2¢ : >>> You need to be able to extend this to all molecule types, or am I missing the point >>> of this thread, and you mean that >>> this would be a suclass of the conceptual molecule element? >>> >>> Second, and this is is tangentially related, but are decoy sequences really a >>> problem we should be putting our effort >>> into? Is it in our domain to encode semantic information about a sequence, and >>> possibly relating reported sequences as >>> part of our schema? >>> On a personal level I could care less if "isDecoy" is an attribute or not, but the >>> temptation then would be for folks to >>> encode the same accession for two different sequences, effectively making the >>> primary key of the sequence object >>> (accession, isDecoy) >>> >>> >>> Do we want to go there? >>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> >>> wrote: >>> So how about include length as an attribute and then let all other things go in the >>> CV (pI, mass, etc.)? >>> >>> >>> >>> From: Jones, Andy >>> Sent: 27 June 2008 14:54 >>> To: 'David Creasy' >>> Subject: RE: [Psidev-pi-dev] Representing Sequences >>> >>> id and name are standard for all elements that inherit from FuGE identifiable – this >>> is perhaps a separate discussion as >>> to whether the optional name attribute should be there. >>> >>> I agree that length may be useful – is this just an integer value with no unit? >>> Yes, I think so. >>> I'm less sure about pI and mass since mass at least can be calculated very simply >>> Only if you have the sequence... (we have residue masses in the file). >>> >>> >>> , and pI values (in my opinion) are pretty inaccurate and fairly meaningless >>> Scandalous! (I happen to agree, but now some people will never speak to either of >>> us ever again). >>> >>> The main problem with mass and pI is that these are 'irrelevant' if the sequence is >>> nuleic acid rather than residues. >>> Why not just allow CV there? We can share the same CV as the PEFF format, >>> which includes, taxonomy, sequence type, gene >>> ID, and lots of wonderful other things? >>> >>> >>> – unless someone can convince me otherwise? >>> Cheers >>> Andy >>> >>> >>> From: David Creasy [mailto:dc...@ma...] >>> Sent: 27 June 2008 14:51 >>> To: Jones, Andy >>> Cc: psi...@li... >>> Subject: Re: [Psidev-pi-dev] Representing Sequences >>> >>> Hi Andy, >>> >>> length may be useful, because some people won't want to output the actual >>> sequence for space reasons. The other things >>> we wanted to add before were pI and mass. >>> Why do we want name? Is this for, say, a description line? >>> (Also, identifier -> id?) >>> >>> David >>> >>> Jones, Andy wrote: >>> Hi all, >>> >>> It was decided on the call that we would like to flag that Sequences in the >>> ConceptualMoleculeCollection should have a >>> Boolean attribute to capture if they are decoy sequences. At the moment we are >>> using the FuGE:Sequence element. I don't >>> really want to add another attribute to this (it's less problematic cutting down FuGE >>> than adding new things), so I'm >>> wondering if we should define our own Sequence type in AnalysisXML. This >>> would also allow us to choose exactly the >>> relevant attributes. At the moment, Sequence can have all of the following: >>> >>> <pf:Sequence isCircular="true" sequence="String" length="0" >>> isApproximateLength="true" >>> SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" >>> name="String"> >>> >>> Several of these attributes were created to represent concepts that probably will >>> never be required or implemented in >>> AnalysisXML. How about the following: >>> >>> <DBSequence identifier = "" name = "" isDecoy = "true"> >>> <seq>MCTMG...</seq> >>> <pf:DatabaseReference Database_ref="" >>> accession="Rev_IPI00013808.1"/> >>> </DBSequence> >>> >>> Are any of the other attributes on Sequence actually required? I'll post a new >>> version of the schema with other changes >>> WRT to PeptideEvidence shortly, >>> Cheers >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> >>> >>> >>> ________________________________________ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> -- >>> Angel Pizarro >>> Director, ITMAT Bioinformatics Facility >>> 806 Biological Research Building >>> 421 Curie Blvd. >>> Philadelphia, PA 19104-6160 >>> 215-573-3736 >>> ________________________________________ >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> ________________________________________ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> ________________________________________ >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> ________________________________________ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> ------------------------------------------------------------------------- >>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >>> Build the coolest Linux based applications with Moblin SDK & win great prizes >>> Grand prize is a trip for two to an Open Source event anywhere in the world >>> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-07-31 12:45:49
|
Hi, I've added an example N15 Mascot search to: http://code.google.com/p/psi-pi/source/browse/#svn/trunk/examples/schema_usecase_examples/working29July This seems to work pretty well (as expected!!) The interesting thing is that there are two sets of residue masses, two SpectrumIdentificationList (one for light, one for heavy) but just one ProteinDetectionList. And (possibly confusing at first glance!) unmodified peptides with different masses like this: <Peptide id="peptide_48_1" sequenceMass="1025.481796" > <peptideSequence>STNLDWYK</peptideSequence> </Peptide> <Peptide id="peptide_53_1" sequenceMass="1036.449188"> <peptideSequence>STNLDWYK</peptideSequence> </Peptide> Any comments welcomed... -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Jones, A. <And...@li...> - 2008-07-31 12:31:54
|
Hi all, I only made SearchDatabase mandatory on DBSequence, so I think this is okay? And while you are there, could you allow multiple TranslationTable entries under DatabaseTranslationFrames? I've added a Mascot_NA_example.xml to examples\schema_usecase_examples\working29July. I also think that we should change the name from DatabaseTranslationFrames to DatabaseTranslation: Done. I also think that we should change the name from DatabaseTranslationFrames to DatabaseTranslation Done. Since I've actually included the table, there is no need to reference the NCBI ontology (which doesn't exist), and this then also copes with the custom case? Can we also allow multiple cv items in the translation table so the start codons can be specified? If anyone sees a problem, let me know otherwise I'll change the example, add details to the wiki documentation and sign off issue #24 i.e. Make the reference to CVParam 0..*? Done. Speak at 4, cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of David Creasy Sent: 31 July 2008 13:15 To: Martin Eisenacher Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Changes to schema on 29th July 2008 Hi Martin, Andy It's in schema_usecase_examples\working29July - probably added by Andy actually. And yes, of course you are right about it being optional. (Could you add a note to the documentation as to why it's optional) Sorry about that Andy, could you undo those changes. And while you are there, could you allow multiple TranslationTable entries under DatabaseTranslationFrames? I've added a Mascot_NA_example.xml to examples\schema_usecase_examples\working29July. I also think that we should change the name from DatabaseTranslationFrames to DatabaseTranslation: <DatabaseTranslationFrames frames="1,2,3,4,5,6"> <TranslationTable id="0" name="Unspecified"> <pf:cvParam accession="PI:00025" name="translation table" cvRef="PSI-PI" value="" /> </TranslationTable> <TranslationTable id="1" name="Standard"> <pf:cvParam accession="PI:00025" name="translation table" cvRef="PSI-PI" value="FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG " /> </TranslationTable> <TranslationTable id="2" name="Vertebrate Mitochondrial"> <pf:cvParam accession="PI:00025" name="translation table" cvRef="PSI-PI" value="FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG " /> ... Since I've actually included the table, there is no need to reference the NCBI ontology (which doesn't exist), and this then also copes with the custom case? Can we also allow multiple cv items in the translation table so the start codons can be specified? If anyone sees a problem, let me know otherwise I'll change the example, add details to the wiki documentation and sign off issue #24 Thanks, David Martin Eisenacher wrote: Hi David, hi all! Mh, I cannot find a MPC_use_case_working_temp.axml (where have you found it?); the latest is MPC_use_case_working27June.axml and it has no empty SearchDatabase_ref. But I think SearchDatabase_ref should be optional in the <Inputs> because of non-database searches. Andy added SearchDatabase_ref and accession into <DBSequence> and there it makes sense to make them mandatory (because it is “DB”sequence). It would be more human-readable to have: <SpectraData ref="SD_1"/> <SearchDatabase ref="SDB_SwissProt"/> but I agree with Andy it maybe more FuGe and more validatable to have <SpectraData SpectraData_ref="SD_1"/> <SearchDatabase SearchDatabase_ref="SDB_SwissProt"/> (For some seconds I thought to move them to attributes but we have potentially more than one.) Bye Martin Von: psi...@li... [mailto:psi...@li...] Im Auftrag von David Creasy Gesendet: Wednesday, July 30, 2008 6:00 AM An: psi...@li... Betreff: [Psidev-pi-dev] Changes to schema on 29th July 2008 Hi Andy, Thanks for the recent changes to the schema. You asked a question: Added <xsd:attribute name="SearchDatabase_ref" type="xsd:string"/> to DBSequence and <xsd:attribute name="accession" type="xsd:string"/> should either/both of these be set as required? I think that the answer has to be yes. However, in the MPC_use_case_working_temp.axml file, there is: SearchDatabase_ref="", so maybe Martin could comment? And under <AnalysisCollection> <SpectrumIdentification ... > you made the change from <SpectraData_ref id="SD_1"/> <SearchDatabase_ref id="SDB_SwissProt"/> to <SpectraData_ref ref="SD_1"/> <SearchDatabase SearchDatabase_ref="SDB_SwissProt"/> which maybe isn't as consistent as you intended - or maybe I have missed the point? There is also still a : <SpectrumIdentificationList_ref ref=... cvList is fine, except it can only be a list of exactly 1 item at the moment... (I've not fixed the schema). I didn't notice it last time, but the schema doesn't validate with xerces: Need to change (in 2 places): <xsd:pattern value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?-]{1}"/> to <xsd:pattern value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?\-]{1}"/> (I've not updated the schema in svn - or checked to see if xerces is correct). Thanks, David -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-07-31 12:14:32
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=UTF-8" http-equiv="Content-Type"> <title></title> </head> <body bgcolor="#ffffff" text="#000000"> Hi Martin, Andy<br> <br> It's in schema_usecase_examples\working29July - probably added by Andy actually.<br> And yes, of course you are right about it being optional. (Could you add a note to the documentation as to why it's optional)<br> <br> Sorry about that Andy, could you undo those changes.<br> <br> And while you are there, could you allow multiple TranslationTable entries under DatabaseTranslationFrames?<br> I've added a Mascot_NA_example.xml to examples\schema_usecase_examples\working29July. <br> I also think that we should change the name from DatabaseTranslationFrames to DatabaseTranslation:<br> <br> <small><small><tt> <DatabaseTranslationFrames frames="1,2,3,4,5,6"><br> <TranslationTable id="0" name="Unspecified"><br> <pf:cvParam accession="PI:00025" name="translation table" cvRef="PSI-PI" value="" /><br> </TranslationTable><br> <TranslationTable id="1" name="Standard"><br> <pf:cvParam accession="PI:00025" name="translation table" cvRef="PSI-PI" value="FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG " /><br> </TranslationTable><br> <TranslationTable id="2" name="Vertebrate Mitochondrial"><br> <pf:cvParam accession="PI:00025" name="translation table" cvRef="PSI-PI" value="FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG " /><br> ...<br> </tt></small></small><br> Since I've actually included the table, there is no need to reference the NCBI ontology (which doesn't exist), and this then also copes with the custom case? Can we also allow multiple cv items in the translation table so the start codons can be specified?<br> If anyone sees a problem, let me know otherwise I'll change the example, add details to the wiki documentation and sign off issue #24<br> <br> Thanks,<br> David<br> <br> Martin Eisenacher wrote: <blockquote cite="mid:000901c8f2fb$12e32bd0$38a98370$@eis...@ru..." type="cite"> <meta http-equiv="Content-Type" content="text/html; "> <meta name="ProgId" content="Word.Document"> <meta name="Generator" content="Microsoft Word 12"> <meta name="Originator" content="Microsoft Word 12"> <link rel="File-List" href="cid:filelist.xml@01C8F30B.D60A53D0"> <!--[if gte mso 9]><xml> <o:OfficeDocumentSettings> <o:AllowPNG/> <o:DoNotRelyOnCSS/> <o:TargetScreenSize>1024x768</o:TargetScreenSize> </o:OfficeDocumentSettings> </xml><![endif]--><!--[if gte mso 9]><xml> <w:WordDocument> <w:Zoom>120</w:Zoom> <w:SpellingState>Clean</w:SpellingState> <w:TrackMoves/> <w:TrackFormatting/> <w:HyphenationZone>21</w:HyphenationZone> <w:EnvelopeVis/> <w:ValidateAgainstSchemas/> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:DoNotPromoteQF/> <w:LidThemeOther>DE</w:LidThemeOther> <w:LidThemeAsian>X-NONE</w:LidThemeAsian> <w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript> <w:Compatibility> <w:DoNotExpandShiftReturn/> <w:BreakWrappedTables/> <w:SplitPgBreakAndParaMark/> <w:DontVertAlignCellWithSp/> <w:DontBreakConstrainedForcedTables/> <w:DontVertAlignInTxbx/> <w:Word 11KerningPairs/> <w:CachedColBalance/> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> <m:mathPr> <m:mathFont m:val="Cambria Math"/> <m:brkBin m:val="before"/> <m:brkBinSub m:val="--"/> <m:smallFrac m:val="off"/> <m:dispDef/> <m:lMargin m:val="0"/> <m:rMargin m:val="0"/> <m:defJc m:val="centerGroup"/> <m:wrapIndent m:val="1440"/> <m:intLim m:val="subSup"/> <m:naryLim m:val="undOvr"/> </m:mathPr></w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true" DefSemiHidden="true" DefQFormat="false" DefPriority="99" LatentStyleCount="267"> <w:LsdException Locked="false" Priority="0" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Normal"/> <w:LsdException Locked="false" Priority="9" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="heading 1"/> <w:LsdException Locked="false" Priority="9" QFormat=" true" Name="heading 2"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/> <w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/> <w:LsdException Locked="false" Priority="39" Name="toc 1"/> <w:LsdException Locked="false" Priority="39" Name="toc 2"/> <w:LsdException Locked="false" Priority="39" Name="toc 3"/> <w:LsdException Locked="false" Priority="39" Name="toc 4"/> <w:LsdException Locked="false" Priority="39" Name="toc 5"/> <w:LsdException Locked="false" Priority="39" Name="toc 6"/> <w:LsdException Locked="false" Priority="39" Name="toc 7"/> <w:LsdException Locked="false" Priority="39" Name="toc 8"/> <w:LsdException Locked="false" Priority="39" Name="toc 9"/> <w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/> <w:LsdException Locked="false" Priority="10" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Title"/> <w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/> <w:LsdException Locked="false" Priority="11" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/> <w:LsdException Locked="false" Priority="22" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Strong"/> <w:LsdException Locked="false" Priority="20" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/> <w:LsdException Locked="false" Priority="59" SemiHidden="false" UnhideWhenUsed="false" Name="Table Grid"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeh older Text"/> <w:LsdException Locked="false" Priority="1" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2"/> <w:LsdException Locked="false" Prio rity="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 1"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false " Name="Light List Accent 1"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 1"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/> <w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/> <w:LsdException Locked="false" Priority="34" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/> <w:LsdException Locked="false" Priority="29" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Quote"/> <w:LsdException Locked="false" Priority="30" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Intense Quote "/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 1"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 1"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 2"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 2"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 2"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Med ium Grid 1 Accent 2"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 2"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 2"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 3"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 3"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 3"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUse d="false" Name="Medium Grid 3 Accent 3"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 3"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 3"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 4"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 4"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 4"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhen Used="false" Name="Medium Shading 1 Accent 4"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 4"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 4"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 5"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 5"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 5"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/> <w:LsdException Locked="false" Priority="65" SemiHidd en="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/> <w:LsdException Locked="false" Priority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 5"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 5"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/> <w:LsdException Locked="false" Priority="60" SemiHidden="false" UnhideWhenUsed="false" Name="Light Shading Accent 6"/> <w:LsdException Locked="false" Priority="61" SemiHidden="false" UnhideWhenUsed="false" Name="Light List Accent 6"/> <w:LsdException Locked="false" Priority="62" SemiHidden="false" UnhideWhenUsed="false" Name="Light Grid Accent 6"/> <w:LsdException Locked="false" Priority="63" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/> <w:LsdException Locked="false" Priority="64" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/> <w:LsdException Locked="false" Priority="65" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/> <w:LsdException Locked="false" Priority="66" SemiHidden="false" UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/> <w:LsdException Locked="false" Pri ority="67" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/> <w:LsdException Locked="false" Priority="68" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/> <w:LsdException Locked="false" Priority="69" SemiHidden="false" UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/> <w:LsdException Locked="false" Priority="70" SemiHidden="false" UnhideWhenUsed="false" Name="Dark List Accent 6"/> <w:LsdException Locked="false" Priority="71" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/> <w:LsdException Locked="false" Priority="72" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful List Accent 6"/> <w:LsdException Locked="false" Priority="73" SemiHidden="false" UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/> <w:LsdException Locked="false" Priority="19" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/> <w:LsdException L ocked="false" Priority="21" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/> <w:LsdException Locked="false" Priority="31" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/> <w:LsdException Locked="false" Priority="32" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/> <w:LsdException Locked="false" Priority="33" SemiHidden="false" UnhideWhenUsed="false" QFormat="true" Name="Book Title"/> <w:LsdException Locked="false" Priority="37" Name="Bibliography"/> <w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/> </w:LatentStyles> </xml><![endif]--> <style> <!-- /* Font Definitions */ @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4; mso-font-charset:1; mso-generic-font-family:roman; mso-font-format:other; mso-font-pitch:variable; mso-font-signature:0 0 0 0 0 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:-1610611985 1073750139 0 0 159 0;} @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4; mso-font-charset:0; mso-generic-font-family:swiss; mso-font-pitch:variable; mso-font-signature:1627400839 -2147483648 8 0 66047 0;} @font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4; mso-font-charset:0; mso-generic-font-family:modern; mso-font-pitch:fixed; mso-font-signature:-1610611985 1073750091 0 0 159 0;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {mso-style-unhide:no; mso-style-qformat:yes; mso-style-parent:""; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:12.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:Calibri; color:black;} a:link, span.MsoHyperlink {mso-style-noshow:yes; mso-style-priority:99; color:blue; text-decoration:underline; text-underline:single;} a:visited, span.MsoHyperlinkFollowed {mso-style-noshow:yes; mso-style-priority:99; color:purple; text-decoration:underline; text-underline:single;} pre {mso-style-noshow:yes; mso-style-priority:99; mso-style-link:"HTML Vorformatiert Zchn"; margin:0cm; margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Courier New"; mso-fareast-font-family:Calibri; color:black;} tt {mso-style-noshow:yes; mso-style-priority:99; font-family:"Courier New"; mso-ascii-font-family:"Courier New"; mso-fareast-font-family:Calibri; mso-hansi-font-family:"Courier New"; mso-bidi-font-family:"Courier New";} span.HTMLVorformatiertZchn {mso-style-name:"HTML Vorformatiert Zchn"; mso-style-noshow:yes; mso-style-priority:99; mso-style-unhide:no; mso-style-locked:yes; mso-style-link:"HTML Vorformatiert"; font-family:Consolas; mso-ascii-font-family:Consolas; mso-fareast-font-family:Calibri; mso-hansi-font-family:Consolas; color:black;} span.E-MailFormatvorlage20 {mso-style-type:personal-reply; mso-style-noshow:yes; mso-style-unhide:no; mso-ansi-font-size:11.0pt; mso-bidi-font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-fareast-font-family:Calibri; mso-hansi-font-family:Calibri; mso-bidi-font-family:"Times New Roman"; color:#1F497D;} span.SpellE {mso-style-name:""; mso-spl-e:yes;} .MsoChpDefault {mso-style-type:export-only; mso-default-props:yes; font-size:10.0pt; mso-ansi-font-size:10.0pt; mso-bidi-font-size:10.0pt;} @page Section1 {size:612.0pt 792.0pt; margin:70.85pt 70.85pt 2.0cm 70.85pt; mso-header-margin:36.0pt; mso-footer-margin:36.0pt; mso-paper-source:0;} div.Section1 {page:Section1;} --> </style><!--[if gte mso 10]> <style> /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Normale Tabelle"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin:0cm; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif";} </style> <![endif]--><!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1" /> </o:shapelayout></xml><![endif]--> <div class="Section1"> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">Hi David, hi all!<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB"><o:p> </o:p></span></font></p> <p class="MsoNormal"><span class="SpellE"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">Mh</span></font></span><font face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">, I cannot find a <span class="SpellE">MPC_use_case_working_temp.axml</span> (where have you found it?); <o:p></o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">the latest is MPC_use_case_working27June.axml and it has no empty <span class="SpellE">SearchDatabase_ref</span>.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">But I think <span class="SpellE">SearchDatabase_ref</span> should be optional in the <Inputs> because of non-database searches.<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">Andy added <span class="SpellE">SearchDatabase_ref</span> and accession into <<span class="SpellE">DBSequence</span>> and there it <o:p></o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">makes sense to make them mandatory (because it is “<span class="SpellE">DB”sequence</span>).<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">It would be more human-readable to have:<o:p></o:p></span></font></p> <p class="MsoNormal"><tt><font color="black" face="Courier New" size="4"><span style="font-size: 13.5pt;" lang="EN-US"><<span class="SpellE">SpectraData</span> ref="SD_1"/></span></font></tt><font face="Courier New" size="4"><span style="font-size: 13.5pt; font-family: "Courier New";" lang="EN-US"><br> <tt><font face="Courier New"><span style=""><<span class="SpellE">SearchDatabase</span> ref="<span class="SpellE">SDB_SwissProt</span>"/><o:p></o:p></span></font></tt></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">but I agree with Andy it maybe more <span class="SpellE">FuGe</span> and more <span class="SpellE">validatable</span> to have<br> <<span class="SpellE">SpectraData</span> <span class="SpellE">SpectraData_ref</span>="SD_1"/><br> <<span class="SpellE">SearchDatabase</span> <span class="SpellE">SearchDatabase_ref</span>="<span class="SpellE">SDB_SwissProt</span>"/><br style=""> <!--[if !supportLineBreakNewLine]--><br style=""> <!--[endif]--><o:p></o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">(For some seconds I thought to move them to attributes but we have potentially more than one.)<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB"><span style=""> </span>Bye<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB"><span style=""> </span>Martin<o:p></o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB"><o:p> </o:p></span></font></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p class="MsoNormal"><b><font color="black" face="Tahoma" size="2"><span style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext; font-weight: bold;">Von:</span></font></b><font color="black" face="Tahoma" size="2"><span style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;"> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a> [<a class="moz-txt-link-freetext" href="mailto:psi...@li...">mailto:psi...@li...</a>] <b><span style="font-weight: bold;">Im Auftrag von </span></b>David Creasy<br> <b><span style="font-weight: bold;">Gesendet:</span></b> Wednesday, July 30, 2008 6:00 AM<br> <b><span style="font-weight: bold;">An:</span></b> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a><br> <b><span style="font-weight: bold;">Betreff:</span></b> [Psidev-pi-dev] Changes to schema on 29th July 2008<o:p></o:p></span></font></p> </div> </div> <p class="MsoNormal"><font color="black" face="Times New Roman" size="3"><span style="font-size: 12pt;"><o:p> </o:p></span></font></p> <p class="MsoNormal"><font color="black" face="Times New Roman" size="3"><span style="font-size: 12pt;">Hi Andy,<br> <br> Thanks for the recent changes to the schema. You asked a question:<br> </span></font><font face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">Added</span></font><span style="" lang="EN-GB"> </span><font color="#000096"><span style="color: rgb(0, 0, 150);" lang="EN-GB"><xsd:attribute</span></font><font color="#f5844c"><span style="color: rgb(245, 132, 76);" lang="EN-GB"> name</span></font><font color="#ff8040"><span style="color: rgb(255, 128, 64);" lang="EN-GB">=</span></font><font color="#993300"><span style="color: rgb(153, 51, 0);" lang="EN-GB">"SearchDatabase_ref"</span></font><font color="#f5844c"><span style="color: rgb(245, 132, 76);" lang="EN-GB"> type</span></font><font color="#ff8040"><span style="color: rgb(255, 128, 64);" lang="EN-GB">=</span></font><font color="#993300"><span style="color: rgb(153, 51, 0);" lang="EN-GB">"xsd:string"</span></font><font color="#000096"><span style="color: rgb(0, 0, 150);" lang="EN-GB">/> </span></font><font face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">to DBSequence and </span></font><font color="#000096"><span style="color: rgb(0, 0, 150);" lang="EN-GB"><xsd:attribute</span></font><font color="#f5844c"><span style="color: rgb(245, 132, 76);" lang="EN-GB"> name</span></font><font color="#ff8040"><span style="color: rgb(255, 128, 64);" lang="EN-GB">=</span></font><font color="#993300"><span style="color: rgb(153, 51, 0);" lang="EN-GB">"accession"</span></font><font color="#f5844c"><span style="color: rgb(245, 132, 76);" lang="EN-GB"> type</span></font><font color="#ff8040"><span style="color: rgb(255, 128, 64);" lang="EN-GB">=</span></font><font color="#993300"><span style="color: rgb(153, 51, 0);" lang="EN-GB">"xsd:string"</span></font><font color="#000096"><span style="color: rgb(0, 0, 150);" lang="EN-GB">/> </span></font><font face="Calibri" size="2"><span style="font-size: 11pt; font-family: "Calibri","sans-serif";" lang="EN-GB">should either/both of these be set as required?<u><br> <br> </u>I think that the answer has to be yes. However, in the MPC_use_case_working_temp.axml file, there is: SearchDatabase_ref="", so maybe Martin could comment?<br> <br> </span></font><span style=""><br> And under<br> </span><tt><font face="Courier New" size="4"><span style="font-size: 13.5pt;"> <AnalysisCollection></span></font></tt><font face="Courier New" size="4"><span style="font-size: 13.5pt; font-family: "Courier New";"><br> <tt><font face="Courier New"><span style=""> <SpectrumIdentification ... ></span></font></tt><br> </span></font><font face="Courier New" size="2"><span style="font-size: 10pt; font-family: "Courier New";"><br> </span></font><span style="">you made the change from<br> </span><tt><font face="Courier New" size="4"><span style="font-size: 13.5pt;"> <SpectraData_ref id="SD_1"/></span></font></tt><font face="Courier New" size="4"><span style="font-size: 13.5pt; font-family: "Courier New";"><br> <tt><font face="Courier New"><span style=""> <SearchDatabase_ref id="SDB_SwissProt"/></span></font></tt><br> </span></font><span style="">to <br> </span><tt><font face="Courier New" size="4"><span style="font-size: 13.5pt;"> <SpectraData_ref ref="SD_1"/></span></font></tt><font face="Courier New" size="4"><span style="font-size: 13.5pt; font-family: "Courier New";"><br> <tt><font face="Courier New"><span style=""> <SearchDatabase SearchDatabase_ref="SDB_SwissProt"/></span></font></tt><br> </span></font><span style=""><br> which maybe isn't as consistent as you intended - or maybe I have missed the point?<br> There is also still a : <br> <SpectrumIdentificationList_ref ref=...<br> <br> cvList is fine, except it can only be a list of exactly 1 item at the moment... (I've not fixed the schema).<br> <br> I didn't notice it last time, but the schema doesn't validate with xerces:<br> Need to change (in 2 places):<br> <xsd:pattern value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?-]{1}"/><br> to<br> <xsd:pattern value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?\-]{1}"/><br> (I've not updated the schema in svn - or checked to see if xerces is correct).<br> <br> Thanks,<br> <br> David<br> <br style=""> <!--[if !supportLineBreakNewLine]--><br style=""> <!--[endif]--><o:p></o:p></span></p> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;">-- <o:p></o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;">David Creasy<o:p></o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;">Matrix Science<o:p></o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;">64 Baker Street<o:p></o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;">London W1U 7GB, UK<o:p></o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;">Tel: +44 (0)20 7486 1050<o:p></o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;">Fax: +44 (0)20 7224 1344<o:p></o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;"><a moz-do-not-send="true" href="mailto:dc...@ma...">dc...@ma...</a><o:p></o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;"><a moz-do-not-send="true" href="http://www.matrixscience.com">http://www.matrixscience.com</a><o:p></o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;">Matrix Science Ltd. is registered in England and Wales<o:p></o:p></span></font></pre> <pre><font color="black" face="Courier New" size="2"><span style="font-size: 10pt;">Company number 3533898<o:p></o:p></span></font></pre> </div> </div> </blockquote> <br> <pre class="moz-signature" cols="72">-- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 <a class="moz-txt-link-abbreviated" href="mailto:dc...@ma...">dc...@ma...</a> <a class="moz-txt-link-freetext" href="http://www.matrixscience.com">http://www.matrixscience.com</a> Matrix Science Ltd. is registered in England and Wales Company number 3533898</pre> </body> </html> |
From: Martin E. <mar...@ru...> - 2008-07-31 11:35:29
|
Hi Andy, hi all, > As I see it SpectrumIdentificationItem is intended only for identifying Peptides. I didn't fully understand Yes, I agree; but I understood Pierre-Alains question as a hint, that top-down identifies protein sequences, so we would have to double information, referencing a protein sequence as <Peptide> from <SpectrumIdentificationItem> and then the same sequence as <DBSequence> from <ProteinDetectionResult>. But I might be wrong and we definitely have to wait for a top-down instance doc. > Looking at it again, the model of SpectrumIdentificationItem is a little hard to understand and we could > probably improve it. This is because SpectrumIdentificationItem has both Peptide_ref (i.e. a reference to a > Peptide sequence and its mods) plus PeptideEvidence which is a reference to the part of the ProteinSequence > this Peptide was derived from. The PeptideEvidence lines could be shifted up to <Peptide> and renamed e.g. > SourceProtein - this would save some space and would appear to be a logically more sensible model... You mean shifting <PeptideEvidence> under <Peptide> in the SequenceCollection? But missedcleavages is only well-defined in relation to a search (using an enzyme)! > I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be > mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? Yes, if <PeptideEvidence> stays optional. > > >4) all mass values (sequenceMass, calculatedMassToCharge, > > experimentalMassToCharge, > > >are not specified whether monoisotopic or averaged. > > >Do we assume that averaged does not exist anymore? > > No, we decided to have only one type of masses in the whole analysisXML. > > But I cannot find a note for that or a schema attribute... I will add an issue for that. > > It is a database search parameter: > <AdditionalSearchParams> > <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> Yes, it is, but in case we have more than one SpectrumIdentification, that could be conflicting. http://code.google.com/p/psi-pi/issues/detail?id=37 bye Martin > > > -----Original Message----- > > From: psi...@li... [mailto:psidev-pi-dev- > > bo...@li...] On Behalf Of Martin Eisenacher > > Sent: 30 July 2008 13:05 > > To: 'Pierre-Alain Binz' > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > > > Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: > > > > >2nd July, 2008: > > >a couple of questions, just to make sure: > > > > >1) in case of top-down approach, do we have to duplicate sequenceCollection > > information? > > I hope not, by referencing the same identifier. > > > > >as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > > element > > >(and not to a DBSequence), identification is obligatory a Peptide? > > At the moment I think it's possible to directly reference a DBSeq. At the time the > > foreign key definitions are implemented we can forbid that. > > But we should have in mind, that a peptide is a sequence plus modifications, so if > > top-down > > identifies only a sequence, we should allow that and if top-down identifies with > > mods, > > we should forbid that. > > It would be quite helpful to have a top-down instance doc. To check > > whether our thoughts are really deep enough... > > > > >2) and what about spectral library searches, do we have to have Peptide > > >elements with possibly undefined explicit sequences to refer to > > >from the SpectrumIdentificationResult (because non peptidic, or because not > > identified > > >but good spectrum) > > At the moment the sequence element can be empty or even left out. > > User or CV params are allowed. > > How do they report results in spectral lib search if they identify non-peptidic or > > unidentified? > > We need CV terms for that... > > > > >3) in the Peptide element, the Modifications are defined in a much more > > >detailed manner than in ModificationParams (PSI-MOD is there for > > >instance). Does this simply mean that The ModificationParams codes > > >the search engine settings and the Peptide includes the formal PSI > > >definition of the Mod? And the only reference is the ModName value? > > I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > > for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > > or > > they can define their own. > > > > >4) all mass values (sequenceMass, calculatedMassToCharge, > > experimentalMassToCharge, > > >are not specified whether monoisotopic or averaged. > > >Do we assume that averaged does not exist anymore? > > No, we decided to have only one type of masses in the whole analysisXML. > > But I cannot find a note for that or a schema attribute... I will add an issue for that. > > > > > > >5) is sequenceMass the mass value with/without the mods? If with, the > > >name might be missleading (peptideMass would be more appropriate) > > It is indeed the mass of the sequence without mods. > > THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > > > > >6) in case the DBSequence is nucleotide, is there a tag for saying > > >this? (NB: MS on nucleotide molecules can be performed and analysed, > > >not only MS on AA sequences that are interpreting nucleotide sequences). > > >Or do we neglect MS experiments done on nucleotide molecules (and by > > >the way on glycans...) and only represent the DBSequences as AA > > >sequences (frame translations)? (and what about glycans?) > > >Probaly can be solved if one can replace SequenceCollection by > > >something else if needed (SmallMoleculeCollection, GlycanCollection, > > >MoleculeCollection)... but the validator might not like this. > > Mh, these can be extensions, I think they are not possible at the moment. > > But a tag for the type can indeed be useful, it could be a CV param. > > I will create an issue for that. > > > > >7) in case that DBSequence is nucleotide, do we represent the > > >Peptide as AA sequence in case of MS done on proteins? > > I hope the following answers this: > > > > <DBSequence> is the nucleotide seq from the nucleotide DB, > > <Peptide> is the identified amino acid sequence plus mods (without any translation > > frame or something). > > <PeptideEvidence> contains the DBSequence_Ref together with a frame and a > > TranslationTable_Ref attribute. > > (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB > > case.) > > If a protein detection is performed, there are <PeptideHypothesis> elements > > referencing > > PeptideEvidence elements from SpectrumIdentificationItem sections. > > > > > > > > Bye > > Martin > > > > > > > > > > David Creasy wrote: > > Thanks Andy, > > > > I've added an updated example document to SVN: > > http://code.google.com/p/psi- > > pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 > > 1350.xml > > > > Problem is that we have now removed the main point of these recent changes > > which was to add the decoy flag... I think > > that we need to add isDecoy to SpectrumIdentificationItem. > > > > And yes, I suspect that we should go back to using the > > ConceptualMoleculeCollection > > Um, and since we've not actually ended up adding anything to DBSequence... we > > haven't actually achieved anything? > > I think we need to discuss this again at the next telecon. > > > > David > > > > Jones, Andy wrote: > > Hi all, > > > > Ive updated the schema in SVN with the following main changes: > > > > PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the > > call (simple mappings to proteins are done > > at this level) > > Added DBSequence that should be used instead of Sequence (following some of > > the discussion below) > > Created a new collection class SequenceCollection (rather than > > ConceptualMoleculeCollection) so that only references can > > be given to DBSequence and Peptide > > In fact, Im not sure if this is sensible since it prevents other types of > > ConceptualMolecule being added later... to > > discuss > > In FuGE on cvParam, the value attribute is no longer mandatory > > > > Ive added a simple example that validates under > > examples\schema_usecase_examples\working27June > > > > Feel free to mail me any changes to make on Monday, > > Cheers > > Andy > > > > > > > > From: psi...@li... [mailto:psidev-pi-dev- > > bo...@li...] On Behalf Of > > Jones, Andy > > Sent: 27 June 2008 16:24 > > To: Angel Pizarro > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > > > I think Angels response below might not have made it round the list yet. > > > > I tend to agree that isDecoy is redundant information and perhaps this is not the > > best place to encode semantic > > information. An alternative would be to have a parameter, say on > > SpectrumIdentification for cvParam = decoy_string > > value = Rev. This would be a more compact representation and we would not > > have to add what is quite a specific > > attribute type (isDecoy) to Sequence. > > > > > > > > From: an...@it... [mailto:an...@it...] On Behalf Of Angel > > Pizarro > > Sent: 27 June 2008 15:59 > > To: Jones, Andy > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > > > my 2¢ : > > You need to be able to extend this to all molecule types, or am I missing the point > > of this thread, and you mean that > > this would be a suclass of the conceptual molecule element? > > > > Second, and this is is tangentially related, but are decoy sequences really a > > problem we should be putting our effort > > into? Is it in our domain to encode semantic information about a sequence, and > > possibly relating reported sequences as > > part of our schema? > > On a personal level I could care less if "isDecoy" is an attribute or not, but the > > temptation then would be for folks to > > encode the same accession for two different sequences, effectively making the > > primary key of the sequence object > > (accession, isDecoy) > > > > > > Do we want to go there? > > On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> > > wrote: > > So how about include length as an attribute and then let all other things go in the > > CV (pI, mass, etc.)? > > > > > > > > From: Jones, Andy > > Sent: 27 June 2008 14:54 > > To: 'David Creasy' > > Subject: RE: [Psidev-pi-dev] Representing Sequences > > > > id and name are standard for all elements that inherit from FuGE identifiable this > > is perhaps a separate discussion as > > to whether the optional name attribute should be there. > > > > I agree that length may be useful is this just an integer value with no unit? > > Yes, I think so. > > I'm less sure about pI and mass since mass at least can be calculated very simply > > Only if you have the sequence... (we have residue masses in the file). > > > > > > , and pI values (in my opinion) are pretty inaccurate and fairly meaningless > > Scandalous! (I happen to agree, but now some people will never speak to either of > > us ever again). > > > > The main problem with mass and pI is that these are 'irrelevant' if the sequence is > > nuleic acid rather than residues. > > Why not just allow CV there? We can share the same CV as the PEFF format, > > which includes, taxonomy, sequence type, gene > > ID, and lots of wonderful other things? > > > > > > unless someone can convince me otherwise? > > Cheers > > Andy > > > > > > From: David Creasy [mailto:dc...@ma...] > > Sent: 27 June 2008 14:51 > > To: Jones, Andy > > Cc: psi...@li... > > Subject: Re: [Psidev-pi-dev] Representing Sequences > > > > Hi Andy, > > > > length may be useful, because some people won't want to output the actual > > sequence for space reasons. The other things > > we wanted to add before were pI and mass. > > Why do we want name? Is this for, say, a description line? > > (Also, identifier -> id?) > > > > David > > > > Jones, Andy wrote: > > Hi all, > > > > It was decided on the call that we would like to flag that Sequences in the > > ConceptualMoleculeCollection should have a > > Boolean attribute to capture if they are decoy sequences. At the moment we are > > using the FuGE:Sequence element. I don't > > really want to add another attribute to this (it's less problematic cutting down FuGE > > than adding new things), so I'm > > wondering if we should define our own Sequence type in AnalysisXML. This > > would also allow us to choose exactly the > > relevant attributes. At the moment, Sequence can have all of the following: > > > > <pf:Sequence isCircular="true" sequence="String" length="0" > > isApproximateLength="true" > > SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" > > name="String"> > > > > Several of these attributes were created to represent concepts that probably will > > never be required or implemented in > > AnalysisXML. How about the following: > > > > <DBSequence identifier = "" name = "" isDecoy = "true"> > > <seq>MCTMG...</seq> > > <pf:DatabaseReference Database_ref="" > > accession="Rev_IPI00013808.1"/> > > </DBSequence> > > > > Are any of the other attributes on Sequence actually required? I'll post a new > > version of the schema with other changes > > WRT to PeptideEvidence shortly, > > Cheers > > Andy > > > > > > > > > > > > > > > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > > > -- > > Angel Pizarro > > Director, ITMAT Bioinformatics Facility > > 806 Biological Research Building > > 421 Curie Blvd. > > Philadelphia, PA 19104-6160 > > 215-573-3736 > > ________________________________________ > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > ________________________________________ > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > > > -- > > David Creasy > > Matrix Science > > 64 Baker Street > > London W1U 7GB, UK > > Tel: +44 (0)20 7486 1050 > > Fax: +44 (0)20 7224 1344 > > > > dc...@ma... > > http://www.matrixscience.com > > > > Matrix Science Ltd. is registered in England and Wales > > Company number 3533898 > > > > ________________________________________ > > > > ------------------------------------------------------------------------- > > Check out the new SourceForge.net Marketplace. > > It's the best place to buy or sell services for > > just about anything Open Source. > > http://sourceforge.net/services/buy/index.php > > > > ________________________________________ > > > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > > Build the coolest Linux based applications with Moblin SDK & win great prizes > > Grand prize is a trip for two to an Open Source event anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Psidev-pi-dev mailing list > > Psi...@li... > > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Martin E. <mar...@ru...> - 2008-07-31 10:49:15
|
Hi David, hi all! Mh, I cannot find a MPC_use_case_working_temp.axml (where have you found it?); the latest is MPC_use_case_working27June.axml and it has no empty SearchDatabase_ref. But I think SearchDatabase_ref should be optional in the <Inputs> because of non-database searches. Andy added SearchDatabase_ref and accession into <DBSequence> and there it makes sense to make them mandatory (because it is “DB”sequence). It would be more human-readable to have: <SpectraData ref="SD_1"/> <SearchDatabase ref="SDB_SwissProt"/> but I agree with Andy it maybe more FuGe and more validatable to have <SpectraData SpectraData_ref="SD_1"/> <SearchDatabase SearchDatabase_ref="SDB_SwissProt"/> (For some seconds I thought to move them to attributes but we have potentially more than one.) Bye Martin Von: psi...@li... [mailto:psi...@li...] Im Auftrag von David Creasy Gesendet: Wednesday, July 30, 2008 6:00 AM An: psi...@li... Betreff: [Psidev-pi-dev] Changes to schema on 29th July 2008 Hi Andy, Thanks for the recent changes to the schema. You asked a question: Added <xsd:attribute name="SearchDatabase_ref" type="xsd:string"/> to DBSequence and <xsd:attribute name="accession" type="xsd:string"/> should either/both of these be set as required? I think that the answer has to be yes. However, in the MPC_use_case_working_temp.axml file, there is: SearchDatabase_ref="", so maybe Martin could comment? And under <AnalysisCollection> <SpectrumIdentification ... > you made the change from <SpectraData_ref id="SD_1"/> <SearchDatabase_ref id="SDB_SwissProt"/> to <SpectraData_ref ref="SD_1"/> <SearchDatabase SearchDatabase_ref="SDB_SwissProt"/> which maybe isn't as consistent as you intended - or maybe I have missed the point? There is also still a : <SpectrumIdentificationList_ref ref=... cvList is fine, except it can only be a list of exactly 1 item at the moment... (I've not fixed the schema). I didn't notice it last time, but the schema doesn't validate with xerces: Need to change (in 2 places): <xsd:pattern value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?-]{1}"/> to <xsd:pattern value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?\-]{1}"/> (I've not updated the schema in svn - or checked to see if xerces is correct). Thanks, David -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Jones, A. <And...@li...> - 2008-07-30 13:36:09
|
Hi all, > >as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > element > >(and not to a DBSequence), identification is obligatory a Peptide? > At the moment I think it's possible to directly reference a DBSeq. At the time the > foreign key definitions are implemented we can forbid that. > But we should have in mind, that a peptide is a sequence plus modifications, so if > top-down > identifies only a sequence, we should allow that and if top-down identifies with > mods, > we should forbid that. > It would be quite helpful to have a top-down instance doc. To check > whether our thoughts are really deep enough... As I see it SpectrumIdentificationItem is intended only for identifying Peptides. I didn't fully understand Martin's response about Mods. We have to focus on what use cases we state we are supporting... Looking at it again, the model of SpectrumIdentificationItem is a little hard to understand and we could probably improve it. This is because SpectrumIdentificationItem has both Peptide_ref (i.e. a reference to a Peptide sequence and its mods) plus PeptideEvidence which is a reference to the part of the ProteinSequence this Peptide was derived from. The PeptideEvidence lines could be shifted up to <Peptide> and renamed e.g. SourceProtein - this would save some space and would appear to be a logically more sensible model... I notice also that there is a small error in the schema in that on PeptideEvidence DBSequence_ref should be mandatory (and it is missing from the instance docs). I can fix this if there is agreement on this? > >2) and what about spectral library searches, do we have to have Peptide > >elements with possibly undefined explicit sequences to refer to > >from the SpectrumIdentificationResult (because non peptidic, or because not > identified > >but good spectrum) > At the moment the sequence element can be empty or even left out. > User or CV params are allowed. > How do they report results in spectral lib search if they identify non-peptidic or > unidentified? > We need CV terms for that... I don't quite get this point. What is reported from a spectral library search if it is unidentified - how does this differ from no result? In terms of non-peptidic, are we talking about identifying small molecules? This is analysisXML version 2 :-) > >3) in the Peptide element, the Modifications are defined in a much more > >detailed manner than in ModificationParams (PSI-MOD is there for > >instance). Does this simply mean that The ModificationParams codes > >the search engine settings and the Peptide includes the formal PSI > >definition of the Mod? And the only reference is the ModName value? > I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > or > they can define their own. Mods proposal coming from Angel. > >4) all mass values (sequenceMass, calculatedMassToCharge, > experimentalMassToCharge, > >are not specified whether monoisotopic or averaged. > >Do we assume that averaged does not exist anymore? > No, we decided to have only one type of masses in the whole analysisXML. > But I cannot find a note for that or a schema attribute... I will add an issue for that. It is a database search parameter: <AdditionalSearchParams> <pf:cvParam accession="PRIDE:0000162" name="Mass value type setting monoisotopic" cvRef="PRIDE"/> > >6) in case the DBSequence is nucleotide, is there a tag for saying > >this? DBSequence can have cvParams, so we could easily add a sequenceType = Nucleic acid CV term. Cheers Andy > -----Original Message----- > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of Martin Eisenacher > Sent: 30 July 2008 13:05 > To: 'Pierre-Alain Binz' > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: > > >2nd July, 2008: > >a couple of questions, just to make sure: > > >1) in case of top-down approach, do we have to duplicate sequenceCollection > information? > I hope not, by referencing the same identifier. > > >as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide > element > >(and not to a DBSequence), identification is obligatory a Peptide? > At the moment I think it's possible to directly reference a DBSeq. At the time the > foreign key definitions are implemented we can forbid that. > But we should have in mind, that a peptide is a sequence plus modifications, so if > top-down > identifies only a sequence, we should allow that and if top-down identifies with > mods, > we should forbid that. > It would be quite helpful to have a top-down instance doc. To check > whether our thoughts are really deep enough... > > >2) and what about spectral library searches, do we have to have Peptide > >elements with possibly undefined explicit sequences to refer to > >from the SpectrumIdentificationResult (because non peptidic, or because not > identified > >but good spectrum) > At the moment the sequence element can be empty or even left out. > User or CV params are allowed. > How do they report results in spectral lib search if they identify non-peptidic or > unidentified? > We need CV terms for that... > > >3) in the Peptide element, the Modifications are defined in a much more > >detailed manner than in ModificationParams (PSI-MOD is there for > >instance). Does this simply mean that The ModificationParams codes > >the search engine settings and the Peptide includes the formal PSI > >definition of the Mod? And the only reference is the ModName value? > I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms > for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV > or > they can define their own. > > >4) all mass values (sequenceMass, calculatedMassToCharge, > experimentalMassToCharge, > >are not specified whether monoisotopic or averaged. > >Do we assume that averaged does not exist anymore? > No, we decided to have only one type of masses in the whole analysisXML. > But I cannot find a note for that or a schema attribute... I will add an issue for that. > > > >5) is sequenceMass the mass value with/without the mods? If with, the > >name might be missleading (peptideMass would be more appropriate) > It is indeed the mass of the sequence without mods. > THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > > >6) in case the DBSequence is nucleotide, is there a tag for saying > >this? (NB: MS on nucleotide molecules can be performed and analysed, > >not only MS on AA sequences that are interpreting nucleotide sequences). > >Or do we neglect MS experiments done on nucleotide molecules (and by > >the way on glycans...) and only represent the DBSequences as AA > >sequences (frame translations)? (and what about glycans?) > >Probaly can be solved if one can replace SequenceCollection by > >something else if needed (SmallMoleculeCollection, GlycanCollection, > >MoleculeCollection)... but the validator might not like this. > Mh, these can be extensions, I think they are not possible at the moment. > But a tag for the type can indeed be useful, it could be a CV param. > I will create an issue for that. > > >7) in case that DBSequence is nucleotide, do we represent the > >Peptide as AA sequence in case of MS done on proteins? > I hope the following answers this: > > <DBSequence> is the nucleotide seq from the nucleotide DB, > <Peptide> is the identified amino acid sequence plus mods (without any translation > frame or something). > <PeptideEvidence> contains the DBSequence_Ref together with a frame and a > TranslationTable_Ref attribute. > (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB > case.) > If a protein detection is performed, there are <PeptideHypothesis> elements > referencing > PeptideEvidence elements from SpectrumIdentificationItem sections. > > > > Bye > Martin > > > > > David Creasy wrote: > Thanks Andy, > > I've added an updated example document to SVN: > http://code.google.com/p/psi- > pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F00 > 1350.xml > > Problem is that we have now removed the main point of these recent changes > which was to add the decoy flag... I think > that we need to add isDecoy to SpectrumIdentificationItem. > > And yes, I suspect that we should go back to using the > ConceptualMoleculeCollection > Um, and since we've not actually ended up adding anything to DBSequence... we > haven't actually achieved anything? > I think we need to discuss this again at the next telecon. > > David > > Jones, Andy wrote: > Hi all, > > I’ve updated the schema in SVN with the following main changes: > > PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the > call (simple mappings to proteins are done > at this level) > Added DBSequence that should be used instead of Sequence (following some of > the discussion below) > Created a new collection class SequenceCollection (rather than > ConceptualMoleculeCollection) so that only references can > be given to DBSequence and Peptide > In fact, I’m not sure if this is sensible since it prevents other types of > ConceptualMolecule being added later... to > discuss > In FuGE on cvParam, the value attribute is no longer mandatory > > I’ve added a simple example that validates under > examples\schema_usecase_examples\working27June > > Feel free to mail me any changes to make on Monday, > Cheers > Andy > > > > From: psi...@li... [mailto:psidev-pi-dev- > bo...@li...] On Behalf Of > Jones, Andy > Sent: 27 June 2008 16:24 > To: Angel Pizarro > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > I think Angel’s response below might not have made it round the list yet. > > I tend to agree that isDecoy is redundant information and perhaps this is not the > best place to encode semantic > information. An alternative would be to have a parameter, say on > SpectrumIdentification for cvParam = “decoy_string” > value = “Rev”. This would be a more compact representation and we would not > have to add what is quite a specific > attribute type (isDecoy) to Sequence. > > > > From: an...@it... [mailto:an...@it...] On Behalf Of Angel > Pizarro > Sent: 27 June 2008 15:59 > To: Jones, Andy > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] FW: Representing Sequences > > my 2¢ : > You need to be able to extend this to all molecule types, or am I missing the point > of this thread, and you mean that > this would be a suclass of the conceptual molecule element? > > Second, and this is is tangentially related, but are decoy sequences really a > problem we should be putting our effort > into? Is it in our domain to encode semantic information about a sequence, and > possibly relating reported sequences as > part of our schema? > On a personal level I could care less if "isDecoy" is an attribute or not, but the > temptation then would be for folks to > encode the same accession for two different sequences, effectively making the > primary key of the sequence object > (accession, isDecoy) > > > Do we want to go there? > On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> > wrote: > So how about include length as an attribute and then let all other things go in the > CV (pI, mass, etc.)? > > > > From: Jones, Andy > Sent: 27 June 2008 14:54 > To: 'David Creasy' > Subject: RE: [Psidev-pi-dev] Representing Sequences > > id and name are standard for all elements that inherit from FuGE identifiable – this > is perhaps a separate discussion as > to whether the optional name attribute should be there. > > I agree that length may be useful – is this just an integer value with no unit? > Yes, I think so. > I'm less sure about pI and mass since mass at least can be calculated very simply > Only if you have the sequence... (we have residue masses in the file). > > > , and pI values (in my opinion) are pretty inaccurate and fairly meaningless > Scandalous! (I happen to agree, but now some people will never speak to either of > us ever again). > > The main problem with mass and pI is that these are 'irrelevant' if the sequence is > nuleic acid rather than residues. > Why not just allow CV there? We can share the same CV as the PEFF format, > which includes, taxonomy, sequence type, gene > ID, and lots of wonderful other things? > > > – unless someone can convince me otherwise? > Cheers > Andy > > > From: David Creasy [mailto:dc...@ma...] > Sent: 27 June 2008 14:51 > To: Jones, Andy > Cc: psi...@li... > Subject: Re: [Psidev-pi-dev] Representing Sequences > > Hi Andy, > > length may be useful, because some people won't want to output the actual > sequence for space reasons. The other things > we wanted to add before were pI and mass. > Why do we want name? Is this for, say, a description line? > (Also, identifier -> id?) > > David > > Jones, Andy wrote: > Hi all, > > It was decided on the call that we would like to flag that Sequences in the > ConceptualMoleculeCollection should have a > Boolean attribute to capture if they are decoy sequences. At the moment we are > using the FuGE:Sequence element. I don't > really want to add another attribute to this (it's less problematic cutting down FuGE > than adding new things), so I'm > wondering if we should define our own Sequence type in AnalysisXML. This > would also allow us to choose exactly the > relevant attributes. At the moment, Sequence can have all of the following: > > <pf:Sequence isCircular="true" sequence="String" length="0" > isApproximateLength="true" > SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" > name="String"> > > Several of these attributes were created to represent concepts that probably will > never be required or implemented in > AnalysisXML. How about the following: > > <DBSequence identifier = "" name = "" isDecoy = "true"> > <seq>MCTMG...</seq> > <pf:DatabaseReference Database_ref="" > accession="Rev_IPI00013808.1"/> > </DBSequence> > > Are any of the other attributes on Sequence actually required? I'll post a new > version of the schema with other changes > WRT to PeptideEvidence shortly, > Cheers > Andy > > > > > > > > > > > > ________________________________________ > > > > > > > > > > > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > > > > > > > ________________________________________ > > > > > > > > > > > > > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > > > > ________________________________________ > > > > > > > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > > > > > ________________________________________ > > > > > > > > > > > > > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > -- > Angel Pizarro > Director, ITMAT Bioinformatics Facility > 806 Biological Research Building > 421 Curie Blvd. > Philadelphia, PA 19104-6160 > 215-573-3736 > ________________________________________ > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > ________________________________________ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > > ________________________________________ > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > > ________________________________________ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Martin E. <mar...@ru...> - 2008-07-30 12:05:21
|
Hi Pierre-Alain, quite old posting, but I saw no answer yet, so I will try: >2nd July, 2008: >a couple of questions, just to make sure: >1) in case of top-down approach, do we have to duplicate sequenceCollection information? I hope not, by referencing the same identifier. >as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide element >(and not to a DBSequence), identification is obligatory a Peptide? At the moment I think it's possible to directly reference a DBSeq. At the time the foreign key definitions are implemented we can forbid that. But we should have in mind, that a peptide is a sequence plus modifications, so if top-down identifies only a sequence, we should allow that and if top-down identifies with mods, we should forbid that. It would be quite helpful to have a top-down instance doc. To check whether our thoughts are really deep enough... >2) and what about spectral library searches, do we have to have Peptide >elements with possibly undefined explicit sequences to refer to >from the SpectrumIdentificationResult (because non peptidic, or because not identified >but good spectrum) At the moment the sequence element can be empty or even left out. User or CV params are allowed. How do they report results in spectral lib search if they identify non-peptidic or unidentified? We need CV terms for that... >3) in the Peptide element, the Modifications are defined in a much more >detailed manner than in ModificationParams (PSI-MOD is there for >instance). Does this simply mean that The ModificationParams codes >the search engine settings and the Peptide includes the formal PSI >definition of the Mod? And the only reference is the ModName value? I think that has changed meanwhile, in the MPC use case I used PSI-MOD terms for both. If a search engine has its "own" mods, we need CV for that in PSI-PI CV or they can define their own. >4) all mass values (sequenceMass, calculatedMassToCharge, experimentalMassToCharge, >are not specified whether monoisotopic or averaged. >Do we assume that averaged does not exist anymore? No, we decided to have only one type of masses in the whole analysisXML. But I cannot find a note for that or a schema attribute... I will add an issue for that. >5) is sequenceMass the mass value with/without the mods? If with, the >name might be missleading (peptideMass would be more appropriate) It is indeed the mass of the sequence without mods. THAT is described in http://code.google.com/p/psi-pi/wiki/NotesForFocumentation >6) in case the DBSequence is nucleotide, is there a tag for saying >this? (NB: MS on nucleotide molecules can be performed and analysed, >not only MS on AA sequences that are interpreting nucleotide sequences). >Or do we neglect MS experiments done on nucleotide molecules (and by >the way on glycans...) and only represent the DBSequences as AA >sequences (frame translations)? (and what about glycans?) >Probaly can be solved if one can replace SequenceCollection by >something else if needed (SmallMoleculeCollection, GlycanCollection, >MoleculeCollection)... but the validator might not like this. Mh, these can be extensions, I think they are not possible at the moment. But a tag for the type can indeed be useful, it could be a CV param. I will create an issue for that. >7) in case that DBSequence is nucleotide, do we represent the >Peptide as AA sequence in case of MS done on proteins? I hope the following answers this: <DBSequence> is the nucleotide seq from the nucleotide DB, <Peptide> is the identified amino acid sequence plus mods (without any translation frame or something). <PeptideEvidence> contains the DBSequence_Ref together with a frame and a TranslationTable_Ref attribute. (The Peptide_Ref is done in SpectrumIdentificationItem as in the amino acid DB case.) If a protein detection is performed, there are <PeptideHypothesis> elements referencing PeptideEvidence elements from SpectrumIdentificationItem sections. Bye Martin David Creasy wrote: Thanks Andy, I've added an updated example document to SVN: http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml Problem is that we have now removed the main point of these recent changes which was to add the decoy flag... I think that we need to add isDecoy to SpectrumIdentificationItem. And yes, I suspect that we should go back to using the ConceptualMoleculeCollection Um, and since we've not actually ended up adding anything to DBSequence... we haven't actually achieved anything? I think we need to discuss this again at the next telecon. David Jones, Andy wrote: Hi all, Ive updated the schema in SVN with the following main changes: PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) Added DBSequence that should be used instead of Sequence (following some of the discussion below) Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide In fact, Im not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss In FuGE on cvParam, the value attribute is no longer mandatory Ive added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angels response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = decoy_string value = Rev. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev |
From: Jones, A. <And...@li...> - 2008-07-30 08:24:04
|
Hi David, There is also still a : <SpectrumIdentificationList_ref ref=... Okay we should have a quick discussion about this on Thursday. The rule used in FuGE is that the element name is whatever you want, but the attribute that does the reference is called: objectBeingReferenced_ref – this way it is simple to write a parser to check that the correct element is referenced. I’m fairly sure we agreed to do this previously in a call, (although it’s less important if we have Key/KeyRefs), but I would have a slight preference to follow the rule. I’ll make the other fixes later today, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of David Creasy Sent: 30 July 2008 05:00 To: psi...@li... Subject: [Psidev-pi-dev] Changes to schema on 29th July 2008 Hi Andy, Thanks for the recent changes to the schema. You asked a question: Added <xsd:attribute name="SearchDatabase_ref" type="xsd:string"/> to DBSequence and <xsd:attribute name="accession" type="xsd:string"/> should either/both of these be set as required? I think that the answer has to be yes. However, in the MPC_use_case_working_temp.axml file, there is: SearchDatabase_ref="", so maybe Martin could comment? And under <AnalysisCollection> <SpectrumIdentification ... > you made the change from <SpectraData_ref id="SD_1"/> <SearchDatabase_ref id="SDB_SwissProt"/> to <SpectraData_ref ref="SD_1"/> <SearchDatabase SearchDatabase_ref="SDB_SwissProt"/> which maybe isn't as consistent as you intended - or maybe I have missed the point? There is also still a : <SpectrumIdentificationList_ref ref=... cvList is fine, except it can only be a list of exactly 1 item at the moment... (I've not fixed the schema). I didn't notice it last time, but the schema doesn't validate with xerces: Need to change (in 2 places): <xsd:pattern value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?-]{1}"/> to <xsd:pattern value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?\-]{1}"/> (I've not updated the schema in svn - or checked to see if xerces is correct). Thanks, David -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: David C. <dc...@ma...> - 2008-07-30 03:59:55
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> </head> <body bgcolor="#ffffff" text="#000000"> Hi Andy,<br> <br> Thanks for the recent changes to the schema. You asked a question:<br> <span style="font-size: 11pt; font-family: Calibri;" lang="EN-GB">Added</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: black;" lang="EN-GB"><span style=""> </span></span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(0, 0, 150);" lang="EN-GB"><xsd:attribute</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(245, 132, 76);" lang="EN-GB"> name</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(255, 128, 64);" lang="EN-GB">=</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(153, 51, 0);" lang="EN-GB">"SearchDatabase_ref"</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(245, 132, 76);" lang="EN-GB"> type</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(255, 128, 64);" lang="EN-GB">=</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(153, 51, 0);" lang="EN-GB">"xsd:string"</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(0, 0, 150);" lang="EN-GB">/> </span><span style="font-size: 11pt; font-family: Calibri;" lang="EN-GB">to DBSequence and </span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(0, 0, 150);" lang="EN-GB"><xsd:attribute</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(245, 132, 76);" lang="EN-GB"> name</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(255, 128, 64);" lang="EN-GB">=</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(153, 51, 0);" lang="EN-GB">"accession"</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(245, 132, 76);" lang="EN-GB"> type</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(255, 128, 64);" lang="EN-GB">=</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(153, 51, 0);" lang="EN-GB">"xsd:string"</span><span style="font-size: 12pt; font-family: "Times New Roman"; color: rgb(0, 0, 150);" lang="EN-GB">/> </span><span style="font-size: 11pt; font-family: Calibri;" lang="EN-GB">should either/both of these be set as required?<u><br> <br> </u>I think that the answer has to be yes. However, in the MPC_use_case_working_temp.axml file, there is: SearchDatabase_ref="", so maybe Martin could comment?<br> <br> </span><br> And under<br> <small><small><tt><big><big> <AnalysisCollection><br> <SpectrumIdentification ... ><br> </big></big><br> </tt></small></small>you made the change from<br> <small><small><tt><big><big> <SpectraData_ref id="SD_1"/><br> <SearchDatabase_ref id="SDB_SwissProt"/><br> </big></big></tt></small></small>to <br> <small><small><tt><big><big> <SpectraData_ref ref="SD_1"/><br> <SearchDatabase SearchDatabase_ref="SDB_SwissProt"/><br> </big></big></tt><big><big><br> which maybe isn't as consistent as you intended - or maybe I have missed the point?<br> There is also still a : <br> <SpectrumIdentificationList_ref ref=...<br> <br> cvList is fine, except it can only be a list of exactly 1 item at the moment... (I've not fixed the schema).<br> <br> I didn't notice it last time, but the schema doesn't validate with xerces:<br> Need to change (in 2 places):<br> </big></big></small></small><small><small><big><big><xsd:pattern value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?-]{1}"/><br> </big></big></small></small>to<br> <small><small><big><big><xsd:pattern value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?\-]{1}"/><br> </big></big></small></small><small><small><big><big>(I've not updated the schema in svn - or checked to see if xerces is correct).<br> <br> Thanks,<br> <br> David<br> <br> </big></big></small></small> <pre class="moz-signature" cols="72">-- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 <a class="moz-txt-link-abbreviated" href="mailto:dc...@ma...">dc...@ma...</a> <a class="moz-txt-link-freetext" href="http://www.matrixscience.com">http://www.matrixscience.com</a> Matrix Science Ltd. is registered in England and Wales Company number 3533898</pre> </body> </html> |
From: David C. <dc...@ma...> - 2008-07-29 21:21:54
|
We've still not settled on how to define an enzyme, but there's been some further comments on the google issue list. http://code.google.com/p/psi-pi/issues/detail?id=30 The proposals have been fleshed out a little and the current choices are: #6 #7 - Simon Expanded example after comments is #14 #2 - Pierre-Alain My description of perl regex at #9 #15 - David I believe that we can discount #6 as not providing sufficient flexibility, which leaves the 3 different proposals from Simon, Pierre-Alain and me. Any further questions/comments, please add to the issue in google or reply to the list because we want to decide which one to go for at the tele conference on Thursday. (If you don't intend to be at the telecon, but want to express an opinion, please let me know by email in advance of the telecon). I'm still very undecided as to which is best, as all have their merits. Thanks, David David Creasy wrote: > Hello, > > I've added some comments/suggestions for specifying an enzyme to: > > http://code.google.com/p/psi-pi/issues/detail?id=30 > > If anything needs clarification, please add further comments to the > issue. Otherwise, we'll probably need to have a vote at the next telecon > on whether to use > #6, #7, #9 or #10 > > (or yet another suggestion) > > Thanks, > David > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |