You can subscribe to this list here.
2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
(5) |
Aug
(4) |
Sep
(4) |
Oct
(10) |
Nov
(1) |
Dec
(1) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2007 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
(4) |
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2008 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(8) |
May
(40) |
Jun
(30) |
Jul
(61) |
Aug
(21) |
Sep
(12) |
Oct
(56) |
Nov
(99) |
Dec
(83) |
2009 |
Jan
(3) |
Feb
(9) |
Mar
(1) |
Apr
(5) |
May
(88) |
Jun
(43) |
Jul
(60) |
Aug
(54) |
Sep
(4) |
Oct
(18) |
Nov
(9) |
Dec
(5) |
2010 |
Jan
|
Feb
(3) |
Mar
(1) |
Apr
(8) |
May
(10) |
Jun
(8) |
Jul
(10) |
Aug
(18) |
Sep
(11) |
Oct
(19) |
Nov
(14) |
Dec
(26) |
2011 |
Jan
(27) |
Feb
(38) |
Mar
(50) |
Apr
(128) |
May
(54) |
Jun
(116) |
Jul
(79) |
Aug
(163) |
Sep
(21) |
Oct
(14) |
Nov
(19) |
Dec
(9) |
2012 |
Jan
(7) |
Feb
(34) |
Mar
(34) |
Apr
(50) |
May
(70) |
Jun
(23) |
Jul
(8) |
Aug
(24) |
Sep
(35) |
Oct
(40) |
Nov
(276) |
Dec
(34) |
2013 |
Jan
(25) |
Feb
(23) |
Mar
(12) |
Apr
(59) |
May
(31) |
Jun
(11) |
Jul
(21) |
Aug
(7) |
Sep
(18) |
Oct
(11) |
Nov
(12) |
Dec
(18) |
2014 |
Jan
(37) |
Feb
(22) |
Mar
(9) |
Apr
(10) |
May
(38) |
Jun
(20) |
Jul
(15) |
Aug
(4) |
Sep
(4) |
Oct
(3) |
Nov
(8) |
Dec
(5) |
2015 |
Jan
(13) |
Feb
(34) |
Mar
(27) |
Apr
(5) |
May
(12) |
Jun
(10) |
Jul
(12) |
Aug
(3) |
Sep
(1) |
Oct
(13) |
Nov
|
Dec
(6) |
2016 |
Jan
(1) |
Feb
(1) |
Mar
(17) |
Apr
(139) |
May
(120) |
Jun
(90) |
Jul
(10) |
Aug
|
Sep
|
Oct
(11) |
Nov
(6) |
Dec
(2) |
2017 |
Jan
(24) |
Feb
(8) |
Mar
(7) |
Apr
(2) |
May
(5) |
Jun
(11) |
Jul
(5) |
Aug
(9) |
Sep
(6) |
Oct
(4) |
Nov
(2) |
Dec
(4) |
2018 |
Jan
(7) |
Feb
|
Mar
(4) |
Apr
(6) |
May
(10) |
Jun
(6) |
Jul
(7) |
Aug
|
Sep
(7) |
Oct
(5) |
Nov
(3) |
Dec
(3) |
2019 |
Jan
(3) |
Feb
|
Mar
(4) |
Apr
(3) |
May
(2) |
Jun
(6) |
Jul
(3) |
Aug
(2) |
Sep
|
Oct
(2) |
Nov
(12) |
Dec
(1) |
2020 |
Jan
(3) |
Feb
(1) |
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(1) |
Sep
|
Oct
(1) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jones, A. <And...@li...> - 2008-07-09 16:05:27
|
Hi Marc, Thanks for this, I've updated a new schema in the SVN, I also updated the KeyRefs for id rather than identifier (although I only did a global replace on identifier --> id so I didn't check that this works properly) Cheers Andy > -----Original Message----- > From: Marc Sturm [mailto:st...@in...] > Sent: 08 July 2008 13:53 > To: Jones, Andy > Subject: Re: [Psidev-pi-dev] FW: pre and post > > Hi Andy, > > this should do: > > <?xml version="1.0" encoding="UTF-8"?> > <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" > elementFormDefault="qualified" attributeFormDefault="unqualified"> > <xs:element name="Dummy"> > <xs:complexType> > <xs:attribute name="pre" use="required"> > <xs:simpleType> > <xs:restriction base="xs:string"> > <xs:pattern > value="[ABCDEFGHIJKLMNOPQRSTUVWXYZ?-]{1}"/> > </xs:restriction> > </xs:simpleType> > </xs:attribute> > </xs:complexType> > </xs:element> > </xs:schema> > > Best, > Marc |
From: Andreas B. <be...@in...> - 2008-07-08 12:52:27
|
> Any XSD experts out there? We need to restrict the value of the attribute > "pre" to be a single character from the following alphabet: > > "ABCDEFGHIJKLMNOPQRSTUVWXYZ-?" > > Anyone know how to do this in XML Schema? If not, I'll investigate further > but it might take a little while for me to get round to it. s.th. like <xs:restriction base="xs:string"> <xs:pattern value="[A-Z-?]"/> </xs:restriction> with some quoting for '-' and '?' should be sufficient. Best, A. -- Div. for Simulation of Biological Systems, WSI, University of Tuebingen Room C322, Sand 14, 72076 Tuebingen, Germany phone: +49-7071-29-70461 fax: +49-7071-29-5152 http://www-bs.informatik.uni-tuebingen.de |
From: Jones, A. <And...@li...> - 2008-07-08 12:45:33
|
Hi all, Any XSD experts out there? We need to restrict the value of the attribute "pre" to be a single character from the following alphabet: "ABCDEFGHIJKLMNOPQRSTUVWXYZ-?" Anyone know how to do this in XML Schema? If not, I'll investigate further but it might take a little while for me to get round to it. Cheers Andy > -----Original Message----- > From: David Creasy [mailto:dc...@ma...] > Sent: 03 July 2008 17:40 > To: Jones, Andy > Cc: Angel Pizarro > Subject: pre and post > > Hi Andy, > > We discussed this last week. If you can do it, we can cross another one > off the list... > > http://code.google.com/p/psi-pi/issues/detail?id=34#c2 > > Thanks, > David > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 |
From: Pierre-Alain B. <pie...@is...> - 2008-07-05 13:05:04
|
For your info, here is the format we use in Phenyx to explicitely annotate a spectrum (the most verbose but selfconsistent mode): <ionicSeries> <oneIonicSeries fragType='a'> <expTheoSpectraMatch len='14'> <expMoz>120.0761:221.1184:-1:-1:-1:664.287:777.3536:905.5523:-1:-1:-1:-1:-1:-1</expMoz> <expIntensity>3.1905:13.0476:-1:-1:-1:3.1905:4.2857:7.7619:-1:-1:-1:-1:-1:-1</expIntensity> <expIntensitySlice>1:4:-1:-1:-1:1:2:3:-1:-1:-1:-1:-1:-1</expIntensitySlice> <theoMass>120.0478:221.0954:336.1224:464.181:593.2236:664.2607:777.3447:905.4033:1020.43:1133.514:1319.594:1447.652:1633.731:1789.833</theoMass> <deltaMass>-0.02834:-0.02296:-9999:-9999:-9999:-0.02634:-0.00888:-0.149:-9999:-9999:-9999:-9999:-9999:-9999</deltaMass> </expTheoSpectraMatch> </oneIonicSeries> <oneIonicSeries fragType='a-NH3'> <expTheoSpectraMatch len='14'> <expMoz>-1:-1:-1:-1:576.2927:647.3032:-1:-1:-1:-1:-1:-1:-1:-1</expMoz> <expIntensity>-1:-1:-1:-1:4.8571:4.2857:-1:-1:-1:-1:-1:-1:-1:-1</expIntensity> <expIntensitySlice>-1:-1:-1:-1:2:2:-1:-1:-1:-1:-1:-1:-1:-1</expIntensitySlice> <theoMass>-1:-1:-1:447.1438:576.1864:647.2235:760.3076:888.3662:1003.393:1116.477:1302.556:1430.615:1616.694:1772.795</theoMass> <deltaMass>-9999:-9999:-9999:-9999:-0.10627:-0.07966:-9999:-9999:-9999:-9999:-9999:-9999:-9999:-9999</deltaMass> </expTheoSpectraMatch> </oneIonicSeries> <oneIonicSeries fragType='b'> <expTheoSpectraMatch len='14'> <expMoz>-1:249.1126:-1:492.2127:621.29:692.3133:805.4106:-1:1048.536:1161.645:1347.596:-1:-1:-1</expMoz> <expIntensity>-1:39.9048:-1:13.6667:34.4762:27.0952:15.5238:-1:4.2857:4.2857:1.6667:-1:-1:-1</expIntensity> <expIntensitySlice>-1:4:-1:4:4:4:4:-1:2:2:0:-1:-1:-1</expIntensitySlice> <theoMass>148.0427:249.0904:364.1173:492.1759:621.2185:692.2556:805.3396:933.3982:1048.425:1161.509:1347.589:1475.647:1661.726:1817.828</theoMass> <deltaMass>-9999:-0.02225:-9999:-0.03683:-0.07154:-0.05773:-0.07097:-9999:-0.11075:-0.13559:-0.00708:-9999:-9999:-9999</deltaMass> </expTheoSpectraMatch> </oneIonicSeries> <oneIonicSeries fragType='b-H2O'> <expTheoSpectraMatch len='14'> <expMoz>-1:231.1091:346.1367:474.213:603.2585:674.29:787.4133:915.475:1030.526:1143.635:-1:-1:-1:-1</expMoz> <expIntensity>-1:7.5238:5.7619:11.5714:8.5714:17.2857:15.7619:3.9524:5.7143:8.2381:-1:-1:-1:-1</expIntensity> <expIntensitySlice>-1:3:3:4:3:4:4:1:3:3:-1:-1:-1:-1</expIntensitySlice> <theoMass>-1:231.0798:346.1067:474.1653:603.2079:674.245:787.3291:915.3877:1030.415:1143.499:1329.578:1457.637:1643.716:1799.817</theoMass> <deltaMass>-9999:-0.02931:-0.02997:-0.04769:-0.0506:-0.04499:-0.08423:-0.08735:-0.11141:-0.13615:-9999:-9999:-9999:-9999</deltaMass> </expTheoSpectraMatch> </oneIonicSeries> <oneIonicSeries fragType='b-NH3'> <expTheoSpectraMatch len='14'> <expMoz>-1:-1:-1:475.2086:604.242:675.3962:788.4813:-1:1031.562:1144.459:-1:-1:-1:-1</expMoz> <expIntensity>-1:-1:-1:5.2381:7.7619:61.2857:43.0476:-1:99.0952:5.7619:-1:-1:-1:-1</expIntensity> <expIntensitySlice>-1:-1:-1:2:3:4:4:-1:4:3:-1:-1:-1:-1</expIntensitySlice> <theoMass>-1:-1:-1:475.1388:604.1813:675.2184:788.3025:916.3611:1031.388:1144.472:1330.551:1458.61:1644.689:1800.79</theoMass> <deltaMass>-9999:-9999:-9999:-0.06985:-0.06066:-0.17775:-0.17879:-9999:-0.17447:0.01319:-9999:-9999:-9999:-9999</deltaMass> </expTheoSpectraMatch> </oneIonicSeries> <oneIonicSeries fragType='y'> <expTheoSpectraMatch len='14'> <expMoz>-1:-1:1587.914:1472.891:1344.755:1215.739:1144.677:1031.562:903.5342:788.4813:675.3962:489.3123:361.2253:175.147</expMoz> <expIntensity>-1:-1:5.7619:8.6667:20.4286:48.0952:44.1429:99.0952:39.8095:43.0476:61.2857:53.9524:55.6191:26.2381</expIntensity> <expIntensitySlice>-1:-1:3:3:4:4:4:4:4:4:4:4:4:4</expIntensitySlice> <theoMass>1835.838:1688.803:1587.755:1472.728:1344.67:1215.627:1144.59:1031.506:903.4472:788.4202:675.3362:489.2569:361.1983:175.119</theoMass> <deltaMass>-9999:-9999:-0.1593754:-0.1626154:-0.0849954:-0.1118854:-0.0871954:-0.0567554:-0.0870354:-0.0610754:-0.0600354:-0.0554454:-0.0270254:-0.0280354</deltaMass> </expTheoSpectraMatch> </oneIonicSeries> <oneIonicSeries fragType='y-H2O'> <expTheoSpectraMatch len='14'> <expMoz>-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1</expMoz> <expIntensity>-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1</expIntensity> <expIntensitySlice>-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1</expIntensitySlice> <theoMass>1817.828:1670.792:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1:-1</theoMass> <deltaMass>-9999:-9999:-9999:-9999:-9999:-9999:-9999:-9999:-9999:-9999:-9999:-9999:-9999:-9999</deltaMass> </expTheoSpectraMatch> </oneIonicSeries> <oneIonicSeries fragType='y-NH3'> <expTheoSpectraMatch len='14'> <expMoz>-1:-1:-1:1455.827:-1:1198.736:-1:1014.542:-1:771.4312:658.3353:472.3802:344.1792:-1</expMoz> <expIntensity>-1:-1:-1:6.5238:-1:5.2857:-1:6.619:-1:3.0952:10.2857:1.3333:4.1905:-1</expIntensity> <expIntensitySlice>-1:-1:-1:3:-1:2:-1:3:-1:1:3:0:2:-1</expIntensitySlice> <theoMass>1818.801:1671.766:1570.718:1455.691:1327.632:1198.59:1127.553:1014.469:886.41:771.3831:658.299:472.2197:344.1612:158.0818</theoMass> <deltaMass>-9999:-9999:-9999:-0.1364354:-9999:-0.1462054:-9999:-0.0735754:-9999:-0.0480954:-0.0362554:-0.1604654:-0.0180454:-9999</deltaMass> </expTheoSpectraMatch> </oneIonicSeries> <extraFragSerie fragType='immonium ions'> <expMoz>-1:-1:-1:-1:-1:101.0882:-1:-1:-1:-1</expMoz> <expIntensity>-1:-1:-1:-1:-1:3.1905:-1:-1:-1:-1</expIntensity> <expIntensitySlice>-1:-1:-1:-1:-1:1:-1:-1:-1:-1</expIntensitySlice> <theoMass>44.02621:74.03678:86.07316:86.07316:88.01604:101.0477:102.0317:104.0296:129.0902:159.0684</theoMass> <deltaMass>-9999:-9999:-9999:-9999:-9999:-0.04052:-9999:-9999:-9999:-9999</deltaMass> <label>A:T:I:L:D:Q:E:M:R:W</label> <sequencePosition>-1:-1:-1:-1:-1:-1:-1:-1:-1:-1</sequencePosition> </extraFragSerie> <extraFragSerie fragType='precursor'> <expMoz>918.475:-1</expMoz> <expIntensity>17.7619:-1</expIntensity> <expIntensitySlice>4:-1</expIntensitySlice> <theoMass>918.4227:1835.838</theoMass> <deltaMass>-0.0523127:-9999</deltaMass> <label>2+:1+</label> <sequencePosition>-1:-1</sequencePosition> </extraFragSerie> </ionicSeries> Pierre-Alain Lennart Martens wrote: > (ooops, I originally only sent this to David rather than the whole list. > Corrected here). > > Hi David, > > > > >>> Let the argument (re-)commence! > > Good to hear from you - maybe ;) > > Always glad to help :) . > > > > Just to clarify, without this, there is no information haemorrhaging. > > It's just rather hard to reconstruct some of the information... which > > you may say is the same thing. > > Yes, I'd say it's the same thing. If something is difficult, nobody (or > practically nobody) will expend the effort to do it, and many (like > myself) will simply sit in a corner and complain about how difficult it > is to do this. Finally, some of those who do decide to do something will > do it badly. > > > What's the problem with a separate to tool to generate the > information > from the (less huge) anaylsisXML file and the mzML file? > > I can come up with three reasons right now: > i) that the tool may not exist (anymore); > ii) that the tool may not be up-to-date (or may be too up-to-date, > pick and choose which one you like); and > iii) that the information is no longer atomic (I'd need access to both > the file and the tool to get all relevant information). > > > Cheers, > > lnnrt. > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > |
From: Pierre-Alain B. <pie...@is...> - 2008-07-05 12:19:32
|
Yes, and the answer is: a lib is a result, mzML does not anntoate spectra, as it is the task of an interpretation analysis. Therefore, a library should be stored in an AnalysisXML format, not mzML... Pierre-Alain Angel Pizarro wrote: > Matt, have the mzML group thought about annotated spectra libs, like > those provided by NIST, yet? > -angel > > On Thu, Jul 3, 2008 at 9:21 AM, Matt Chambers > <mat...@va... > <mailto:mat...@va...>> wrote: > > Hi Lennart, > > If you think about storing fragments in as verbose a format as Andy > Jones's suggestion, for every result in every spectrum (keeping in > mind > that more than the top result/s is/are written out per spectrum), it > would represent an intolerable (to me) bloat to the file. As I > understand it, we want to mention the algorithm, its parameters, > and its > version in the CV. We would then recommend to search engine developers > that when they implement support for analysisXML they provide an > online > script for generating fragments based on the controlled parameters and > the algorithm version. I do not think this reconstruction of the > fragment information is "next to impossible" as long as such a > script is > provided. > > Alternatively, I think we could come up with a much briefer format to > store the fragments in, something like: > <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches> > > It's ugly as sin, but we can come up with a controlled pattern to > store > the ion types in. The numbers are mostly redundant: the expected m/z > values can be recalculated from the ion type and the sequence, and the > observed m/z values can be looked up in the spectrum according to some > rules regarding mass/m/z tolerances and whatever data processing was > applied to the original spectrum by the search engine (which > again, is a > good reason to have search engines write out the results of their > preprocessing to an mzML file). > > -Matt > > > Lennart Martens wrote: > > Dear PSI-PI'ers, > > > > > > I recently came across a discussion related to the inclusion of > fragment > > ions (as called by the search engine during identification) in the > > analysisXML format (see issue 28 on the Google tracker, direct link: > > http://code.google.com/p/psi-pi/issues/detail?id=28). > > > > It somehow seems that popular opinion is against inclusion of > this vital > > piece of information, and that makes me very worried. One of the > > comments on the issue page in fact is that fragment ion calling is > > algorithm specific (which is true), and therefore should not be > a part > > of analysisXML. > > I'd actually like to use this same datum to strongly argue the other > > way: since the calling is algorithm specific, it is next to > impossible > > to reconstruct the original calling after analysisXML export. So > > essentially, a vital piece of information (the ability of the > spectrum > > to support the peptide identification as judged by the algorithm) is > > thrown away during analysisXML conversion or output. > > > > I also believe that the difficulty in annotating which fragments are > > called from the spectrum is definitely not insurmountable. The > link with > > mzML should be there anyway (otherwise you would not even be able to > > retrieve the spectrum the identification was made from, an > unthinkable > > scenario), so inclusion of this is trivial (as in: already there). > > Additionally, the unambiguous reference to the exact peak called > in the > > spectrum is also trivial: simply copy in the actual mass - or more > > likely: m/z - in the analysisXML tag. Ion type should be easy > enough to > > annotate (there are only so many ion types, and these can be > modelled in > > CV), while charge state is a call made by the algorithm anyway, > and can > > therefore also be included easily. So this essentially fully > backs up > > Andy Jones' suggested tag format on the issue 28 page. And Andy has > > included some other information, such as 'subsequence' and > 'theoretical > > mass' which people are free to dicuss the usefulness of (as it > probably > > constitutes redundant information). > > > > So my conclusion is: it's relatively easy to do, will capture vital > > information about the identification and how it was established, and > > conserves irreplacable data. > > So consider any weight I might have to be formally thrown behind > > including this in version 1.0! > > > > Let the argument (re-)commence! > > > > > > Cheers, > > > > lnnrt. > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |
From: Lennart M. <len...@eb...> - 2008-07-03 17:28:46
|
(ooops, I originally only sent this to David rather than the whole list. Corrected here). Hi David, > >>> Let the argument (re-)commence! > Good to hear from you - maybe ;) Always glad to help :) . > Just to clarify, without this, there is no information haemorrhaging. > It's just rather hard to reconstruct some of the information... which > you may say is the same thing. Yes, I'd say it's the same thing. If something is difficult, nobody (or practically nobody) will expend the effort to do it, and many (like myself) will simply sit in a corner and complain about how difficult it is to do this. Finally, some of those who do decide to do something will do it badly. > What's the problem with a separate to tool to generate the information > from the (less huge) anaylsisXML file and the mzML file? I can come up with three reasons right now: i) that the tool may not exist (anymore); ii) that the tool may not be up-to-date (or may be too up-to-date, pick and choose which one you like); and iii) that the information is no longer atomic (I'd need access to both the file and the tool to get all relevant information). Cheers, lnnrt. |
From: David C. <dc...@ma...> - 2008-07-03 14:22:01
|
Hi Pierre-Alain, Pierre-Alain Binz wrote: > Thanks David. > a couple of questions, just to make sure: In England, "couple" means 2. In the US, I believe it can mean any. So, being English, I'll just answer the first two questions ;) - actually, the rest of the questions are all good too! > > 1) in case of top-down approach, do we have to duplicate > sequenceCollection information? as SpectrumIdentificationResult contains > a PeptideEvidence refering to a Peptide element (and not to a > DBSequence), identification is obligatory a Peptide? I guess so, yes. > > 2) and what about spectral library searches, do we have to have Peptide > elements with possibly undefined explicit sequences to refer to from the > SpectrumIdentificationResult (because non peptidic, or because not > identified but good spectrum) PeptideEvidence is not a required element. However, we don't have an example instance document for spectral library searches. Would you like to volunteer? > > 3) in the Peptide element, the Modifications are defined in a much more > detailed manner than in ModificationParams (PSI-MOD is there for > instance). Does this simply mean that The ModificationParams codes the > search engine settings and the Peptide includes the formal PSI > definition of the Mod? And the only reference is the ModName value? The example document is not yet complete here... and yes, it needs a little more thought. However, we expect to provide a PSI mod definition. > > 4) all mass values (sequenceMass, calculatedMassToCharge, > experimentalMassToCharge, are not specified whether monoisotopic or > averaged. Do we assume that averaged does not exist anymore? No, average is still allowed. I've added this to http://code.google.com/p/psi-pi/issues/detail?id=13 > > 5) is sequenceMass the mass value with/without the mods? If with, the > name might be missleading (peptideMass would be more appropriate) Yes, it is without mods. I've added this to http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > > 6) in case the DBSequence is nucleotide, is there a tag for saying this? Up for discussion at the telecon today I hope. > (NB: MS on nucleotide molecules can be performed and analysed, not only > MS on AA sequences that are interpreting nucleotide sequences). Or do we > neglect MS experiments done on nucleotide molecules (and by the way on > glycans...) and only represent the DBSequences as AA sequences (frame > translations)? (and what about glycans?) Probaly can be solved if one > can replace SequenceCollection by something else if needed > (SmallMoleculeCollection, GlycanCollection, MoleculeCollection)... but > the validator might not like this. > > 7) in case that DBSequence is nucleotide, do we represent the Peptide as > AA sequence in case of MS done on proteins? Yes - see the first item in: http://code.google.com/p/psi-pi/wiki/NotesForFocumentation Thanks very much for the questions. David > > That's all for the sequence representation so far > > Cheers, > Pierre-Alain > > > David Creasy wrote: >> Thanks Andy, >> >> I've added an updated example document to SVN: >> http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml >> >> Problem is that we have now removed the main point of these recent >> changes which was to add the decoy flag... I think that we need to add >> isDecoy to SpectrumIdentificationItem. >> >> And yes, I suspect that we should go back to using the >> ConceptualMoleculeCollection >> Um, and since we've not actually ended up adding anything to >> DBSequence... we haven't actually achieved anything? >> I think we need to discuss this again at the next telecon. >> >> David >> >> Jones, Andy wrote: >>> >>> Hi all, >>> >>> >>> >>> I’ve updated the schema in SVN with the following main changes: >>> >>> >>> >>> - PeptideEvidence is now part of SpectrumIdentificationItem >>> as discussed on the call (simple mappings to proteins are done at >>> this level) >>> >>> - Added DBSequence that should be used instead of Sequence >>> (following some of the discussion below) >>> >>> - Created a new collection class SequenceCollection (rather >>> than ConceptualMoleculeCollection) so that only references can be >>> given to DBSequence and Peptide >>> >>> o In fact, I’m not sure if this is sensible since it prevents other >>> types of ConceptualMolecule being added later... to discuss >>> >>> - In FuGE on cvParam, the value attribute is no longer mandatory >>> >>> >>> >>> I’ve added a simple example that validates under >>> examples\schema_usecase_examples\working27June >>> >>> >>> >>> Feel free to mail me any changes to make on Monday, >>> >>> Cheers >>> >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> *From:* psi...@li... >>> [mailto:psi...@li...] *On Behalf Of >>> *Jones, Andy >>> *Sent:* 27 June 2008 16:24 >>> *To:* Angel Pizarro >>> *Cc:* psi...@li... >>> *Subject:* Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> >>> >>> I think Angel’s response below might not have made it round the list yet. >>> >>> >>> >>> I tend to agree that isDecoy is redundant information and perhaps >>> this is not the best place to encode semantic information. An >>> alternative would be to have a parameter, say on >>> SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. >>> This would be a more compact representation and we would not have to >>> add what is quite a specific attribute type (isDecoy) to Sequence. >>> >>> >>> >>> >>> >>> >>> >>> *From:* an...@it... [mailto:an...@it...] *On >>> Behalf Of *Angel Pizarro >>> *Sent:* 27 June 2008 15:59 >>> *To:* Jones, Andy >>> *Cc:* psi...@li... >>> *Subject:* Re: [Psidev-pi-dev] FW: Representing Sequences >>> >>> >>> >>> my 2¢ : >>> You need to be able to extend this to all molecule types, or am I >>> missing the point of this thread, and you mean that this would be a >>> suclass of the conceptual molecule element? >>> >>> Second, and this is is tangentially related, but are decoy sequences >>> really a problem we should be putting our effort into? Is it in our >>> domain to encode semantic information about a sequence, and possibly >>> relating reported sequences as part of our schema? >>> On a personal level I could care less if "isDecoy" is an attribute or >>> not, but the temptation then would be for folks to encode the same >>> accession for two different sequences, effectively making the primary >>> key of the sequence object (accession, isDecoy) >>> >>> Do we want to go there? >>> >>> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy >>> <And...@li... <mailto:And...@li...>> >>> wrote: >>> >>> So how about include length as an attribute and then let all other >>> things go in the CV (pI, mass, etc.)? >>> >>> >>> >>> >>> >>> >>> >>> *From:* Jones, Andy >>> *Sent:* 27 June 2008 14:54 >>> *To:* 'David Creasy' >>> *Subject:* RE: [Psidev-pi-dev] Representing Sequences >>> >>> >>> >>> id and name are standard for all elements that inherit from FuGE >>> identifiable – this is perhaps a separate discussion as to whether >>> the optional name attribute should be there. >>> >>> >>> >>> I agree that length may be useful – is this just an integer value >>> with no unit? >>> >>> Yes, I think so. >>> >>> I'm less sure about pI and mass since mass at least can be calculated >>> very simply >>> >>> Only if you have the sequence... (we have residue masses in the file). >>> >>> >>> >>> >>> >>> , and pI values (in my opinion) are pretty inaccurate and fairly >>> meaningless >>> >>> Scandalous! (I happen to agree, but now some people will never speak >>> to either of us ever again). >>> >>> The main problem with mass and pI is that these are 'irrelevant' if >>> the sequence is nuleic acid rather than residues. >>> Why not just allow CV there? We can share the same CV as the PEFF >>> format, which includes, taxonomy, sequence type, gene ID, and lots of >>> wonderful other things? >>> >>> – unless someone can convince me otherwise? >>> >>> Cheers >>> >>> Andy >>> >>> >>> >>> >>> >>> *From:* David Creasy [mailto:dc...@ma...] >>> *Sent:* 27 June 2008 14:51 >>> *To:* Jones, Andy >>> *Cc:* psi...@li... >>> <mailto:psi...@li...> >>> *Subject:* Re: [Psidev-pi-dev] Representing Sequences >>> >>> >>> >>> Hi Andy, >>> >>> length may be useful, because some people won't want to output the >>> actual sequence for space reasons. The other things we wanted to add >>> before were pI and mass. >>> Why do we want name? Is this for, say, a description line? >>> (Also, identifier -> id?) >>> >>> David >>> >>> Jones, Andy wrote: >>> >>> Hi all, >>> >>> >>> >>> It was decided on the call that we would like to flag that Sequences >>> in the ConceptualMoleculeCollection should have a Boolean attribute >>> to capture if they are decoy sequences. At the moment we are using >>> the FuGE:Sequence element. I don't really want to add another >>> attribute to this (it's less problematic cutting down FuGE than >>> adding new things), so I'm wondering if we should define our own >>> Sequence type in AnalysisXML. This would also allow us to choose >>> exactly the relevant attributes. At the moment, Sequence can have all >>> of the following: >>> >>> >>> >>> <pf:Sequence isCircular="true" >>> sequence="String" length="0" isApproximateLength="true" >>> SequenceAnnotationSet_ref="String" start="0" end="0" >>> identifier="String" name="String"> >>> >>> >>> >>> Several of these attributes were created to represent concepts that >>> probably will never be required or implemented in AnalysisXML. How >>> about the following: >>> >>> >>> >>> <DBSequence identifier = "" name = "" isDecoy = "true"> >>> >>> <seq>MCTMG...</seq> >>> >>> <pf:DatabaseReference Database_ref="" >>> accession="Rev_IPI00013808.1"/> >>> >>> </DBSequence> >>> >>> >>> >>> Are any of the other attributes on Sequence actually required? I'll >>> post a new version of the schema with other changes WRT to >>> PeptideEvidence shortly, >>> >>> Cheers >>> >>> Andy >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... <mailto:Psi...@li...> >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... <mailto:dc...@ma...> >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> >>> >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... <mailto:Psi...@li...> >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> >>> -- >>> David Creasy >>> Matrix Science >>> 64 Baker Street >>> London W1U 7GB, UK >>> Tel: +44 (0)20 7486 1050 >>> Fax: +44 (0)20 7224 1344 >>> >>> dc...@ma... <mailto:dc...@ma...> >>> http://www.matrixscience.com >>> >>> Matrix Science Ltd. is registered in England and Wales >>> Company number 3533898 >>> >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> <mailto:Psi...@li...> >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >>> >>> >>> >>> -- >>> Angel Pizarro >>> Director, ITMAT Bioinformatics Facility >>> 806 Biological Research Building >>> 421 Curie Blvd. >>> Philadelphia, PA 19104-6160 >>> 215-573-3736 >>> >>> ------------------------------------------------------------------------ >>> >>> ------------------------------------------------------------------------- >>> Check out the new SourceForge.net Marketplace. >>> It's the best place to buy or sell services for >>> just about anything Open Source. >>> http://sourceforge.net/services/buy/index.php >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Psidev-pi-dev mailing list >>> Psi...@li... >>> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >>> >> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Matthew C. <mat...@va...> - 2008-07-03 14:14:27
|
It's not as simple as just storing a single observed (matched) peak because the search engine might look at multiple peaks in that window. Or it might be working from profile data. I think it's much safer to just list the labels. What reason is there to know exactly which data points the search engine decided were related to the ion? Unless you're trying to double-check the algorithm's calculations I can't see the point. -Matt Lennart Martens wrote: > Hi Matt, > > > Just to clarify, I do not care about the actual formatting, I only care > about preventing information loss. > In that respect, your less verbose format would be fine, with the one > caveat that it doesn't explicitly point to a peak in the spectrum. As a > result, one would have to calculate the theoretical m/z for the fragment > ion, then apply the fragment ion mass threshold used, and then somehow > select a single peak from all candidates in this m/z window in the > spectrum, introducing an arbitrary component in the process (e.g., I > might simply choose the largest peak in the interval, you might pick the > one with the best fitting isotopic envelope, while the actual search > engine originally chose the peak with the smallest mass delta -- so we'd > all end up with a different opinion once again). > > But anyway, the format is definitely open to any suggestions. I simply > want to stop the information haemorrhaging that results from the > exclusion of these data. > > > Cheers, > > lnnrt. > > Matt Chambers wrote: > >> Hi Lennart, >> >> If you think about storing fragments in as verbose a format as Andy >> Jones's suggestion, for every result in every spectrum (keeping in mind >> that more than the top result/s is/are written out per spectrum), it >> would represent an intolerable (to me) bloat to the file. As I >> understand it, we want to mention the algorithm, its parameters, and its >> version in the CV. We would then recommend to search engine developers >> that when they implement support for analysisXML they provide an online >> script for generating fragments based on the controlled parameters and >> the algorithm version. I do not think this reconstruction of the >> fragment information is "next to impossible" as long as such a script is >> provided. >> >> Alternatively, I think we could come up with a much briefer format to >> store the fragments in, something like: >> <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches> >> >> It's ugly as sin, but we can come up with a controlled pattern to store >> the ion types in. The numbers are mostly redundant: the expected m/z >> values can be recalculated from the ion type and the sequence, and the >> observed m/z values can be looked up in the spectrum according to some >> rules regarding mass/m/z tolerances and whatever data processing was >> applied to the original spectrum by the search engine (which again, is a >> good reason to have search engines write out the results of their >> preprocessing to an mzML file). >> >> -Matt >> >> >> Lennart Martens wrote: >> >>> Dear PSI-PI'ers, >>> >>> >>> I recently came across a discussion related to the inclusion of fragment >>> ions (as called by the search engine during identification) in the >>> analysisXML format (see issue 28 on the Google tracker, direct link: >>> http://code.google.com/p/psi-pi/issues/detail?id=28). >>> >>> It somehow seems that popular opinion is against inclusion of this vital >>> piece of information, and that makes me very worried. One of the >>> comments on the issue page in fact is that fragment ion calling is >>> algorithm specific (which is true), and therefore should not be a part >>> of analysisXML. >>> I'd actually like to use this same datum to strongly argue the other >>> way: since the calling is algorithm specific, it is next to impossible >>> to reconstruct the original calling after analysisXML export. So >>> essentially, a vital piece of information (the ability of the spectrum >>> to support the peptide identification as judged by the algorithm) is >>> thrown away during analysisXML conversion or output. >>> >>> I also believe that the difficulty in annotating which fragments are >>> called from the spectrum is definitely not insurmountable. The link with >>> mzML should be there anyway (otherwise you would not even be able to >>> retrieve the spectrum the identification was made from, an unthinkable >>> scenario), so inclusion of this is trivial (as in: already there). >>> Additionally, the unambiguous reference to the exact peak called in the >>> spectrum is also trivial: simply copy in the actual mass - or more >>> likely: m/z - in the analysisXML tag. Ion type should be easy enough to >>> annotate (there are only so many ion types, and these can be modelled in >>> CV), while charge state is a call made by the algorithm anyway, and can >>> therefore also be included easily. So this essentially fully backs up >>> Andy Jones' suggested tag format on the issue 28 page. And Andy has >>> included some other information, such as 'subsequence' and 'theoretical >>> mass' which people are free to dicuss the usefulness of (as it probably >>> constitutes redundant information). >>> >>> So my conclusion is: it's relatively easy to do, will capture vital >>> information about the identification and how it was established, and >>> conserves irreplacable data. >>> So consider any weight I might have to be formally thrown behind >>> including this in version 1.0! >>> >>> Let the argument (re-)commence! >>> >>> >>> Cheers, >>> >>> lnnrt. >>> >>> >> ------------------------------------------------------------------------- >> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! >> Studies have shown that voting for your favorite open source project, >> along with a healthy diet, reduces your potential for chronic lameness >> and boredom. Vote Now at http://www.sourceforge.net/community/cca08 >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > > |
From: David C. <dc...@ma...> - 2008-07-03 13:43:34
|
Hi Lennart, >>> Let the argument (re-)commence! Good to hear from you - maybe ;) Just to clarify, without this, there is no information haemorrhaging. It's just rather hard to reconstruct some of the information... which you may say is the same thing. What's the problem with a separate to tool to generate the information from the (less huge) anaylsisXML file and the mzML file? David Lennart Martens wrote: > Hi Matt, > > > Just to clarify, I do not care about the actual formatting, I only care > about preventing information loss. > In that respect, your less verbose format would be fine, with the one > caveat that it doesn't explicitly point to a peak in the spectrum. As a > result, one would have to calculate the theoretical m/z for the fragment > ion, then apply the fragment ion mass threshold used, and then somehow > select a single peak from all candidates in this m/z window in the > spectrum, introducing an arbitrary component in the process (e.g., I > might simply choose the largest peak in the interval, you might pick the > one with the best fitting isotopic envelope, while the actual search > engine originally chose the peak with the smallest mass delta -- so we'd > all end up with a different opinion once again). > > But anyway, the format is definitely open to any suggestions. I simply > want to stop the information haemorrhaging that results from the > exclusion of these data. > > > Cheers, > > lnnrt. > > Matt Chambers wrote: >> Hi Lennart, >> >> If you think about storing fragments in as verbose a format as Andy >> Jones's suggestion, for every result in every spectrum (keeping in mind >> that more than the top result/s is/are written out per spectrum), it >> would represent an intolerable (to me) bloat to the file. As I >> understand it, we want to mention the algorithm, its parameters, and its >> version in the CV. We would then recommend to search engine developers >> that when they implement support for analysisXML they provide an online >> script for generating fragments based on the controlled parameters and >> the algorithm version. I do not think this reconstruction of the >> fragment information is "next to impossible" as long as such a script is >> provided. >> >> Alternatively, I think we could come up with a much briefer format to >> store the fragments in, something like: >> <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches> >> >> It's ugly as sin, but we can come up with a controlled pattern to store >> the ion types in. The numbers are mostly redundant: the expected m/z >> values can be recalculated from the ion type and the sequence, and the >> observed m/z values can be looked up in the spectrum according to some >> rules regarding mass/m/z tolerances and whatever data processing was >> applied to the original spectrum by the search engine (which again, is a >> good reason to have search engines write out the results of their >> preprocessing to an mzML file). >> >> -Matt >> >> >> Lennart Martens wrote: >>> Dear PSI-PI'ers, >>> >>> >>> I recently came across a discussion related to the inclusion of fragment >>> ions (as called by the search engine during identification) in the >>> analysisXML format (see issue 28 on the Google tracker, direct link: >>> http://code.google.com/p/psi-pi/issues/detail?id=28). >>> >>> It somehow seems that popular opinion is against inclusion of this vital >>> piece of information, and that makes me very worried. One of the >>> comments on the issue page in fact is that fragment ion calling is >>> algorithm specific (which is true), and therefore should not be a part >>> of analysisXML. >>> I'd actually like to use this same datum to strongly argue the other >>> way: since the calling is algorithm specific, it is next to impossible >>> to reconstruct the original calling after analysisXML export. So >>> essentially, a vital piece of information (the ability of the spectrum >>> to support the peptide identification as judged by the algorithm) is >>> thrown away during analysisXML conversion or output. >>> >>> I also believe that the difficulty in annotating which fragments are >>> called from the spectrum is definitely not insurmountable. The link with >>> mzML should be there anyway (otherwise you would not even be able to >>> retrieve the spectrum the identification was made from, an unthinkable >>> scenario), so inclusion of this is trivial (as in: already there). >>> Additionally, the unambiguous reference to the exact peak called in the >>> spectrum is also trivial: simply copy in the actual mass - or more >>> likely: m/z - in the analysisXML tag. Ion type should be easy enough to >>> annotate (there are only so many ion types, and these can be modelled in >>> CV), while charge state is a call made by the algorithm anyway, and can >>> therefore also be included easily. So this essentially fully backs up >>> Andy Jones' suggested tag format on the issue 28 page. And Andy has >>> included some other information, such as 'subsequence' and 'theoretical >>> mass' which people are free to dicuss the usefulness of (as it probably >>> constitutes redundant information). >>> >>> So my conclusion is: it's relatively easy to do, will capture vital >>> information about the identification and how it was established, and >>> conserves irreplacable data. >>> So consider any weight I might have to be formally thrown behind >>> including this in version 1.0! >>> >>> Let the argument (re-)commence! >>> >>> >>> Cheers, >>> >>> lnnrt. >>> >> >> ------------------------------------------------------------------------- >> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! >> Studies have shown that voting for your favorite open source project, >> along with a healthy diet, reduces your potential for chronic lameness >> and boredom. Vote Now at http://www.sourceforge.net/community/cca08 >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Angel P. <an...@ma...> - 2008-07-03 13:36:30
|
Matt, have the mzML group thought about annotated spectra libs, like those provided by NIST, yet? -angel On Thu, Jul 3, 2008 at 9:21 AM, Matt Chambers < mat...@va...> wrote: > Hi Lennart, > > If you think about storing fragments in as verbose a format as Andy > Jones's suggestion, for every result in every spectrum (keeping in mind > that more than the top result/s is/are written out per spectrum), it > would represent an intolerable (to me) bloat to the file. As I > understand it, we want to mention the algorithm, its parameters, and its > version in the CV. We would then recommend to search engine developers > that when they implement support for analysisXML they provide an online > script for generating fragments based on the controlled parameters and > the algorithm version. I do not think this reconstruction of the > fragment information is "next to impossible" as long as such a script is > provided. > > Alternatively, I think we could come up with a much briefer format to > store the fragments in, something like: > <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches> > > It's ugly as sin, but we can come up with a controlled pattern to store > the ion types in. The numbers are mostly redundant: the expected m/z > values can be recalculated from the ion type and the sequence, and the > observed m/z values can be looked up in the spectrum according to some > rules regarding mass/m/z tolerances and whatever data processing was > applied to the original spectrum by the search engine (which again, is a > good reason to have search engines write out the results of their > preprocessing to an mzML file). > > -Matt > > > Lennart Martens wrote: > > Dear PSI-PI'ers, > > > > > > I recently came across a discussion related to the inclusion of fragment > > ions (as called by the search engine during identification) in the > > analysisXML format (see issue 28 on the Google tracker, direct link: > > http://code.google.com/p/psi-pi/issues/detail?id=28). > > > > It somehow seems that popular opinion is against inclusion of this vital > > piece of information, and that makes me very worried. One of the > > comments on the issue page in fact is that fragment ion calling is > > algorithm specific (which is true), and therefore should not be a part > > of analysisXML. > > I'd actually like to use this same datum to strongly argue the other > > way: since the calling is algorithm specific, it is next to impossible > > to reconstruct the original calling after analysisXML export. So > > essentially, a vital piece of information (the ability of the spectrum > > to support the peptide identification as judged by the algorithm) is > > thrown away during analysisXML conversion or output. > > > > I also believe that the difficulty in annotating which fragments are > > called from the spectrum is definitely not insurmountable. The link with > > mzML should be there anyway (otherwise you would not even be able to > > retrieve the spectrum the identification was made from, an unthinkable > > scenario), so inclusion of this is trivial (as in: already there). > > Additionally, the unambiguous reference to the exact peak called in the > > spectrum is also trivial: simply copy in the actual mass - or more > > likely: m/z - in the analysisXML tag. Ion type should be easy enough to > > annotate (there are only so many ion types, and these can be modelled in > > CV), while charge state is a call made by the algorithm anyway, and can > > therefore also be included easily. So this essentially fully backs up > > Andy Jones' suggested tag format on the issue 28 page. And Andy has > > included some other information, such as 'subsequence' and 'theoretical > > mass' which people are free to dicuss the usefulness of (as it probably > > constitutes redundant information). > > > > So my conclusion is: it's relatively easy to do, will capture vital > > information about the identification and how it was established, and > > conserves irreplacable data. > > So consider any weight I might have to be formally thrown behind > > including this in version 1.0! > > > > Let the argument (re-)commence! > > > > > > Cheers, > > > > lnnrt. > > > > |
From: Lennart M. <len...@eb...> - 2008-07-03 13:34:05
|
Hi Matt, Just to clarify, I do not care about the actual formatting, I only care about preventing information loss. In that respect, your less verbose format would be fine, with the one caveat that it doesn't explicitly point to a peak in the spectrum. As a result, one would have to calculate the theoretical m/z for the fragment ion, then apply the fragment ion mass threshold used, and then somehow select a single peak from all candidates in this m/z window in the spectrum, introducing an arbitrary component in the process (e.g., I might simply choose the largest peak in the interval, you might pick the one with the best fitting isotopic envelope, while the actual search engine originally chose the peak with the smallest mass delta -- so we'd all end up with a different opinion once again). But anyway, the format is definitely open to any suggestions. I simply want to stop the information haemorrhaging that results from the exclusion of these data. Cheers, lnnrt. Matt Chambers wrote: > Hi Lennart, > > If you think about storing fragments in as verbose a format as Andy > Jones's suggestion, for every result in every spectrum (keeping in mind > that more than the top result/s is/are written out per spectrum), it > would represent an intolerable (to me) bloat to the file. As I > understand it, we want to mention the algorithm, its parameters, and its > version in the CV. We would then recommend to search engine developers > that when they implement support for analysisXML they provide an online > script for generating fragments based on the controlled parameters and > the algorithm version. I do not think this reconstruction of the > fragment information is "next to impossible" as long as such a script is > provided. > > Alternatively, I think we could come up with a much briefer format to > store the fragments in, something like: > <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches> > > It's ugly as sin, but we can come up with a controlled pattern to store > the ion types in. The numbers are mostly redundant: the expected m/z > values can be recalculated from the ion type and the sequence, and the > observed m/z values can be looked up in the spectrum according to some > rules regarding mass/m/z tolerances and whatever data processing was > applied to the original spectrum by the search engine (which again, is a > good reason to have search engines write out the results of their > preprocessing to an mzML file). > > -Matt > > > Lennart Martens wrote: >> Dear PSI-PI'ers, >> >> >> I recently came across a discussion related to the inclusion of fragment >> ions (as called by the search engine during identification) in the >> analysisXML format (see issue 28 on the Google tracker, direct link: >> http://code.google.com/p/psi-pi/issues/detail?id=28). >> >> It somehow seems that popular opinion is against inclusion of this vital >> piece of information, and that makes me very worried. One of the >> comments on the issue page in fact is that fragment ion calling is >> algorithm specific (which is true), and therefore should not be a part >> of analysisXML. >> I'd actually like to use this same datum to strongly argue the other >> way: since the calling is algorithm specific, it is next to impossible >> to reconstruct the original calling after analysisXML export. So >> essentially, a vital piece of information (the ability of the spectrum >> to support the peptide identification as judged by the algorithm) is >> thrown away during analysisXML conversion or output. >> >> I also believe that the difficulty in annotating which fragments are >> called from the spectrum is definitely not insurmountable. The link with >> mzML should be there anyway (otherwise you would not even be able to >> retrieve the spectrum the identification was made from, an unthinkable >> scenario), so inclusion of this is trivial (as in: already there). >> Additionally, the unambiguous reference to the exact peak called in the >> spectrum is also trivial: simply copy in the actual mass - or more >> likely: m/z - in the analysisXML tag. Ion type should be easy enough to >> annotate (there are only so many ion types, and these can be modelled in >> CV), while charge state is a call made by the algorithm anyway, and can >> therefore also be included easily. So this essentially fully backs up >> Andy Jones' suggested tag format on the issue 28 page. And Andy has >> included some other information, such as 'subsequence' and 'theoretical >> mass' which people are free to dicuss the usefulness of (as it probably >> constitutes redundant information). >> >> So my conclusion is: it's relatively easy to do, will capture vital >> information about the identification and how it was established, and >> conserves irreplacable data. >> So consider any weight I might have to be formally thrown behind >> including this in version 1.0! >> >> Let the argument (re-)commence! >> >> >> Cheers, >> >> lnnrt. >> > > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |
From: David C. <dc...@ma...> - 2008-07-03 13:27:04
|
Julian Selley has made some comments on one of the example instance documents at: http://code.google.com/p/psi-pi/issues/detail?id=13#c4 David Creasy wrote: > Hi everyone, > > There will be an AnalysisXML working group conference call tomorrow > (Thursday) at: > http://www.timeanddate.com/worldclock/fixedtime.html?day=3&month=7&year=2008&hour=16&min=0&sec=0&p1=136 > > > Minutes from the last meeting: > http://psidev.info/index.php?q=node/351 > (Which seem to have restricted access - can you fix this Phil?) > > Latest instance documents for those that agreed to review them at: > http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml > > http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/MPC_use_case_working27June.axml > > > Agenda: > 1. Decoy database search - recent discussions on the list. > > 2. Feedback from review of example instance documents > > > Dial in details: > > + Germany: 08001012079 > + Switzerland: 0800000860 > + UK: 08081095644 > + USA: 1-866-314-3683 > + Generic international: +44 2083222500 (UK number) > > access code: 297427 > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Angel P. <an...@ma...> - 2008-07-03 13:21:44
|
I'm fine with inclusion of calls. -a On Thu, Jul 3, 2008 at 8:57 AM, Lennart Martens <len...@eb...> wrote: > Dear PSI-PI'ers, > > > I recently came across a discussion related to the inclusion of fragment > ions (as called by the search engine during identification) in the > analysisXML format (see issue 28 on the Google tracker, direct link: > http://code.google.com/p/psi-pi/issues/detail?id=28). > > It somehow seems that popular opinion is against inclusion of this vital > piece of information, and that makes me very worried. One of the > comments on the issue page in fact is that fragment ion calling is > algorithm specific (which is true), and therefore should not be a part > of analysisXML. > I'd actually like to use this same datum to strongly argue the other > way: since the calling is algorithm specific, it is next to impossible > to reconstruct the original calling after analysisXML export. So > essentially, a vital piece of information (the ability of the spectrum > to support the peptide identification as judged by the algorithm) is > thrown away during analysisXML conversion or output. > > I also believe that the difficulty in annotating which fragments are > called from the spectrum is definitely not insurmountable. The link with > mzML should be there anyway (otherwise you would not even be able to > retrieve the spectrum the identification was made from, an unthinkable > scenario), so inclusion of this is trivial (as in: already there). > Additionally, the unambiguous reference to the exact peak called in the > spectrum is also trivial: simply copy in the actual mass - or more > likely: m/z - in the analysisXML tag. Ion type should be easy enough to > annotate (there are only so many ion types, and these can be modelled in > CV), while charge state is a call made by the algorithm anyway, and can > therefore also be included easily. So this essentially fully backs up > Andy Jones' suggested tag format on the issue 28 page. And Andy has > included some other information, such as 'subsequence' and 'theoretical > mass' which people are free to dicuss the usefulness of (as it probably > constitutes redundant information). > > So my conclusion is: it's relatively easy to do, will capture vital > information about the identification and how it was established, and > conserves irreplacable data. > So consider any weight I might have to be formally thrown behind > including this in version 1.0! > > Let the argument (re-)commence! > > > Cheers, > > lnnrt. > > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Matt C. <mat...@va...> - 2008-07-03 13:21:42
|
Hi Lennart, If you think about storing fragments in as verbose a format as Andy Jones's suggestion, for every result in every spectrum (keeping in mind that more than the top result/s is/are written out per spectrum), it would represent an intolerable (to me) bloat to the file. As I understand it, we want to mention the algorithm, its parameters, and its version in the CV. We would then recommend to search engine developers that when they implement support for analysisXML they provide an online script for generating fragments based on the controlled parameters and the algorithm version. I do not think this reconstruction of the fragment information is "next to impossible" as long as such a script is provided. Alternatively, I think we could come up with a much briefer format to store the fragments in, something like: <FragmentIonMatches>b2 y2 y6-NH3 y6-NH3(+2)</FragmentIonMatches> It's ugly as sin, but we can come up with a controlled pattern to store the ion types in. The numbers are mostly redundant: the expected m/z values can be recalculated from the ion type and the sequence, and the observed m/z values can be looked up in the spectrum according to some rules regarding mass/m/z tolerances and whatever data processing was applied to the original spectrum by the search engine (which again, is a good reason to have search engines write out the results of their preprocessing to an mzML file). -Matt Lennart Martens wrote: > Dear PSI-PI'ers, > > > I recently came across a discussion related to the inclusion of fragment > ions (as called by the search engine during identification) in the > analysisXML format (see issue 28 on the Google tracker, direct link: > http://code.google.com/p/psi-pi/issues/detail?id=28). > > It somehow seems that popular opinion is against inclusion of this vital > piece of information, and that makes me very worried. One of the > comments on the issue page in fact is that fragment ion calling is > algorithm specific (which is true), and therefore should not be a part > of analysisXML. > I'd actually like to use this same datum to strongly argue the other > way: since the calling is algorithm specific, it is next to impossible > to reconstruct the original calling after analysisXML export. So > essentially, a vital piece of information (the ability of the spectrum > to support the peptide identification as judged by the algorithm) is > thrown away during analysisXML conversion or output. > > I also believe that the difficulty in annotating which fragments are > called from the spectrum is definitely not insurmountable. The link with > mzML should be there anyway (otherwise you would not even be able to > retrieve the spectrum the identification was made from, an unthinkable > scenario), so inclusion of this is trivial (as in: already there). > Additionally, the unambiguous reference to the exact peak called in the > spectrum is also trivial: simply copy in the actual mass - or more > likely: m/z - in the analysisXML tag. Ion type should be easy enough to > annotate (there are only so many ion types, and these can be modelled in > CV), while charge state is a call made by the algorithm anyway, and can > therefore also be included easily. So this essentially fully backs up > Andy Jones' suggested tag format on the issue 28 page. And Andy has > included some other information, such as 'subsequence' and 'theoretical > mass' which people are free to dicuss the usefulness of (as it probably > constitutes redundant information). > > So my conclusion is: it's relatively easy to do, will capture vital > information about the identification and how it was established, and > conserves irreplacable data. > So consider any weight I might have to be formally thrown behind > including this in version 1.0! > > Let the argument (re-)commence! > > > Cheers, > > lnnrt. > |
From: Lennart M. <len...@eb...> - 2008-07-03 12:57:11
|
Dear PSI-PI'ers, I recently came across a discussion related to the inclusion of fragment ions (as called by the search engine during identification) in the analysisXML format (see issue 28 on the Google tracker, direct link: http://code.google.com/p/psi-pi/issues/detail?id=28). It somehow seems that popular opinion is against inclusion of this vital piece of information, and that makes me very worried. One of the comments on the issue page in fact is that fragment ion calling is algorithm specific (which is true), and therefore should not be a part of analysisXML. I'd actually like to use this same datum to strongly argue the other way: since the calling is algorithm specific, it is next to impossible to reconstruct the original calling after analysisXML export. So essentially, a vital piece of information (the ability of the spectrum to support the peptide identification as judged by the algorithm) is thrown away during analysisXML conversion or output. I also believe that the difficulty in annotating which fragments are called from the spectrum is definitely not insurmountable. The link with mzML should be there anyway (otherwise you would not even be able to retrieve the spectrum the identification was made from, an unthinkable scenario), so inclusion of this is trivial (as in: already there). Additionally, the unambiguous reference to the exact peak called in the spectrum is also trivial: simply copy in the actual mass - or more likely: m/z - in the analysisXML tag. Ion type should be easy enough to annotate (there are only so many ion types, and these can be modelled in CV), while charge state is a call made by the algorithm anyway, and can therefore also be included easily. So this essentially fully backs up Andy Jones' suggested tag format on the issue 28 page. And Andy has included some other information, such as 'subsequence' and 'theoretical mass' which people are free to dicuss the usefulness of (as it probably constitutes redundant information). So my conclusion is: it's relatively easy to do, will capture vital information about the identification and how it was established, and conserves irreplacable data. So consider any weight I might have to be formally thrown behind including this in version 1.0! Let the argument (re-)commence! Cheers, lnnrt. |
From: Martin E. <mar...@ru...> - 2008-07-03 09:16:43
|
Forget my following comments: > In Inputs we have SearchDatabase. Then follows DatabaseName. > We could add DatabaseProperties... > That would be quite useful to describe the type of decoy DB. That is discussed in issue 31, I would be happy with Phils suggestion. Bye Martin |
From: Martin E. <mar...@ru...> - 2008-07-03 09:12:36
|
Hi! > > I am in the process of trying to put together an example instance > > document for OMSSA and have a few questions. To make things more > > complicated I have gone for an example where I run the search on a > > concatenated forward/reverse database. Great! We need real use cases / example docs, otherwise our discussions are quite academic. > > At the moment I have all the > > results in the analysisXML file i.e. in the ConceptualMoleculeCollection > > I am listing all proteins and peptides identified including the reverse > > sequences. I am unsure if (a) I am supposed to be listing all results > > and (b) if all results are supposed to be listed how I mark the reverse > > ones as decoy or does it not matter? > In some ways it doesn't matter, because they are just lists of > proteins/peptides. I agree, what you list is your decision, but it would be helpful to report this decision. So e.g. a CVparam, that it was a reverse search; and a FDR threshold, if you list only the forward proteins below this threshold. > However, you might like to look at Martin's example which contains Originally I shouted out for an own Analysis type "Quality Assurance" but I was convinced that it is not necessary. In our (MPC) use case I decided to list ALL identified proteins, the forward and decoy; to mark the decoy, I reported a "decoy pattern" CVParam. I have no FDR threshold, or I could have set it to "1.0". The decoy pattern belongs to the ProteinDetermination, because in doing a SpectrumIdentification, it has no meaning. > > I am also listing all results (forward and reverse) in DataCollection. > I'd recommend two sets of results: > <SpectrumIdentificationList id="OMSSA_forward"> ... > <SpectrumIdentificationList id="OMSSA_reverse"> But you cannot specify two result sets of ONE SpectrumIdentification. So with this suggestion you would have to have one SpectrumIdentification for the forward and one for reverse. I used one and reported a decoy pattern. > AnalysisXML is (currently) expected to > report for just one 'cutoff' - > i.e. a consumer of the analysisXML > document couldn't recalculate the > value. Yes, we agreed to have another AnalysisXML for another cut-off. I should put that into the wiki page ;-) > > N-terminal peptide what would pre be? pre="" or pre="-"? > We just need to decide and document - > maybe at the conference call later > today. New issue 34; in SEQUEST it is "-"; Oh, I see, that David finished this issue just-in-time because we decided on that in TeleCon 26th June. ;-) It is in the wiki now... > > Finally the database searched was a > custom database, is there anywhere > > to report how a database was > generated? > Possibly outside the scope of > AnalysisXML. In Inputs we have SearchDatabase. Then follows DatabaseName. We could add DatabaseProperties... That would be quite useful to describe the type of decoy DB. Bye Martin |
From: David C. <dc...@ma...> - 2008-07-02 18:07:39
|
Hi everyone, There will be an AnalysisXML working group conference call tomorrow (Thursday) at: http://www.timeanddate.com/worldclock/fixedtime.html?day=3&month=7&year=2008&hour=16&min=0&sec=0&p1=136 Minutes from the last meeting: http://psidev.info/index.php?q=node/351 (Which seem to have restricted access - can you fix this Phil?) Latest instance documents for those that agreed to review them at: http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/MPC_use_case_working27June.axml Agenda: 1. Decoy database search - recent discussions on the list. 2. Feedback from review of example instance documents Dial in details: + Germany: 08001012079 + Switzerland: 0800000860 + UK: 08081095644 + USA: 1-866-314-3683 + Generic international: +44 2083222500 (UK number) access code: 297427 -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |
From: Martin E. <mar...@ru...> - 2008-07-02 14:12:36
|
I added an issue called "minor schema changes we agreed to" and added the three minor changes discussed below... (Hope, that all agree. ;-) ) Bye Martin > Maybe you could this to > http://code.google.com/p/psi-pi/issues/d etail?id=27 > Or make a separate issue. > > > > >> Question: > >> In ProteinDetermination, there is: > >> > <SpectrumIdentificationList_ref > >> identifier="SIL_1"/> > >> The SpectrumIdentificationList_ref > >> isn't an attribute of > >> ProteinDetermination because there > >> could be more than one > >> SpectrumIdentificationList? > >> Assuming this is the case, then the > >> identifier (SIL_1 in my example) is > >> the reference? So somewhere else > there > >> will be: > >> <SpectrumIdentificationList > >> identifier="SIL_1" ... > >> > >> or have I misunderstood? > > No, that is completely what was > > intended. > > That was, because a protein > detection > > may use the peptide pool of more > > than one SpectrumIdentification, > like: > > > > <ProteinDetermination > > > identifier="ProteinExtractor_analysis" > > > ProteinDeterminationProtocol_ref="Prot > ei > > nExtractor_proto" > > > ProteinDetectionResultList_ref="Protei > nE > > xtractor_results"> > > <SpectrumIdentificationList_ref > > identifier="SEQUEST_results"/> > > <SpectrumIdentificationList_ref > > identifier="Mascot_results"/> > > </ProteinDetermination> > > > > We could discuss to change it to > > <SpectrumIdentificationList_ref > > ref="SIL1""/> > Yes, I think that would be clearer. |
From: David C. <dc...@ma...> - 2008-07-02 12:24:19
|
Hi Martin, Martin Eisenacher wrote: > Hi, > >> (and agree that Sequence_ref should be >> mandatory) Probably anything like this should go in the issues list. > Agree, too. Shall we add that as a > comment to remind us? > At the moment, the psidev.info website > seems to be down, so I > don't know, whether you agreed to use > the google doc to collect such infos; > otherwise I would have put that there... Actually, Julian suggested we just use a Wiki page, so I started one here http://code.google.com/p/psi-pi/wiki/NotesForFocumentation > >> 1 minor issue: >> <ProteinDetermination identifier="xxx" >> ProteinDetectionResultList_ref="result >> s_from_Mascot" > >> The name of the _ref attribute should >> now be: ProteinDetectionList_ref > Agree, too. Is there a schema-TODO or > will you, Andy, change that > by reading this mail? ;-) Maybe you could this to http://code.google.com/p/psi-pi/issues/detail?id=27 Or make a separate issue. > >> Question: >> In ProteinDetermination, there is: >> <SpectrumIdentificationList_ref >> identifier="SIL_1"/> >> The SpectrumIdentificationList_ref >> isn't an attribute of >> ProteinDetermination because there >> could be more than one >> SpectrumIdentificationList? >> Assuming this is the case, then the >> identifier (SIL_1 in my example) is >> the reference? So somewhere else there >> will be: >> <SpectrumIdentificationList >> identifier="SIL_1" ... >> >> or have I misunderstood? > No, that is completely what was > intended. > That was, because a protein detection > may use the peptide pool of more > than one SpectrumIdentification, like: > > <ProteinDetermination > identifier="ProteinExtractor_analysis" > ProteinDeterminationProtocol_ref="Protei > nExtractor_proto" > ProteinDetectionResultList_ref="ProteinE > xtractor_results"> > <SpectrumIdentificationList_ref > identifier="SEQUEST_results"/> > <SpectrumIdentificationList_ref > identifier="Mascot_results"/> > </ProteinDetermination> > > We could discuss to change it to > <SpectrumIdentificationList_ref > ref="SIL1""/> Yes, I think that would be clearer. > >> And, (losing my memory again), I've >> forgotten what SpectraData is >> supposed to be for. > You mean, in the Input element? It is a > description (or link) > to the spectra data set, i.e. normally a > mzML file > or a mgf. It is referenced from the > SpectrumIdentification element and the > SpectrumResult elements. I have added to the wiki David > > Bye > Martin > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB Tel +44 (0)20 7486 1050 Fax +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com |
From: Martin E. <mar...@ru...> - 2008-07-02 12:00:56
|
Hi! > > <!-- Then a new element (subclass of PeptideEvidence) under ProteinDetectionHypothesis > > <ProteinAmbiguityGroup identifier="hit_1" > > > <ProteinDetectionHypothesis identifier="prot_1" Sequence_ref = "EST_1"> > > <TranslatedPeptideEvidence start="160" end="171" > > SpectrumIdentificationItem_ref="1_1" post="K" pre="I" frame = "3" > > TranslationTable_ref="Table_1" > > > > Does this make sense? > Yes. But it makes it harder for a parser to have to look for TranslatedPeptideEvidence and > PeptideEvidence? > Why not just add optional attributes to PeptideEvidence? With optional attributes it would be possible to code peptide results containing reference to a nucleotide sequence without frame and translation table attribute (frame can be eventually reconstructed). With mandatory attributes (can be schema-coded if we have TranslatedPeptideEvidence!) this can be avoided. So I vote for Andy's proposal. Bye Martin |
From: Pierre-Alain B. <pie...@is...> - 2008-07-02 09:20:27
|
Thanks David. a couple of questions, just to make sure: 1) in case of top-down approach, do we have to duplicate sequenceCollection information? as SpectrumIdentificationResult contains a PeptideEvidence refering to a Peptide element (and not to a DBSequence), identification is obligatory a Peptide? 2) and what about spectral library searches, do we have to have Peptide elements with possibly undefined explicit sequences to refer to from the SpectrumIdentificationResult (because non peptidic, or because not identified but good spectrum) 3) in the Peptide element, the Modifications are defined in a much more detailed manner than in ModificationParams (PSI-MOD is there for instance). Does this simply mean that The ModificationParams codes the search engine settings and the Peptide includes the formal PSI definition of the Mod? And the only reference is the ModName value? 4) all mass values (sequenceMass, calculatedMassToCharge, experimentalMassToCharge, are not specified whether monoisotopic or averaged. Do we assume that averaged does not exist anymore? 5) is sequenceMass the mass value with/without the mods? If with, the name might be missleading (peptideMass would be more appropriate) 6) in case the DBSequence is nucleotide, is there a tag for saying this? (NB: MS on nucleotide molecules can be performed and analysed, not only MS on AA sequences that are interpreting nucleotide sequences). Or do we neglect MS experiments done on nucleotide molecules (and by the way on glycans...) and only represent the DBSequences as AA sequences (frame translations)? (and what about glycans?) Probaly can be solved if one can replace SequenceCollection by something else if needed (SmallMoleculeCollection, GlycanCollection, MoleculeCollection)... but the validator might not like this. 7) in case that DBSequence is nucleotide, do we represent the Peptide as AA sequence in case of MS done on proteins? That's all for the sequence representation so far Cheers, Pierre-Alain David Creasy wrote: > Thanks Andy, > > I've added an updated example document to SVN: > http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml > > Problem is that we have now removed the main point of these recent > changes which was to add the decoy flag... I think that we need to add > isDecoy to SpectrumIdentificationItem. > > And yes, I suspect that we should go back to using the > ConceptualMoleculeCollection > Um, and since we've not actually ended up adding anything to > DBSequence... we haven't actually achieved anything? > I think we need to discuss this again at the next telecon. > > David > > Jones, Andy wrote: >> >> Hi all, >> >> >> >> I’ve updated the schema in SVN with the following main changes: >> >> >> >> - PeptideEvidence is now part of SpectrumIdentificationItem >> as discussed on the call (simple mappings to proteins are done at >> this level) >> >> - Added DBSequence that should be used instead of Sequence >> (following some of the discussion below) >> >> - Created a new collection class SequenceCollection (rather >> than ConceptualMoleculeCollection) so that only references can be >> given to DBSequence and Peptide >> >> o In fact, I’m not sure if this is sensible since it prevents other >> types of ConceptualMolecule being added later... to discuss >> >> - In FuGE on cvParam, the value attribute is no longer mandatory >> >> >> >> I’ve added a simple example that validates under >> examples\schema_usecase_examples\working27June >> >> >> >> Feel free to mail me any changes to make on Monday, >> >> Cheers >> >> Andy >> >> >> >> >> >> >> >> *From:* psi...@li... >> [mailto:psi...@li...] *On Behalf Of >> *Jones, Andy >> *Sent:* 27 June 2008 16:24 >> *To:* Angel Pizarro >> *Cc:* psi...@li... >> *Subject:* Re: [Psidev-pi-dev] FW: Representing Sequences >> >> >> >> I think Angel’s response below might not have made it round the list yet. >> >> >> >> I tend to agree that isDecoy is redundant information and perhaps >> this is not the best place to encode semantic information. An >> alternative would be to have a parameter, say on >> SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. >> This would be a more compact representation and we would not have to >> add what is quite a specific attribute type (isDecoy) to Sequence. >> >> >> >> >> >> >> >> *From:* an...@it... [mailto:an...@it...] *On >> Behalf Of *Angel Pizarro >> *Sent:* 27 June 2008 15:59 >> *To:* Jones, Andy >> *Cc:* psi...@li... >> *Subject:* Re: [Psidev-pi-dev] FW: Representing Sequences >> >> >> >> my 2¢ : >> You need to be able to extend this to all molecule types, or am I >> missing the point of this thread, and you mean that this would be a >> suclass of the conceptual molecule element? >> >> Second, and this is is tangentially related, but are decoy sequences >> really a problem we should be putting our effort into? Is it in our >> domain to encode semantic information about a sequence, and possibly >> relating reported sequences as part of our schema? >> On a personal level I could care less if "isDecoy" is an attribute or >> not, but the temptation then would be for folks to encode the same >> accession for two different sequences, effectively making the primary >> key of the sequence object (accession, isDecoy) >> >> Do we want to go there? >> >> On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy >> <And...@li... <mailto:And...@li...>> >> wrote: >> >> So how about include length as an attribute and then let all other >> things go in the CV (pI, mass, etc.)? >> >> >> >> >> >> >> >> *From:* Jones, Andy >> *Sent:* 27 June 2008 14:54 >> *To:* 'David Creasy' >> *Subject:* RE: [Psidev-pi-dev] Representing Sequences >> >> >> >> id and name are standard for all elements that inherit from FuGE >> identifiable – this is perhaps a separate discussion as to whether >> the optional name attribute should be there. >> >> >> >> I agree that length may be useful – is this just an integer value >> with no unit? >> >> Yes, I think so. >> >> I'm less sure about pI and mass since mass at least can be calculated >> very simply >> >> Only if you have the sequence... (we have residue masses in the file). >> >> >> >> >> >> , and pI values (in my opinion) are pretty inaccurate and fairly >> meaningless >> >> Scandalous! (I happen to agree, but now some people will never speak >> to either of us ever again). >> >> The main problem with mass and pI is that these are 'irrelevant' if >> the sequence is nuleic acid rather than residues. >> Why not just allow CV there? We can share the same CV as the PEFF >> format, which includes, taxonomy, sequence type, gene ID, and lots of >> wonderful other things? >> >> – unless someone can convince me otherwise? >> >> Cheers >> >> Andy >> >> >> >> >> >> *From:* David Creasy [mailto:dc...@ma...] >> *Sent:* 27 June 2008 14:51 >> *To:* Jones, Andy >> *Cc:* psi...@li... >> <mailto:psi...@li...> >> *Subject:* Re: [Psidev-pi-dev] Representing Sequences >> >> >> >> Hi Andy, >> >> length may be useful, because some people won't want to output the >> actual sequence for space reasons. The other things we wanted to add >> before were pI and mass. >> Why do we want name? Is this for, say, a description line? >> (Also, identifier -> id?) >> >> David >> >> Jones, Andy wrote: >> >> Hi all, >> >> >> >> It was decided on the call that we would like to flag that Sequences >> in the ConceptualMoleculeCollection should have a Boolean attribute >> to capture if they are decoy sequences. At the moment we are using >> the FuGE:Sequence element. I don't really want to add another >> attribute to this (it's less problematic cutting down FuGE than >> adding new things), so I'm wondering if we should define our own >> Sequence type in AnalysisXML. This would also allow us to choose >> exactly the relevant attributes. At the moment, Sequence can have all >> of the following: >> >> >> >> <pf:Sequence isCircular="true" >> sequence="String" length="0" isApproximateLength="true" >> SequenceAnnotationSet_ref="String" start="0" end="0" >> identifier="String" name="String"> >> >> >> >> Several of these attributes were created to represent concepts that >> probably will never be required or implemented in AnalysisXML. How >> about the following: >> >> >> >> <DBSequence identifier = "" name = "" isDecoy = "true"> >> >> <seq>MCTMG...</seq> >> >> <pf:DatabaseReference Database_ref="" >> accession="Rev_IPI00013808.1"/> >> >> </DBSequence> >> >> >> >> Are any of the other attributes on Sequence actually required? I'll >> post a new version of the schema with other changes WRT to >> PeptideEvidence shortly, >> >> Cheers >> >> Andy >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> >> >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... <mailto:Psi...@li...> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> >> >> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... <mailto:dc...@ma...> >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> >> >> >> >> ------------------------------------------------------------------------ >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... <mailto:Psi...@li...> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> >> >> >> -- >> David Creasy >> Matrix Science >> 64 Baker Street >> London W1U 7GB, UK >> Tel: +44 (0)20 7486 1050 >> Fax: +44 (0)20 7224 1344 >> >> dc...@ma... <mailto:dc...@ma...> >> http://www.matrixscience.com >> >> Matrix Science Ltd. is registered in England and Wales >> Company number 3533898 >> >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> <mailto:Psi...@li...> >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> >> >> >> >> -- >> Angel Pizarro >> Director, ITMAT Bioinformatics Facility >> 806 Biological Research Building >> 421 Curie Blvd. >> Philadelphia, PA 19104-6160 >> 215-573-3736 >> >> ------------------------------------------------------------------------ >> >> ------------------------------------------------------------------------- >> Check out the new SourceForge.net Marketplace. >> It's the best place to buy or sell services for >> just about anything Open Source. >> http://sourceforge.net/services/buy/index.php >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Psidev-pi-dev mailing list >> Psi...@li... >> https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev >> > > -- > David Creasy > Matrix Science > 64 Baker Street > London W1U 7GB, UK > Tel: +44 (0)20 7486 1050 > Fax: +44 (0)20 7224 1344 > > dc...@ma... > http://www.matrixscience.com > > Matrix Science Ltd. is registered in England and Wales > Company number 3533898 > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > Check out the new SourceForge.net Marketplace. > It's the best place to buy or sell services for > just about anything Open Source. > http://sourceforge.net/services/buy/index.php > ------------------------------------------------------------------------ > > _______________________________________________ > Psidev-pi-dev mailing list > Psi...@li... > https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev > |
From: David C. <dc...@ma...> - 2008-06-29 21:04:23
|
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta content="text/html;charset=windows-1252" http-equiv="Content-Type"> </head> <body bgcolor="#ffffff" text="#000000"> Thanks Andy,<br> <br> I've added an updated example document to SVN:<br> <a class="moz-txt-link-freetext" href="http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml">http://code.google.com/p/psi-pi/source/browse/trunk/examples/schema_usecase_examples/working27June/F001350.xml</a><br> <br> Problem is that we have now removed the main point of these recent changes which was to add the decoy flag... I think that we need to add isDecoy to SpectrumIdentificationItem.<br> <br> And yes, I suspect that we should go back to using the ConceptualMoleculeCollection<br> Um, and since we've not actually ended up adding anything to DBSequence... we haven't actually achieved anything?<br> I think we need to discuss this again at the next telecon.<br> <br> David<br> <br> Jones, Andy wrote: <blockquote cite="mid:08D...@EV..." type="cite"> <meta http-equiv="Content-Type" content="text/html; "> <meta name="Generator" content="Microsoft Word 12 (filtered medium)"> <!--[if !mso]> <style> v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} </style> <![endif]--> <style> <!-- /* Font Definitions */ @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0;} @font-face {font-family:Wingdings; panose-1:5 0 0 0 0 0 0 0 0 0;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} @font-face {font-family:Tahoma; panose-1:2 11 6 4 3 5 4 4 2 4;} @font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;} /* Style Definitions */ p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0cm; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman","serif";} a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;} p {mso-style-priority:99; mso-margin-top-alt:auto; margin-right:0cm; mso-margin-bottom-alt:auto; margin-left:0cm; font-size:12.0pt; font-family:"Times New Roman","serif";} pre {mso-style-priority:99; mso-style-link:"HTML Preformatted Char"; margin:0cm; margin-bottom:.0001pt; font-size:10.0pt; font-family:"Courier New";} p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph {mso-style-priority:34; margin-top:0cm; margin-right:0cm; margin-bottom:0cm; margin-left:36.0pt; margin-bottom:.0001pt; font-size:12.0pt; font-family:"Times New Roman","serif";} span.HTMLPreformattedChar {mso-style-name:"HTML Preformatted Char"; mso-style-priority:99; mso-style-link:"HTML Preformatted"; font-family:Consolas;} span.EmailStyle20 {mso-style-type:personal; font-family:"Calibri","sans-serif"; color:#1F497D;} span.EmailStyle21 {mso-style-type:personal-reply; font-family:"Calibri","sans-serif"; color:#1F497D;} .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;} @page Section1 {size:612.0pt 792.0pt; margin:72.0pt 72.0pt 72.0pt 72.0pt;} div.Section1 {page:Section1;} /* List Definitions */ @list l0 {mso-list-id:723259958; mso-list-type:hybrid; mso-list-template-ids:100932132 2015425440 134807555 134807557 134807553 134807555 134807557 134807553 134807555 134807557;} @list l0:level1 {mso-level-start-at:0; mso-level-number-format:bullet; mso-level-text:-; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:"Calibri","sans-serif"; mso-fareast-font-family:Calibri; mso-bidi-font-family:"Times New Roman";} @list l0:level2 {mso-level-number-format:bullet; mso-level-text:o; mso-level-tab-stop:none; mso-level-number-position:left; text-indent:-18.0pt; font-family:"Courier New";} ol {margin-bottom:0cm;} ul {margin-bottom:0cm;} --> </style><!--[if gte mso 9]><xml> <o:shapedefaults v:ext="edit" spidmax="1026" /> </xml><![endif]--><!--[if gte mso 9]><xml> <o:shapelayout v:ext="edit"> <o:idmap v:ext="edit" data="1" /> </o:shapelayout></xml><![endif]--> <div class="Section1"> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Hi all,<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">I’ve updated the schema in SVN with the following main changes:<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoListParagraph" style="text-indent: -18pt;"><!--[if !supportLists]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span></span><!--[endif]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level)<o:p></o:p></span></p> <p class="MsoListParagraph" style="text-indent: -18pt;"><!--[if !supportLists]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span></span><!--[endif]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Added DBSequence that should be used instead of Sequence (following some of the discussion below)<o:p></o:p></span></p> <p class="MsoListParagraph" style="text-indent: -18pt;"><!--[if !supportLists]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span></span><!--[endif]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide<o:p></o:p></span></p> <p class="MsoListParagraph" style="margin-left: 72pt; text-indent: -18pt;"><!--[if !supportLists]--><span style="font-size: 11pt; font-family: "Courier New"; color: rgb(31, 73, 125);"><span style="">o<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span></span><!--[endif]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">In fact, I’m not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss<o:p></o:p></span></p> <p class="MsoListParagraph" style="text-indent: -18pt;"><!--[if !supportLists]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><span style="">-<span style="font-family: "Times New Roman"; font-style: normal; font-variant: normal; font-weight: normal; font-size: 7pt; line-height: normal; font-size-adjust: none; font-stretch: normal;"> </span></span></span><!--[endif]--><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">In FuGE on cvParam, the value attribute is no longer mandatory<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">I’ve added a simple example that validates under examples\schema_usecase_examples\working27June<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Feel free to mail me any changes to make on Monday,<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Cheers<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Andy<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p class="MsoNormal"><b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif";" lang="EN-US">From:</span></b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif";" lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a> [<a class="moz-txt-link-freetext" href="mailto:psi...@li...">mailto:psi...@li...</a>] <b>On Behalf Of </b>Jones, Andy<br> <b>Sent:</b> 27 June 2008 16:24<br> <b>To:</b> Angel Pizarro<br> <b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a><br> <b>Subject:</b> Re: [Psidev-pi-dev] FW: Representing Sequences<o:p></o:p></span></p> </div> </div> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">I think Angel’s response below might not have made it round the list yet.<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence.<o:p></o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <p class="MsoNormal"><span style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p class="MsoNormal"><b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif";" lang="EN-US">From:</span></b><span style="font-size: 10pt; font-family: "Tahoma","sans-serif";" lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:an...@it...">an...@it...</a> [<a class="moz-txt-link-freetext" href="mailto:an...@it...">mailto:an...@it...</a>] <b>On Behalf Of </b>Angel Pizarro<br> <b>Sent:</b> 27 June 2008 15:59<br> <b>To:</b> Jones, Andy<br> <b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:psi...@li...">psi...@li...</a><br> <b>Subject:</b> Re: [Psidev-pi-dev] FW: Representing Sequences<o:p></o:p></span></p> </div> </div> <p class="MsoNormal"><o:p> </o:p></p> <p class="MsoNormal" style="margin-bottom: 12pt;">my 2¢ : <br> You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? <br> <br> Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? <br> On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) <br> <br> Do we want to go there? <o:p></o:p></p> <div> <p class="MsoNormal">On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <<a moz-do-not-send="true" href="mailto:And...@li..." target="_blank">And...@li...</a>> wrote:<o:p></o:p></p> <div> <div> <p><span style="color: rgb(31, 73, 125);">So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)?</span><o:p></o:p></p> <div> <div> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p><b><span style="font-size: 10pt;" lang="EN-US">From:</span></b><span style="font-size: 10pt;" lang="EN-US"> Jones, Andy <br> <b>Sent:</b> 27 June 2008 14:54<br> <b>To:</b> 'David Creasy'<br> <b>Subject:</b> RE: [Psidev-pi-dev] Representing Sequences</span><o:p></o:p></p> </div> </div> <p> <o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);">id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there.</span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);">I agree that length may be useful – is this just an integer value with no unit? </span><o:p></o:p></p> </div> <p style="margin-bottom: 12pt;">Yes, I think so.<o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p><span style="color: rgb(31, 73, 125);">I'm less sure about pI and mass since mass at least can be calculated very simply</span><o:p></o:p></p> </div> <p>Only if you have the sequence... (we have residue masses in the file).<o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <p> <o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p><span style="color: rgb(31, 73, 125);">, and pI values (in my opinion) are pretty inaccurate and fairly meaningless </span><o:p></o:p></p> </div> <p style="margin-bottom: 12pt;">Scandalous! (I happen to agree, but now some people will never speak to either of us ever again).<br> <br> The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues.<br> Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things?<br> <br> <o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <p><span style="color: rgb(31, 73, 125);">– unless someone can convince me otherwise?</span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);">Cheers</span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);">Andy</span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <p><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p> <div style="border-style: none none none solid; border-color: -moz-use-text-color -moz-use-text-color -moz-use-text-color blue; border-width: medium medium medium 1.5pt; padding: 0cm 0cm 0cm 4pt;"> <div> <div style="border-style: solid none none; border-color: -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0cm 0cm;"> <p><b><span style="font-size: 10pt;" lang="EN-US">From:</span></b><span style="font-size: 10pt;" lang="EN-US"> David Creasy [<a moz-do-not-send="true" href="mailto:dc...@ma..." target="_blank">mailto:dc...@ma...</a>] <br> <b>Sent:</b> 27 June 2008 14:51<br> <b>To:</b> Jones, Andy<br> <b>Cc:</b> <a moz-do-not-send="true" href="mailto:psi...@li..." target="_blank">psi...@li...</a><br> <b>Subject:</b> Re: [Psidev-pi-dev] Representing Sequences</span><o:p></o:p></p> </div> </div> <p> <o:p></o:p></p> <p>Hi Andy,<br> <br> length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. <br> Why do we want name? Is this for, say, a description line?<br> (Also, identifier -> id?)<br> <br> David<br> <br> Jones, Andy wrote: <o:p></o:p></p> <p>Hi all,<o:p></o:p></p> <p> <o:p></o:p></p> <p>It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following:<o:p></o:p></p> <p> <o:p></o:p></p> <p><span style="background: white none repeat scroll 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> <span style="color: blue;"><</span><span style="color: maroon;">pf:Sequence</span><span style="color: red;"> isCircular</span><span style="color: blue;">="</span>true<span style="color: blue;">"</span><span style="color: red;"> sequence</span><span style="color: blue;">="</span>String<span style="color: blue;">"</span><span style="color: red;"> length</span><span style="color: blue;">="</span>0<span style="color: blue;">"</span><span style="color: red;"> isApproximateLength</span><span style="color: blue;">="</span>true<span style="color: blue;">"</span><span style="color: red;"> SequenceAnnotationSet_ref</span><span style="color: blue;">="</span>String<span style="color: blue;">"</span><span style="color: red;"> start</span><span style="color: blue;">="</span>0<span style="color: blue;">"</span><span style="color: red;"> end</span><span style="color: blue;">="</span>0<span style="color: blue;">"</span><span style="color: red;"> identifier</span><span style="color: blue;">="</span>String<span style="color: blue;">"</span><span style="color: red;"> name</span><span style="color: blue;">="</span>String<span style="color: blue;">"></span></span><o:p></o:p></p> <p><span style="color: blue;"> </span><o:p></o:p></p> <p>Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following:<o:p></o:p></p> <p> <o:p></o:p></p> <p><DBSequence identifier = "" name = "" isDecoy = "true"><o:p></o:p></p> <p> <seq>MCTMG...</seq><o:p></o:p></p> <p> <span style="background: white none repeat scroll 0%; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"><</span><span style="background: white none repeat scroll 0%; color: maroon; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">pf:DatabaseReference</span><span style="background: white none repeat scroll 0%; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> Database_ref</span><span style="background: white none repeat scroll 0%; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">=""</span><span style="background: white none repeat scroll 0%; color: red; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;"> accession</span><span style="background: white none repeat scroll 0%; color: blue; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">="Rev_</span><span style="background: white none repeat scroll 0%; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial;">IPI00013808.1<span style="color: blue;">"/></span></span><o:p></o:p></p> <p><span style="color: blue;"></DBSequence></span><o:p></o:p></p> <p><span style="color: blue;"> </span><o:p></o:p></p> <p>Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly,<o:p></o:p></p> <p>Cheers<o:p></o:p></p> <p>Andy<o:p></o:p></p> <p> <o:p></o:p></p> <p> <o:p></o:p></p> <p> <o:p></o:p></p> <p> <o:p></o:p></p> <p> <o:p></o:p></p> <pre> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>-------------------------------------------------------------------------<o:p></o:p></pre> <pre>Check out the new SourceForge.net Marketplace.<o:p></o:p></pre> <pre>It's the best place to buy or sell services for<o:p></o:p></pre> <pre>just about anything Open Source.<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://sourceforge.net/services/buy/index.php" target="_blank">http://sourceforge.net/services/buy/index.php</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre style="text-align: center;"> <o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>_______________________________________________<o:p></o:p></pre> <pre>Psidev-pi-dev mailing list<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:Psi...@li..." target="_blank">Psi...@li...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev" target="_blank">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <p style="margin-bottom: 12pt;"> <o:p></o:p></p> <pre>-- <o:p></o:p></pre> <pre>David Creasy<o:p></o:p></pre> <pre>Matrix Science<o:p></o:p></pre> <pre>64 Baker Street<o:p></o:p></pre> <pre>London W1U 7GB, UK<o:p></o:p></pre> <pre>Tel: +44 (0)20 7486 1050<o:p></o:p></pre> <pre>Fax: +44 (0)20 7224 1344<o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:dc...@ma..." target="_blank">dc...@ma...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://www.matrixscience.com" target="_blank">http://www.matrixscience.com</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>Matrix Science Ltd. is registered in England and Wales<o:p></o:p></pre> <pre>Company number 3533898<o:p></o:p></pre> </div> </div> <pre> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"><o:p> </o:p></pre> <pre> <o:p></o:p></pre> <pre>-------------------------------------------------------------------------<o:p></o:p></pre> <pre>Check out the new SourceForge.net Marketplace.<o:p></o:p></pre> <pre>It's the best place to buy or sell services for<o:p></o:p></pre> <pre>just about anything Open Source.<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://sourceforge.net/services/buy/index.php" target="_blank">http://sourceforge.net/services/buy/index.php</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"> <hr align="center" size="4" width="90%"> </pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="text-align: center;"><o:p> </o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"> <o:p></o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"> <o:p></o:p></pre> <pre style="margin-bottom: 12pt; text-align: center;"><o:p> </o:p></pre> <pre> <o:p></o:p></pre> <pre>_______________________________________________<o:p></o:p></pre> <pre>Psidev-pi-dev mailing list<o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:Psi...@li..." target="_blank">Psi...@li...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev" target="_blank">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <p style="margin-bottom: 12pt;"><o:p> </o:p></p> <pre>-- <o:p></o:p></pre> <pre>David Creasy<o:p></o:p></pre> <pre>Matrix Science<o:p></o:p></pre> <pre>64 Baker Street<o:p></o:p></pre> <pre>London W1U 7GB, UK<o:p></o:p></pre> <pre>Tel: +44 (0)20 7486 1050<o:p></o:p></pre> <pre>Fax: +44 (0)20 7224 1344<o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre><a moz-do-not-send="true" href="mailto:dc...@ma..." target="_blank">dc...@ma...</a><o:p></o:p></pre> <pre><a moz-do-not-send="true" href="http://www.matrixscience.com" target="_blank">http://www.matrixscience.com</a><o:p></o:p></pre> <pre> <o:p></o:p></pre> <pre>Matrix Science Ltd. is registered in England and Wales<o:p></o:p></pre> <pre>Company number 3533898<o:p></o:p></pre> </div> </div> </div> </div> </div> <p class="MsoNormal" style="margin-bottom: 12pt;"><br> -------------------------------------------------------------------------<br> Check out the new SourceForge.net Marketplace.<br> It's the best place to buy or sell services for<br> just about anything Open Source.<br> <a moz-do-not-send="true" href="http://sourceforge.net/services/buy/index.php" target="_blank">http://sourceforge.net/services/buy/index.php</a><br> _______________________________________________<br> Psidev-pi-dev mailing list<br> <a moz-do-not-send="true" href="mailto:Psi...@li..." target="_blank">Psi...@li...</a><br> <a moz-do-not-send="true" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev" target="_blank">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a><o:p></o:p></p> </div> <p class="MsoNormal"><br> <br clear="all"> <br> -- <br> Angel Pizarro<br> Director, ITMAT Bioinformatics Facility<br> 806 Biological Research Building<br> 421 Curie Blvd.<br> Philadelphia, PA 19104-6160<br> 215-573-3736 <o:p></o:p></p> </div> </div> </div> <pre wrap=""> <hr size="4" width="90%"> ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. <a class="moz-txt-link-freetext" href="http://sourceforge.net/services/buy/index.php">http://sourceforge.net/services/buy/index.php</a></pre> <pre wrap=""> <hr size="4" width="90%"> _______________________________________________ Psidev-pi-dev mailing list <a class="moz-txt-link-abbreviated" href="mailto:Psi...@li...">Psi...@li...</a> <a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev">https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev</a> </pre> </blockquote> <br> <pre class="moz-signature" cols="72">-- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 <a class="moz-txt-link-abbreviated" href="mailto:dc...@ma...">dc...@ma...</a> <a class="moz-txt-link-freetext" href="http://www.matrixscience.com">http://www.matrixscience.com</a> Matrix Science Ltd. is registered in England and Wales Company number 3533898</pre> </body> </html> |
From: Jones, A. <And...@li...> - 2008-06-27 15:58:21
|
Hi all, I’ve updated the schema in SVN with the following main changes: - PeptideEvidence is now part of SpectrumIdentificationItem as discussed on the call (simple mappings to proteins are done at this level) - Added DBSequence that should be used instead of Sequence (following some of the discussion below) - Created a new collection class SequenceCollection (rather than ConceptualMoleculeCollection) so that only references can be given to DBSequence and Peptide o In fact, I’m not sure if this is sensible since it prevents other types of ConceptualMolecule being added later... to discuss - In FuGE on cvParam, the value attribute is no longer mandatory I’ve added a simple example that validates under examples\schema_usecase_examples\working27June Feel free to mail me any changes to make on Monday, Cheers Andy From: psi...@li... [mailto:psi...@li...] On Behalf Of Jones, Andy Sent: 27 June 2008 16:24 To: Angel Pizarro Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences I think Angel’s response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful – is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? – unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |
From: Jones, A. <And...@li...> - 2008-06-27 15:23:40
|
I think Angel’s response below might not have made it round the list yet. I tend to agree that isDecoy is redundant information and perhaps this is not the best place to encode semantic information. An alternative would be to have a parameter, say on SpectrumIdentification for cvParam = “decoy_string” value = “Rev”. This would be a more compact representation and we would not have to add what is quite a specific attribute type (isDecoy) to Sequence. From: an...@it... [mailto:an...@it...] On Behalf Of Angel Pizarro Sent: 27 June 2008 15:59 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] FW: Representing Sequences my 2¢ : You need to be able to extend this to all molecule types, or am I missing the point of this thread, and you mean that this would be a suclass of the conceptual molecule element? Second, and this is is tangentially related, but are decoy sequences really a problem we should be putting our effort into? Is it in our domain to encode semantic information about a sequence, and possibly relating reported sequences as part of our schema? On a personal level I could care less if "isDecoy" is an attribute or not, but the temptation then would be for folks to encode the same accession for two different sequences, effectively making the primary key of the sequence object (accession, isDecoy) Do we want to go there? On Fri, Jun 27, 2008 at 10:21 AM, Jones, Andy <And...@li...> wrote: So how about include length as an attribute and then let all other things go in the CV (pI, mass, etc.)? From: Jones, Andy Sent: 27 June 2008 14:54 To: 'David Creasy' Subject: RE: [Psidev-pi-dev] Representing Sequences id and name are standard for all elements that inherit from FuGE identifiable – this is perhaps a separate discussion as to whether the optional name attribute should be there. I agree that length may be useful – is this just an integer value with no unit? Yes, I think so. I'm less sure about pI and mass since mass at least can be calculated very simply Only if you have the sequence... (we have residue masses in the file). , and pI values (in my opinion) are pretty inaccurate and fairly meaningless Scandalous! (I happen to agree, but now some people will never speak to either of us ever again). The main problem with mass and pI is that these are 'irrelevant' if the sequence is nuleic acid rather than residues. Why not just allow CV there? We can share the same CV as the PEFF format, which includes, taxonomy, sequence type, gene ID, and lots of wonderful other things? – unless someone can convince me otherwise? Cheers Andy From: David Creasy [mailto:dc...@ma...] Sent: 27 June 2008 14:51 To: Jones, Andy Cc: psi...@li... Subject: Re: [Psidev-pi-dev] Representing Sequences Hi Andy, length may be useful, because some people won't want to output the actual sequence for space reasons. The other things we wanted to add before were pI and mass. Why do we want name? Is this for, say, a description line? (Also, identifier -> id?) David Jones, Andy wrote: Hi all, It was decided on the call that we would like to flag that Sequences in the ConceptualMoleculeCollection should have a Boolean attribute to capture if they are decoy sequences. At the moment we are using the FuGE:Sequence element. I don't really want to add another attribute to this (it's less problematic cutting down FuGE than adding new things), so I'm wondering if we should define our own Sequence type in AnalysisXML. This would also allow us to choose exactly the relevant attributes. At the moment, Sequence can have all of the following: <pf:Sequence isCircular="true" sequence="String" length="0" isApproximateLength="true" SequenceAnnotationSet_ref="String" start="0" end="0" identifier="String" name="String"> Several of these attributes were created to represent concepts that probably will never be required or implemented in AnalysisXML. How about the following: <DBSequence identifier = "" name = "" isDecoy = "true"> <seq>MCTMG...</seq> <pf:DatabaseReference Database_ref="" accession="Rev_IPI00013808.1"/> </DBSequence> Are any of the other attributes on Sequence actually required? I'll post a new version of the schema with other changes WRT to PeptideEvidence shortly, Cheers Andy ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ________________________________ ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php ________________________________ _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 ------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ Psidev-pi-dev mailing list Psi...@li... https://lists.sourceforge.net/lists/listinfo/psidev-pi-dev -- Angel Pizarro Director, ITMAT Bioinformatics Facility 806 Biological Research Building 421 Curie Blvd. Philadelphia, PA 19104-6160 215-573-3736 |