From: David C. <dc...@ma...> - 2015-05-22 16:07:40
|
Hi Everyone, Sorry I couldn't make the call, and good work for moving this forward. Yes, protein databases from NCBI have multiple title lines for identical sequences. Concerning the idea of just repeating the sequences, some numbers from the current complete nr file to be aware of: File size : 40,231,392,221 Number of sequences : 66,926,000 Number of title lines : 170,713,645 Average sequence length: 360 residues So, duplicating the sequences would change the file size by roughly (170,713,645 - 66,926,000) * 360 = 37,363,552,200 So, nearly doubling the size of the 38Gb file. Obviously this will almost certainly make search engines, blast etc. slower and may have an adverse effect on the stats for some tools. Hope this helps somewhat... David On 21/05/2015 19:26, Eric Deutsch wrote: > > Hi everyone, here are my notes from the PEFF call today. Please let me > know if I captured anything incorrectly. > > Note that we agreed to continue to the second part of the posted > agenda, which we did not finish today, next Thursday at the same timeslot. > > Present: Eric, Pierre-Alain, Jim, Robert, Xiaojing, Lydie, Harald, > Gerben, Yasset, Karl, Juan Antonio > > Did I miss anyone? > > Agenda: > > - Review what remains to be done > > + We need to finish spec doc, the extant single hand-crafted example > (~dozen entries), and the OBO file > > + neXtProt has been generating PEFF files regularly (not quite > compliant the spec as previously written, but are now actively fixing > the neXtProt exporter so this discussion is timely) > > + Writer from Alain Gateau? This is the neXtProt exporter? > > + Reader from Harald > > + Settle the open issues below and then update all these elements > > + Full manuscript? > > + General tool to work with PEFF files? Add/subtract elements, adjust > annotations? Would be great but not required. > > - Discuss (CTRL+A) header delimiter > > + Not new perhaps, this already exists in FASTA! > > + Or maybe not in the FASTA specification is there is one, but seen in > the wild > > + Original NCBI nr already has these (CTRL+A) > > + This PEFF feature was mainly motivated by the fact that this feature > is already existing out there in the field, by NCBI nr at least > > + For software implementations, there are several things that could > happen when faced with this, which is not good. > > + Some on the call feel this is a problematic feature that should be > avoided > > + General consensus is that this should be an explicitly disallowed > feature. Keep it simple. Repeat the sequence. > > + Could add a header keyvalue that could point to other redundant entries? > > - Discuss degree of sequence variant complexity to support > > + Explicit examples of usage should be written into the specification. > This is not there yet. > > + Discussion at PSI workshop last month moved strongly toward > excluding advanced variations completely > > + Karl proposed having two different tags, one for simple > substitutions, and one for indels and other complex things > > + Gerben in favor of including indels in separate tags > > + Pierre-Alain okay to splitting into two tags as Karl suggested > > + Karl stands by his suggestion. And also suggests deprecating > existing term to avoid confusion > > + Explicitly require that SAAV changes be in the simple SAAV term, and > NOT in the more complex term > > + Xiaojing asks what about nonsense mutation? > > + In the simple SAAV, could allow a change to an asterisk: (223:*) > General agreement on this. > > + (223|225|KPA) goes in the more complex term > > + There were rumblings that regular expressions should not be > supported anywhere? i.e. no (223|[KR]) for simple, no (223|225|[KR]PA) > in complex. Eric is not sure if there was a consensus on this. > > Ran out of time here. Skipped to last agenda item. Will continue here > next week. > > - Discuss degree of PTM complexity to support > > - Support for UniMod vs. PSI-MOD > > - Review examples of PEFF > > - Review PEFF-supporting software > > - Central location for all supporting documentation > > - Time slot for future calls > > + Broad agreement that Thursday at this time is good. We will meet > again at this time next week and continue with the agenda. > > *From:*Eric Deutsch [mailto:ede...@sy... > <mailto:ede...@sy...>] > *Sent:* Wednesday, May 20, 2015 4:18 PM > *To:* Pierre-Alain Binz; jim...@th... > <mailto:jim...@th...>; Patrick Pedrioli; en...@uw... > <mailto:en...@uw...>; cha...@cg... > <mailto:cha...@cg...>; xia...@va... > <mailto:xia...@va...>; Lydie Lane; Matt Chambers; > Har...@bi... <mailto:Har...@bi...>; > David Creasy; Eugene Kapp; ju...@eb... <mailto:ju...@eb...>; > Ger...@gm... <mailto:Ger...@gm...>; > Jones, Andy; Yasset Perez-Riverol > *Cc:* Mass spectrometry standard development; > psi...@li... > <mailto:psi...@li...>; Eric Deutsch > *Subject:* RE: PEFF call Thursday 8am Seattle, 4pm UK time > > Hi everyone, just a reminder about the PEFF call tomorrow, see below > for details. Dial-in information: > > Dial in numbers: > > + Germany: 08001012079 > > + Switzerland: 0800000860 > > + UK: 08081095644 > > + Generic international: +44 2083222500 (UK number) > > + US: 877-420-0272 > > *access code: 297427 #* > > *From:*Eric Deutsch [mailto:ede...@sy... > <mailto:ede...@sy...>] > *Sent:* Monday, May 18, 2015 2:36 PM > *To:* Pierre-Alain Binz; jim...@th... > <mailto:jim...@th...>; Patrick Pedrioli; en...@uw... > <mailto:en...@uw...>; cha...@cg... > <mailto:cha...@cg...>; xia...@va... > <mailto:xia...@va...>; Lydie Lane; Matt Chambers; > Har...@bi... <mailto:Har...@bi...>; > David Creasy; Eugene Kapp; ju...@eb... <mailto:ju...@eb...>; > Ger...@gm... <mailto:Ger...@gm...>; > Jones, Andy; Yasset Perez-Riverol > *Cc:* Mass spectrometry standard development; > psi...@li... > <mailto:psi...@li...>; Eric Deutsch > *Subject:* PEFF call Thursday 8am Seattle, 4pm UK time > > Hi everyone, it appears that we have near consensus for Thursday at > the earlier timeslot 8am PDT, 4pm UK time, so let’s plan on that. > > Agenda: > > - Review what remains to be done > > - Discuss (CTRL+A) header delimiter > > - Discuss degree of sequence variant complexity to support > > - Discuss degree of PTM complexity to support > > - Support for UniMod vs. PSI-MOD > > - Review examples of PEFF > > - Review PEFF-supporting software > > - Central location for all supporting documentation > > - Time slot for future calls > > We may not get through this all, but let’s tackle as much as we can > and continue at the next call. I’m presenting at the top of the next > hour, so I’d like to limit this call to 55 mins. > > I will send out dial-in information and reminder as we get closer. > > Thanks, > > Eric > > *From:*Eric Deutsch [mailto:ede...@sy... > <mailto:ede...@sy...>] > *Sent:* Friday, May 15, 2015 10:05 AM > *To:* Pierre-Alain Binz; jim...@th... > <mailto:jim...@th...>; Patrick Pedrioli; en...@uw... > <mailto:en...@uw...>; cha...@cg... > <mailto:cha...@cg...>; xia...@va... > <mailto:xia...@va...>; Lydie Lane; Matt Chambers; > Har...@bi... <mailto:Har...@bi...>; > David Creasy; Eugene Kapp; Eric Deutsch; ju...@eb... > <mailto:ju...@eb...>; Ger...@gm... > <mailto:Ger...@gm...> > *Cc:* Mass spectrometry standard development; > psi...@li... > <mailto:psi...@li...> > *Subject:* PEFF progress and call next week and beyond > > Hi everyone, we would like to set up some regular calls to carry PEFF > over the finish line. We had a good discussion at the PSI meeting, so > we just need to keep some momentum to get it finished. > > I would like to set up a time next week to have conference call to > discuss some of the open issues. If you are interested in working to > help complete and resubmit PEFF, please complete the Doodle poll below > (even if you cannot attend at the suggested times next week) and I > will keep you in the loop for continuing development on this. I have > sent this to the list of coauthors on the specification and other > vocal participants at the workshop that I recall. Anyone else is > welcome to join the effort even if I forgot to add you to the list above. > > http://doodle.com/but4z7ihv75byyp7 > > Attached are the latest document and the notes I took during the PSI > workshop. > > Please respond to the doodle poll and I will email some specific > agenda point in advance of the meeting. > > We should probably paste some more information here: > > http://www.psidev.info/peff > > Please email me if you have specific agenda items or other questions > or comments. > > thanks, > > Eric > -- David Creasy Matrix Science 64 Baker Street London W1U 7GB, UK Tel: +44 (0)20 7486 1050 Fax: +44 (0)20 7224 1344 dc...@ma... http://www.matrixscience.com Matrix Science Ltd. is registered in England and Wales Company number 3533898 |