From: Kent L. <Ken...@ge...> - 2006-02-03 22:56:13
|
The responses to the PSI-MS requirements survery sent out some time ago = are aggregated below. =20 Regards, Kent Laursen GenoLogics =20 PSI-MS Requirements Questionnaire =20 =20 Requirements =20 At the most technical level of detail the deliverables of PSI-MS will = include a codified rendering of information interchange structure in the = form of UML models and/or XML Schema that can be utilized as a basis for = a variety of computerized implementations. However PSI-MS has the goal = to publish its work at three levels of abstraction in order clearly = represent its value to a broader audience ranging from scientists and = decision makers to technologists. These three levels of abstraction are = the conceptual, the logical and the physical. This questionnaire asks = you to categorize your ideas for requirements with this leveling = applied. When collated your responses will be a valued input in = describing mzData in both specification and codified form. Some = characteristics of these three levels of abstraction are listed below: =20 Conceptual =96 visionary, science driven, computationally independent, = expressing an aim, justifying and or explaining the basis for the effort =20 Logical - expressed clearly in human understandable terms, logically = complete without implementation dependence, providing satisfactory = rendering of what the solution is, delineating the contract(s) that the = solution must fulfill =20 Physical =96 having a direct relationship to a codified or machine = processable form, describing detail sufficient for implementation =20 We utilize the notion of requirements to simply state a set of aims, = qualities, rules, constraints, shapes and functionality to which the = published work of PSI-MS must adhere. In addition PSI-MS will provide = a response or mapping describing how its publications address the = requirements and concerns represented by MCP and MIAPE.=20 =20 =20 =20 Questions =20 1. What is your aim in participating with the PSI-MS working group =20 * Get an in-depth understanding and early information of how the = proposed standards are to be used. * Contribute to the standards by sharing of our expertise in proteomics = experimentation and data analysis. * To contribute with our experience in using the proposed standards in = software developed by us. * We have adopted mzData as our internal mass spec data format and have = developed tools to convert other ms data formats - eg, mzXML - to it. I = participate primarily as an interested consumer of the standards = developed by the working group. * We have interests in ms data that go beyond proteomics. I would like = to add a voice to the discussions that might allow the standard to = encompass a broader discipline, eg glycoproteomics. * The creation of data interchange formats and utilities which enhance = access to raw data for the development and use of informatics tools. My = research depends on high throughput and automated mass spectrometry and = easy access to a full representation of data from instruments is = critical. * To participate in establishing guidelines, defining and evaluating = object models and derived schemes, and elaborating terminological = resources in the context of existing initiatives including PSI * I am responsible for the standardization of the proteomics platform in = our research center. Therefore, I am interested in the advancement of = PSI-MS work concerning mzData and analysisXML. =20 2. What would you like to see PSI-MS accomplish or stated another way = what should be the goals of PSI-MS in your view? * To create data transport standards =96 for publication repositories, = between software and database applications. * Create guidelines on what data should be retrieved and stored for easy = access (call it an object model if you will) in proteomics = experimentation. * Make vendors understand that proprietary data formats are not a key = for success. * Define a standard that can capture ms data acquired by most models of = ms instruments. Define a standard that can capture the semantics of the ms data within = any of numerous discipline contexts. The first standard provides for the raw data, devoid of contextual = semantics. The second adds the contextual semantics * The goal should be to create a community standard which addresses the = needs of as many stakeholders as possible. To me this means recognizing = several key facts: a) instrument vendors own their data formats and an = interchange format is not the same as a operational or native format; b) = there is no single, correct proteomics or mass spectrometry =91work = flow=92 and as many of these uses need to be considered as possible for = an interchange specification to become a standard. * From the data producer=92s point of view, to ensure common practices = by sharing experimental data and results thru publications (i.e. = supplementary materials) and repositories, and finally to be able to = understand unambiguously and to reproduce proteomics experiments. * From the software engineer=92s point of view, to deliver standard = format (and associated handling tools) that encompasses the following = types of data: raw, processed and identification. To make it daily used, = the issued standards might be open and well-documented; moreover, widely = accepted and supported (especially by vendors and manufacturers). * In order to make people use PSI-MS standards, PSI-MS should not only = define them but also provide users with free open source software tools = such as MS spectra viewers, MS analysis viewers, parsers, converters, = =85 =20 3. If you have any general suggestions as to how to accomplish these = goals, please state them here. =20 * What I think is lacking in PSI-MS is the lack of communication of = progress and changes in the development. The PSI-dev web pages should be = a natural place to look for the current status of PSI-MS development. As = it is now, I get the feeling that a small group knows where PSI-MS is = going. Since the bi-weekly teleconfs are not scalable to a large = audience, information must be spread in some other way. I think this is = a key issue if we want the community to take our work seriously, and in = the long run adopting standards created by us. * Incremental revisions of standards, i.e. do not try to solve = everything in one shot. Also, be more pragmatic, tests with real data = during specification development. As it is now, we have problems knowing = what should be changed in the proposed standards. * Get funding to key people so they can focus on PSI-MS * Concentrate first on the raw data standard. You should be able to = achieve widespread agreement on it rather quickly. The bun-fight starts = when the contextual semantics standard is attempted. This is because = each research group will have their own context and will want it = represented. If the contextual semantics standard can be made abstract = enough, individual contextual subclasses will be straightforward to = develop and most groups will be happy. * Define the process by which decisions are made. Formalize the = structure of the organization and give everyone a chance to contribute. = Provide better public interaction, communication and documentation * it seems fruitful to allow wide participation and create a transparent = process for establishing standards. The main point remains the barrier = of language which can lead to misunderstanding and sometimes, make it = difficult to explicit semantic concept. For these reasons, frequent = exchange by mailing list and conf call reports should ensure a good = communication level between each participant. Besides, document revision = tracking would be favored by making it frequently available. Two = meetings per year seem to be enough =20 4. What conceptual requirements should be fulfilled by PSI-MS/mzData? =20 * I think the most important goal for PSI-MS specifications is that they = should support and enable sharing of data. * It should be possible to answer the question =93I=92ve done an = experiment with 67 samples, MS and MSMS. I need to send these to a = publication repository, how do I use PSI-MS stuff?=94 * The formats of the PSI should allow experimental quality to be = evaluated. * They should allow scientists to share information across instrument = types, time and space. * The formats of the PSI should allow the accumulation of a community = wisdom represented by the ability to reanalyze data indefinitely. =20 5. What logical requirements should be fulfilled by PSI-MS/mzData? =20 * The standards should be implement-able, i.e. don=92t overdo them. * Examples of use of the standard should be created, i.e. real data in = files following specifications created by PSI-MS. Use cases describing = real world scenarios. * The aim of any standard is to promote interchange. It is also to = promote computability. By creating, promoting, and enforcing a = standard, researchers will be able to share data, build on previous = findings, and construct useful databases and archives. * Something akin to the W3C (HYPERLINK "http://www.w3.org/"www.w3.org). * It should not break when new instruments or experiments are developed. * mzData should perform the function of being a full representation of = data from mass spectrometry-based experiments. * The formats should be able to be generated by the instrument data = systems directly and used by users without expertise in programming to = submit data to repositories, or to interchange results between groups. * The solution should be the unique reference: produced by raw data = processing software as well as identification (search) engines.=20 * The solution must be pragmatic: for instance, xsl transformation from = a given format to another one is time- and space-consuming, therefore = unrealistic in a =93true=94 proteomics pipeline. Another example is = illustrated by the amount of data daily produced by proteomics center = (in our case around 30 Go/day of raw data): a human-readable format = seems to be inappropriate for such volume. =20 6. What physical requirements should be fulfilled by PSI-MS/mzData? =20 * Use LSID to reference data residing in known data providers, e.g. when = a Swissprot entry is references do something like = urn:lsid:expasy.ch:swissprot:p12345:41, and this should be possible to = validate with an parser. This would also remove the need for storing = information such as sequences, since these will be automatically = retrievable. Bottom line: embrace LSID. * The PSI-MS working group thanks you for your participation in this = survey. Your entries will be examined, collated and included in the = shaping of the published specifications of mzData and analysisXML. * Development and maintenance of a schema, eg .xsd, or collection = thereof * Development of tools that can be used to move data from an earlier = version of the schema to a later version * Development of documentation that helps people build datasets that = conform to the schema * Maintenance of a repository of open source tools that have been = developed by PSI or 3rd parties that operate on dataset conforming to = the schema * It should be compact enough to allow reasonable computer systems to = move and operate on files of the format. * The format should be open and self describing. * Files in the format should be verifiable, support concepts like = digital security and allow for the management of data integrity. * To answer properly, we just wonder whether it is expected one (and = only one) implementation (such as XML) or several (XML, SQL, Object = oriented language) derived from a unique model? * As said in 2., for mzData, PSI-MS should do a work similar to the one = done by Sashimi for mzXML standards cf: = http://sashimi.sourceforge.net/software_glossolalia.html. * That is provide free converters from native binary formats (RAW, wiff, = =85) into mzData and parsers to get spectra objects from an mzData file = in Perl, Java,=85 * Same for analysisXML: provide converters from common formats: Sequest, = Mascot, Tandem, =85 into analysisXML. * It should also provide a free portable viewer (cf msInspect = http://proteomics.fhcrc.org/CPAS/Wiki/home/help/page.view?name=3Dviewer) = to display MS spectra. InSilicos does the job but is available on = Windows only. --=20 No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 267.15.1/250 - Release Date: 2/3/2006 =20 |