From: Thaman C. <th...@ut...> - 2011-09-03 20:33:14
|
Hi Experts, I am somehow lost in the beginning phase of my first project in the Proteomics department. My main task: "Designing DATABASE" of various "FILE" formats existed in the Proteomics research (RAW, wiff, MGF, pepXML, dta). Till now main file formats are RAW, WIFF, MGF & DTA which are not in XML and XML based: pepXM. Obviously there is overlap in information in multiple different data file formats/experimental. Though analysis can be made from single file (experimental measurements) or by comparison between “sets of files”. But, what is missing? Understanding the re-occurring pattern in and between experiments. Of course “Database” is needed in querying experiments/(files) with varied parameters. That’s why file based searching is not efficient to our answer our question. What kind of "search term" in an experiments? "Mono-isotopic mass". "Have we seen this mono isotopic mass before in other experiments is the main issue in our research"? I have started to work with RAW-> mzML (convert). I went through the documentation of mzML available in the HUPO trying to design relational schema from mzML schema manually. Though mzML is quite well documented I have confess not being clear. Questions -------------- 1) Where information like*"mono isotopic mass"* are recorded in the mzML file? Is it precursor charge attribute in the spectrum Element? Or am I missing something? 2) Can I be sure that all RAW and mzML after conversion regardless of different vendors consists of "mono-isotpic mass" info? 3) Further, I am wondering does all mentioned files (wiff, pepXML,dta,mgf) too contains "mono-isotopic mass" information? Please guide me! Regards, RawProt |