From: Matthew C. <mat...@va...> - 2009-06-30 19:21:36
|
Hi all, We need terms for Agilent MassHunter sources in the CV. In the MassHunter API there are two ways to uniquely address a spectrum: by "row number" or "scan id". Row number is essentially a 0-based index that refers to the spectra after the acquisition software has done something...perhaps internal merging? Scan id represents the ordinal number of acquisitions as they come off the instrument. So, at least on their (Q)TOF instruments, the rowNumber is very disparate from the scanId, but both of them are unique identifiers that can technically be used to refer to a native spectrum. The kink is that the MassHunter API only refers to the parent scan by its scan id and doesn't provide a way to directly translate a scan id to a row number - translation must be done indirectly by enumerating all the row numbers and building a mapping of scan id to row number. For this reason I would recommend that the nativeID format be defined as "scanId=xsd:nonNegativeInteger" but I'm open to comment on this! The source type brings another issue to a head. We actually have more vendor formats that use directories to store their raw data than those that use files. Directories: Agilent MassHunter (read with MHDAC API), Bruker/Agilent YEP, Bruker BAF, Bruker FID, Bruker U2 (previous 4 formats read with CompassXtract API), Waters MassLynx (read with DACServer API) Files: ABI WIFF (read with either Analyst or WiffFileDataReader APIs), Thermo RAW (read with XRawfile API) But we don't clearly define how to deal with the directory-based formats. I'm tempted to recommend that we include and checksum all files within the directories, but it's entirely possible that some people store alternative encodings of the data inside these directories, e.g. mzXML and MGF (I've seen this). So it would be silly to include mzXML and MGF as source files for the native data. There can also be analyses of the data stored there, like Bruker and Agilent's *.m subdirectories, or even pepXML files. Is it reasonable to determine which files in these sources are used by the APIs and put that information in the CV definition for the source types - possibly in a machine-readable way? Also, if we're not going to (and I wouldn't want to) define a separate source type for each subfile (see attached thread ending on 2009-19-03), we would have to document somewhere that every file that should be included in these directory-based formats should be given the directory-level CV term as its source type. -Matt > Matthew Chambers wrote: > >> > Yes, I made that change, but I forgot that every sourceFile has to have >> > a type. That does make it ugly. I was trying to make things consistent >> > between Waters and Bruker formats because they both use directories, but >> > perhaps I should have gone the other direction and made the source type >> > for Bruker directories more applicable to the format as a whole - the >> > problem is I'm not knowledgeable about those formats to know what each >> > one corresponds to that is analogous to MassLynx. In any case, I don't >> > think the meaning of the term changed. The important part is that it's >> > the MassLynx format, not whether it's called DAT or RAW. >> > >> > -Matt >> > >> > >> > Fredrik Levander wrote: >> > >> >>> >> Just noticed that the name and definition of MS:1000526 MassLynx raw >>> >> format has changed to Waters DAT format. Is this really wanted? I guess >>> >> that one would like to have all files in a MassLynx raw folder as source >>> >> files, since they will all contain some information that is used in the >>> >> mzML file, and then they are all part of the the same source file format >>> >> (in my opinion). Or otherwise there will be need to add an _FUNCTNS.INF >>> >> file format and a header.txt format, etc. >>> >> If there is need for separate file formats for these sub-files, I think >>> >> those terms (including the DAT one) should have new accession numbers, >>> >> since the meaning of the term has changed, or am I interpreting this in >>> >> the wrong way? >>> >> >>> >> Fredrik >>> |