Menu

Number format in process summary

2022-04-20
2022-08-04
  • Henriette K

    Henriette K - 2022-04-20

    Hi,
    what is the format of Peak IDs in the process summary xml? For some ions I have 7 digits for other 8 (can be 3 or 4 digits after the decimal). I checked my ref file with the peak IDs I pass on to specproc and there are far more digits (as I compute them). I just came across this question as I tried to match specproc results with my ion list in a next step (comparing strings). I tried to round or cut my peak list (derived from my ref file), but never get all ions.

     
  • clochardM33

    clochardM33 - 2022-04-20

    Not sure. By any chance are the one with 4 digits after the decimal place less than mass 1000?
    Attach an example of what you are looking at pls as it could be any number of things.

    From memory all numbers are carried through at full precision and there is only ever any rounding in the styling of the xml files.
    For every xml file there is a corresponding text file which is probably easier to read data from as it drops straight into Excel.

     
    • Henriette K

      Henriette K - 2022-05-12

      Hi,
      This makes sense. I checked the txt files and there number of decimal places are consistent. So, it is really only the xml formatting. Here an extract of my masses from the xml sheet (in the txt file they all have 4 decimal places):
      <mass>1098.525</mass>
      <mass>530.788</mass>
      <mass>539.3013</mass>

      Per sample batch I have around 100 samples or more. My workflow so far was to open the processing summary (xml) in internet explorer, copy it to excel and save as csv. From there it is straight forward to import in python (samples in rows, m/z in columns). This is probably way more complicated than it needs to be, but my coding skills are very basic. When I read the txt files directly, the ; masks the header, which I could obvisouly manually add afterwards.

       
  • clochardM33

    clochardM33 - 2022-05-13

    Hi,
    Favourite is that the 3dp numbers end in zero.

    I don't know what exactly you are doing with the numbers, but 100 samples per batch is very impressive. It's good to see someone using it.

    The "2 types of output" is in there for exactly what it looks like you are doing, except you have picked the wrong one!
    If any of this was written down it might have helped, but sadly that never happened. So here is a belated attempt at an explanation.
    SpecProc was originally written as a mechanism to get peak lists out of multiple MLynx files. Anyone can process one file, but there isn't a general purpose app in MLynx to get lots of data out of lots of files in one step. SpecProc was an attempt to fill that gap.
    So long as you acquire in a reproducible way then you can process in a reproducible way. If you make ad-hoc acquisitions then who knows the structure of your data and all bets are off. However if you do the same thing many times than if you could process the data in the same way you would end up with a big list and the only difference between each (notional) entry in the list would be the data itself (not the formatting or column order or mixing MS and MSMS mid run and all the other things that make it ad-hoc).
    Which is where I wanted to get to. In my case I wanted to be able to drop the "output" into Excel so then there was no sort/filter/manipulate code to write; just use Excel.

    So step 1. Text files that contain the output.
    Snag 1. Lot's of Mass Spec text contains commas (such as chem names). Which means csv not reliable as csv is split by comma.
    Workaround 1. Use tab as the delimiter rather than comma (as it is unlikely that a tab char is used in any MLynx text). I should have saved the files as .tsv (tab separated values), but they came out as .txt
    Note 1. If you programmatically read in these files use tab as the split character. Then you get the data into an array.
    Note 2. Rows that do not contain "data", which means results of the processing, start with a semicolon. So in the file reader code you can do something like Is first char of first entry in array $ If it is then do something like skip that line.
    Additionally the two lines before "data" are always

    ;Status
    ;Data start

    So if you see Data start after the ; then the row before tells you what each of the columns is.
    Note 3. Column headers are dynamic and a function of what kind of processing you did. By reading the header info row and finding the column that contains the text string you are interested in you guarantee you are reading the column you want. It's not a safe bet to presume that columns are always in the same place (although almost all of them report the same first few things)
    Note 4. The data row contains a complete description of the processing and the file. This means that when you put in excel you can sort on the first column (which puts all the text and headers in one group), delete that group (so you don't have text everywhere) and then resort based on one of the columns (such as filename maybe) so that they are back in the original order. Then if you want to, for example, plot a graph of two columns you don't get text in the way ruining the graph.

    In your case you are (probably unknowingly) doing the same thing in python except getting them out of the report xml is the least efficient way of doing it.

    Step 2. Now there are lists what more can we do with them.
    The idea of applying some prettification to the results was secondary. At the time it seamed reasonable that if you had gone to the trouble of setting up ordered acquisitions and ordered processing you might well want to have an ordered overview of the results. It is a fairly easy (albeit tedious) job to take selective values from the overall data and squirt them out into an xml version. XML only so that it could be viewed in a web browser, because you can add style to that and suddenly you get a pretty report.
    However SpecProc is not psychic; it doesn't know what you want the data for. So the data in text file is always produced because that is the exhaustive set. The xml is a sub set.
    So that is reporting.
    One more note is that the only two queries you can run are peak width and peak position. It again seamed reasonable to offer a "pass/fail" concept then you can use it as a QC check.

    But there can be lots of data files, how do I put them all together?
    You don't, SpecProc does. Although all it actually does is take all the text files and merge them into other files (which is all you would really do in Excel).
    Results go into multiple places:
    The raw folder only contains the result of the last process. It overwrites anything that is already there.
    A folder under the current MLynx project is created - SpecProcResults
    Below that are two folders named after the current sample list (in the example below it's "Default") - one that contains all the individual elements of processing as single files (Default_Files) and one that contains all these individual files merged. The merged folder also contains a merge of all the results of the same query type which is why you might see a file simply called Resolution.txt or Peak ID.txt
    I think it also sorts things out if you run multiple olp fles in the same batch.

    So after all that (going back to what I think is your problem) there is a text file that you should look at which has all the data you need lurking under "sample_list_name_Merged".
    Read it row by row.
    If it stats with ; then reject it
    Split all remaining rows by tab (not comma)
    And you will have all the data in memory.

    It sounds so simple.
    If you want any help then we can follow up on it.
    And don't forget this place (if you don't already know it). Carers have been made from just following their tutorials

    https://www.w3schools.com/python/

    C:\MassLynx\Default.pro\SpecProcResults\Default_Merged\2022 May 13 @ 12-15-24
    
     
  • Henriette K

    Henriette K - 2022-08-04

    Thanks for all the information. I finally got back to this, sorry for the delay. The layout of the txt file prevented me from using my standard py package for reading csv (or similar) and this was why I was afraid of trying to read it. But after your encouragement I tried again and found a way to ignore the whitespaces and lines starting with ; while still reading the column headers (which are in the ; lines).
    In case somebody else is in a similar situation, I attached the py file. It's not well coded, but it does work.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.