SpecProc / Discussion / General Discussion: Number format in process summary

Hi,
This makes sense. I checked the txt files and there number of decimal places are consistent. So, it is really only the xml formatting. Here an extract of my masses from the xml sheet (in the txt file they all have 4 decimal places):
<mass>1098.525</mass>
<mass>530.788</mass>
<mass>539.3013</mass>

Per sample batch I have around 100 samples or more. My workflow so far was to open the processing summary (xml) in internet explorer, copy it to excel and save as csv. From there it is straight forward to import in python (samples in rows, m/z in columns). This is probably way more complicated than it needs to be, but my coding skills are very basic. When I read the txt files directly, the ; masks the header, which I could obvisouly manually add afterwards.

Hi,
Favourite is that the 3dp numbers end in zero.

I don't know what exactly you are doing with the numbers, but 100 samples per batch is very impressive. It's good to see someone using it.

The "2 types of output" is in there for exactly what it looks like you are doing, except you have picked the wrong one!
If any of this was written down it might have helped, but sadly that never happened. So here is a belated attempt at an explanation.
SpecProc was originally written as a mechanism to get peak lists out of multiple MLynx files. Anyone can process one file, but there isn't a general purpose app in MLynx to get lots of data out of lots of files in one step. SpecProc was an attempt to fill that gap.
So long as you acquire in a reproducible way then you can process in a reproducible way. If you make ad-hoc acquisitions then who knows the structure of your data and all bets are off. However if you do the same thing many times than if you could process the data in the same way you would end up with a big list and the only difference between each (notional) entry in the list would be the data itself (not the formatting or column order or mixing MS and MSMS mid run and all the other things that make it ad-hoc).
Which is where I wanted to get to. In my case I wanted to be able to drop the "output" into Excel so then there was no sort/filter/manipulate code to write; just use Excel.

So step 1. Text files that contain the output.
Snag 1. Lot's of Mass Spec text contains commas (such as chem names). Which means csv not reliable as csv is split by comma.
Workaround 1. Use tab as the delimiter rather than comma (as it is unlikely that a tab char is used in any MLynx text). I should have saved the files as .tsv (tab separated values), but they came out as .txt
Note 1. If you programmatically read in these files use tab as the split character. Then you get the data into an array.
Note 2. Rows that do not contain "data", which means results of the processing, start with a semicolon. So in the file reader code you can do something like Is first char of first entry in array $ If it is then do something like skip that line.
Additionally the two lines before "data" are always

;Status
;Data start

So if you see Data start after the ; then the row before tells you what each of the columns is.
Note 3. Column headers are dynamic and a function of what kind of processing you did. By reading the header info row and finding the column that contains the text string you are interested in you guarantee you are reading the column you want. It's not a safe bet to presume that columns are always in the same place (although almost all of them report the same first few things)
Note 4. The data row contains a complete description of the processing and the file. This means that when you put in excel you can sort on the first column (which puts all the text and headers in one group), delete that group (so you don't have text everywhere) and then resort based on one of the columns (such as filename maybe) so that they are back in the original order. Then if you want to, for example, plot a graph of two columns you don't get text in the way ruining the graph.

In your case you are (probably unknowingly) doing the same thing in python except getting them out of the report xml is the least efficient way of doing it.

Step 2. Now there are lists what more can we do with them.
The idea of applying some prettification to the results was secondary. At the time it seamed reasonable that if you had gone to the trouble of setting up ordered acquisitions and ordered processing you might well want to have an ordered overview of the results. It is a fairly easy (albeit tedious) job to take selective values from the overall data and squirt them out into an xml version. XML only so that it could be viewed in a web browser, because you can add style to that and suddenly you get a pretty report.
However SpecProc is not psychic; it doesn't know what you want the data for. So the data in text file is always produced because that is the exhaustive set. The xml is a sub set.
So that is reporting.
One more note is that the only two queries you can run are peak width and peak position. It again seamed reasonable to offer a "pass/fail" concept then you can use it as a QC check.

But there can be lots of data files, how do I put them all together?
You don't, SpecProc does. Although all it actually does is take all the text files and merge them into other files (which is all you would really do in Excel).
Results go into multiple places:
The raw folder only contains the result of the last process. It overwrites anything that is already there.
A folder under the current MLynx project is created - SpecProcResults
Below that are two folders named after the current sample list (in the example below it's "Default") - one that contains all the individual elements of processing as single files (Default_Files) and one that contains all these individual files merged. The merged folder also contains a merge of all the results of the same query type which is why you might see a file simply called Resolution.txt or Peak ID.txt
I think it also sorts things out if you run multiple olp fles in the same batch.

So after all that (going back to what I think is your problem) there is a text file that you should look at which has all the data you need lurking under "sample_list_name_Merged".
Read it row by row.
If it stats with ; then reject it
Split all remaining rows by tab (not comma)
And you will have all the data in memory.

It sounds so simple.
If you want any help then we can follow up on it.
And don't forget this place (if you don't already know it). Carers have been made from just following their tutorials

https://www.w3schools.com/python/

C:\MassLynx\Default.pro\SpecProcResults\Default_Merged\2022 May 13 @ 12-15-24

Peak ID.txt

Number format in process summary

SpecProc is a general processing tool for MassLynx data.

Forums

Help

Number format in process summary

Number format in process summary

SpecProc is a general processing tool for MassLynx data.

Forums

Help

Number format in process summary document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Number format in process summary