Hi James,

For a more detailed explanation of the filters, use msconvert --help. But yes, mzWindow and scanTime will do what you need, except it's still extracting raw spectra within those criteria, not making an intensity vs retention time chromatogram. If you want to make a chromatogram, try msaccess' "slice" command. All of the tools will give you unprocessed data unless you tell them otherwise. Your confusion stems from the fact that your file was acquired with some centroid-only spectra (I'm guessing MS2s). You can't get the profile data back for those spectra. But if you're making a chromatogram, you don't really need profile data anyway.

The SHA-1 is part of the mzML standard and is required to make it a valid file. It's useful for checking that the file that was converted has not been corrupted or changed, or indeed checking if the file is a different one that's been evilly renamed. Make sure you're using the very latest msconvert. I recently fixed a severe performance issue that has been in effect since I made the SHA-1 tool we use work for UTF-8 filenames.

Hope this helps,

On 7/2/2014 6:27 PM, James Yuan wrote:

I have been tasked with extracting specific windows of data from raw profile data collected on an Agilent instrument. I need to extract data based on a specific retention time window and specific mz window. The output should be intensity vs retention time for that window.

I was originally going to use the Reader API to manipulate the files directly but I noticed there were two filter options in msconvert called “scanTime” and “mzWindow”

I just need to confirm does calling:
msconvert example_data.d --filter “scanTime [a,b]” --filter “mzWindow [x,z]”

do what I want? When I run this command does msconvert use the raw profile data in the directory or the centroid data? And does msconvert automatically do any extra processing (eg. centroiding) to the output or does it simply convert the raw profile data from the Agilent format into mzML? I want to extract the pure untouched profile data.

I did a test run and in the mzML file I noticed the following lines:
        <cvParam cvRef="MS" accession="MS:1000579" name="MS1 spectrum" value=""/>
        <cvParam cvRef="MS" accession="MS:1000127" name="centroid spectrum" value=""/>
        <cvParam cvRef="MS" accession="MS:1000128" name="profile spectrum" value=""/>
        <cvParam cvRef="MS" accession="MS:1000235" name="total ion current chromatogram" value=""/>

<run id="_x0032_0120928_Scer_isotope_1" defaultInstrumentConfigurationRef="IC1" startTimeStamp="2013-04-02T18:26:19Z" defaultSourceFileRef="MSScan.bin">
      <chromatogramList count="1" defaultDataProcessingRef="pwiz_Reader_Agilent_conversion">
        <chromatogram index="0" id="TIC" defaultArrayLength="2760">
          <cvParam cvRef="MS" accession="MS:1000235" name="total ion current chromatogram" value=""/>

Does this mean the data that I have in the mzML file is not from the profile spectrum? The file that contains profile data is called “MSProfile.bin” and not “MSScan.bin”. Do I need to change the command I’m using to extract data from the correct file? I have attached the full mzML file for you to look at. It seems the file is a little small considering the original binary file was 1.7GB although I did choose a relatively small window to use for this test.

Also what is the purpose of performing the SHA-1 hash and is there any way to skip it?

Thanks for your help,


Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards

proteowizard-support mailing list