From: Hendrik W. <hw...@sa...> - 2014-07-17 08:48:58
|
Hi! I've encountered problems with ProteoWizard's "msconvert" (version 3.0.6485) that I think may be due to a bug. I'm working with Orbitrap Elite HCD data. Starting from the RAW files (e.g. http://www.ebi.ac.uk/pride/archive/files/28960128), I want to prepare the data for a search pipeline: filter out everything but MS2 spectra, perform centroiding (peak picking), store in mzML format. I'm using these msconvert parameters: --32 --filter "peakPicking true 2" --filter "msLevel 2" This runs through without complaint, but the results are bad: The file is much bigger than expected (750 MB for the example), indicating that the centroiding didn't work. Indeed, the result is the same as without the "peakPicking" filter. If the filters are used in the wrong order, there's a helpful error message: "[SpectrumList_PeakPicker] Warning: vendor peakPicking requested, but peakPicking is not the first filter. Since the vendor DLLs can only operate directly on raw data, this filter will likely not have any effect. Warning: vendor peakPicking was requested, but is unavailable for this input data. Using ProteoWizard centroiding algorithm instead. High-quality peak-picking can be enabled using the cwt flag." The result is smaller than before (460 MB), indicative of some centroiding, but still suspiciously large. Interestingly, when centroiding and MS2 filtering are applied as separate steps (two calls to msconvert), I do get the expected result (a 160 MB file). So the filters seem to work, but there may be a problem with how they are applied in conjunction. Cheers Hendrik -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Brian P. <bs...@pr...> - 2014-07-17 16:47:41
|
Hi Hendrik, mzML files do tend to be large, often larger than the raw files they come from, so I wouldn't base any assessment of correctness on size alone. Have you looked at the mzML file in a text editor to see if the MS1 data has in fact been removed? Have you opened the mzML file in something like SeeMS to look at the spectra to see if they've been centroided? Hope this helps, Brian Pratt On Thu, Jul 17, 2014 at 1:48 AM, Hendrik Weisser <hw...@sa...> wrote: > Hi! > > I've encountered problems with ProteoWizard's "msconvert" (version > 3.0.6485) that I think may be due to a bug. > > I'm working with Orbitrap Elite HCD data. Starting from the RAW files > (e.g. http://www.ebi.ac.uk/pride/archive/files/28960128), I want to > prepare the data for a search pipeline: filter out everything but MS2 > spectra, perform centroiding (peak picking), store in mzML format. I'm > using these msconvert parameters: > --32 --filter "peakPicking true 2" --filter "msLevel 2" > > This runs through without complaint, but the results are bad: The file > is much bigger than expected (750 MB for the example), indicating that > the centroiding didn't work. Indeed, the result is the same as without > the "peakPicking" filter. > > If the filters are used in the wrong order, there's a helpful error > message: > "[SpectrumList_PeakPicker] Warning: vendor peakPicking requested, but > peakPicking is not the first filter. Since the vendor DLLs can only > operate directly on raw data, this filter will likely not have any effect. > Warning: vendor peakPicking was requested, but is unavailable for this > input data. Using ProteoWizard centroiding algorithm instead. > High-quality peak-picking can be enabled using the cwt flag." > > The result is smaller than before (460 MB), indicative of some > centroiding, but still suspiciously large. > > Interestingly, when centroiding and MS2 filtering are applied as > separate steps (two calls to msconvert), I do get the expected result (a > 160 MB file). So the filters seem to work, but there may be a problem > with how they are applied in conjunction. > > > Cheers > > Hendrik > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > > ------------------------------------------------------------------------------ > Want fast and easy access to all the code in your enterprise? Index and > search up to 200,000 lines of code with a free copy of Black Duck > Code Sight - the same software that powers the world's largest code > search on Ohloh, the Black Duck Open Hub! Try it now. > http://p.sf.net/sfu/bds > _______________________________________________ > proteowizard-support mailing list > pro...@li... > https://lists.sourceforge.net/lists/listinfo/proteowizard-support > |
From: Hendrik W. <hw...@sa...> - 2014-07-17 20:15:58
|
Hi Brian! Thanks for your response. Maybe I put too much information in my e-mail, so let me reduce it to this - the following two calls produced the exact same output file in my test case (http://www.ebi.ac.uk/pride/archive/files/28960128): msconvert --32 --filter "peakPicking true 2" --filter "msLevel 2" msconvert --32 --filter "msLevel 2" I hope you can agree that there's something fishy there. Cheers Hendrik -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Brian P. <bs...@pr...> - 2014-07-17 20:42:28
|
Agreed - but are you sure the data isn't centroided already? How do the outputs differ with just the peak picking filter and no filter at all? On Thu, Jul 17, 2014 at 1:15 PM, Hendrik Weisser <hw...@sa...> wrote: > Hi Brian! > > Thanks for your response. Maybe I put too much information in my e-mail, > so let me reduce it to this - the following two calls produced the exact > same output file in my test case (http://www.ebi.ac.uk/pride/ > archive/files/28960128): > > msconvert --32 --filter "peakPicking true 2" --filter "msLevel 2" > msconvert --32 --filter "msLevel 2" > > I hope you can agree that there's something fishy there. > > > > Cheers > > Hendrik > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > |
From: Hendrik W. <hw...@sa...> - 2014-07-17 20:53:39
|
Hi! > Agreed - but are you sure the data isn't centroided already? How do the > outputs differ with just the peak picking filter and no filter at all? Yes, I'm sure the data isn't centroided (search engine complains about it not being centroided, and the OpenMS centroiding algorithm doesn't complain, but would if it were already centroided). Just the peak picking filter reduces the size significantly compared to no filter (as you would expect). I can give you the exact file sizes when I'm back at work tomorrow. Cheers Hendrik -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Hendrik W. <hw...@sa...> - 2014-07-18 09:45:47
|
Hi! > How do the outputs differ with just the peak picking filter and no > filter at all? To follow up on this: The mzML without any filtering is 1.6 GB big, with peak picking it's about 1.0 GB. Cheers Hendrik -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. |
From: Brian P. <bs...@pr...> - 2014-07-18 23:54:31
|
I agree, it sounds like the two filters aren't working in sequence as they should. Unfortunately I'm about to go away for a week, hopefully another proteowizard developer can look into it before I return. Brian On Fri, Jul 18, 2014 at 2:45 AM, Hendrik Weisser <hw...@sa...> wrote: > Hi! > > > How do the outputs differ with just the peak picking filter and no >> filter at all? >> > > To follow up on this: The mzML without any filtering is 1.6 GB big, with > peak picking it's about 1.0 GB. > > > > Cheers > > Hendrik > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a company > registered in England with number 2742969, whose registered office is 215 > Euston Road, London, NW1 2BE. > |