From: Brian O'C. <boc...@uc...> - 2008-01-10 00:12:56
|
Hi Klaus, I'm trying to add image compression to the SolexaTools project and I'm running into some problems. I was hoping to add a step that, rather than compressing with bzip2 which saves 1/3 of the space, it compresses with LZW. This would reduce the image files to about 1/5 of their original size. Since the algorithm is lossless, I was expecting to be able to do a compression round trip and get the same answer from the pipeline. When I tried this, the sequences generated was substantially different suggesting the image conversion was actually causing changes. I'm using ImageMagick to do the conversion, when I look at the image before and after the round trip I get the following (using the ImageMagick "identify" command): Before: Channel statistics: Gray: Min: 3 (0.0117647) Max: 24 (0.0941176) Mean: 4.4793 (0.0175659) Standard deviation: 1.08192 (0.00424284) Colors: 21 Histogram: 995: ( 771, 771, 771) grey1 711385: ( 1028, 1028, 1028) #040404040404 210559: ( 1285, 1285, 1285) grey2 41109: ( 1542, 1542, 1542) #060606060606 17605: ( 1799, 1799, 1799) #070707070707 9223: ( 2056, 2056, 2056) grey3 5497: ( 2313, 2313, 2313) #090909090909 2335: ( 2827, 2827, 2827) #0B0B0B0B0B0B 3446: ( 2570, 2570, 2570) grey4 1530: ( 3084, 3084, 3084) #0C0C0C0C0C0C 965: ( 3341, 3341, 3341) grey5 595: ( 3598, 3598, 3598) #0E0E0E0E0E0E 341: ( 3855, 3855, 3855) grey6 201: ( 4112, 4112, 4112) #101010101010 95: ( 4369, 4369, 4369) #111111111111 59: ( 4626, 4626, 4626) grey7 29: ( 4883, 4883, 4883) #131313131313 22: ( 5140, 5140, 5140) grey8 10: ( 5397, 5397, 5397) #151515151515 6: ( 5654, 5654, 5654) #161616161616 1: ( 6168, 6168, 6168) #181818181818 Rendering intent: Undefined After: Channel statistics: Gray: Min: 3 (0.0117647) Max: 24 (0.0941176) Mean: 4.4793 (0.0175659) Standard deviation: 1.08192 (0.00424284) Colors: 65536 Histogram: 995: ( 771, 771, 771) grey1 711385: ( 1028, 1028, 1028) #040404040404 210559: ( 1285, 1285, 1285) grey2 41109: ( 1542, 1542, 1542) #060606060606 17605: ( 1799, 1799, 1799) #070707070707 9223: ( 2056, 2056, 2056) grey3 5497: ( 2313, 2313, 2313) #090909090909 2335: ( 2827, 2827, 2827) #0B0B0B0B0B0B 3446: ( 2570, 2570, 2570) grey4 1530: ( 3084, 3084, 3084) #0C0C0C0C0C0C 965: ( 3341, 3341, 3341) grey5 595: ( 3598, 3598, 3598) #0E0E0E0E0E0E 341: ( 3855, 3855, 3855) grey6 201: ( 4112, 4112, 4112) #101010101010 95: ( 4369, 4369, 4369) #111111111111 59: ( 4626, 4626, 4626) grey7 29: ( 4883, 4883, 4883) #131313131313 22: ( 5140, 5140, 5140) grey8 10: ( 5397, 5397, 5397) #151515151515 6: ( 5654, 5654, 5654) #161616161616 1: ( 6168, 6168, 6168) #181818181818 The statistics reported by ImageMagick look identical to me so it looks like the information content is unaffected(?). Do you have any ideas why this isn't working. Did you use a standard library for tif file operations? Are you using a custom Tif file format that may be disrupted by the conversion by ImageMagick? Does the tif library your using support LZW compressed tif images directly? Is there a reason why the images aren't compressed using LZW on the instrument machine? Are 16-bit images needed or does 8-bit provide sufficient dynamic range? Sorry for all the questions. I would really love to get this added in because it would seriously reduce the size of our raw data (which we're currently archiving). Thanks very much for your help!! --Brian O'Connor SolexaTools @ the Nelson Lab, University of California, Los Angeles PS: I can send some sample images and more verbose output if it's helpful. Maisinger, Klaus wrote: >Hello Brian > >I just had a look at your Sourceforge project "solexatools". If you have >any feature requests / issues how the Genome Analyzer pipeline can make >your life easier, please don't hesitate to let me know (I can't >guarantee we will act on suggestions but we will do our best. :-)) We've >also got an email address Pip...@il... for >development/code related questions/feedback (for user questions please >refer to tec...@il...). > >Thanks for your input >Klaus > > > |