|
From: Brian O'C. <boc...@uc...> - 2008-01-10 00:12:56
|
Hi Klaus,
I'm trying to add image compression to the SolexaTools project and I'm
running into some problems. I was hoping to add a step that, rather
than compressing with bzip2 which saves 1/3 of the space, it compresses
with LZW. This would reduce the image files to about 1/5 of their
original size. Since the algorithm is lossless, I was expecting to be
able to do a compression round trip and get the same answer from the
pipeline. When I tried this, the sequences generated was substantially
different suggesting the image conversion was actually causing changes.
I'm using ImageMagick to do the conversion, when I look at the image
before and after the round trip I get the following (using the
ImageMagick "identify" command):
Before:
Channel statistics:
Gray:
Min: 3 (0.0117647)
Max: 24 (0.0941176)
Mean: 4.4793 (0.0175659)
Standard deviation: 1.08192 (0.00424284)
Colors: 21
Histogram:
995: ( 771, 771, 771) grey1
711385: ( 1028, 1028, 1028) #040404040404
210559: ( 1285, 1285, 1285) grey2
41109: ( 1542, 1542, 1542) #060606060606
17605: ( 1799, 1799, 1799) #070707070707
9223: ( 2056, 2056, 2056) grey3
5497: ( 2313, 2313, 2313) #090909090909
2335: ( 2827, 2827, 2827) #0B0B0B0B0B0B
3446: ( 2570, 2570, 2570) grey4
1530: ( 3084, 3084, 3084) #0C0C0C0C0C0C
965: ( 3341, 3341, 3341) grey5
595: ( 3598, 3598, 3598) #0E0E0E0E0E0E
341: ( 3855, 3855, 3855) grey6
201: ( 4112, 4112, 4112) #101010101010
95: ( 4369, 4369, 4369) #111111111111
59: ( 4626, 4626, 4626) grey7
29: ( 4883, 4883, 4883) #131313131313
22: ( 5140, 5140, 5140) grey8
10: ( 5397, 5397, 5397) #151515151515
6: ( 5654, 5654, 5654) #161616161616
1: ( 6168, 6168, 6168) #181818181818
Rendering intent: Undefined
After:
Channel statistics:
Gray:
Min: 3 (0.0117647)
Max: 24 (0.0941176)
Mean: 4.4793 (0.0175659)
Standard deviation: 1.08192 (0.00424284)
Colors: 65536
Histogram:
995: ( 771, 771, 771) grey1
711385: ( 1028, 1028, 1028) #040404040404
210559: ( 1285, 1285, 1285) grey2
41109: ( 1542, 1542, 1542) #060606060606
17605: ( 1799, 1799, 1799) #070707070707
9223: ( 2056, 2056, 2056) grey3
5497: ( 2313, 2313, 2313) #090909090909
2335: ( 2827, 2827, 2827) #0B0B0B0B0B0B
3446: ( 2570, 2570, 2570) grey4
1530: ( 3084, 3084, 3084) #0C0C0C0C0C0C
965: ( 3341, 3341, 3341) grey5
595: ( 3598, 3598, 3598) #0E0E0E0E0E0E
341: ( 3855, 3855, 3855) grey6
201: ( 4112, 4112, 4112) #101010101010
95: ( 4369, 4369, 4369) #111111111111
59: ( 4626, 4626, 4626) grey7
29: ( 4883, 4883, 4883) #131313131313
22: ( 5140, 5140, 5140) grey8
10: ( 5397, 5397, 5397) #151515151515
6: ( 5654, 5654, 5654) #161616161616
1: ( 6168, 6168, 6168) #181818181818
The statistics reported by ImageMagick look identical to me so it looks
like the information content is unaffected(?). Do you have any ideas
why this isn't working. Did you use a standard library for tif file
operations? Are you using a custom Tif file format that may be
disrupted by the conversion by ImageMagick? Does the tif library your
using support LZW compressed tif images directly? Is there a reason why
the images aren't compressed using LZW on the instrument machine? Are
16-bit images needed or does 8-bit provide sufficient dynamic range?
Sorry for all the questions. I would really love to get this added in
because it would seriously reduce the size of our raw data (which we're
currently archiving).
Thanks very much for your help!!
--Brian O'Connor
SolexaTools @ the Nelson Lab, University of California, Los Angeles
PS: I can send some sample images and more verbose output if it's helpful.
Maisinger, Klaus wrote:
>Hello Brian
>
>I just had a look at your Sourceforge project "solexatools". If you have
>any feature requests / issues how the Genome Analyzer pipeline can make
>your life easier, please don't hesitate to let me know (I can't
>guarantee we will act on suggestions but we will do our best. :-)) We've
>also got an email address Pip...@il... for
>development/code related questions/feedback (for user questions please
>refer to tec...@il...).
>
>Thanks for your input
>Klaus
>
>
>
|