Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

large image size/memory management/PIL

2005-08-15
2013-05-02
  • Gawain Lavers
    Gawain Lavers
    2005-08-15

    This is probably a PIL or python issue more than a ZoomifyImage issue, but I thought there might be some interest here.  I'm having difficulty running a large TIFF file (17888x45408, 3.0gb).  I get the following traceback:

    Traceback (most recent call last):
      File "/insitu1/lavers2/ZoomifyImage/ZoomifyFileProcessor.py", line 135, in ?
        processor.ZoomifyProcess(sys.argv[1:])
      File "/insitu1/lavers2/ZoomifyImage/ZoomifyFileProcessor.py", line 127, in ZoomifyProcess
        self.processImage()
      File "/home/lavers2/ZoomifyImage/ZoomifyBase.py", line 199, in processImage
        imageRow = image.crop([0, ul_y, self.originalWidth, lr_y])
      File "/home/lavers2/python/lib/python2.4/site-packages/PIL/Image.py", line 673, in crop
        self.load()
      File "/home/lavers2/python/lib/python2.4/site-packages/PIL/ImageFile.py", line 155, in load
        self.load_prepare()
      File "/home/lavers2/python/lib/python2.4/site-packages/PIL/ImageFile.py", line 221, in load_prepare
        self.im = Image.core.new(self.mode, self.size)
    MemoryError

    I've found some reference to this being an issue with PIL needing enough memory to store the entire image, but I have 2GB of RAM and 3GB of swap, so I think I ought to be covered.  Do either python or PIL self-limit their memory usage?

     
    • adam smith
      adam smith
      2005-09-13

      For those who might stumble upon this message....

      I have been discussing this issue with Gawain offline, and with some guidance from the PIL folks, I am working on an update to the software that will allow it to load less image data into memory as each image is processed, which should allow much larger images to be processed.

      adam

       
    • Gawain Lavers
      Gawain Lavers
      2005-09-13

      So -- I have a rough script which uses Image Magick to perform the task, and it does so successfully -- as far as I can tell.  I've cut a 3.1 GB image (17888x45408 -- 16679 tiles, 66 TileGroup folders, 9 tiers, 137MB of jpegs).  It's not efficient in terms of disk usage/file creation -- it creates a separate file for each row before cutting tiles from the rows.

      The thing about Image Magick (already gone over this with Adam) is that it uses temporary files when it runs out of memory, and you can use env vars (TMPDIR or MAGICK_TMPDIR) to direct where these files are placed.  The temporary files had been swamping my root drive (default /tmp) as they are often in excess of 10GB.

      Although there is a feature to cut a set of rows, columns, or tiles from a single image (in theory we could do one whole tier with a single convert command) this appears to be so inefficent that even using temporary files the operation crashed out on the fullsized image due to a memory error.  Cutting rows individually was too time expensive -- loading the image to cut a single row had about a 12 minute overhead.  Instead I recursively cut the file into halves (same loading overhead to generate two halves as to cut a single 256px row).

      This, in and of itself, I suspect makes an easy separation of the algorithm from the precise image processing software.  I'll re-examine the file (and my algorithm documentation) and send that to you shortly.

      Takes about 2 hours on a P4 3.2 GHz with 2GB RAM.

       
      • Gawain Lavers
        Gawain Lavers
        2005-09-13

        So, here is the algorithm I've used to generate a tileset:

        First, I have to break the main image down into rows, and because of issues I've alluded to before, this is most efficiently done by recursively breaking the main image into halves:

        stack = []
        stack.push(image)
        rowCounter = 0
        rowList = []

        while(stack is not empty):
            currImage = stack.pop()
            if currImage.height >= tile height:
                rename currImage using rowCounter
                rowList.push(rowFile)
            else:
                offSet = half of image.height, in tile height increments
                topFile, bottomFile = splitInHalf(currImage, offset)
                stack.push(topFile)
                stack.push(bottomFile)
                if(currImage != image):
                    delete currImage

        The rowList now has all the files I need, and any other temporary files have been deleted.  Now I cycle over the rows generating tiles for each tier, and at the same time splicing and shrinking rows when I'm done with them, so I can start the next tier.  To do this, I have to compute ahead of time the number of tiers, and the number of files in each tier -- actually the number of files in all previous tiers.  With that I know the offset into the total number of files when starting to generate tiles for a tier, which allows me to easily compute which tilegroup folder the tile will go in.  For convenience I also compute the number of columns in each tier.

        for tier to 0:
            newRowList = []
            tilePtr = tierInfo.previousTiles
            rowCounter = 0
            while rowList not empty:
                firstRow = rowList.pop(0)
                tilePtr = CutTilesFromRow(firstRow, tier, rowCounter, tierInfo.columns, tilePtr)
                rowCounter++
                if(rowList not empty):
                    secondRow = rowList.pop(0)
                    tilePtr = CutTilesFromRow(firstRow, tier, rowCounter, tierInfo.columns, tilePtr)
                    rowCounter++
                    firstRow = appendRows(firstRow, secondRow)
                    delete secondRow
                firstRow = reduceByHalf(firstRow)
                newRowList.push(firstRow)
            rowList = newRowList

        As I see it, the advantage of the Zoomify algorithm is that you limit the number of temporary files (such as rows) that need to be in existence at any one time.  Since this isn't practical (at least as far as I can see) with ImageMagick, a simpler algorithm for operating over the row files can be used.

         
        • Gawain Lavers
          Gawain Lavers
          2005-09-13

          Hmm.  My tabs died.

           
        • adam smith
          adam smith
          2005-09-14

          This is really great. It looks like your basic approach is similar to mine, and this gives people one more option. If you want to write this up more formally, I could add it to the ZoomifyImage documentation if you want. Also, you may want to contact David at Zoomify Corp--he's always looking for things like this to give to people who come to them with support questions.

          I'm sorry that my software wasn't able to meet your needs right now, but I'm glad you found a work around. I guess then that I won't feel *too* pressued to update my software (you know, life is crazy...) Also, since I don't know IM very well, if I add this support to the product, I will probably refer back to your code and credit you of course.