Menu

Memory management when writing large files

Help
2013-07-10
2013-09-07
  • David R Thompson

    In general, spectral Python has been working really well for me. Just one question: is there a preferred way to write portions of a large file without loading the whole thing into memory?

    open() enables partial reads using a read-only memmap object. My expedient hack was to modify this code so that it would open the memory map in "r+" mode. This works fine, but it would be great if there was a more portable solution that wouldn't require custom modifications to the spectral python code.

     
  • Thomas Boggs

    Thomas Boggs - 2013-07-10

    I intentionally made the object returned by open_image (or similarly by envi.open) a read-only object to avoid the possibility of someone accidentally (and possibly unknowingly) overwriting portions of a huge HSI file. The ability to save and create files was a recent addition to the module but I realize that it can also be useful to just write part of a file (particularly if you are memory-limited).

    Was the change you made simply to open the memmap member of the SpyFile object in "r+" mode? If that is all you need, I suppose I could add an optional "writable" keyword to the open functions/methods that would create the memmap in that mode.

    The one concern I have is whether that might cause confusion for users when opening files with non-BIP interleave. The SpyFile object returned by the open_image function provides an ndarray-like interface that exposes all images like they are a BIP data cube. For example, if I do this

    image = open_image('myfile.hdr')
    pixel = image[30, 30]
    

    The value of pixel will be the image pixel data at row/col (30, 30), regardless of the interleave of the image file. But if I did this:

    pixel = image.memmap[30, 30]
    

    the data in pixel would be mangled if the image file was interleaved BIL or BSQ. So users would need to be explicitly aware of the file's interleave when using the memmap in that way. Maybe I can provide an alternate memmap that is transposed to provide a consistent interface (but first I'll need to find out if transposing a memmap will read all the data from the file).

     
  • David R Thompson

    You're right, the only change was a single character, to open the memmap member in 'r+' mode (I'm working with BIL files so I actually modified bilfile.py, and accessed the memmap using non-standard BIL indexing after that).

    I definitely see the potential for interleave confusion in accessing the memmap directly, so it makes sense to leave writeable memmaps out of the main code tree as you've done. And I'm happy with my current solution (mostly I just wanted to make sure I wasn't missing a more obvious workaround).

    Thanks for writing this library; it's been really helpful for my research.

     
    • Thomas Boggs

      Thomas Boggs - 2013-09-07

      Spectral Python 0.12 was just released and provides a cleaner way access data via memmap objects. There is now an open_memmap method for open images. The default for this method is to return a read-only memmap but you can request a writable memmap as well:

      image = open_image('myfile.hdr')
      mm = image.open_memmap(writable=True)
      
      # Write 1 to all band values for a pixel
      mm[30, 30] = 1
      

      To address the issue I mentioned in my previous post regarding differing interleaves, the open_memmap method will return a memmap with BIP interleave by default but you can specify an alternate interleave, if you prefer (changing the interleave does not force data to be loaded into memory). Note that you should no longer try to access the memmap member of open images (which should no longer be necessary), as that member has been abstracted. For additional details, see the web site.

       
  • David R Thompson

     

    Last edit: David R Thompson 2013-07-10
  • Thomas Boggs

    Thomas Boggs - 2013-07-11

    I explored this a bit and it appears that when I create a transpose of a memmap, it produces a new memmap object which uses a reference to the original. More importantly, it does not force the original memmap to read all data from the file.

    So I could add a new method to the SpyFile class that returns a writable memmap with BIP interleave by default. Actually, it would probably be even better to have a general get_memmap method in the SpyFile class that accepts an optional interleave (BIP would be the default) and an optional access mode, which would be "r" by default so the user would explicitly have to request a writable memmap.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.