Re: [Geotools-devel] GridsPackage

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello Everybody

Some further comments...

> Using RenderedImage and JAI, we can make abstraction of the underlying 
> data type. You can work as if values were of a double type, no matter 
> what the type actually is.

1) The grids package simply stores double precision primitives for cell values.
I had the idea of having a Cell object but it didn't evolve that way.  Also I 
thought it would be simple to have AbstractGrid2DSquareCellFloat, 
AbstractGrid2DSquareCellInt and AbstractGrid2DSquareCellBinary etc.
I fixed the primitive type for all cells in a grid for ease of finding them in a
RandomAccessFile.  There is probably a better way to do this... I am not 
an expert, I just got something working and tried to make some things 
efficient as I needed them.

2) JAI is better as each cell optimally stores itself which (as you explained) 
enables it to cope with large images in memory.

I now agree to use JAI and I think we both agree that it would be a good 
thing to have a custom TileCache implementation.

> JAI has a nice "deferred execution" engine: tiles are computed only 
> when first required. Big images with a lot of tiles will take less 
> memory with JAI if not all tiles were required.

I think I understand:  Process what is needed and so only have in 
memory what is needed for that.  (There may be an argument also for
processing more if idling just in case it is needed?)

> The 'getCell' method in Grid2DSquareCellDoubleFile seem to fetch 
> the value from the file everytime 'getCell' is invoked, which may be a 
> performance issue. A think that a TileCache implementation may be a 
> better approach here.

You are probably right, at least I had hoped to not need to use 
Grid2DSquareCellDoubleFile directly once Grid2DSquareCellDoubleChunk
was implemented.

Repeatedly reading values from Grid2DSquareCellDoubleFile using getCell
is slow.  There is a getCells() method for getting an array for a circular 
region (although it first grabs a rectangular region).  Perhaps more 
impressively a Grid2DSquareCellDouble can be constructed by defining a
region of an AbstractGrid2DSquareCellDouble.  This is relatively fast in 
the case of the  AbstractGrid2DSquareCellDouble being an instanceof
Grid2DSquareCellDoubleFile as the constructor only calls a seek() on the 
file at the end of each line of the rectangular region.  I was planning to do 
some similar optimisation for Grid2DSquareCellDoubleChunk but...

> Square cells is not a requirement for org.geotools.gc.GridCoverage. 
> However, it may be a requirement for some spatial analysis algorithms. 
> But it is the algorithm problem; GridCoverage doesn't have to requires 
> square cells.

Agreed, at least you are aware that the methods in the grids package are 
based on a 2D Euclidean geometry.  Some work is clearly needed to adapt 
methods for non-square celled grids.

> The 2D vs 3D support is more problematic. Theorically, a GridCoverage 
> can very well be 3D, 4D, 5D, etc. In practice, a 3D GridCoverage is 
> allowed in current Geotools implementation but its support is somewhat 
> limited. We may have to come back on this issue later.

Yes, later...

> Actually, there is two kind of "immutability" in GCS specification:
> 
> - Immutability of objects (e.g. we can't change the size of a GridRange,
>    neither the GridGeometry's 'gridToCoordinateSystem' transform). This
>    requirement really make the programming *much* easier. The reason is
>    that when a property change (e.g. the geographic location of a grid
>    coverage), then some code *outside* GridCoverage may be perturbed. for
>    example a renderer may no longer draw the GridCoverage at the right
>    screen location because it was not notified that the GridCoverage has
>    moved. We would need to register PropertyChangeListeners, which add a
>    lot of complexity for little gain.
> 
> - The other immutability is the state of pixel values. Here, grid
>    coverages are not completly immutable. According OpenGIS spec, some
>    GridCoverages may be writable. But pixel values are the only thing we
>    can edit. Complex feature like GridGeometry, CoordinateSystem, etc.
>    still immutable.
> 
> 
> Note that while GridGeometry is a big object, creating a GridGeometry 
> clone with just a few differences don't consume that much memory. This 
> is because two instances of GridGeometry can share a lot of references 
> to the same object (e.g. use the same CoordinateSystem, the same 
> GridRange, etc.). A reference consume only 4 bytes... Sharing the same 
> instances is possible because... thoses instances are immutables :)!

Thanks, that helped...

> >>4) Grid2DSquareCellDouble seems to stores pixel values in a Hashtable
> >>    as well as in a double[] array. Doesn't it means that you intend to
> >>    support sparse matrix? A Hastable is not really an efficient storage
> >>    mechanism for a dense image.
> > 
> > Yes it stores in both, but only one at a given time.  The optimisation is 
> > handled in a hard coded way based on what seemed about right ( see 
> > optimiseCollection() ).  This is one of several key things that can be done
> > much better.  It relates to the problems of getting available memory and 
> > calculating what is best apriori storage wise given what is planned 
> > computationally.
> 
> Well, is anycase the only case where a Hashtable would be more efficient 
> would be with an image with a lot of holes. Lets compute:
> 
> - Using an array of type 'double[]', each pixel value consume 8 bytes.
> 
> - Using an hashtable, each pixel value consume 10 bytes for the Double
>    value (I had 2 bytes for each object instanciated with the HotSpot
>    Client virtuan machine; it would be 3 bytes for the server), plus 6
>    bytes for the Integer key, plus 18 bytes for the internal Map.Entry
>    object used internall by Hashtable, plus approximatively 6 bytes for
>    the internal Hashtable array. TOTAL = 40 bytes per pixel!!!
> 
> A Hashtable would be more efficient only if less than 20% of the image 
> area is filled with data. If the user use the 'byte' data type rather 
> than 'double', then a hashtable would be more efficient only in less 
> than 3% of the whole image area is filled with data!!!!!!!!

(Not a bad guess my optimisation then...)

> Furthermore, fetching data in a Hashtable is much slower than a plain 
> array: Hashtable is synchronized, requires "new Integer(...)" for every 
> pixel fetching (which put a lot of overhead on the garbage collector), 
> etc. I don't think that the Hashmap solution should be pushed any further.

OK...

> If we want to go among the OpenGIS spirit, then there is my suggestions:
> 
> - Current Grid2D...Processor contains many methods performing different
>    tasks. I suggest to split them: One task == one class. Those class
>    will be Operation, which are later used by GridCoverageProcessor.

I'll do that...

> - Those operations should work on RenderedImage, not on
>    Grid2DSquareCellDouble. I suggest that you start with a very
>    simple operation in order to get used with JAI API. Try the
>    following:
> 
>    - Gets a RenderedImage as input.
> 
>    - Use javax.media.jai.iterator.RectIter in order to iterate through
>      all pixels values in this RenderedImage.
> 
>    - Write the result in whatever structure you want (it may be
>      Grid2DSquareCellDouble if you want). Later, I will give you
>      some tips for writting directly in an other RenderedImage instead.

Next week hopefully. Have a good weekend.

:)

---
Andy Turner
---
CCG, School of Geography, University of Leeds, Leeds, United Kingdom
http://www.geog.leeds.ac.uk/people/a.turner/
+44 (0)113 3433309

Re: [Geotools-devel] GridsPackage

Toolkit for working with and mapping geospatial data

Re: [Geotools-devel] GridsPackage