Andy Turner a =E9crit:
> I thought I would answer some of your questions directly and inline wit=
> an edit of your reply. Please edit and forward this to the list if you=
> it right.
If you don't mind, I suggest to continue posting on geotools-devel.=20
Maybe there is some other peoples interested to follow the discussion?
>>1) This package seems do be built on top of Geotools 1, or did I miss
>> something? (there is an import uk.ac.leeds.ccg.raster.Raster).
> The import is only so that the grids can be constructed from a=20
> uk.ac.leeds.ccg.raster.Raster and that the results can be output=20
> as one. The import can easily be removed and is only there for=20
> my convenience as to display in colour sometimes I used GeoTools1
> (more often I would output to ESRI ascii and import int Arc GIS).
Ah! Now I understand. Maybe the next version of the package could be=20
modified in order to do the same with java.awt.image.RenderedImage=20
rather than uk.ac.leeds.ccg.raster.Raster? It would be a smaller step=20
toward J2SE imaging classes (smaller than building on top of JAI right no=
>>2) It there any reason why this package is not built on top of JAI?
> The package was developed for the computational methods mainly in=20
> Grid2DSquareCellDoubleProcessor but also now for processing DEMs in
> Grid2DSquareCellDoubleProcessorDEM. There is no considered reason why=20
> this package does not use JAI. It probably should, but I think there a=
> issues with JAI data storage of very large images.
I don't think... Grid2DSquareCellDouble stores everything as double=20
numbers. JAI lets you the choice between 'byte', 'short', 'int',=20
'float', 'double'. It is truly exceptional to have measurement so=20
accurate that the double precision is needed. A JAI user can use 'float'=20
instead if he wants, and consequently use only half the memory required=20
by Grid2DSquareCellDouble. Furthermore, JAI has a nice "deferred=20
execution" engine: tiles are computed only when first required. Big=20
images with a lot of tiles will take less memory with JAI if not all=20
tiles were required.
As for the memory storage requirement (i.e. swap to disk), I think that=20
we can adress this need with a custom TileCache implementation if=20
needed. The 'getCell' method in Grid2DSquareCellDoubleFile seem to fetch=20
the value from the file everytime 'getCell' is invoked, which may be a=20
performance issue. A think that a TileCache implementation may be a=20
better approach here.
>>3) Is it related to the OpenGIS GridCoverage specification? This packag=
>> API seems to be a custom one.
> I read the specification only after I had gone some way with this. The=
> OpenGIS GridCoverage specification seems reasonable in most ways. =20
> The reason the package does not implement the specification is that it=20
> is a lot of work and I have spent time on spatial analysis method=20
> development and data structures hoping that the rest can be fitted in=20
> around this at a later stage.
You are right, implenting OGC specs is a lot of work :). However, it may=20
be pretty hard to fit in a later stage if the package is not designed in=20
an "OpenGIS way" in an early stage. For example, transforming geographic=20
coordinates to grid coordinates is a kind of fundamental operation, and=20
the OpenGIS way may be different enough to make the change non-trivial.
I suggest to rely on java.awt.image.RenderedImage for data structure and=20
current org.geotools.gc.GridCoverage for "close to" OpenGIS compliance,=20
which have already a lot of work in it :). We are left with spatial=20
analysis method, where there is clearly a lot of room for nice=20
An added difficulty is that this package is
> based on every value being of a double type. Incidently that is why=20
> "Double" is in class names. Additionally the width and height of each=20
> pixel (cell) of a grid is also assumed the same hence "SquareCell". Al=
> "2D" is there as I had planned to also develop "3D"...=20
Using RenderedImage and JAI, we can make abstraction of the underlying=20
data type. You can work as if values were of a double type, no matter=20
what the type actually is.
Square cells is not a requirement for org.geotools.gc.GridCoverage.=20
However, it may be a requirement for some spatial analysis algorithms.=20
But it is the algorithm problem; GridCoverage doesn't have to requires=20
The 2D vs 3D support is more problematic. Theorically, a GridCoverage=20
can very well be 3D, 4D, 5D, etc. In practice, a 3D GridCoverage is=20
allowed in current Geotools implementation but its support is somewhat=20
limited. We may have to come back on this issue later.
> There are parts of the OpenGIS GridCoverage specification that I find=20
> hard to understand. For example, immutability requirements, one way=20
> I interpreted them makes it very hard to see how they can be efficient.
Actually, there is two kind of "immutability" in GCS specification:
- Immutability of objects (e.g. we can't change the size of a GridRange,
neither the GridGeometry's 'gridToCoordinateSystem' transform). This
requirement really make the programming *much* easier. The reason is
that when a property change (e.g. the geographic location of a grid
coverage), then some code *outside* GridCoverage may be perturbed. for
example a renderer may no longer draw the GridCoverage at the right
screen location because it was not notified that the GridCoverage has
moved. We would need to register PropertyChangeListeners, which add a
lot of complexity for little gain.
- The other immutability is the state of pixel values. Here, grid
coverages are not completly immutable. According OpenGIS spec, some
GridCoverages may be writable. But pixel values are the only thing we
can edit. Complex feature like GridGeometry, CoordinateSystem, etc.
Note that while GridGeometry is a big object, creating a GridGeometry=20
clone with just a few differences don't consume that much memory. This=20
is because two instances of GridGeometry can share a lot of references=20
to the same object (e.g. use the same CoordinateSystem, the same=20
GridRange, etc.). A reference consume only 4 bytes... Sharing the same=20
instances is possible because... thoses instances are immutables :)!
>>4) Grid2DSquareCellDouble seems to stores pixel values in a Hashtable
>> as well as in a double array. Doesn't it means that you intend to
>> support sparse matrix? A Hastable is not really an efficient storag=
>> mechanism for a dense image.
> Yes it stores in both, but only one at a given time. The optimisation =
> handled in a hard coded way based on what seemed about right ( see=20
> optimiseCollection() ). This is one of several key things that can be =
> much better. It relates to the problems of getting available memory an=
> calculating what is best apriori storage wise given what is planned=20
Well, is anycase the only case where a Hashtable would be more efficient=20
would be with an image with a lot of holes. Lets compute:
- Using an array of type 'double', each pixel value consume 8 bytes.
- Using an hashtable, each pixel value consume 10 bytes for the Double
value (I had 2 bytes for each object instanciated with the HotSpot
Client virtuan machine; it would be 3 bytes for the server), plus 6
bytes for the Integer key, plus 18 bytes for the internal Map.Entry
object used internall by Hashtable, plus approximatively 6 bytes for
the internal Hashtable array. TOTAL =3D 40 bytes per pixel!!!
A Hashtable would be more efficient only if less than 20% of the image=20
area is filled with data. If the user use the 'byte' data type rather=20
than 'double', then a hashtable would be more efficient only in less=20
than 3% of the whole image area is filled with data!!!!!!!!
Furthermore, fetching data in a Hashtable is much slower than a plain=20
array: Hashtable is synchronized, requires "new Integer(...)" for every=20
pixel fetching (which put a lot of overhead on the garbage collector),=20
etc. I don't think that the Hashmap solution should be pushed any further.
>>5) How do we construct grids with more than one band?
> I can see that this could be one in two ways but neither are implemente=
> One option is to have an array of grids indexed by band. Another would=
> be to change the internal storage of each grid cell as an array.
The first solution is what java.awt.image.DataBufferDouble does. The=20
second solution would consume to much memory and put too much overhead=20
on the garbage collector.
>>7) Grid2DSquareCellDouble constructor expect the minimum and maximum
>> latitude and longitude as arguments. How do I tell, for example,
>> that I want to inverse the Y axis direction (e.g. make it point
>> toward North rather than toward South as in the default Java2D
>> coordinate system)?
> There are a number of constructors. There is no method that flips a gr=
> upside down. I have always treated the axes as fundamental orthoganol=20
> unchangeable and although we are restricted (and this is not handled we=
> theoretically infinite.
Rather than a method for flipping axes, a more general solution is to=20
accept a MathTransform in the constructor rather than a bounding box.
>>"noDataValues" are part of OpenGIS GCS specifications and are already=20
>>supported by gcs-gridcoverage. Actually, I found only one noDataValue=20
>>field in AbstractGrid2DSquareCellDouble while, according OpenGIS GCS, a=
>>grid coverage can contains many different "noDataValues". The=20
>>gcs-coverage implementation supports multi-noDataValues.
> I have yet to need more than one noDataValue for a grid. I cannot imag=
> circumstance when another "masking" grid could not be used more efficie=
> instead. However the grids I am using tend to only have one band and t=
> is probably other reasons which I do not understand...
In my laboratory, we have image of Sea Surface Temperature build like tha=
- Sample value 0 means "nothing know about this pixel"
- Sample value 9 means "clouds"
- Sample value 240 means "lands"
- Sample value 50 to 230 are Sea Surface Temperature, to be converted
into =B0C using the formula T(=B0C) =3D 15 + sampleValue/10
In this example, sample values 0, 9 and 240 are all "no data values". We=20
have different noDataValues because the data are missing for different=20
reasons. I think that an other masking grid would be less efficient=20
here. It would force me to verify in three images before to know if a=20
pixel is valid, and three images consume more memory than one.
>>- AbstractGrid2DSquareCellDouble seems to be basically a mix of the
>> following classes:
>> java.awt.image.DataBufferDouble for hodling the data
>> java.awt.image.Raster for accessing pixel data
>> org.opengis.gc.GC_GridCoverage for the geographic informatio=
>> I think that the above separation is good. It make it possible to
>> use the same algorithm in a top-level class (e.g. GC_GridCoverage)
>> no matter how the data are stored in the low-level class (e.g.
>> DataBufferFloat, DataBufferDouble, etc.).
> I agree with you for "small" grids. But my problem has been processing=
> very large grids. The grids package support the low-level not being in=
> "memory" memory but in "filespace" memory.
What do we means by "large grids" here? (i.e. which size?)
Anyway, even a huge grid still workable with JAI. We don't have to keep=20
every tiles in memory in same time. A huge grid could very well have=20
some of its tiles in memory, and some of them swapped to disk. This is=20
an implementation issue and can be managed well with JAI.
> More thought is needed to work out the best way to proceed in my mind. =
> At the moment the org.geotools.gc.GridCoverage needs to hold all the=20
> data in memory. The uk.ac.leeds.ccg.grids.Grid2DSquareCellDoubleChunk=20
> does not and offers something like JAI tile caching.
GridCoverage doesn't need to hold all data in memory. Actually,=20
GridCoverage has no business is saying where the data should live. This=20
is up to RenderedImage. If we have a custom RenderedImage implementation=20
that store data on a file, then GridCoverage has no problem with that.
Actually, JAI has already a RenderedImage implementation that stores the=20
pixel data... on the network (which is quite exiting too!). The pixels=20
can be on a distant machine and tiles are sent one tile at time through=20
the network when required.
> One way to integrate the methods in Grid2DSquareCellDoubleProcessor=20
> and Grid2D....ProcessorDEM is to write into the factories and construct=
> of the uk.ac.leeds.ccg.grids package methods for taking in the data fro=
> an org.geotools.gc.GridCoverage.
> Re writing the Grid2D...Processor* methods inside=20
> org.geotools.gc.GridCoverage is also a good plan, but is that lots of=20
> unnecesarily work?
> To fit in the GT2 structure the uk.ac.leeds.ccg.grids package can also =
> renamed to live somewhere under org.geotools.gc.
> Thanks for your reply. I look forward to having an IRC and working tog=
The Grid2D...Processor methods shouldn't be inside GridCoverage. It is=20
not GridCoverage work. It is GridCoverageProcessor business instead.
If we want to go among the OpenGIS spirit, then there is my suggestions:
- Current Grid2D...Processor contains many methods performing different
tasks. I suggest to split them: One task =3D=3D one class. Those class
will be Operation, which are later used by GridCoverageProcessor.
- Those operations should work on RenderedImage, not on
Grid2DSquareCellDouble. I suggest that you start with a very
simple operation in order to get used with JAI API. Try the
- Gets a RenderedImage as input.
- Use javax.media.jai.iterator.RectIter in order to iterate through
all pixels values in this RenderedImage.
- Write the result in whatever structure you want (it may be
Grid2DSquareCellDouble if you want). Later, I will give you
some tips for writting directly in an other RenderedImage instead.