wdb-development Mailing List for wdb - weather/water database system

wdb-development — The development mailing list for WDB.

You can subscribe to this list here.

Flat | Threaded

[WDB-Development] Closing this list

From: Michael A. <mic...@me...> - 2014-10-16 11:36:06

Hi,

This is just to let you know that we are closing down this mailing list.

A new mailing list has been established at wdb...@li...

You can sign up to it here:
http://lists.met.no/mailman/listinfo/wdb-users

Regards,

Michael Akinde

[WDB-Development] Fwd: WDB parameter names and the cf metadata standard

From: Michael A. <mic...@me...> - 2010-09-28 18:01:23

Resend of this mail (bounced first time)

----- Forwarded Message -----
From: "Michael Akinde" <mic...@me...>
To: wdb...@li...
Cc: "Heiko Klein" <he...@me...>, "Lisbeth Bergholt" <lis...@me...>, "Aleksandar Babic" <ale...@me...>, "Juergen Schulze" <jue...@me...>, "Audun D. Christoffersen" <au...@me...>, "Håvard Futsæter" <ha...@me...>, "Harald Skoglund" <har...@me...>
Sent: Friday, September 24, 2010 3:01:11 PM
Subject: WDB parameter names and the cf metadata standard

Hi all,

There is a lot of interest for making the WDB parameter names work better together with the CF metdata standard. This would have many benefits, not least the ability to interface much more easily with other open source projects such as FIMEX, et al. CF Metadata is also increasingly being adopted by many projects as the parameter name standard, so making it easy to use CF metadata would improve the general usability of WDB.

There are issues with CF metadata wrt to WDB's use of parameters however. I have summarized the problems, and a proposed solution, here: 

https://sourceforge.net/apps/trac/wdb/wiki/DevCfParameters

The solution requires some significant changes in the WDB data model, which means that it would be best to get these changes done before v1.0.0 if they are to be done at all. At the moment, I think this would probably be worthwhile (it is much easier to discuss metadata based on a standard than having to explain and document the current design).

For users of WDB, the main difference will be:
- users should shift to a different parameter name space.
- the names of some parameters will change (again).

Thoughts and comments?

Regards,

Michael A.

Re: [WDB-Development] WDB parameter names and the cf metadata standard

From: Michael A. <mic...@me...> - 2010-09-27 12:02:42

Heiko wrote:
> that looks good. I would suggest an additional function:
> 
> wci.findParametersByCFStandardName(input text)
> 
> which might return several of wdb parameters.

Yes.

> It seems like you use and extend
> http://cf-pcmdi.llnl.gov/documents/cf-standard-names/guidelines to
> construct new standard-names (+ cell method). CF-standard_names are
> not thought to be unique. So maybe you need an an additional ID.

No, the idea is not to produce new standard names (though I have seen one institution/project that argued for adding functions like max/min to the CF standard). I think the CF-standard is fine as is; it is well suited to it's purpose. The proposal here is to ensure that the WDB parameter name (in use by met.no) stays close/"compatible" to the CF standard name.

Standard name + cell method would seem to be unique enough for our purposes.

> standard_names will never be complete. Adding new items to CF takes
> weeks to month (much faster than WMO Volume C (grib)) You should open
> for names not in the standard yet.

Not a problem. The only issue would be if the standard is extended by additional qualifications, but that is not likely to occur very often.

> CF specifies clearly how to distinguish all parts, i.e. 'at',
> 'assuming', 'in', 'due to' + restricted vocabulary for [component] and
> [surface]. How do you distinguish your additions [cell method] and
> [function]? (Do you mean two brackets means you have to write at least
> one, while one bracket means 'this is optional'?)

Sorry - not very clear in that. Yes - the two brackets is supposed to indicate that one is written.

So: 
air temperature
air temperature [time: maximum within days]

And so on.

> I don't like the [function]: maybe everybody at met.no understands
> 'max air temperature' is the same as 'air temperature [time: maximum]', but
> don't show this to a sensor developer, who will think [xy: maximum]

Note WDB has multiple namespaces (namespaces are distinct domains with different names for the same parameters).

The canonical namespace (namespace 0) would contain the "standard name [cell methods]" form of parameter names. This will be ideally suited for applications that translate to/from NetCDF/cf-metadata names, since it is possible to programmatically generate the cf-metadata standard name from the canonical WDB parameter names and vice versa.

The met.no namespace (namespace 88) has parameters built on the form "function standard name"; e.g.,

air temperature
max air temperature (could be "maximum air temperature", but since the function designations are non-standard, we might as well stick to those we already use in WDB).

The met.no namespace will be the default usage namespace (previously the canonical namespace was default for data provider and parameter, but not for places, where we use a similar system of canonical = machine usage, met.no namespace = human readable). The key to making this work from a WDB point of view is the approach requires minimum maintenance - both the canonical and met.no namespace can be maintained easily based on the cf-metadata standard (the met.no namespace will require a smallish table mapping the function names we use to cell methods). The majority of the maintenance can be handled programmatically. The main weakness of the approach (other than having to rename parameters again) from the database point of view is that people may use the names inconsistently (e.g., use "specific gravitational potential energy" in one case and "specific potential energy" in another in the same database) and that some cf-metadata parameters seem to mix level into the parameter name (which is frowned upon in WDB context). The cf-metadata standard name aliases can be handled if we wish, though.

> I have understanding for your approach: "put all metadata into one
> string", but it conflicts with: "make that string easily human
> readable". Question to the developers of graphical applications: How
> many characters will a user accept for a string?
> 
> A used standard name at met.no (LF) which still might be extended by
> due_to_..._assuming_... (excluding your extensions) is:
> 
> atmosphere_mass_content_of_secondary_particulate_organic_matter_dry_aerosol
> 
> We give it the long_name (= display name according to netcdf-user
> guide): SOA

IMO, descriptive names are always better - then I can at least pretend to understand what I'm reading, rather than being completely clueless. ;-)

I think I've mentioned this before in the context of Diana; from WDB's point of view, it is not a problem to implement a separate diana namespace containing all the abbreviations that one wishes. We (as in the WDB group) just do not want any part in having to maintain it, which is another way of saying that I think it would be a bad idea.

Long names are (should) not be a problem from an application developer's point of view; after all, it is just a string. For a user in WDB, longer names should also not be too problematical since WDB supports extensive use of wild cards (e.g., "%atmosphere%" would retrieve all parameters with "atmosphere" in it).

Hope that cleared up some of issues.

Regards,

Michael A.

> On 2010-09-24 15:01, Michael Akinde wrote:
> > Hi all,
> >
> > There is a lot of interest for making the WDB parameter names work
> > better together with the CF metdata standard. This would have many
> > benefits, not least the ability to interface much more easily with
> > other open source projects such as FIMEX, et al. CF Metadata is also
> > increasingly being adopted by many projects as the parameter name
> > standard, so making it easy to use CF metadata would improve the
> > general usability of WDB.
> >
> > There are issues with CF metadata wrt to WDB's use of parameters
> > however. I have summarized the problems, and a proposed solution,
> > here:
> >
> > https://sourceforge.net/apps/trac/wdb/wiki/DevCfParameters
> >
> > The solution requires some significant changes in the WDB data
> > model, which means that it would be best to get these changes done
> > before v1.0.0 if they are to be done at all. At the moment, I think
> > this would probably be worthwhile (it is much easier to discuss
> > metadata based on a standard than having to explain and document the
> > current design).
> >
> > For users of WDB, the main difference will be:
> > - users should shift to a different parameter name space.
> > - the names of some parameters will change (again).
> >
> > Thoughts and comments?
> >
> > Regards,
> >
> > Michael A.
> >

Re: [WDB-Development] wci.read on wdb2ts

From: Michael O. A. <mic...@me...> - 2008-11-06 22:58:31

> The reasoning seems sound to me, especially since we know that the
> boundary check is inaccurate. However, I would like to know more about
> the result the other optimizations: Last time I looked at the difference
> in performance between fetching a point and fetching a reference to an
> entire field was fairly small. If this is still true I believe the
> benefits from keeping the check may outweight the benefits of removing
> it.

This will not be the case; dropping the check for bounds in the join
generally reduces query execution time by an order of magnitude: <20 ms,
instead of 100-200 ms.

> Is it possible to run this in two steps? First use point specification
> to find the correct placeid(s), and then perform the select on
> wci_xxx.oidvalue? I guess this should make the planner happy.

We could run a function to filter the results of the SELECT, but that
would not make much sense, since the query extraction function already
does that. It can't be done before the SELECT, without adding a function
into the SELECT query with the attendant complications for the optimizer.

Regards,

Michael A.

Re: [WDB-Development] wci.read on wdb2ts

From: V. B. <veg...@me...> - 2008-11-06 12:56:50

> 1. Delayed geographic check
>
> The current wci.read query generated, checks that a point being
> requested is within the geographic boundaries of the field being
> returned (using within and transform). This prevents us from returning
> 100+ rows to point extraction function, when perhaps only 1 row is of
> interest.
>
> The problem: the function is costly (although we need not be too
> concerned about the 3 ms in the context of 100); more importantly, the
> function is impossible to estimate and as a result it confused the
> optimizer.
>
> Proposed solution:
> - We assume that the majority of the geographic searches we get will be
> on appropriate data sources (i.e., applications will not often search
> for Australian locations on Hirlam data).
> - The weeding out of data by geography, will then instead occur when we
> (try to) extract the grid points from the data (if the grid point that
> we calculate falls outside the grid, we obviously don't return data for
> that point). The extra check at this point should not - under normal
> circumstances - be particularly costly.
>
> The solution may need to be different for Polygons; we'll just have to
> deal with that when it comes up.

> Comments...? Particularly on #1 Vegard?

The reasoning seems sound to me, especially since we know that the
boundary check is inaccurate. However, I would like to know more about the
result the other optimizations: Last time I looked at the difference in
performance between fetching a point and fetching a reference to an entire
field was fairly small. If this is still true I believe the benefits from
keeping the check may outweight the benefits of removing it.

A user may not normally request data for a wrong location when working
with a norwegian hirlam model. However, the check also guards against
typos and misunderstandings. For example, the wkt specification of point
data makes it fairly easy to swap latitude and longitude parameters.

Is it possible to run this in two steps? First use point specification to
find the correct placeid(s), and then perform the select on
wci_xxx.oidvalue? I guess this should make the planner happy.

VG

[WDB-Development] wci.read on wdb2ts

From: Michael O. A. <mic...@me...> - 2008-11-06 12:12:43

I have been looking into the issues surrounding the low wci.read
performance for grids that we have been getting in wdb2ts queries.

Part of the problem is due to the lack of indexes; a result of the lack of
appropriate load we have had to test with. Adding some appropriate indexes
improves performance considerably, taking it down from 400-600 ms/query to
around 200 ms (100 ms to retrieve data + 100 ms query execution). At this
tempo, it is still using nested loops extensively, resulting in repeated
index scans of the oidvalue table, for instance.

I believe we should be able to bring the query execution down to about 20
ms (probably close to the optimal performance on PrologDev1), if we can
get the query to hash index properly. To achieve this, we will need to
assist the Postgres query optimizer quite a bit.

Following is a couple of optimizations we might implement on the wci.read
side, to make the job easier for the Postgres optimizer:

1. Delayed geographic check

The current wci.read query generated, checks that a point being requested
is within the geographic boundaries of the field being returned (using
within and transform). This prevents us from returning 100+ rows to point
extraction function, when perhaps only 1 row is of interest.

The problem: the function is costly (although we need not be too concerned
about the 3 ms in the context of 100); more importantly, the function is
impossible to estimate and as a result it confused the optimizer.

Proposed solution:
- We assume that the majority of the geographic searches we get will be on
appropriate data sources (i.e., applications will not often search for
Australian locations on Hirlam data).
- The weeding out of data by geography, will then instead occur when we
(try to) extract the grid points from the data (if the grid point that we
calculate falls outside the grid, we obviously don't return data for that
point). The extra check at this point should not - under normal
circumstances - be particularly costly.

The solution may need to be different for Polygons; we'll just have to
deal with that when it comes up.

2. Increased use of materialized views

The wci.oidvalue view still uses one non-materialized view (placename).
Eliminating that non-materialized view will reduce the number of joins
required by the wci.read query from 6 to 5.

3. Use of IN (x..x) instead of x OR x

99% of the time, this will probably not be useful for the optimizer, as it
tends to treat this the same in complex queries, but the Postgres query
optimizer is supposedly capable of doing a few smart tricks with IN.
Making it possible for the Postgres optimizer to consider better options
would probably be worth it.

Comments...? Particularly on #1 Vegard?

Regards,

Michael A.

Re: [WDB-Development] WCI Interface

From: Jan I. P. <ja...@me...> - 2008-10-15 05:15:17

Then I think you should go ahead and implement the changes as suggested.

-JI


ti., 14.10.2008 kl. 14.59 +0200, skrev Michael Omotayo Akinde:
> The change won't add any new vulnerabilities that do not already
> exist. Strings are already being used for most of the other dimensions
> in WCI.
> 
> Regards,
> 
> Michael A.
> 
> Jan Ivar Pladsen wrote: 
> > ma., 13.10.2008 kl. 10.42 +0200, skrev Michael Akinde:
> > 
> >   
> > > The only risks that I can see are the usual: adding new bugs, being 
> > > unable to complete the changes on time, etc. In general, using a string 
> > > specification makes for a much more flexible solution to the 
> > > specification problem which reduces the likelihood of needing to do 
> > > future costly changes to the interface.
> > >     
> > 
> > Any reason to be worried about adding security related bugs via needed
> > string processing routines? buffer overflow, indexing, etc?
> > 
> > 
> > 
> > -------------------------------------------------------------------------
> > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> > Build the coolest Linux based applications with Moblin SDK & win great prizes
> > Grand prize is a trip for two to an Open Source event anywhere in the world
> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > _______________________________________________
> > Wdb-development mailing list
> > Wdb...@li...
> > https://lists.sourceforge.net/lists/listinfo/wdb-development
> >   
>

Re: [WDB-Development] WCI Interface

From: Jan I. P. <ja...@me...> - 2008-10-14 07:48:51

ma., 13.10.2008 kl. 10.42 +0200, skrev Michael Akinde:

> The only risks that I can see are the usual: adding new bugs, being 
> unable to complete the changes on time, etc. In general, using a string 
> specification makes for a much more flexible solution to the 
> specification problem which reduces the likelihood of needing to do 
> future costly changes to the interface.

Any reason to be worried about adding security related bugs via needed
string processing routines? buffer overflow, indexing, etc?

Re: [WDB-Development] WCI Interface

From: Michael A. <mic...@me...> - 2008-10-13 08:43:54

Attachments: michael.akinde.vcf

Jan Ivar Pladsen wrote:
> But what are the immediate and long run cost and risks?
>   
The costs are that the applications using the WCI need to be changed. 
This is better done now (while the number of applications are few) than 
later. In the long run, costs should be reduced since this change should 
reduce the amount of code we need to maintain.

The only risks that I can see are the usual: adding new bugs, being 
unable to complete the changes on time, etc. In general, using a string 
specification makes for a much more flexible solution to the 
specification problem which reduces the likelihood of needing to do 
future costly changes to the interface.

Re: [WDB-Development] WCI Interface

From: Jan I. P. <ja...@me...> - 2008-10-13 08:36:30

fr., 10.10.2008 kl. 16.12 +0200, skrev Michael Akinde:
> 
> My suggestion is that we replace wci.timespec and wci.levelspec in the
> function interface with text strings.
> 
> This has several benefits, related to the goals for WDB:
> - Simplicity: it allows us to easily implement a number of
> improvements to the time and level retrieval interface, which should
> make it easier to utilize these dimensions. For example, to make it
> unnecessary to specify two timestamps when we are specifying a time
> point.
> - Flexibility: Every data type defined in the wci schema poses a
> problem in terms of flexibility. Because the wci schema is exposed to
> the users, it is problematical to implement changes to WDB that
> involve changes in any such data types (such as, for instance,
> implementing new interpolation types).

But what are the immediate and long run cost and risks?

-JI

[WDB-Development] WCI Interface

From: Michael A. <mic...@me...> - 2008-10-10 14:13:02

Attachments: michael.akinde.vcf

I propose that the WCI interface for the next version of WDB (0.8.0) be 
upgraded.

My suggestion is that we replace wci.timespec and wci.levelspec in the 
function interface with text strings.

This has several benefits, related to the goals for WDB:
- Simplicity: it allows us to easily implement a number of improvements 
to the time and level retrieval interface, which should make it easier 
to utilize these dimensions. For example, to make it unnecessary to 
specify two timestamps when we are specifying a time point.
- Flexibility: Every data type defined in the wci schema poses a problem 
in terms of flexibility. Because the wci schema is exposed to the users, 
it is problematical to implement changes to WDB that involve changes in 
any such data types (such as, for instance, implementing new 
interpolation types).

It is suggested to replace the specifications as follows.

wci.timespec (time specification)

' [ exact | before | after | inside ]  (DATE | TIMESTAMP w/wo tz | TIME 
w/wo tz | special date) [ TO  (DATE | TIMESTAMP w/wo tz | TIME w/wo tz | 
special date | INTERVAL) ]'

wci.levelspec (level parameter)

'[ exact | below | above | inside ] (FLOAT | INT) [ TO (FLOAT | INT) ] 
LEVELPARAM'

I believe both of these variants should be relatively easy to implement 
(for the timespecification, we would simply NOT verify the format of the 
time spec, and simply focus on REGEXPS on the time interpolation and 
"TO" and leave the timestamp/time/special datetime verification to 
Postgres).

The change would require some updates in WDB2TS and other applications 
that use 0.7.x of WDB, of course, although such changes should be 
relatively limited.

Comments?

Regards,

Michael A.

Re: [WDB-Development] 'alt' parameter

From: V. B. <veg...@me...> - 2008-07-02 08:36:29

You are right, thank you.

I have made an addition in svn, and it should appear on the website shortly.

VG


> On
>   http://wdb.met.no/wdb2ts/0.1/html/ch01s03.html
>
> it says
>   "The level specification allows the user to specify the level
>   (i.e., factors such as altitude or depth) for which data should
>   be returned. The level specification is given as follows:
>
>      levelspec=levelfrom,levelto,levelparameter,indeterminatetype
>    "
>
> but nothing about an "alt" parameter.
>
> But in src/mod_wdb2ts/wciWebQuery.cc I see an "alt" parameter:
>
>     * Decodes a query on the form:
>     *
>     * http://server/path/?lat=10;lon=10;alt=10;
>     *   reftime=2007-12-10T10:00,2007-12-10T10:00,exact;
>     *   dataprovider=1096;
>     *   dataversion=-1;
>     *   parameter=instantaneous pressure of air,instantaneous
> temperature of air,instantaneous velocity of air (u-component); *
> levelspec=2,2,above ground,exact;
>     *   validtime=2007-12-10T00:00,2007-12-10T10:00,intersect
>     *   format=CSV
>
> shouldn't the "alt" parameter be documented on
> ï»¿  http://wdb.met.no/wdb2ts/0.1/html/ch01s03.html
>
> ?
>
> -JI
>
>
> -------------------------------------------------------------------------
> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
> Studies have shown that voting for your favorite open source project,
> along with a healthy diet, reduces your potential for chronic lameness
> and boredom. Vote Now at http://www.sourceforge.net/community/cca08
> _______________________________________________
> Wdb-development mailing list
> Wdb...@li...
> https://lists.sourceforge.net/lists/listinfo/wdb-development

Re: [WDB-Development] Documentation: 'Height Correction' file formats

From: V. B. <veg...@me...> - 2008-07-02 08:21:13

I do not know the formal spec of the file format that is used. However,
the reading of topography from files will be removed from wdb2ts v 0.3.0,
due in ~1 week, making both the bug and the documentation you request
obsolete.

VG


> There is a bug on "Height Correction"
>   https://wdb.bugs.met.no/show_bug.cgi?id=60
>
> but I find no documentation on which height correction file formats are
> supported by wdb2ts on the wiki or in the docs.
>
> -JI
>
>
> -------------------------------------------------------------------------
> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
> Studies have shown that voting for your favorite open source project,
> along with a healthy diet, reduces your potential for chronic lameness
> and boredom. Vote Now at http://www.sourceforge.net/community/cca08
> _______________________________________________
> Wdb-development mailing list
> Wdb...@li...
> https://lists.sourceforge.net/lists/listinfo/wdb-development

[WDB-Development] Documentation: "Height Correction" file formats

From: Jan I. P. <ja...@me...> - 2008-07-01 21:12:12

There is a bug on "Height Correction"
  https://wdb.bugs.met.no/show_bug.cgi?id=60 

but I find no documentation on which height correction file formats are
supported by wdb2ts on the wiki or in the docs.

-JI

[WDB-Development] Documentation: "versioned interface"

From: Jan I. P. <ja...@me...> - 2008-07-01 21:06:26

On 
  https://wdb.wiki.met.no/doku.php?id=utilities:wdb2ts

you say 
  "The request interface ... is versioned ...", 

but on 
  http://wdb.met.no/wdb2ts/0.1/html/ch01s02.html 

the text (and examples) says nothing about a version number other than
the part about "dataversion=dataversionval" in the "parameterspec". 

Should I file a bug or have I misunderstood something?

-JI

[WDB-Development] "alt" parameter

From: Jan I. P. <ja...@me...> - 2008-07-01 21:06:12

On
  http://wdb.met.no/wdb2ts/0.1/html/ch01s03.html

it says
  "The level specification allows the user to specify the level
  (i.e., factors such as altitude or depth) for which data should
  be returned. The level specification is given as follows:

     levelspec=levelfrom,levelto,levelparameter,indeterminatetype
   "

but nothing about an "alt" parameter.

But in src/mod_wdb2ts/wciWebQuery.cc I see an "alt" parameter:
  
    * Decodes a query on the form:
    * 
    * http://server/path/?lat=10;lon=10;alt=10;
    *   reftime=2007-12-10T10:00,2007-12-10T10:00,exact;
    *   dataprovider=1096;
    *   dataversion=-1;
    *   parameter=instantaneous pressure of air,instantaneous temperature of air,instantaneous velocity of air (u-component);
    *   levelspec=2,2,above ground,exact;
    *   validtime=2007-12-10T00:00,2007-12-10T10:00,intersect
    *   format=CSV
 
shouldn't the "alt" parameter be documented on 
  http://wdb.met.no/wdb2ts/0.1/html/ch01s03.html

?

-JI

[Wdb-development] Introduction

From: Michael O. A. <mic...@me...> - 2008-05-22 14:01:06

The development mailing list for WDB. Non-development questions should be
posted to 'wdb-users' instead; that includes build problems, configuration
issues, etc, as well as usage questions.

5 messages has been excluded from this view by a project administrator.

Flat | Threaded