Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
I'm relatively new to DiGIR and its functionality, and I'm also relatively new to the Darwin Core standard, but thought this would be a good place to post a fairly basic question for my better understanding of its capabilities.
Because DiGIR utilizes Darwin Core to extract data from a dataset, does that limit DiGIR applications to just specimen data? If the contributing database includes data that is not addressed in Darwin Core, is it possible to make it available using DiGIR?
Does Darwin Core just focus on data that are needed to establish a specimen collection? It appears that D.C addresses things like species, location, collector etc., but many measurements obtained during the conduct of a specific project may be excluded.
Are projects and data that do not produce a specimen not compatible with DiGIR? For example, would it or would it not be possible to use DiGIR to pull together a bunch of Water Quality data?
I greatly appreciate any info on this as I am part of a Data Management Working Group tasked with assessing existing data access and repository options for a wide variety of data (vegetation, water quality, air quality/climate change, and avian/bird data).
Thomas E. Burley
Geospatial Applications - NBII Southern Appalachian Information Node
Research Associate - The Institute for a Secure and Sustainable Environment
The University of Tennessee
Knoxville, TN 37996-4134
DiGIR is not bound to any particular conceptual schema. It is being used with DarwinCore because it was conceived within the biodiversity informatics community and the first networks needed to query biological collection data.
You're free to define your own conceptual schema provided is follows the same rules you can find in DarwinCore: an XML Schema with root elements and specific substitutionGroups pointing to some abstract DiGIR elements.
It is also possible to extend a particular conceptual schema, and then map multiple schemas when configuring a data provider.
Please note that there's a new protocol which unifies DiGIR and BioCASe. It is called TAPIR and the first specification should be available within the next weeks. You can find more information here:
DarwinCore will also have a new version, this time not bound to DiGIR:
Hope this helps,
DiGIR was designed to be a general purpose solution for the problem of distributed search over federated databases. As such, it is not tied specifically to the Darwin Core, though it seems most of the current installations use some variant of that schema.
The primary limitation of DiGIR is that although it can support federation schemas (like the Darwin Core) with a variety of structural characteristics (such as nested elements) it only allows for searching of top level elements in the schema.
Since DiGIR was conceived and developed, there has been considerable activity in the area of federated databases and how it can be approached as a more constrained case of a semantic web. To that end the W3C has recently endorsed a query language and a protocol call SPARQL that would be suitable for replacing DiGIR as a distributed data retrieval protocol. There is a lot of development activity by a number of different groups on SPARQL, so one big advantage of following that activity is that we can utilize the work and resources already invested by a larger community than ours.
As Renato points out, there has also been a lot of work on a protocol called "Tapir" by GBIF and TDWG that presents one option for replacing DiGIR and BioCase data providers.
There are a number of implementations of systems that support the SPARQL protocol, and we have been working on a system (formerly labeled "DiGIR2", but now called WASABI - Web Applications for the Semantic Architecture of Biodiversity Informatics) that is meant to replace DiGIR. It offers a data service architecture that permits supporting a number of different protocols including SPARQL, OAI (the Open Archives Initiative), and others (e.g. Tapir).
So, the short answer is yes, DiGIR is perfectly suitable for connecting together a "bunch of water quality data", and a fully functional system could be implemented in relatively short order. In the big picture though, I would suggest taking a careful look at the SPARQL protocol and consider the levels of data integration that can be obtained with systems built upon widely endorsed and implemented standards of the internet community. A quick google of "SPARQL" should provide a lot of reference material for you.