This might be a good idea, but there are several design trade-offs here.
First of all, perhaps we should define a little bit more about what we mean
by metadata. Each person seems to have a slightly different definition.
Next come the question of what metadata to add by default and where do we
put it (in the XML files or not in the XML files) and should we allow the
users to change this on a collection-by-collection configuration. For
example the current system keeps track of the following for each resource
and each collection:
last updated date-time
Note that collections also have these items also.
Since we frequently need to sync much of our data to subversion we also add
the user-id that created the document and the user-id that last modified the
document. But we put all this metadata at the end of each XML file for
"administered items" as the ISO-11179 metadata registry spec calls them.
There are many trade-offs for storing system metadata in the XML documents
and not in the documents.
We do try to centralize some of these functions using a common XQuery module
but we have more work to do here. We can not use the eXist built-in
time-stamp metadata for user and timestamps since it gets changed when we do
a restore from a backup and does not reflect actual user and timestamps that
did change the data.
We also would like to do what subversion does and have a timestamp change of
a collection that reflects the most recent update of any resource inside
that collection. This would be very useful for doing Sync operations
between exist systems and systems like subversion.
One option might be to create a collection configuration standard that would
automatically add a <metadata> tag to the end of each element that needs
this metadata and keeps it up to date.
I could also see a lot of other useful data that might be unimportant to
other people. Things like "validated-by" metatdata that hold the XML Schema
name and version and time-stamp that a document what checked against a
specific version of an XML Schema. Or a "published" date-time that shows
when the document was published to an external public web server and who
authorized the document to be published.
I have also tried to use eXist triggers to keep this metadata up to date but
my work on triggers has not been very successful and I don't have the
background to debug why the sometimes do not fire.
I hope that give us some ideas of where this can go. My only real
suggestion is that we use the existing collection configuration files to
change what metadata is tracked and where it is stored. It might be
interesting to try to do this with just triggers and in-document XML data as
a starting point.
On Tue, Jul 27, 2010 at 7:52 PM, Patrick Bosek <patrick.bosek@...:
> Hello eXist Developers!
> I need the ability to set up some pretty advanced metadata for documents
> and for binary objects (perhaps even collections?). I spoke briefly with
> Adam, he seemed to think a module was the wrong path, and eXist needed
> metadata functionality built into the core (Adam, please feel free to
> correct me if you don't feel I've properly represented you). This is a job
> I'm more than willing to undertake, but I think it would be best if I was
> directed by the core developers, since I want to make sure anything I write
> benefits everyone and not just my needs. Also, I think I could probably
> build it much faster if given a few pointers.
> My needs are basically just the ability to efficiently store and index a
> schema validated piece of XML associated with other objects in the system. I
> was also thinking this might be a good time to consider how this could be
> extended into native XLink functionality.
> I've done a little work in the code base, I built a small addition to the
> unix permissions to allow more granular permissions (which I still intend to
> contribute, but I haven't had time to fully test, and now I'm thinking of
> changing it as I work on the metadata stuff). But a quick "here's where I
> would start" would be very helpful. Also, any tips to ensure efficiency
> would be appreciated. Lastly, links to reading material are always great.
> Let me know what ya'll think!
> Patrick Bosek
> Jorsek Software
> Cell (585) 820 9634
> Office (585) 239 6060
> The Palm PDK Hot Apps Program offers developers who use the
> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
> of $1 Million in cash or HP Products. Visit us here for more details:
> Exist-development mailing list
Semantic Solutions Architect
office: (952) 931-9198
cell: (612) 986-1552