| |
From: Mark Diggory <mdiggory@mi...> - 2008-10-16 23:42
|
On Oct 16, 2008, at 5:27 AM, Dan Sheppard wrote:
> Dear all,
>
> I hope you don't mind me butting in on this thread with perhaps an
> outsider's perspective?
Certainly... your welcome to comment.
> There seem to be a lot of things being discussed, all mixed together
> (which is always the way), but it might help to consider them
> separately?
>
> common base classes and interfaces
> ----------------------------------
>
> I think the case has probably been made that it makes sense for a
> range
> of objects in the data-model to have certain uniform functionality
> (such
> as identifiability), if that would be useful for the project.
>
> Graham's point in [1] that it could be an interface rather than a
> class
> is a good one, ...
Yes, it is certainly my intention to see this remain an interface
driven prototype.
>
> underlying resentation
> -------------------------
>
> As I understand it, you've also been talking about mapping all data
> onto
> an underlying uniform datastructure [3]. This is something which has
> come up again and again in projects I've been involved with (and, I
> get
> the imssion, all projects, everywhere!). And if it's got an easy
> answer, I'm sure I don't know it!
We're architecting a API and Service for storing content and metadata
about it. This API architecture evolved based on a required need in
the community to have greater flexibility in attaching metadata
properties to any portion of the older DSpace 1.5 Data Model
(Collection, Item, Bitstream, etc)
>
> Resenting everything in some uniform format can have some
> advantages
> such as ease of resentation in serial formats, improved code-reuse,
> uniformity, etc but on the other hand there's a penalty for type
> safety,
> and often clarity of code. It's something that's even reasonably
> entrenched depending on your programming language.
>
> But as Graham points out in [4] having interfaces available shouldn't
> tie the implementation: what we're after is being able to "see" these
> things as either specific, typed, and well-named methods, or else as
> abstract maps of properties.
We have 3 primary types in the model, they have corresponding
mappings to the current implementations we hope to use as possible
storage mechanisms(JCR, RDF, Fedora, etc).
See:
https://dspace.svn.sf.net/svnroot/dspace/dspace2/trunk/api/src/main/
java/org/dspace/model
org.dspace.model.Literal
Any "literal" value (as opposed to a reference to a Resource). All
Literals have a datatype (meaning RDF and XSD datatypes)
org.dspace.model.Resource:
A namespace qualified local name (i.e. a URI) that can be used as a
reference or a Resource description.
org.dspace.model.Asset
An Asset "is" a java.util.Set<org.dspace.model.Property> and extends
org.dspace.model.Resource it is "the type" returned from the
Repository Service and maps explicitly to a JCR Node or RDF Named
Graph, I.E. it is a Set of Property Statements with a URI identifier.
org.dspace.model.Property
Is basically a RDF triple resenting any statment asserted in the
above "Named Graph".
With this set of Interfaces it is possible to exss every
association and attribute exssed within the original DSpace 1.5.x
Data Model, but with the flexibility of not hardcoding that original
Data Model (which was basically an ontology, like Fedoras, EPrints/
SWAP, FRBR, BIBO, etc). Thus it opens the door to allowing the
codebase (and conversely its developers) to exit the debate about who
has got the best resentational ontology... DSpace becomes
agnostic, no matter if we are talking FRBR, SWAP, MARC/MODS, DCMI,
BIBO... The common need being to store your content and describe it
as you see fit.
> Of course, there has to be one underlying implementation, and I'd be
> marginally inclined to agree with Aaron in [5] that the language and
> IDEs give you neat ways of working quickly and safely with objects
> which
> are resented explicitly and separately and which are more
> painful to
> use if merely "layered-on" at a higher level.
>
> It would makes sense to me, though, that if you went down this route,
> you'd probably need some kind of interface implemented on all
> first-class objects (or, perhaps better, a
> reflection-and-annotation-based equivalent) to be able to resent
> and
> manipulate these disparate objects uniformly (eg for RDF).
I think this is a case of trying to make the store "hyper generic",
in which case its sounding more and more like an Object store. There
are certainly enough of those out there that we don't need to be
competing with. That stretches a bit beyond both are current
architectural requirements and needs IMO.
> strings as identifiers
> ----------------------
>
> Bradley's message [6] matches my experience, too, when handling the
> objects vs strings question. I always think it's really important to
> remember that an "object" can be a simple string container, or else
> you
> could have a complex, very constrained "string". Overall, I agree with
> Bradley that it's important to work on defining the components (if
> any)
> of an identifier before thinking about resentation.
As I've state viously, my current only need for identifiers is as
references for looking up Assets and referencing their Resources.
>
> An object resentation has advantages if the identifier syntax is
> highly structured, and painful to parse because you only do it
> once, and
> you get it right each time (eg URI), but a string has the advantage of
> being simple to pass around, across and through environments.
>
> Given the thoughts in this thread on "underlying resentation"
> (above), it seems to make sense to me that a string resentation
> would
> be a sensible default, along with some uniform means of DSpace
> resurrecting an object on the basis of its identifier.
>
> This is because I think the case has been made that you need both a
> rich
> and also a string resentation, but the balance here is shifted
> for me
> for identifiers from the earlier discussion of the objects themselves.
> Identifiers are used as surrogates for objects, often because
> storing or
> transmitting the object itself is impractical [7] (or nonsensical),
> and
> so an "identifier object" is itself an object you will probably
> have to
> stringify it at some point, anyway. Even to use the object
> resentation of an identifier in reasonably internal service
> interfaces stresses the SOA type approach of modularity by forcing
> replacement services to "buy into" the data-model. And so it becomes
> more difficult to plug in things (like caches) from third parties.
>
> It probably makes sense, though, to also have some parsed object (by
> analogy with java's URI) where a service can "very" internally
> parse and
> serialise identifiers with the minimum of hassle and risk. (This
> object
> could also containing means of creating the object it identifies).
You argue both points here... both the final statement is that the
implementation validate and parse String identifiers.
https://dspace.svn.sf.net/svnroot/dspace/dspace2/trunk/api/src/main/
java/org/dspace/model/ValueFactory
https://dspace.svn.sf.net/svnroot/dspace/dspace2/trunk/api/src/main/
java/org/dspace/service/RepositoryService
These both allow the underlying implementation to manage both the
validation of the "string" resenting the identifier and the
optimal creation of its parsed form (Asset and/or Resource).
>
> The point at which you go from passing the "identifier object"
> around to
> its string resentation is a judgement call, I think. But in the
> inter-service layer, it makes sense to me to keep to strings.
>
> [6] dspace-architecture, Bradley McLean, 2008-10-15 12:29
> [7] dspace-architecture, Aaron Zeckowski, 2008-10-15 15:01
>
> URIs as identifiers
> -------------------
>
> I'd be very nervous of using URIs as identifiers, as it seems to me
> that
> Mark is making very good points in [8], that URIs are, these days, a
> complex and subtle form of identifier. It makes sense to me that the
> identifiers you have should have a URL resentation, but that this
> shouldn't be the base form of these identifiers.
The base form of the identifier is up to the underlying storage
mechanism. But, based on the constrain I'm imposing, it needs to be
resentable as an Asset and/or Resource described above, and in
that case, it needs a "namespace" and a "local name". When exposing
to applications... that will be rendered as a URI (the syntax, not
the java.net.URI object).
> My main worry here is data orthogonality. URIs have many nooks and
> strange places in which data can be shoved which could cause
> services to
> behave erratically if misused. This is exactly the kind of issues
> that a
> tight identifier specification is trying to avoid. For example, if you
> use "http" in one identifier and someone writes "https" instead, what
> will happen. We can come to a decision on what that means, or regulate
> the protocol down to one instance, but how do we ensure that this is
> correctly used/ignored and checked where appropriate across services?
I feel its still upto the implementation and the consuming application.
> Also, if we start using them in XSLT, etc, that could easily mess our
> URIs up for similar reasons.
Yes, if they are in content serialized to XML and transformed by XSL
tooling (i.e. a consuming application) they need to be treated
appropriately according to the syntax and rules of the XML
serialization and XSL transformation pipeline.
>
> It seems to make sense that an identifier object which you might
> create
> for /intra-/service use (a deserialised resentation of the
> /inter-/service string identifier) should include a means of
> generating
> a corresponding URI, or else something such as a URI service being
> able
> to take an DSpace identifier string and return a URI and vice
> versa, but
> not /be/ a URI. To make this a service would allow URI/identifier
> mapping in challenging deployment situations.
The DSpace trunks ObjectIdentifier/ExternalIdentifier implementation
currently provides this capability.
>
> [8] dspace-architecture, Mark Diggory, 2008-10-15 22:34
>
> I hope this is of some use, and I've not just made the thread
> pointlessly longer!
Not at all, well, others may have differing opinions ;-)
Cheers,
Mark
~~~~~~~~~~~~~
Mark R. Diggory
Home Page: http://purl.org/net/mdiggory/homepage
Blog: http://purl.org/net/mdiggory/blog
|
|