From: Stefano E. C. <cam...@ya...> - 2004-09-17 16:31:41
|
Laurian Gridinoc wrote: >On Fri, 17 Sep 2004 09:18:24 +0200, Stefano Campanini ><cam...@ya...> wrote: > > >>>On Wed, 15 Sep 2004 07:16:15 +0200, Paolo Castagna >>> >>> >>>>>>... we need an unique identifier for a statement >>>>>>in the index and for reification. >>>>>> <subject uri> <predicate uri> <object uri> >>>>>>Is there any problem if we use SHA1? >>>>>> sha1(<subject uri> <predicate uri> <object uri>)? >>>>>> >>>>>> >>Ok, I have added the follow methods in MetaManager class: >>/* >> *@return the Triple ID >> * >> */ >> public static String getStmtId(final URI subject, final URI >>predicate, final URI object) throws NoSuchAlgorithmException { >> final String tbDigest = >>subject.toString()+predicate.toString()+object.toString(); >> byte[] digest = getDigest(tbDigest.getBytes()); >> Base64 base = new Base64(); >> byte[] encoded = base.encode(digest); >> final String result = new String(encoded); >> return result; >> } >> >>So, we can calculate the ID of triples (tbd: literals ). >> >> > >Instead of sha1 of strings concatenation: >subject.toString()+predicate.toString()+object.toString() >I would propose sha1 of the n-triple.toString() representing the >triple, I think this will include the optional xsd datatype and >language too, and would be easyer to describe/interoperate. > >Consider: > ><http://www.grapefruit.ro> <http://purl.org/dc/terms/1.0/title> "Grapefruit" . > >sha1("http://www.grapefruit.rohttp://purl.org/dc/terms/1.0/titleGrapefruit") >doesn't look nice but: >sha1("<http://www.grapefruit.ro> <http://purl.org/dc/terms/1.0/title> >\"Grapefruit\" .") does, I would include the final . too, for ease of >script processing if ever needed by us or by third parties. > >also using n-triples would allow to identify by different URIs these statements: ><http://www.grapefruit.ro> <http://purl.org/dc/terms/1.0/title> "Grapefruit" . ><http://www.grapefruit.ro> <http://purl.org/dc/terms/1.0/title> >"Grapefruit"@en . ><http://www.grapefruit.ro> <http://purl.org/dc/terms/1.0/title> >"Grapefruit"^^xsd:string . > > Ok, is better use n-triple.toStirng(): > > >>I'd like thinking at the ID as a URI of the triple. In fact, this is >>what happen during the reification. >> >> > >I always thought at this sha1 result as an URI, I would use the >already used format of identification of resources by sha1 content sum >used in P2P software: > >urn:sha1:XSQCZ3UK3PPW6Y6HOVLTIX2QFMZ3TFFB >----------------^ base32 of sha1 sum of the content - in our case of >the n-triple > >I using for base32: com.bitzi.util.Base32: >URI urn = new URI("urn", "sha1:" + Base32.encode(sha1digest), null); > >You proposed base64, I would go for base32, theoretically we would be >able to identify (in the future) individual RDF statements over P2P >networks :) > > Ok, I like it. > > >>In Platypus, when we create a refication we put the new resource (the >>refified triple) in the namespcace "reifications", so the URI is >>something like this: reifications:_086yskjf (reification+_CRC(s,p,o)). >>I consider this " reifications:_086yskjf" as the ID/URI of the triple, >>isn't it? >> >> > >I won't add semantics to the URI, in this case using a schema for >identifying reified statements, if your sha1(statement) is in the >system, then is reified. > > > >>If a triple isn't reificated? what is it's ID/URI? >> >> > >same, but if you don;t have its sha1 ID in the index, you may assume >is not reified, at least not in the system :) > >It would be dangerous to change the ID of a statement just to mark >that is reified, and won't be interoperable. > > Yes... Following the Playtipus Way we have to permit to create a page (XHTM) that describe the statement as for others resources. So, we need a namespace or a better way (We have to invent it) to save the "index.rdf" and the "index.html" describing it as other Wiki resources. Do you think it is useful for the user reificate a triple and describe it using a wiki page? I propose that way: * All Wiki triples are in an implicit way reificated (only in the index) and its URI is as your proposal (new URI("urn", "sha1:" + Base32.encode(sha1digest), null) .... * If a user wants to create a wiki page that describe a triple we have to save RDF reification (s, o, p etc...) and the page. Where we can save the page? For the moment they are saved under "reification" namespace/dir. We add some semantics to the triple .. yes ... but ... the user create it intentionally, using the one determined Wiki Installation. I found the adding of this semantics not well but not so bad. So, I suggest to permit the user to save it in any other namespace selecting it. What do you think Laurian ? >>May be: >>namespace:_SHA1 (subjURI, predicateURI, objectURI/Literal) >>Or directly >>SHA1 (subjURI, predicateURI, objectURI/Literal) >>In my opinion we are creating a new RDF resource naming it, so it could >>be in a fixed "installation depending" namespace. >>I need this ID, becouse I use it to "add" or "remove" triples from the >>Lucene index. >> >> > >Cheers, > > In these days I start working on Unique Identifiers for triple in ways that you suggested above to use it in the Lucene index. Bye Bye |