Re: [Platypuswiki-users] Re: [platypus] Unique identifier for a statement...

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Laurian Gridinoc wrote:

>On Fri, 17 Sep 2004 09:18:24 +0200, Stefano Campanini
><cam...@ya...> wrote:
>  
>
>>>On Wed, 15 Sep 2004 07:16:15 +0200, Paolo Castagna
>>>      
>>>
>>>>>>... we need an unique identifier for a statement
>>>>>>in the index and for reification.
>>>>>> <subject uri> <predicate uri> <object uri>
>>>>>>Is there any problem if we use SHA1?
>>>>>> sha1(<subject uri> <predicate uri> <object uri>)?
>>>>>>            
>>>>>>
>>Ok, I have added the follow methods in MetaManager class:
>>/*
>>     *@return the Triple ID
>>     *
>>     */
>>    public static String getStmtId(final URI subject, final URI
>>predicate, final URI object) throws NoSuchAlgorithmException {
>>        final String tbDigest =
>>subject.toString()+predicate.toString()+object.toString();
>>        byte[] digest =  getDigest(tbDigest.getBytes());
>>        Base64 base = new Base64();
>>        byte[] encoded = base.encode(digest);
>>        final String result = new String(encoded);
>>        return result;
>>    }
>>
>>So, we can calculate the ID of triples (tbd: literals ).
>>    
>>
>
>Instead of sha1 of strings concatenation:
>subject.toString()+predicate.toString()+object.toString()
>I would propose sha1 of the n-triple.toString() representing the
>triple, I think this will include the optional xsd datatype and
>language too, and would be easyer to describe/interoperate.
>
>Consider:
>
><http://www.grapefruit.ro> <http://purl.org/dc/terms/1.0/title> "Grapefruit" .
>
>sha1("http://www.grapefruit.rohttp://purl.org/dc/terms/1.0/titleGrapefruit")
>doesn't look nice but:
>sha1("<http://www.grapefruit.ro> <http://purl.org/dc/terms/1.0/title>
>\"Grapefruit\" .") does, I would include the final . too, for ease of
>script processing if ever needed by us or by third parties.
>
>also using n-triples would allow to identify by different URIs these statements:
><http://www.grapefruit.ro> <http://purl.org/dc/terms/1.0/title> "Grapefruit" .
><http://www.grapefruit.ro> <http://purl.org/dc/terms/1.0/title>
>"Grapefruit"@en .
><http://www.grapefruit.ro> <http://purl.org/dc/terms/1.0/title>
>"Grapefruit"^^xsd:string .
>  
>
Ok, is better use n-triple.toStirng():

>  
>
>>I'd like thinking at the ID as a URI of the triple. In fact, this is
>>what happen during the reification.
>>    
>>
>
>I always thought at this sha1 result as an URI, I would use the
>already used format of identification of resources by sha1 content sum
>used in P2P software:
>
>urn:sha1:XSQCZ3UK3PPW6Y6HOVLTIX2QFMZ3TFFB
>----------------^ base32 of sha1 sum of the content - in our case of
>the n-triple
>
>I using for base32: com.bitzi.util.Base32:
>URI urn = new URI("urn", "sha1:" + Base32.encode(sha1digest), null);
>
>You proposed base64, I would go for base32, theoretically we would be
>able to identify (in the future) individual RDF statements over P2P
>networks :)
>  
>
Ok,  I like it.

>  
>
>>In Platypus, when we create a refication we put the new resource (the
>>refified triple) in the namespcace "reifications", so the URI is
>>something like this: reifications:_086yskjf (reification+_CRC(s,p,o)).
>>I consider this " reifications:_086yskjf" as the ID/URI of the triple,
>>isn't it?
>>    
>>
>
>I won't add semantics to the URI, in this case using a schema for
>identifying reified statements, if your sha1(statement) is in the
>system, then is reified.
>
>  
>
>>If a triple isn't reificated? what is it's ID/URI?
>>    
>>
>
>same, but if you don;t have its sha1 ID in the index, you may assume
>is not reified, at least not in the system :)
>
>It would be dangerous to change the ID of a statement just to mark
>that is reified, and won't be interoperable.
>  
>
Yes... 
Following the Playtipus Way  we have to permit to create a page (XHTM) 
that describe the statement as for others resources. So, we need a 
namespace or a better way (We have to invent it) to save the "index.rdf" 
and the "index.html" describing it as other Wiki resources.
Do you think it is useful for the user reificate a triple and describe 
it using a wiki page?
I propose that way:
* All Wiki triples are in an implicit way reificated (only in the index) 
and its URI is as your proposal (new URI("urn", "sha1:" + 
Base32.encode(sha1digest), null)  ....
* If a user wants to create a wiki page that describe a triple we have 
to save RDF reification (s, o, p etc...)  and the page. Where we can 
save the page? For the moment they are saved under "reification" 
namespace/dir.
We add some semantics to the triple .. yes ... but ... the user create 
it intentionally, using the one determined Wiki Installation. I found 
the adding of this semantics not well but not so bad.
So, I suggest to permit the user to save it in any other namespace 
selecting it.

What do you think Laurian ?

>>May be:
>>namespace:_SHA1 (subjURI, predicateURI, objectURI/Literal)
>>Or directly
>>SHA1 (subjURI, predicateURI, objectURI/Literal)
>>In my opinion we are creating a new RDF resource naming it, so it could
>>be in a fixed "installation depending" namespace. 
>>I need this ID, becouse I use it to "add" or "remove" triples from the
>>Lucene index.
>>    
>>
>
>Cheers,
>  
>
In these days I start working on Unique Identifiers for triple in ways 
that you suggested above to use it in the Lucene index. 
Bye Bye