[Introspector-developers] RDF_Stats.pl
Status: Beta
Brought to you by:
mdupont
|
From: James M. D. <mdu...@ya...> - 2002-12-22 23:09:50
|
Dear fellow hackers, The first stone is layed down, I now have a working version of the redland/raptor cvs version, it seems that there were some minor issues with perl 5.8 that needed addressing. Dajobe has provided excellent support and help fix and find all the problems. I have committed the RDF_Stats.pl, a program that will extract the statistics of a ntriples file, and tell you what types of nodes are used. It reads a n3 file, that you can create from a rdf file with rdfdump eg: rdfdump -o ntriples file:test.rdf Then it produces a statistic of how often a uri is used as a predicate, object or subject in an rdf file. the output is in rdf, of all things, just as a test. this is the first step in creating a daml outputter for the introspector, where we will be outputting the meta-model in daml. here is a sample output of the opencyc data base ran through this program : <rdf:Description rdf:about="http://www.daml.org/2001/03/daml+oil#Restriction"> <ns0:count-objects xmlns:ns0="http://introspector.sourceforge.net/2002/12/22/stats.daml#">11</ns0:count-objects> </rdf:Description> <rdf:Description rdf:about="http://www.daml.org/2001/03/daml+oil#Restriction"> <ns0:count-subject xmlns:ns0="http://introspector.sourceforge.net/2002/12/22/stats.daml#">4</ns0:count-subject> </rdf:Description> here you see that a you run the program like this: perl ./RDF_Stats.pl stats.n3 stats.rdf it reads in the n3 input file from the first parametere, and produces an rdf output file based on the second parameter. The output looks like this the counts for object, subject or predicate are stored in their own predicate : http://introspector.sourceforge.net/2002/12/22/stats.daml#count-objects http://introspector.sourceforge.net/2002/12/22/stats.daml#count-subject http://introspector.sourceforge.net/2002/12/22/stats.daml#count-predicates so you have the uri as the subject, the count predicate and the the count as a subject. Later on, I will be adding more features like : 1. extraction of the commonly used number of properties that an subject has. 2. Deterimining the statistics of an URI, does a predicate always have a different object, the same object or is there a fixed number of objects that have a certain frequency. 3. Is there a field that is a key, that all objects have in common, or is there a mutually exclusive set of fields(predicates) that we can look for all the objects that have that, and we will find each subject only once. Put simply, is there a primary key or a set of primary key fields for accessing all the objects? 4. Can we treat a give object as being dependant on another, or is it used/referenced by multiple objects. For each predicate we will try and determine the cardinality of the relationships defined by a predicate : are they associations, aggregation or composition? Are the 1-1, (0:1)-(1), m-n, 1-m etc. Anyway, I hope to create a set of debs soon that contain all the binaries needed to run this, with a redland cvs snapshot and the gcc shaptshot. We need to package what we have now, becuase it is good and working. mike ===== James Michael DuPont http://introspector.sourceforge.net/ __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com |