Re: [Introspector-developers] Alpha version of rewrite of introspector perl scripts
Status: Beta
Brought to you by:
mdupont
From: James M. D. <mdu...@ya...> - 2004-11-28 20:10:32
|
--- James Michael DuPont <mdu...@ya...> wrote: > I will be packaging this all up in one system real soon now. So, I have now been able to get the perl scripts to run on the output of rdfproc, a part of redland. All you need to use this now are just the redland, and there are debian packages for them. You can use many tools on this rdf, take a look at http://librdf.org/ for more information You are going to want these packages for debian. librdf-perl - Perl language bindings for the Redland RDF library librdf0 - Redland RDF Application Framework librdf0-dev - Redland RDF library development libraries and headers libraptor1 - Raptor RDF Parser library libraptor1-dev - Raptor RDF parser and serializer development libraries and headers Here are some good example data files : http://introspector.sourceforge.net/2004/11/c-dump.ntriples.gz http://introspector.sourceforge.net/2004/11/c-dump.rdf.gz These are two forms of rdf, ntriple and rdf/xml. You can use them with the introspector like this, example given with the ntriples : 1. gunzip the file gunzip c-dump.rdf.gz 2. make a redland repository rdfproc Global parse ntriples file:/ The Global is the name of the repository file:/ is the base address that can be what ever uri you want That will create a repository in the current directory using berkleydb 6.2M Global-po2s.db -- predicate object index (used to find by field) 9.0M Global-so2p.db -- subject -object index (not used) 9.5M Global-sp2o.db -- subeject predicate index (graph traversal) 25M total So you have about 9mb of indexes for a 500k zipped ntriples file. The unpacked sizes are here : 13M Nov 28 15:34 c-dump.rdf 4.7M Nov 28 15:34 c-dump.ntriples wc(wordcount) on c-dump.ntriples gives lines 96,818, words 387,292, chars 4,846,776 The original source file (expanded with headers) lines 13,270 words 27,221 chars 260,051(254K from ls) c-dump.i So we are talking about 10x increase in size for indexing. For example, i have installed the introspector into my home dir : /home/mdupont/EXPERIMENTS/introspector/introspector-0.7 The cvs version is up to date, I will be releasing the sf file/ so, to use it Go to the directory containing the rdf database files perl -I/home/mdupont/EXPERIMENTS/introspector/introspector-0.7 ~/EXPERIMENTS/introspector/introspector-0.7/recurse5.pl node_types:function_decl file:/ the node_types:function_decl is the node types that i am looking for, other interesting ones can be found in the Introspector/GCCTypes.pm file. I hope that you take some time and play around with the introspector. It is not running perfect, but fast! mike ===== James Michael DuPont http://introspector.sourceforge.net/ |