[Bigdata-commit] SF.net SVN: bigdata:[6388] branches/BIGDATA_RELEASE_1_2_0

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Revision: 6388
          http://bigdata.svn.sourceforge.net/bigdata/?rev=6388&view=rev
Author:   thompsonbry
Date:     2012-07-23 20:18:34 +0000 (Mon, 23 Jul 2012)
Log Message:
-----------
It also reports the #of triples in the default graph (note that the default graph is the RDF Merge of the named graphs, so it is a summary of the entire KB instance).
I've found some bugs in the current (1.2.1) ServiceDescription?, notably having to do with not using blank nodes to model the service and the nested dataset, defaultGraph, etc.

I've added a VoidVocabularyClass which is now used by RDFSVocabulary2.

I've extended the ServiceDescription to report the vocabularies which are in use in the default graph and the predicate partition and class partition statistics. Those statistics are computed efficiently using a distinct term scan / advancer pattern. 

There was a bug in the distinct multi-term scanner code where it failed to specify the elementClass for the chunked iterator.

I have removed the old predicateUsage() method on AbstractTripleStore.  You can now use the ServiceDescription instead and get more information in a standards oriented manner.

index.html now contains a link to the ServiceDescription.

Modified Paths:
--------------
    branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/spo/SPORelation.java
    branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java
    branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/RDFSVocabulary.java
    branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/test/com/bigdata/rdf/rio/AbstractRIOTestCase.java
    branches/BIGDATA_RELEASE_1_2_0/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/QueryServlet.java
    branches/BIGDATA_RELEASE_1_2_0/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/SD.java
    branches/BIGDATA_RELEASE_1_2_0/bigdata-war/src/html/index.html

Added Paths:
-----------
    branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/RDFSVocabularyV2.java
    branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/decls/VoidVocabularyDecl.java

Modified: branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/spo/SPORelation.java
===================================================================

--- branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/spo/SPORelation.java	2012-07-21 18:37:51 UTC (rev 6387)
+++ branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/spo/SPORelation.java	2012-07-23 20:18:34 UTC (rev 6388)
@@ -1733,7 +1733,10 @@
             
         });
 
-        return new ChunkedWrappedIterator<IV>(itr);
+        return new ChunkedWrappedIterator<IV>(itr,//
+                ChunkedWrappedIterator.DEFAULT_CHUNK_SIZE,// chunkSize
+                IV.class// elementClass
+        );
 
     }
 

Modified: branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java
===================================================================
--- branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java	2012-07-21 18:37:51 UTC (rev 6387)
+++ branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/store/AbstractTripleStore.java	2012-07-23 20:18:34 UTC (rev 6388)
@@ -3165,72 +3165,75 @@
 
     }
 
-    final public StringBuilder predicateUsage() {
+//    final public StringBuilder predicateUsage() {
+//
+//        return predicateUsage(this);
+//
+//    }
+//
+//    /**
+//     * Dumps the #of statements using each predicate in the kb (tab delimited,
+//     * unordered).
+//     * 
+//     * @param resolveTerms
+//     *            Used to resolve term identifiers to terms (you can use this to
+//     *            dump a {@link TempTripleStore} that is using the term
+//     *            dictionary of the main database).
+//     * 
+//     * @see SD, which can now compute and report both the predicate partition
+//     *      usage and the class partition usage.
+//     */
+//    final public StringBuilder predicateUsage(
+//            final AbstractTripleStore resolveTerms) {
+//
+//        if (getSPORelation().oneAccessPath) {
+//
+//            // The necessary index (POS or POCS) does not exist.
+//            throw new UnsupportedOperationException();
+//            
+//        }
+//        
+//        // visit distinct term identifiers for the predicate position.
+//        final IChunkedIterator<IV> itr = getSPORelation().distinctTermScan(
+//                quads ? SPOKeyOrder.POCS : SPOKeyOrder.POS);
+//
+//        // resolve term identifiers to terms efficiently during iteration.
+//        final BigdataValueIterator itr2 = new BigdataValueIteratorImpl(
+//                resolveTerms, itr);
+//        
+//        try {
+//        
+//            final StringBuilder sb = new StringBuilder();
+//
+//            while (itr2.hasNext()) {
+//
+//                final BigdataValue term = itr2.next();
+//
+//                final IV p = term.getIV();
+//                
+//                final long n = getSPORelation().getAccessPath(null, p, null,
+//                        null).rangeCount(false/* exact */);
+//
+//                /*
+//                 * FIXME do efficient term resolution for scale-out. This will
+//                 * require an expander pattern where we feed one iterator into
+//                 * another and both are chunked.
+//                 */
+//                sb.append(n + "\t" + resolveTerms.toString(p) + "\n");
+//
+//            }
+//        
+//            return sb;
+//        
+//        } finally {
+//
+//            itr2.close();
+//            
+//        }
+//
+//    }
 
-        return predicateUsage(this);
-
-    }
-
     /**
-     * Dumps the #of statements using each predicate in the kb (tab delimited,
-     * unordered).
-     * 
-     * @param resolveTerms
-     *            Used to resolve term identifiers to terms (you can use this to
-     *            dump a {@link TempTripleStore} that is using the term
-     *            dictionary of the main database).
-     */
-    final public StringBuilder predicateUsage(
-            final AbstractTripleStore resolveTerms) {
-
-        if (getSPORelation().oneAccessPath) {
-
-            // The necessary index (POS or POCS) does not exist.
-            throw new UnsupportedOperationException();
-            
-        }
-        
-        // visit distinct term identifiers for the predicate position.
-        final IChunkedIterator<IV> itr = getSPORelation().distinctTermScan(
-                quads ? SPOKeyOrder.POCS : SPOKeyOrder.POS);
-
-        // resolve term identifiers to terms efficiently during iteration.
-        final BigdataValueIterator itr2 = new BigdataValueIteratorImpl(
-                resolveTerms, itr);
-        
-        try {
-        
-            final StringBuilder sb = new StringBuilder();
-
-            while (itr2.hasNext()) {
-
-                final BigdataValue term = itr2.next();
-
-                final IV p = term.getIV();
-                
-                final long n = getSPORelation().getAccessPath(null, p, null,
-                        null).rangeCount(false/* exact */);
-
-                /*
-                 * FIXME do efficient term resolution for scale-out. This will
-                 * require an expander pattern where we feed one iterator into
-                 * another and both are chunked.
-                 */
-                sb.append(n + "\t" + resolveTerms.toString(p) + "\n");
-
-            }
-        
-            return sb;
-        
-        } finally {
-
-            itr2.close();
-            
-        }
-
-    }
-
-    /**
      * Utility method dumps the statements in the store using the SPO index
      * (subject order).
      */

Modified: branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/RDFSVocabulary.java
===================================================================
--- branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/RDFSVocabulary.java	2012-07-21 18:37:51 UTC (rev 6387)
+++ branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/RDFSVocabulary.java	2012-07-23 20:18:34 UTC (rev 6388)
@@ -27,6 +27,7 @@
 
 package com.bigdata.rdf.vocab;
 
+import org.openrdf.Sesame;
 import org.openrdf.model.Value;
 import org.openrdf.model.vocabulary.OWL;
 import org.openrdf.model.vocabulary.RDF;
@@ -41,6 +42,7 @@
 import com.bigdata.rdf.vocab.decls.RDFVocabularyDecl;
 import com.bigdata.rdf.vocab.decls.SKOSVocabularyDecl;
 import com.bigdata.rdf.vocab.decls.SesameVocabularyDecl;
+import com.bigdata.rdf.vocab.decls.VoidVocabularyDecl;
 import com.bigdata.rdf.vocab.decls.XMLSchemaVocabularyDecl;
 
 /**

Added: branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/RDFSVocabularyV2.java
===================================================================
--- branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/RDFSVocabularyV2.java	                        (rev 0)
+++ branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/RDFSVocabularyV2.java	2012-07-23 20:18:34 UTC (rev 6388)
@@ -0,0 +1,72 @@
+/**
+
+Copyright (C) SYSTAP, LLC 2006-2007.  All rights reserved.
+
+Contact:
+     SYSTAP, LLC
+     4501 Tower Road
+     Greensboro, NC 27410
+     lic...@bi...
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; version 2 of the License.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+*/
+/*
+ * Created on Oct 28, 2007
+ */
+
+package com.bigdata.rdf.vocab;
+
+import com.bigdata.rdf.store.AbstractTripleStore;
+import com.bigdata.rdf.vocab.decls.RDFVocabularyDecl;
+import com.bigdata.rdf.vocab.decls.VoidVocabularyDecl;
+
+/**
+ * Extended vocabulary to include {@link VoidVocabularyDecl}.
+ * 
+ * @author <a href="mailto:tho...@us...">Bryan Thompson</a>
+ * @version $Id: RDFSVocabulary.java 4632 2011-06-06 15:11:53Z thompsonbry $
+ */
+public class RDFSVocabularyV2 extends RDFSVocabulary {
+
+    /**
+     * De-serialization ctor.
+     */
+    public RDFSVocabularyV2() {
+        
+        super();
+        
+    }
+    
+    /**
+     * Used by {@link AbstractTripleStore#create()}.
+     * 
+     * @param namespace
+     *            The namespace of the KB instance.
+     */
+    public RDFSVocabularyV2(final String namespace) {
+
+        super( namespace );
+        
+    }
+
+    @Override
+    protected void addValues() {
+
+        super.addValues();
+        
+        addDecl(new RDFVocabularyDecl());
+
+    }
+
+}

Added: branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/decls/VoidVocabularyDecl.java
===================================================================
--- branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/decls/VoidVocabularyDecl.java	                        (rev 0)
+++ branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/java/com/bigdata/rdf/vocab/decls/VoidVocabularyDecl.java	2012-07-23 20:18:34 UTC (rev 6388)
@@ -0,0 +1,118 @@
+/**
+
+Copyright (C) SYSTAP, LLC 2006-2011.  All rights reserved.
+
+Contact:
+     SYSTAP, LLC
+     4501 Tower Road
+     Greensboro, NC 27410
+     lic...@bi...
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; version 2 of the License.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+*/
+/*
+ * Created on Jul 23, 2012
+ */
+
+package com.bigdata.rdf.vocab.decls;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Iterator;
+
+import org.openrdf.model.URI;
+import org.openrdf.model.impl.URIImpl;
+
+import com.bigdata.rdf.vocab.VocabularyDecl;
+
+/**
+ * Vocabulary and namespace for VOID.
+ * 
+ * @see <a href="http://www.w3.org/TR/void/"> Describing Linked Datasets with
+ *      the VoiD Vocabulary </a>
+ * @see <a href="http://vocab.deri.ie/void/"> Vocabulary of Interlinked Datasets
+ *      (VoID) </a>
+ * 
+ * @author <a href="mailto:tho...@us...">Bryan Thompson</a>
+ * @version $Id: RDFVocabularyDecl.java 4631 2011-06-06 15:06:48Z thompsonbry $
+ */
+public class VoidVocabularyDecl implements VocabularyDecl {
+
+    public static final String NAMESPACE = "http://rdfs.org/ns/void#";
+
+    // Classes.
+    public static final URI //
+            Dataset = new URIImpl(NAMESPACE + "Dataset"),//
+            DatasetDescription = new URIImpl(NAMESPACE + "DatasetDescription"),//
+            Linkset = new URIImpl(NAMESPACE + "Linkset"),//
+            TechnicalFeature = new URIImpl(NAMESPACE + "TechnicalFeature")//
+            ;
+
+    // Properties.
+    public static final URI //
+            class_ = new URIImpl(NAMESPACE+"class"),//
+            classPartition = new URIImpl(NAMESPACE+"classPartition"),//
+            classes = new URIImpl(NAMESPACE+"classes"),//
+            dataDump = new URIImpl(NAMESPACE+"dataDump"),//
+            distinctObjects = new URIImpl(NAMESPACE+"distinctObjects"),//
+            distinctSubjects = new URIImpl(NAMESPACE+"distinctSubjects"),//
+            documents = new URIImpl(NAMESPACE+"documents"),//
+            entities = new URIImpl(NAMESPACE+"entities"),//
+            exampleResource = new URIImpl(NAMESPACE+"exampleResource"),//
+            feature = new URIImpl(NAMESPACE+"feature"),//
+            inDataset = new URIImpl(NAMESPACE+"inDataset"),//
+            linkPredicate = new URIImpl(NAMESPACE+"linkPredicate"),//
+            objectsTarget = new URIImpl(NAMESPACE+"objectsTarget"),//
+            openSearchDescription = new URIImpl(NAMESPACE+"openSearchDescription"),//
+            properties = new URIImpl(NAMESPACE+"properties"),//
+            property = new URIImpl(NAMESPACE+"property"),//
+            propertyPartition = new URIImpl(NAMESPACE+"propertyPartition"),//
+            rootResource = new URIImpl(NAMESPACE+"rootResource"),//
+            sparqlEndpoint = new URIImpl(NAMESPACE+"sparqlEndpoint"),//
+            subjectsTarget = new URIImpl(NAMESPACE+"subjectsTarget"),//
+            subset = new URIImpl(NAMESPACE+"subset"),//
+            target = new URIImpl(NAMESPACE+"target"),//
+            triples = new URIImpl(NAMESPACE+"triples"),//
+            uriLookupEndpoint = new URIImpl(NAMESPACE+"uriLookupEndpoint"),//
+            uriRegexPattern = new URIImpl(NAMESPACE+"uriRegexPattern"),//
+            uriSpace = new URIImpl(NAMESPACE+"uriSpace"),//
+            vocabulary = new URIImpl(NAMESPACE+"vocabulary")//
+    ;
+
+    static private final URI[] uris = new URI[]{
+            new URIImpl(NAMESPACE),
+            // classes
+            Dataset,
+            DatasetDescription,
+            Linkset,
+            TechnicalFeature,
+            // properties
+            class_, classPartition, classes, dataDump, distinctObjects,
+            distinctSubjects, documents, entities, exampleResource, feature,
+            inDataset, linkPredicate, objectsTarget, openSearchDescription,
+            properties, property, propertyPartition, rootResource,
+            sparqlEndpoint, subjectsTarget, subset, target, triples,
+            uriLookupEndpoint, uriRegexPattern, uriSpace, vocabulary//
+    };
+
+    public VoidVocabularyDecl() {
+    }
+    
+    public Iterator<URI> values() {
+
+        return Collections.unmodifiableList(Arrays.asList(uris)).iterator();
+        
+    }
+
+}

Modified: branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/test/com/bigdata/rdf/rio/AbstractRIOTestCase.java
===================================================================
--- branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/test/com/bigdata/rdf/rio/AbstractRIOTestCase.java	2012-07-21 18:37:51 UTC (rev 6387)
+++ branches/BIGDATA_RELEASE_1_2_0/bigdata-rdf/src/test/com/bigdata/rdf/rio/AbstractRIOTestCase.java	2012-07-23 20:18:34 UTC (rev 6388)
@@ -641,7 +641,7 @@
 
             if (log.isInfoEnabled()) {
                 log.info("computing predicate usage...");
-                log.info("\n" + store.predicateUsage());
+//                log.info("\n" + store.predicateUsage());
             }
 
             /*

Modified: branches/BIGDATA_RELEASE_1_2_0/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/QueryServlet.java
===================================================================
--- branches/BIGDATA_RELEASE_1_2_0/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/QueryServlet.java	2012-07-21 18:37:51 UTC (rev 6387)
+++ branches/BIGDATA_RELEASE_1_2_0/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/QueryServlet.java	2012-07-23 20:18:34 UTC (rev 6388)
@@ -175,31 +175,26 @@
             buildResponse(resp, HTTP_NOTFOUND, MIME_TEXT_PLAIN);
             return;
         }
-
-        final Graph g = SD.describeService(tripleStore);
-
+        
         /*
-         * Add the service end point.
-         * 
-         * TODO Report alternative end points?
+         * Figure out the service end point.
          */
+        final String serviceURI;
         {
             final StringBuffer sb = req.getRequestURL();
 
             final int indexOf = sb.indexOf("?");
 
-            final String serviceURI;
             if (indexOf == -1) {
                 serviceURI = sb.toString();
             } else {
                 serviceURI = sb.substring(0, indexOf);
             }
 
-            g.add(SD.Service, SD.endpoint,
-                    g.getValueFactory().createURI(serviceURI));
-
         }
         
+        final Graph g = SD.describeService(tripleStore, serviceURI);
+
         /*
          * CONNEG for the MIME type.
          * 

Modified: branches/BIGDATA_RELEASE_1_2_0/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/SD.java
===================================================================
--- branches/BIGDATA_RELEASE_1_2_0/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/SD.java	2012-07-21 18:37:51 UTC (rev 6387)
+++ branches/BIGDATA_RELEASE_1_2_0/bigdata-sails/src/java/com/bigdata/rdf/sail/webapp/SD.java	2012-07-23 20:18:34 UTC (rev 6388)
@@ -27,16 +27,34 @@
 
 package com.bigdata.rdf.sail.webapp;
 
+import java.util.Arrays;
+import java.util.LinkedHashMap;
+import java.util.LinkedHashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.openrdf.model.BNode;
 import org.openrdf.model.Graph;
 import org.openrdf.model.URI;
+import org.openrdf.model.ValueFactory;
 import org.openrdf.model.impl.GraphImpl;
 import org.openrdf.model.impl.URIImpl;
+import org.openrdf.model.vocabulary.RDF;
 
 import com.bigdata.rdf.axioms.Axioms;
 import com.bigdata.rdf.axioms.NoAxioms;
 import com.bigdata.rdf.axioms.OwlAxioms;
 import com.bigdata.rdf.axioms.RdfsAxioms;
+import com.bigdata.rdf.internal.IV;
+import com.bigdata.rdf.internal.NotMaterializedException;
+import com.bigdata.rdf.model.BigdataURI;
+import com.bigdata.rdf.model.BigdataValue;
+import com.bigdata.rdf.spo.SPOKeyOrder;
 import com.bigdata.rdf.store.AbstractTripleStore;
+import com.bigdata.rdf.store.BigdataValueIterator;
+import com.bigdata.rdf.store.BigdataValueIteratorImpl;
+import com.bigdata.rdf.vocab.decls.VoidVocabularyDecl;
+import com.bigdata.striterator.IChunkedIterator;
 
 /**
  * SPARQL 1.1 Service Description vocabulary class.
@@ -321,20 +339,35 @@
     /**
      * Collect various information, building up a service description graph.
      * 
-     * TODO I am disinclined to enumerate all named and default graphs when
-     * there is a GET against the SPARQL end point. That can be WAY too much
-     * information.
+     * @param tripleStore
+     *            The KB instance to be described.
+     * @param serviceURIs
+     *            One or more service end points for that KB instance.
      */
-    public static Graph describeService(final AbstractTripleStore tripleStore) {
+    public static Graph describeService(final AbstractTripleStore tripleStore,
+            final String serviceURI) {
 
         final Graph g = new GraphImpl();
 
+        final ValueFactory f = g.getValueFactory();
+
+        final BNode service = f.createBNode("service");
+
+        final BNode defaultDataset = f.createBNode("defaultDataset");
+        
+        final BNode defaultGraph = f.createBNode("defaultGraph");
+
+        g.add(service, RDF.TYPE, SD.Service);
+
+        // Service end point.
+        g.add(service, SD.endpoint, g.getValueFactory().createURI(serviceURI));
+
         /*
          * Supported Query Languages
          */
-        g.add(SD.Service, SD.supportedLanguage, SD.SPARQL10Query);
-        g.add(SD.Service, SD.supportedLanguage, SD.SPARQL11Query);
-        g.add(SD.Service, SD.supportedLanguage, SD.SPARQL11Update);
+        g.add(service, SD.supportedLanguage, SD.SPARQL10Query);
+        g.add(service, SD.supportedLanguage, SD.SPARQL11Query);
+        g.add(service, SD.supportedLanguage, SD.SPARQL11Update);
 
         /*
          * RDF and SPARQL Formats.
@@ -342,46 +375,46 @@
          * @see http://www.openrdf.org/issues/browse/RIO-79 (Request for unique
          * URIs)
          * 
-         * TODO Add an explict declation for SIDS mode data interchange?
+         * TODO Add an explicit declaration for SIDS mode data interchange?
          */
 
         // InputFormats
         {
-            g.add(SD.Service, SD.inputFormat, SD.RDFXML);
-            g.add(SD.Service, SD.inputFormat, SD.NTRIPLES);
-            g.add(SD.Service, SD.inputFormat, SD.TURTLE);
-            g.add(SD.Service, SD.inputFormat, SD.N3);
-            // g.add(SD.Service, SD.inputFormat, SD.TRIX); // TODO TRIX
-            g.add(SD.Service, SD.inputFormat, SD.TRIG);
-            // g.add(SD.Service, SD.inputFormat, SD.BINARY); // TODO BINARY
-            g.add(SD.Service, SD.inputFormat, SD.NQUADS);
+            g.add(service, SD.inputFormat, SD.RDFXML);
+            g.add(service, SD.inputFormat, SD.NTRIPLES);
+            g.add(service, SD.inputFormat, SD.TURTLE);
+            g.add(service, SD.inputFormat, SD.N3);
+            // g.add(service, SD.inputFormat, SD.TRIX); // TODO TRIX
+            g.add(service, SD.inputFormat, SD.TRIG);
+            // g.add(service, SD.inputFormat, SD.BINARY); // TODO BINARY
+            g.add(service, SD.inputFormat, SD.NQUADS);
 
-            g.add(SD.Service, SD.inputFormat, SD.SPARQL_RESULTS_XML);
-            g.add(SD.Service, SD.inputFormat, SD.SPARQL_RESULTS_JSON);
-            g.add(SD.Service, SD.inputFormat, SD.SPARQL_RESULTS_CSV);
-            g.add(SD.Service, SD.inputFormat, SD.SPARQL_RESULTS_TSV);
-            // g.add(SD.Service, SD.inputFormat,
+            g.add(service, SD.inputFormat, SD.SPARQL_RESULTS_XML);
+            g.add(service, SD.inputFormat, SD.SPARQL_RESULTS_JSON);
+            g.add(service, SD.inputFormat, SD.SPARQL_RESULTS_CSV);
+            g.add(service, SD.inputFormat, SD.SPARQL_RESULTS_TSV);
+            // g.add(service, SD.inputFormat,
             // SD.SPARQL_RESULTS_OPENRDF_BINARY);
 
         }
 
         // ResultFormats
         {
-            g.add(SD.Service, SD.resultFormat, SD.RDFXML);
-            g.add(SD.Service, SD.resultFormat, SD.NTRIPLES);
-            g.add(SD.Service, SD.resultFormat, SD.TURTLE);
-            g.add(SD.Service, SD.resultFormat, SD.N3);
-            // g.add(SD.Service, SD.resultFormat, SD.TRIX); // TODO TRIX
-            g.add(SD.Service, SD.resultFormat, SD.TRIG);
-            // g.add(SD.Service, SD.resultFormat, SD.BINARY); // TODO BINARY
-            // g.add(SD.Service, SD.resultFormat, SD.NQUADS); // TODO NQuads
+            g.add(service, SD.resultFormat, SD.RDFXML);
+            g.add(service, SD.resultFormat, SD.NTRIPLES);
+            g.add(service, SD.resultFormat, SD.TURTLE);
+            g.add(service, SD.resultFormat, SD.N3);
+            // g.add(service, SD.resultFormat, SD.TRIX); // TODO TRIX
+            g.add(service, SD.resultFormat, SD.TRIG);
+            // g.add(service, SD.resultFormat, SD.BINARY); // TODO BINARY
+            // g.add(service, SD.resultFormat, SD.NQUADS); // TODO NQuads
             // writer
 
-            g.add(SD.Service, SD.resultFormat, SD.SPARQL_RESULTS_XML);
-            g.add(SD.Service, SD.resultFormat, SD.SPARQL_RESULTS_JSON);
-            g.add(SD.Service, SD.resultFormat, SD.SPARQL_RESULTS_CSV);
-            g.add(SD.Service, SD.resultFormat, SD.SPARQL_RESULTS_TSV);
-            // g.add(SD.Service, SD.resultFormat,
+            g.add(service, SD.resultFormat, SD.SPARQL_RESULTS_XML);
+            g.add(service, SD.resultFormat, SD.SPARQL_RESULTS_JSON);
+            g.add(service, SD.resultFormat, SD.SPARQL_RESULTS_CSV);
+            g.add(service, SD.resultFormat, SD.SPARQL_RESULTS_TSV);
+            // g.add(service, SD.resultFormat,
             // SD.SPARQL_RESULTS_OPENRDF_BINARY);
         }
 
@@ -397,7 +430,7 @@
 
         if (tripleStore.isQuads()) {
 
-            g.add(SD.Service, SD.feature, SD.UnionDefaultGraph);
+            g.add(service, SD.feature, SD.UnionDefaultGraph);
 
         } else {
 
@@ -418,13 +451,421 @@
                 entailmentRegime = null;
             }
             if (entailmentRegime != null)
-                g.add(SD.Service, SD.entailmentRegime, entailmentRegime);
+                g.add(service, SD.entailmentRegime, entailmentRegime);
 
         }
 
-        g.add(SD.Service, SD.feature, SD.BasicFederatedQuery);
+        // Other features.
+        g.add(service, SD.feature, SD.BasicFederatedQuery);
 
+        /*
+         * Information about the defaultGraph.
+         * 
+         * TODO This could all be generalized and then run for each known named
+         * graph as well.
+         */
+        {
+            
+            // Default data set
+            g.add(service, SD.defaultDataset, defaultDataset);
+
+            g.add(defaultDataset, RDF.TYPE, SD.Dataset);
+
+            // any URI is considered to be an entity.
+            g.add(defaultDataset, VoidVocabularyDecl.uriRegexPattern,
+                    f.createLiteral("^.*"));
+
+            // Default graph in the default data set.
+            g.add(defaultDataset, SD.defaultGraph, defaultGraph);
+
+            // defautGraph description.
+            {
+
+                // #of triples in the default graph
+                g.add(defaultGraph, VoidVocabularyDecl.triples,
+                        f.createLiteral(tripleStore.getStatementCount()));
+
+                // #of entities in the default graph.
+                g.add(defaultGraph, VoidVocabularyDecl.entities,
+                        f.createLiteral(tripleStore.getURICount()));
+
+                // The distinct vocabularies in use.
+                final Set<String> namespaces = new LinkedHashSet<String>();
+
+                /*
+                 * property partition statistics & used vocabularies.
+                 * 
+                 * Note: A temporary graph is used to hold these data so we can
+                 * first output the vocabulary summary. This gives the output a
+                 * neater (and more human consumable) appearance.
+                 */
+                final Graph propertyPartitions = new GraphImpl();
+
+                // Frequency count of the predicates in the default graph.
+                final IVCount[] predicatePartitionCounts = predicateUsage(tripleStore);
+
+                // Frequency count of the classes in the default graph.
+                final IVCount[] classPartitionCounts = classUsage(tripleStore);
+
+                {
+
+                    // property partitions.
+                    for (IVCount tmp : predicatePartitionCounts) {
+
+                        final BNode propertyPartition = f.createBNode();
+
+                        final URI p = (URI) tmp.getValue();
+
+                        propertyPartitions.add(defaultGraph,
+                                VoidVocabularyDecl.propertyPartition,
+                                propertyPartition);
+
+                        propertyPartitions.add(propertyPartition,
+                                VoidVocabularyDecl.property, p);
+
+                        propertyPartitions.add(propertyPartition,
+                                VoidVocabularyDecl.triples,
+                                f.createLiteral(tmp.count));
+
+                        String namespace = p.getNamespace();
+
+                        if (namespace.endsWith("#")) {
+
+                            // Strip trailing '#' per VoID specification.
+                            namespace = namespace.substring(0,
+                                    namespace.length() - 1);
+
+                        }
+
+                        namespaces.add(namespace);
+
+                    }
+
+                }
+
+                // emit the in use vocabularies.
+                for (String namespace : namespaces) {
+
+                    g.add(defaultDataset, VoidVocabularyDecl.vocabulary,
+                            f.createURI(namespace));
+
+                }
+
+                // #of distinct predicates in the default graph.
+                g.add(defaultGraph, VoidVocabularyDecl.properties,
+                        f.createLiteral(predicatePartitionCounts.length));
+
+                // #of distinct classes in the default graph.
+                g.add(defaultGraph, VoidVocabularyDecl.classes,
+                        f.createLiteral(classPartitionCounts.length));
+
+                // now emit the property partition statistics.
+                g.addAll(propertyPartitions);
+                
+                // class partition statistics.
+                {
+
+                    // per class partition statistics.
+                    for (IVCount tmp : classPartitionCounts) {
+
+                        final BNode classPartition = f.createBNode();
+
+                        final BigdataValue cls = tmp.getValue();
+
+                        g.add(defaultGraph, VoidVocabularyDecl.classPartition,
+                                classPartition);
+
+                        g.add(classPartition, VoidVocabularyDecl.class_, cls);
+
+                        g.add(classPartition, VoidVocabularyDecl.triples,
+                                f.createLiteral(tmp.count));
+
+                    }
+
+                } // end class partition statistics.
+
+            } // end defaultGraph
+            
+//            sb.append("termCount\t = " + tripleStore.getTermCount() + "\n");
+//
+//            sb.append("uriCount\t = " + tripleStore.getURICount() + "\n");
+//
+//            sb.append("literalCount\t = " + tripleStore.getLiteralCount() + "\n");
+//
+//            /*
+//             * Note: The blank node count is only available when using the told
+//             * bnodes mode.
+//             */
+//            sb
+//                    .append("bnodeCount\t = "
+//                            + (tripleStore.getLexiconRelation()
+//                                    .isStoreBlankNodes() ? ""
+//                                    + tripleStore.getBNodeCount() : "N/A")
+//                            + "\n");
+
+        }
+
         return g;
+    
     }
 
+    /**
+     * An {@link IV} and a counter for that {@link IV}.
+     */
+    private static class IVCount implements Comparable<IVCount> {
+
+        public final IV<?,?> iv;
+        
+        public final long count;
+
+        private BigdataValue val;
+        
+        /**
+         * Return the associated {@link BigdataValue}.
+         * <p>
+         * Note: A resolution set is necessary if you want to attach the
+         * {@link BigdataValue} to the {@link IV}.
+         * 
+         * @throws NotMaterializedException
+         */
+        public BigdataValue getValue() {
+
+            if(val == null)
+                throw new NotMaterializedException(iv.toString());
+            
+            return val;
+            
+        }
+        
+        public void setValue(final BigdataValue val) {
+
+            if (val == null)
+                throw new IllegalArgumentException();
+            
+            if (this.val != null && !this.val.equals(val))
+                throw new IllegalArgumentException();
+            
+            this.val = val;
+
+        }
+        
+        public IVCount(final IV<?,?> iv, final long count) {
+
+            if (iv == null)
+                throw new IllegalArgumentException();
+            
+            this.iv = iv;
+            
+            this.count = count;
+            
+        }
+
+        /**
+         * Place into order by descending count.
+         */
+        @Override
+        public int compareTo(IVCount arg0) {
+
+            if (count < arg0.count)
+                return 1;
+
+            if (count > arg0.count)
+                return -1;
+
+            return 0;
+            
+        }
+        
+    }
+    
+    /**
+     * Return an array of the distinct predicates in the KB ordered by their
+     * descending frequency of use. The {@link IV}s in the returned array will
+     * have been resolved to the corresponding {@link BigdataURI}s which can be
+     * accessed using {@link IV#getValue()}.
+     */
+    private static IVCount[] predicateUsage(final AbstractTripleStore kb) {
+
+        if (kb.getSPORelation().oneAccessPath) {
+
+            // The necessary index (POS or POCS) does not exist.
+            throw new UnsupportedOperationException();
+
+        }
+
+        final boolean quads = kb.isQuads();
+
+        // visit distinct term identifiers for the predicate position.
+        @SuppressWarnings("rawtypes")
+        final IChunkedIterator<IV> itr = kb.getSPORelation().distinctTermScan(
+                quads ? SPOKeyOrder.POCS : SPOKeyOrder.POS);
+
+        // resolve term identifiers to terms efficiently during iteration.
+        final BigdataValueIterator itr2 = new BigdataValueIteratorImpl(
+                kb/* resolveTerms */, itr);
+
+        try {
+
+            final Set<IV<?,?>> ivs = new LinkedHashSet<IV<?,?>>();
+
+            final Map<IV<?, ?>, IVCount> counts = new LinkedHashMap<IV<?, ?>, IVCount>();
+
+            while (itr2.hasNext()) {
+
+                final BigdataValue term = itr2.next();
+
+                final IV<?,?> iv = term.getIV();
+
+                final long n = kb.getSPORelation()
+                        .getAccessPath(null, iv, null, null)
+                        .rangeCount(false/* exact */);
+
+                ivs.add(iv);
+                
+                counts.put(iv, new IVCount(iv, n));
+
+            }
+
+            // Batch resolve IVs to Values
+            final Map<IV<?, ?>, BigdataValue> x = kb.getLexiconRelation()
+                    .getTerms(ivs);
+
+            for (Map.Entry<IV<?, ?>, BigdataValue> e : x.entrySet()) {
+
+                final IVCount count = counts.get(e.getKey());
+
+                count.setValue(e.getValue());
+
+            }
+
+            final IVCount[] a = counts.values().toArray(
+                    new IVCount[counts.size()]);
+
+            // Order by descending count.
+            Arrays.sort(a);
+
+            return a;
+
+        } finally {
+
+            itr2.close();
+
+        }
+
+    }
+
+    /**
+     * Return an efficient statistical summary for the class partitions. The
+     * SPARQL query for this is
+     * 
+     * <pre>
+     * SELECT  ?class (COUNT(?s) AS ?count ) { ?s a ?class } GROUP BY ?class ORDER BY ?count
+     * </pre>
+     * 
+     * However, it is much efficient to scan POS for
+     * 
+     * <pre>
+     * rdf:type ?o ?s
+     * </pre>
+     * 
+     * and report the range count of
+     * 
+     * <pre>
+     * rdf:type ?o ?s
+     * </pre>
+     * 
+     * for each distinct value of <code>?o</code>.
+     * 
+     * @param kb
+     *            The KB instance.
+     *            
+     * @return The class usage statistics.
+     */
+    private static IVCount[] classUsage(final AbstractTripleStore kb) {
+
+        if (kb.getSPORelation().oneAccessPath) {
+
+            // The necessary index (POS or POCS) does not exist.
+            throw new UnsupportedOperationException();
+
+        }
+
+        final boolean quads = kb.isQuads();
+
+        final SPOKeyOrder keyOrder = quads ? SPOKeyOrder.POCS : SPOKeyOrder.POS;
+
+        // Resolve IV for rdf:type
+        final BigdataURI rdfType = kb.getValueFactory().asValue(RDF.TYPE);
+
+        kb.getLexiconRelation().addTerms(new BigdataValue[] { rdfType },
+                1/* numTerms */, true/* readOnly */);
+
+        if (rdfType.getIV() == null) {
+
+            // No rdf:type assertions since rdf:type is unknown term.
+            return new IVCount[0];
+
+        }
+        
+        // visit distinct term identifiers for the rdf:type predicate.
+        @SuppressWarnings("rawtypes")
+        final IChunkedIterator<IV> itr = kb
+                .getSPORelation()
+                .distinctMultiTermScan(keyOrder, new IV[] { rdfType.getIV() }/* knownTerms */);
+
+        // resolve term identifiers to terms efficiently during iteration.
+        final BigdataValueIterator itr2 = new BigdataValueIteratorImpl(
+                kb/* resolveTerms */, itr);
+
+        try {
+
+            final Set<IV<?,?>> ivs = new LinkedHashSet<IV<?,?>>();
+
+            final Map<IV<?, ?>, IVCount> counts = new LinkedHashMap<IV<?, ?>, IVCount>();
+
+            while (itr2.hasNext()) {
+
+                final BigdataValue term = itr2.next();
+
+                final IV<?,?> iv = term.getIV();
+
+                final long n = kb
+                        .getSPORelation()
+                        .getAccessPath(null, rdfType.getIV()/* p */, iv/* o */,
+                                null).rangeCount(false/* exact */);
+
+                ivs.add(iv);
+                
+                counts.put(iv, new IVCount(iv, n));
+
+            }
+
+            // Batch resolve IVs to Values
+            final Map<IV<?, ?>, BigdataValue> x = kb.getLexiconRelation()
+                    .getTerms(ivs);
+
+            for (Map.Entry<IV<?, ?>, BigdataValue> e : x.entrySet()) {
+
+                final IVCount count = counts.get(e.getKey());
+
+                count.setValue(e.getValue());
+
+            }
+
+            final IVCount[] a = counts.values().toArray(
+                    new IVCount[counts.size()]);
+
+            // Order by descending count.
+            Arrays.sort(a);
+
+            return a;
+
+        } finally {
+
+            itr2.close();
+
+        }
+
+    }
+
 }

Modified: branches/BIGDATA_RELEASE_1_2_0/bigdata-war/src/html/index.html
===================================================================
--- branches/BIGDATA_RELEASE_1_2_0/bigdata-war/src/html/index.html	2012-07-21 18:37:51 UTC (rev 6387)
+++ branches/BIGDATA_RELEASE_1_2_0/bigdata-war/src/html/index.html	2012-07-23 20:18:34 UTC (rev 6388)
@@ -21,7 +21,7 @@
 <dt>http://hostname:port/bigdata</dt>
 <dd>This page.</dd>
 <dt>http://hostname:port/bigdata/sparql</dt>
-<dd>The SPARQL REST API.</dd>
+<dd>The SPARQL REST API (<a href="sparql">Service Description</a>).</dd>
 <dt>http://hostname:port/bigdata/status</dt>
 <dd>A <a href="status">status</a> page.</dd>
 <dt>http://hostname:port/bigdata/counters</dt>

This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.





[Bigdata-commit] SF.net SVN: bigdata:[6388] branches/BIGDATA_RELEASE_1_2_0

Fast, scalable, robust graph database platform

[Bigdata-commit] SF.net SVN: bigdata:[6388] branches/BIGDATA_RELEASE_1_2_0