Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property Paths

Help
2014-03-20
2014-07-23
  • Andreas Kahl

    Andreas Kahl - 2014-03-20

    Bigdata crashes on this query:

    SELECT * WHERE {
        ?s rdf:type* ?o
    } LIMIT 1
    

    I've set a query timeout of 300000ms in web.xml. This timeout is being ignored and the JVM (1.7) runs into a OutOfMemoryError after roughly 25mins. Using Java 1.6 this happens much quicker (~3-5mins).
    Also, running this query, Bigdata does not respond to any other requests. After the OutOfMemoryError, no queries are served until a tomcat (6.0.18) restart.

    With other queries (like COUNTs) the query timeout works perfectly, and other parallel queries work, too.

    What can I do to make either the query timeout effective for all kinds of SPARQL queries or avoid the OutOfMemoryError?

     
    • Bryan Thompson

      Bryan Thompson - 2014-03-20

      What bigdata version and what data set? The most likely explanation for igorning the timeout is that the property path operator is memory bound, not doing IO and not noticing the interrupt by the query engine. I née to check with mike about the property path implementation, but it might be brining in all values for ?s and then running them all to a fixed point. That would definitely stress memory and the rdf:type hierarchy would wind up in memory as well. If this is the explanation, it could be fixed by vectoring the input into the property path operator. We might also want to develop a variant of the property path operator that uses native memory.

      How much ram are you giving the jvm?

      Adding if(Thread.isInterrupted()) throw new InterruptedException() into the property path operator will probably fix this,

      Bryan

      On Mar 20, 2014, at 4:07 AM, "Andreas Kahl" andreaskahl@users.sf.net<mailto:andreaskahl@users.sf.net> wrote:

      Bigdata crashes on this query:
      SELECT * WHERE {
      ?s rdf:type* ?o
      } LIMIT 1
      I've set a query timeout of 300000ms in web.xml. This timeout is being ignored and the JVM (1.7) runs into a OutOfMemoryError after roughly 25mins. Using Java 1.6 this happens much quicker (~3-5mins).
      Also, running this query, Bigdata does not respond to any other requests. After the OutOfMemoryError, no queries are served until a tomcat (6.0.18) restart.

      With other queries (like COUNTs) the query timeout works perfectly, and other parallel queries work, too.

      What can I do to make either the query timeout effective for all kinds of SPARQL queries or avoid the OutOfMemoryError?


      Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property Pathshttps://sourceforge.net/p/bigdata/discussion/676946/thread/0df3c970/?limit=25#6d8e


      Sent from sourceforge.nethttp://sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
      • Bryan Thompson

        Bryan Thompson - 2014-03-20

        I think that this is due to GC pause rather than a failure to ignore interrupts. Use jstat -gc or a visual JVM inspector to look at the GC time. It is probably going through the roof. Also, what is the underlying OOM message? GC overhead exceeded or out of managed object space (send the stack trace).

        Can you go to the /status page and copy a snapshot of this running query (click through on query details) early on during the run and then another snapshot after at least 10 seconds?

        There are a number of query engine parameters available on a per PipelineOp basis that might help out here. These allow you to limit the parallelism, the input size for an operator, and the data queued up in the query engine for an operator. This should let you bound the CPU and memory resources.

        /**

        • This option may be used to place an optional limit on the #of

        • concurrent tasks which may run for the same (bopId,shardId) for a

        • given query (default {@value #DEFAULT_MAX_PARALLEL}). The query is

        • guaranteed to make progress as long as this is some positive integer.

        • While limiting this value can limit the concurrency with which

        • certain operators are evaluated and that can have a negative effect

        • on the throughput, it controls both the demand on the JVM heap and

        • the #of threads consumed.

        • Note: {@link #MAX_PARALLEL} is the annotation for pipelined joins

        • which has the strongest effect on performance. Changes to both

        • {@link #MAX_MESSAGES_PER_TASK} and {@link #PIPELINE_QUEUE_CAPACITY}

        • have less effect and performance tends to be best around a modest

        • value (10) for those annotations.

        */

        String MAX_PARALLEL = PipelineOp.class.getName() + ".maxParallel";

        /**

        • @see #MAX_PARALLEL

        */

        int DEFAULT_MAX_PARALLEL = 5;

        /**

        • For a pipelined operator, this is the maximum number of messages that

        • will be assigned to a single invocation of the evaluation task for

        • that operator (default {@value #DEFAULT_MAX_MESSAGES_PER_TASK}). By

        • default the {@link QueryEngine} MAY (and generally does) combine

        • multiple {@link IChunkMessage}s from the work queue of an operator

        • for each evaluation pass made for that operator. When ONE (1), each

        • {@link IChunkMessage} will be assigned to a new evaluation task for

        • the operator. The value of this annotation must be a positive

        • integer. If the operator is not-pipelined, then the maximum amount of

        • data to be assigned to an evaluation task is governed by

        • {@link #MAX_MEMORY} instead.

        */

        String MAX_MESSAGES_PER_TASK = PipelineOp.class.getName()
        
                + ".maxMessagesPerTask";
        

        /**

        • @see #MAX_MESSAGES_PER_TASK

        */

        int DEFAULT_MAX_MESSAGES_PER_TASK = 10;
        

        /**

        • For pipelined operators, this is the capacity of the input queue for

        • that operator. Producers will block if the input queue for the target

        • operator is at its capacity. This provides an important limit on the

        • amount of data which can be buffered on the JVM heap during pipelined

        • query evaluation.

        */

        String PIPELINE_QUEUE_CAPACITY = PipelineOp.class.getName()

        • ".pipelineQueueCapacity";

        /**

        • @see #PIPELINE_QUEUE_CAPACITY

        */

        int DEFAULT_PIPELINE_QUEUE_CAPACITY = 10;

        Another thing is to use the G1 (Java 7) or incremental GC modes.

        But I need to see the query plan and how it is executing to understand where the memory demand is. Or take a snapshot of the JVM heap through a profiler and provide an analysis of the heap.

        Thanks,
        Bryan

        From: Bryan Thompson thompsonbry@users.sf.net<mailto:thompsonbry@users.sf.net>
        Reply-To: "[bigdata:discussion]" 676946@discussion.bigdata.p.re.sf.net<mailto:676946@discussion.bigdata.p.re.sf.net>
        Date: Thursday, March 20, 2014 6:14 AM
        To: "[bigdata:discussion]" 676946@discussion.bigdata.p.re.sf.net<mailto:676946@discussion.bigdata.p.re.sf.net>
        Subject: [bigdata:discussion] Re: Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property Paths

        What bigdata version and what data set? The most likely explanation for igorning the timeout is that the property path operator is memory bound, not doing IO and not noticing the interrupt by the query engine. I née to check with mike about the property path implementation, but it might be brining in all values for ?s and then running them all to a fixed point. That would definitely stress memory and the rdf:type hierarchy would wind up in memory as well. If this is the explanation, it could be fixed by vectoring the input into the property path operator. We might also want to develop a variant of the property path operator that uses native memory.

        How much ram are you giving the jvm?

        Adding if(Thread.isInterrupted()) throw new InterruptedException() into the property path operator will probably fix this,

        Bryan

        On Mar 20, 2014, at 4:07 AM, "Andreas Kahl" andreaskahl@users.sf.netandreaskahl@users.sf.net<mailto:andreaskahl@users.sf.net<mailto:andreaskahl@users.sf.net> wrote:

        Bigdata crashes on this query:
        SELECT * WHERE {
        ?s rdf:type* ?o
        } LIMIT 1
        I've set a query timeout of 300000ms in web.xml. This timeout is being ignored and the JVM (1.7) runs into a OutOfMemoryError after roughly 25mins. Using Java 1.6 this happens much quicker (~3-5mins).
        Also, running this query, Bigdata does not respond to any other requests. After the OutOfMemoryError, no queries are served until a tomcat (6.0.18) restart.

        With other queries (like COUNTs) the query timeout works perfectly, and other parallel queries work, too.

        What can I do to make either the query timeout effective for all kinds of SPARQL queries or avoid the OutOfMemoryError?


        Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property Pathshttps://sourceforge.net/p/bigdata/discussion/676946/thread/0df3c970/?limit=25#6d8e


        Sent from sourceforge.nethttp://sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/

        To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/


        Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property Pathshttp://sourceforge.net/p/bigdata/discussion/676946/thread/0df3c970/?limit=25#6d8e/80ef


        Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/

        To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

         
        • Bryan Thompson

          Bryan Thompson - 2014-03-20

          Refactoring the arbitrary length path op somewhat might reveal more about where it is getting into trouble,eg. by pushing down some loops into methods to give the code a little more structure. That would let us understand more of what is happening from thread dumps.
          Bryan

          On Mar 20, 2014, at 7:09 AM, "Bryan Thompson" thompsonbry@users.sf.net<mailto:thompsonbry@users.sf.net> wrote:

          I think that this is due to GC pause rather than a failure to ignore interrupts. Use jstat -gc or a visual JVM inspector to look at the GC time. It is probably going through the roof. Also, what is the underlying OOM message? GC overhead exceeded or out of managed object space (send the stack trace).

          Can you go to the /status page and copy a snapshot of this running query (click through on query details) early on during the run and then another snapshot after at least 10 seconds?

          There are a number of query engine parameters available on a per PipelineOp basis that might help out here. These allow you to limit the parallelism, the input size for an operator, and the data queued up in the query engine for an operator. This should let you bound the CPU and memory resources.

          /**

          • This option may be used to place an optional limit on the #of

          • concurrent tasks which may run for the same (bopId,shardId) for a

          • given query (default {@value #DEFAULT_MAX_PARALLEL}). The query is

          • guaranteed to make progress as long as this is some positive integer.

          • While limiting this value can limit the concurrency with which

          • certain operators are evaluated and that can have a negative effect

          • on the throughput, it controls both the demand on the JVM heap and

          • the #of threads consumed.

          *

          • Note: {@link #MAX_PARALLEL} is the annotation for pipelined joins

          • which has the strongest effect on performance. Changes to both

          • {@link #MAX_MESSAGES_PER_TASK} and {@link #PIPELINE_QUEUE_CAPACITY}

          • have less effect and performance tends to be best around a modest

          • value (10) for those annotations.

          */

          String MAX_PARALLEL = PipelineOp.class.getName() + ".maxParallel";

          /**

          • @see #MAX_PARALLEL

          */

          int DEFAULT_MAX_PARALLEL = 5;

          /**

          • For a pipelined operator, this is the maximum number of messages that

          • will be assigned to a single invocation of the evaluation task for

          • that operator (default {@value #DEFAULT_MAX_MESSAGES_PER_TASK}). By

          • default the {@link QueryEngine} MAY (and generally does) combine

          • multiple {@link IChunkMessage}s from the work queue of an operator

          • for each evaluation pass made for that operator. When ONE (1), each

          • {@link IChunkMessage} will be assigned to a new evaluation task for

          • the operator. The value of this annotation must be a positive

          • integer. If the operator is not-pipelined, then the maximum amount of

          • data to be assigned to an evaluation task is governed by

          • {@link #MAX_MEMORY} instead.

          */

          String MAX_MESSAGES_PER_TASK = PipelineOp.class.getName()

              + ".maxMessagesPerTask";
          

          /**

          • @see #MAX_MESSAGES_PER_TASK

          */

          int DEFAULT_MAX_MESSAGES_PER_TASK = 10;

          /**

          • For pipelined operators, this is the capacity of the input queue for

          • that operator. Producers will block if the input queue for the target

          • operator is at its capacity. This provides an important limit on the

          • amount of data which can be buffered on the JVM heap during pipelined

          • query evaluation.

          */

          String PIPELINE_QUEUE_CAPACITY = PipelineOp.class.getName()

          • ".pipelineQueueCapacity";

          /**

          • @see #PIPELINE_QUEUE_CAPACITY

          */

          int DEFAULT_PIPELINE_QUEUE_CAPACITY = 10;

          Another thing is to use the G1 (Java 7) or incremental GC modes.

          But I need to see the query plan and how it is executing to understand where the memory demand is. Or take a snapshot of the JVM heap through a profiler and provide an analysis of the heap.

          Thanks,
          Bryan

          From: Bryan Thompson thompsonbry@users.sf.netthompsonbry@users.sf.net<mailto:thompsonbry@users.sf.net<mailto:thompsonbry@users.sf.net>
          Reply-To: "[bigdata:discussion]" 676946@discussion.bigdata.p.re.sf.net676946@discussion.bigdata.p.re.sf.net<mailto:676946@discussion.bigdata.p.re.sf.net<mailto:676946@discussion.bigdata.p.re.sf.net>
          Date: Thursday, March 20, 2014 6:14 AM
          To: "[bigdata:discussion]" 676946@discussion.bigdata.p.re.sf.net676946@discussion.bigdata.p.re.sf.net<mailto:676946@discussion.bigdata.p.re.sf.net<mailto:676946@discussion.bigdata.p.re.sf.net>
          Subject: [bigdata:discussion] Re: Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property Paths

          What bigdata version and what data set? The most likely explanation for igorning the timeout is that the property path operator is memory bound, not doing IO and not noticing the interrupt by the query engine. I née to check with mike about the property path implementation, but it might be brining in all values for ?s and then running them all to a fixed point. That would definitely stress memory and the rdf:type hierarchy would wind up in memory as well. If this is the explanation, it could be fixed by vectoring the input into the property path operator. We might also want to develop a variant of the property path operator that uses native memory.

          How much ram are you giving the jvm?

          Adding if(Thread.isInterrupted()) throw new InterruptedException() into the property path operator will probably fix this,

          Bryan

          On Mar 20, 2014, at 4:07 AM, "Andreas Kahl" andreaskahl@users.sf.netandreaskahl@users.sf.netandreaskahl@users.sf.netandreaskahl@users.sf.net<mailto:andreaskahl@users.sf.net<mailto:andreaskahl@users.sf.net<mailto:andreaskahl@users.sf.net<mailto:andreaskahl@users.sf.net> wrote:

          Bigdata crashes on this query:
          SELECT * WHERE {
          ?s rdf:type* ?o
          } LIMIT 1
          I've set a query timeout of 300000ms in web.xml. This timeout is being ignored and the JVM (1.7) runs into a OutOfMemoryError after roughly 25mins. Using Java 1.6 this happens much quicker (~3-5mins).
          Also, running this query, Bigdata does not respond to any other requests. After the OutOfMemoryError, no queries are served until a tomcat (6.0.18) restart.

          With other queries (like COUNTs) the query timeout works perfectly, and other parallel queries work, too.

          What can I do to make either the query timeout effective for all kinds of SPARQL queries or avoid the OutOfMemoryError?


          Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property Pathshttps://sourceforge.net/p/bigdata/discussion/676946/thread/0df3c970/?limit=25#6d8e


          Sent from sourceforge.nethttp://sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/

          To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/


          Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property Pathshttp://sourceforge.net/p/bigdata/discussion/676946/thread/0df3c970/?limit=25#6d8e/80ef


          Sent from sourceforge.nethttp://sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/

          To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/


          Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property Pathshttp://sourceforge.net/p/bigdata/discussion/676946/thread/0df3c970/?limit=25#6d8e/80ef/649b


          Sent from sourceforge.nethttp://sourceforge.net because you indicated interest in https://sourceforge.net/p/bigdata/discussion/676946/

          To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

           
  • Andreas Kahl

    Andreas Kahl - 2014-03-20

    Bryan thank you very much for the analysis. Please find attached:
    - catalina.out containing a StackTrace from the crash
    - the query plan, the query plan after 28secs runtime
    - a screenshot from VisualVM showing some strange GC acitivity

    Some information about the dataset & JVM settings:
    Size: ~800,000,000 Statements, Bigdata 1.3.0
    JVM-settings: -server -Xms7g -Xmx7g -XX:+UseG1GC (+JMX-Parameters)
    Machine: 4 CPU, 12 GB RAM

     
    Last edit: Andreas Kahl 2014-03-20
  • Andreas Kahl

    Andreas Kahl - 2014-03-22

    The ticket was created: http://trac.bigdata.com/ticket/865
    I will check whether the error is reproducible on smaller dataset (less than 100,000,000 statements) and report results here.

     
  • Andreas Kahl

    Andreas Kahl - 2014-03-24

    With a smaller dataset, the problem does not occur. I ran the query on 25,177,648 triples and got a result. Java's -Xmx was at 4GB.

     
  • Andreas Kahl

    Andreas Kahl - 2014-03-25

    Now I've tried

    -server -Xms7g -Xmx7g -XX:+UseConcMarkSweepGC -XX:-UseParNewGC -XX:+CMSIncrementalMode
    

    That does not help avoiding the OOME, but interestingly the Tomcat running Bigdata did not crash completely as before - Bigdata still answered queries after the OOME without restart.

    Also I reduced queryThreadPoolSize in web.xml from 16 to 4; the only difference I could see from this was a slower increase in heap usage, but it still ran into an OOME.

     
  • Bryan Thompson

    Bryan Thompson - 2014-07-11

    Were you able to work around this problem?

     
  • Andreas Kahl

    Andreas Kahl - 2014-07-12

    We have a workaround, but it is rather ugly: I wrote a cron script sending a ASK query every 5mins; if no response is retrieved in 15secs, it restarts tomcat.
    That sometimes happens every 15min; escpecially for various queries submitted by the Sparqles service: http://sparqles.okfn.org/endpoint/http%3A%2F%2Flod.b3kat.de%2Fsparql (The red dots are queries that killed one of the tomcats).

    As we have read only mode activated and two servers load balanced. This still produces a quite good availability. (Updates are indexed on an internal machine, and the journal-file is then copied (scp) to the query servers)

    We will go for daily updates in the future, and those should be run via SPARQL Update - so there will be a point when frequent restarts are a problem.

    Just recently we updated to Bigdata 1.3.1 (Thanks for the great new Web-UI), running on Java 1.7 and Tomcat8. Current JAVA_OPTS are:
    -server -Xms7g -Xmx7g -XX:+UseG1GC -Djava.awt.headless=true

    Also we started to use custom VocabularyClasses for our most frequent Vocabularies as you recommended in an earlier post (the one about IOPerformance).

    So, the test set of queries from Sparqles seems very helpful to me and it would be very desirable to get Bigdata to run stable through all those queries (running into a configured query timeout would be no problem - that's what it's for), not only for small datasets, but for large(r) ones, too.

     
    • Mike Personick

      Mike Personick - 2014-07-12

      Does it always die on a property path query?


      Mike Personick
      Managing Partner
      Systap, LLC
      www.systap.com
      801-243-3678
      skype: mike.personick

      On Sat, Jul 12, 2014 at 1:16 AM, Andreas Kahl andreaskahl@users.sf.net
      wrote:

      We have a workaround, but it is rather ugly: I wrote a cron script sending
      a ASK query every 5mins; if no response is retrieved in 15secs, it restarts
      tomcat.
      That sometimes happens every 15min; escpecially for various queries
      submitted by the Sparqles service:
      http://sparqles.okfn.org/endpoint/http%3A%2F%2Flod.b3kat.de%2Fsparql (The
      red dots are queries that killed one of the tomcats).

      As we have read only mode activated and two servers load balanced. This
      still produces a quite good availability. (Updates are indexed on an
      internal machine, and the journal-file is then copied (scp) to the query
      servers)

      We will go for daily updates in the future, and those should be run via
      SPARQL Update - so there will be a point when frequent restarts are a
      problem.

      Just recently we updated to Bigdata 1.3.1 (Thanks for the great new
      Web-UI), running on Java 1.7 and Tomcat8. Current JAVA_OPTS are:
      -server -Xms7g -Xmx7g -XX:+UseG1GC -Djava.awt.headless=true

      Also we started to use custom VocabularyClasses for our most frequent
      Vocabularies as you recommended in an earlier post (the one about
      IOPerformance).

      So, the test set of queries from Sparqles seems very helpful to me and it
      would be very desirable to get Bigdata to run stable through all those
      queries (running into a configured query timeout would be no problem -
      that's what it's for), not only for small datasets, but for large(r) ones,
      too.


      Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property
      Paths
      https://sourceforge.net/p/bigdata/discussion/676946/thread/0df3c970/?limit=25#8ef4


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Andreas Kahl

    Andreas Kahl - 2014-07-13

    At the moment, I know only the Property Path query and I reproduced the OOME on the http://lod.b3kat.de dataset with Bigdata 1.3.1, Java7, Tomcat8. I tried again using the incremental gc without success, too (Normally we use the G1GC).

    With Bigdata 1.3.0 on Java/Tomcat6 I had for example this query running into the error, too:
    CONSTRUCT {?x rdf:type ?v}
    WHERE {
    ?x rdf:type ?o.
    ?o rdf:type ?x
    }
    LIMIT 3
    But for the current setup this query ran into a Timeout correctly, leaving tomcat responsive.

    I tried some of the other queries from Sparqles, but they seem to correctly run into the timeout (we have set 300,000ms).

    There is only one more thing: there are some DESCRIBEs that produce very large result sets (several 100 MBs). I would love to limit the maximum amount of triples/solutions returned for a SPARQL-Query to avoid that DESCRIBE is being called from outside.

     
    Last edit: Andreas Kahl 2014-07-13
    • Bryan Thompson

      Bryan Thompson - 2014-07-13

      The default query hints parameters (see below) that restrict the work
      performed by DESCRIBE are in QueryHints.java and defined here
      http://wiki.bigdata.com/wiki/index.php/QueryHints

      Are these settings failing to limit the #of DESCRIBE iterations or the #of
      statements accumulated by DESCRIBE?

      Are these settings being explicitly overridden by user queries using query
      hints?

      Or do you have some very large literals in the result?

      Thanks,
      Bryan

          /**
      
           * Query hint controls the manner in which a DESCRIBE query is
      evaluated.
      
           *
      
           * *@see* DescribeModeEnum
      
           * *@see* #DEFAULT_DESCRIBE_MODE
      
           * *@see* 
      
           *      Concise Bounded Description 
      
           */
      
          String DESCRIBE_MODE = "describeMode";
      
          DescribeModeEnum DEFAULT_DESCRIBE_MODE =
      DescribeModeEnum.SymmetricOneStep;
      
          /**
      
           * For iterative {@link DescribeModeEnum}s, this property places a
      limit on
      
           * the number of iterative expansions that will be performed before the
      
           * DESCRIBE query is cut off, providing that the limit on the maximum
      #of
      
           * statements in the description is also satisfied (the cut off requires
      
           * that both limits are reached).  May be ZERO (0) for NO limit.
      
           *
      
           * *@see* #DESCRIBE_MODE
      
           * *@see* #DESCRIBE_STATEMENT_LIMIT
      
           */
      
          String DESCRIBE_ITERATION_LIMIT = "describeIterationLimit";
      
          *int* DEFAULT_DESCRIBE_ITERATION_LIMIT = 5;
      
          /**
      
           * For iterative {@link DescribeModeEnum}s, this property places a
      limit on
      
           * the number of statements that will be accumulated before the DESCRIBE
      
           * query is cut off, providing that the limit on the maximum #of
      iterations
      
           * in the description is also satisfied (the cut off requires that both
      
           * limits are reached). May be ZERO (0) for NO limit.
      
           *
      
           * *@see* #DESCRIBE_MODE
      
           * *@see* #DESCRIBE_ITERATION_LIMIT
      
           */
      
          String DESCRIBE_STATEMENT_LIMIT = "describeStatementLimit";
      
          *int* DEFAULT_DESCRIBE_STATEMENT_LIMIT = 5000;
      

      Bryan Thompson
      Chief Scientist & Founder
      SYSTAP, LLC
      4501 Tower Road
      Greensboro, NC 27410
      bryan@systap.com
      http://bigdata.com
      http://mapgraph.io

      CONFIDENTIALITY NOTICE: This email and its contents and attachments are
      for the sole use of the intended recipient(s) and are confidential or
      proprietary to SYSTAP. Any unauthorized review, use, disclosure,
      dissemination or copying of this email or its contents or attachments is
      prohibited. If you have received this communication in error, please notify
      the sender by reply email and permanently delete all copies of the email
      and its contents and attachments.

      On Sun, Jul 13, 2014 at 5:44 AM, Andreas Kahl andreaskahl@users.sf.net
      wrote:

      At the moment, I know only the Property Path query and I reproduced the
      OOME on the http://lod.b3kat.de dataset with Bigdata 1.3.1, Java7,
      Tomcat8. I tried again using the incremental gc without success, too
      (Normally we use the G1GC).

      I tried some of the other queries from Sparqles, but they seem to
      correctly run into the timeout (we have set 300,000ms).

      There is only one more thing: there are some DESCRIBEs that produce very
      large result sets (several 100 MBs). I would love to limit the maximum
      amount of triples/solutions returned for a SPARQL-Query to avoid that
      DESCRIBE is being called from outside.


      Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property
      Paths
      https://sourceforge.net/p/bigdata/discussion/676946/thread/0df3c970/?limit=25#7e3a


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
    • Bryan Thompson

      Bryan Thompson - 2014-07-13

      Looking back at the thread dump attached to the ticket, I see very little
      in the way of stack traces through bigdata. The only one in the property
      path code is this.

      "com.bigdata.journal.Journal.executorService5" - Thread t@43
         java.lang.Thread.State: RUNNABLE
      at com.bigdata.bop.bindingSet.ListBindingSet.copy(ListBindingSet.java:290)
      at com.bigdata.bop.bindingSet.ListBindingSet.<init>(ListBindingSet.java:267)
      at com.bigdata.bop.bindingSet.ListBindingSet.clone(ListBindingSet.java:325)
      at com.bigdata.bop.bindingSet.ListBindingSet.clone(ListBindingSet.java:43)
      at
      com.bigdata.bop.paths.ArbitraryLengthPathOp$ArbitraryLengthPathTask.processChunk(ArbitraryLengthPathOp.java:511)
      at
      com.bigdata.bop.paths.ArbitraryLengthPathOp$ArbitraryLengthPathTask.call(ArbitraryLengthPathOp.java:270)
      at
      com.bigdata.bop.paths.ArbitraryLengthPathOp$ArbitraryLengthPathTask.call(ArbitraryLengthPathOp.java:196)
      at java.util.concurrent.FutureTask.run(FutureTask.java:273)
      at
      com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTask.call(ChunkedRunningQuery.java:1281)
      at
      com.bigdata.bop.engine.ChunkedRunningQuery$ChunkTaskWrapper.run(ChunkedRunningQuery.java:836)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482)
      at java.util.concurrent.FutureTask.run(FutureTask.java:273)
      at com.bigdata.concurrent.FutureTaskMon.run(FutureTaskMon.java:63)
      at
      com.bigdata.bop.engine.ChunkedRunningQuery$ChunkFutureTask.run(ChunkedRunningQuery.java:731)
      at
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1156)
      at
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:626)
      at java.lang.Thread.run(Thread.java:804)
      

      I've added tests for interrupts to two locations in the processChunk()
      code. One corresponds to the point where this stack trace passes through
      processChunk(). The other corresponds to the point where the initial
      solutions are flowing into the property path operator. Both check for an
      interrupt every 10 solutions.

      Committed revision r8540.

      Please try that out and see if it fixes the problem.

      Thanks,
      Bryan

      On Sun, Jul 13, 2014 at 5:44 AM, Andreas Kahl andreaskahl@users.sf.net
      wrote:

      At the moment, I know only the Property Path query and I reproduced the
      OOME on the http://lod.b3kat.de dataset with Bigdata 1.3.1, Java7,
      Tomcat8. I tried again using the incremental gc without success, too
      (Normally we use the G1GC).

      I tried some of the other queries from Sparqles, but they seem to
      correctly run into the timeout (we have set 300,000ms).

      There is only one more thing: there are some DESCRIBEs that produce very
      large result sets (several 100 MBs). I would love to limit the maximum
      amount of triples/solutions returned for a SPARQL-Query to avoid that
      DESCRIBE is being called from outside.


      Bigdata runs into OutOfMemoryError instead of Timeout for SPARQL Property
      Paths
      https://sourceforge.net/p/bigdata/discussion/676946/thread/0df3c970/?limit=25#7e3a


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/bigdata/discussion/676946/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • Andreas Kahl

    Andreas Kahl - 2014-07-13

    Thanks for the hints :-) I wasn't aware of that. So if I want to set my own default I will just have to override this class - perfect.
    I am a bit surprised by the default of 5000 statements per DESCRIBE, which would be ok. I started thinking about large DESCRIBEs because I saw very long running Threads with such DESCRIBEs in TomcatManager - and normally Bigdata answers those in Milliseconds; so I started to think about data volume and a possible slow connection of the user issuing those queries.

    Then I sent a test query with curl to my internal machine:
    curl -H"Content-Type:application/x-www-form-urlencoded; charset=utf-8" -d "query=DESCRIBE http://lod.b3kat.de/bib/DE-12" -d"namespace=b3kat" -d"monitor=true" http://localhost:8080/bigdata/sparql >DESCRIBEDE12.xml

    The resulting RDF/XML file has more than 1.5GB (uncompressed) - after that I cancelled the request. So if the default of 5000 is not effective alone; as your documentation says, I will have to tune the iteration limit, too. But with your Class and the documentation I am confident that will work out on Monday.

     
  • Andreas Kahl

    Andreas Kahl - 2014-07-21

    We've updated to the patched revision. The error does not occur any more on our servers. Thank you very much, Bryan, for your support.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks