Multithreaded directory scanning-performance

Help
Andre
2009-06-10
2012-10-08
  • Andre
    Andre
    2009-06-10

    Hello Michael,

    Before one year we had performance problems with huge documents and mass of mathematical aggregations. With that I've contact you directly by mail for solutions like multithreaded options. setMutithreading(true), a function within your API, was no solution and you answered that you are working this year on that.

    My vision is that for-each statements should work parallel when the working for-each xsl-Subtree has no xsl:variables of the prior xsl execution context.
    Theoretical is managable, practically with threadpool options ...

    Is that feature in the roadmap ? ?

    Regards, Andre, Berlin

     
    • Michael Kay
      Michael Kay
      2009-06-10

      No, I'm afraid I haven't made any progress on multithreading in Saxon yet. I've been concentrating on streaming of large documents (that is, processing them without building a tree in memory), which may also give you benefits. One of the reasons for this is that the implementation is closely tracking the work we're doing in the W3C.

       
    • Andre
      Andre
      2009-06-10

      < merci , The world is Nice />

      Regards, Andre

       
    • Andre
      Andre
      2009-06-10

      // can work parallel too. Actually parallel // processing is multithreaded directory scanning, but the for-each parallelism is in the same topic..

       
    • Andre
      Andre
      2009-06-10

      We wrote our own threadpools before ExecutorService was available and still are using this .. For-Each parallelism is a highly missing thing and would give best boosting, especially with underling aggregations.

      Please think about it. //* makes no sense, but for-each .. hm...

      Currently 6core cpus are available, 8,12,16,24 cores will follow.. I think microsoft will publish its xpath 2.0 & xslt 2.0 implementation in parallel mode too.

      with 15, before 13 years ive developed 3d demos in turbo pascal and the time of recompiling will come.
      I am a very deep DB2 fan, because they do all in parallel due enterprise is the base.

      its your child.

      Best Regards, Andre

       
      • Michael Kay
        Michael Kay
        2009-06-10

        I'm sure you're right that there are use cases that would benefit greatly from multithreading. I'm not confident that the optimizer can make good decisions on this, my first idea is to do it under user control (e.g. an extension attribute on xsl:for-each). But I'm afraid it hasn't made it into the release which I'm trying to get finished at the moment.