Menu

Using CEL

2009-08-15
2013-04-26
  • Edward Hoffman

    Edward Hoffman - 2009-08-15

    Mingsheng,

    In an effort to gain some level of fluency with CEL, I have:
    1. read the following articles  "Cayuga: A General Pupose Event Monitoring System" and "Cayuga: A High-Performance Event Processing Engine"
    2. studied all the CEL examples and dumped the antlr *.g files and the P2TokenTypes.

    With all that, Istill do not have a solid basis to build the following types of queries:
    1.  Average StockPrice over a fixed number of ticks
    2.  Exponential Moving Average over a fixed number of ticks
    3.  Volume weighted Average Price of a fixed number of ticks
    4. Standard Deviation of Stock price over a fixed number of ticks

    Questions
    1. I noticed that there are mathematical operators in CEL, are there some additional examples available that use them?
    2. Is there a  CEL manual?   
    3. Has anyone extended CEL to include  additional functions (e.g. SUM, MAX, MIN, AVG, STDDEV, etc.)?
    4. In the first article, I read that the Cornell Center for Investment Research is involved in using Cayuga for real-time analysis and correlation of external stock streaming data.  As I am evaluating the use of Cayuga for real-time stock streaming data is there someone that I can contact there to see if more relevant CEL examples, or, maybe they have extended the CEL language? 

    RSVP.

    Best Regards,

    Ed

    Edward Hoffman
    cadence@optonline.net

     
    • Mingsheng Hong

      Mingsheng Hong - 2009-08-17

      Ed,

      The materials you have read are currently the best references to CEL. All the major features are documented in these articles, and there aren't many additional details left over. If you have additional questions, feel free to post them here.

      Regarding the "Average StockPrice" query, let N be the number of ticks you want to average over. We can define a sequence query with length N (that is, use N-1 sequence operators to chain together N copies of the same ticker stream), and then compute the average value at the end of the sequence. Exponential Moving Average, and other types of mathematical formula can be handled in a similar way. However, if N is not fixed, this scheme will not work, and instead we have to use an iteration operator. Also, even when the above scheme works, its performance would be less than optimal (i.e., there is a more efficient way to do it than the way the query is executed by the Cayuga engine). Recall that Cayuga is optimized for PATTERN DETECTION, where each step of the pattern building is expected to involve certain predicate evaluation. The "sliding window average query" in comparison does not build a selective pattern -- it instead takes any consecutive sequence of N events (presumably on the same stock symbol), and computes the average (or some other variant of average). Of course, whether the performance of this query in Cayuga is acceptable depends on the actual workload.

      An alternative is to use a User-Defined Function (UDF) to do the computation. I will say more about UDFs later in this post, but for now, one thing I want to mention is that in order to gain the best performance out of the Cayuga engine, I encourage you to study the AIR query presentation, as a lower-level alternative to CEL. A non-trivial example involving arithematic computation in AIR is cayuga-system\Cayuga\test\examples\query\StockQueryMShapeMu.xml. The relationship between AIR and CEL is analogous to that between assembly and C language. AIR is not as well documented, and is perhaps more technically involved to understand than CEL, but again if you are interested, let us know where you get stuck, and we will supply help as we can.

      Extending Cayuga with user defined functions/operators is relatively simple. For example, the following query, which is Q2 used in our SIGMOD 07 blogsphere demo, uses a user defined predicate CONTAINS, which is defined in UDFs.cpp.

      SELECT date, 'Keyword Query' as subject, 'This is a key word query' as body, 'Keyword Query' as folder,
          2 as numRcvrs, rtype as rtype1, senderFN as rcvrFN1, senderLN as rcvrLN1, senderEM as rcvrEM1,
          rtype as rtype2, rcvrFN as rcvrFN2, rcvrLN as rcvrLN2, rcvrEM as rcvrEM2,
          1 as numRmids, mid as rmid1, rcvrEM as rmidEM1
      FROM FILTER{CONTAINS(body, 'Bomb') = 1} (enron)
      PUBLISH QL2

      A new UDF can then be programmatically added via function insertUDFs defined in UDFs.cpp. A better mechanism can of course be developed to add UDFs.

       

Log in to post a comment.

MongoDB Logo MongoDB