Menu

Optimize <script> blocks

Help
2011-11-06
2012-09-04
  • Steven P. Goldsmith

    Is there any way to improve on the performance of <script> blocks? I noticed
    it doubled the time because I wanted a List of Maps returning data to be
    process further in a Java program.

        <!-- List of Maps will be returned -->
    
        <script>
            <![CDATA[
            List list = new ArrayList();
            java.text.SimpleDateFormat simpleDateFormat = new java.text.SimpleDateFormat("MM/dd/yyyy hh:ss");
            ]]>
        </script>
    
         <!-- Parse field elements into variable and store in Map -->
    
        <loop item="item" index="i" empty="true">
            <list>
                <var name="callXml"/>
            </list>
            <body>
                <script>
                    <![CDATA[
                    Map map = new HashMap();
                    ]]>
                </script>
    
                <var-def name="incident">
                    <xpath expression="//span[@name='INCIDENT']/text()">
                        <var name="item"/>
                    </xpath>
                </var-def>
    
                <script>
                    <![CDATA[
                    map.put("event_num", incident.toString());
                    ]]>
                </script>
    
                <var-def name="date">
                    <xpath expression="//span[@name='DATE_TIME']/text()">
                        <var name="item"/>
                    </xpath>
                </var-def>
    
                <var-def name="time">
                    <xpath expression="//span[@name='eTIME']/text()">
                        <var name="item"/>
                    </xpath>
                </var-def>
    
                <!-- Convert date and time strings to Java Date -->
                <script>
                    <![CDATA[
                    Date timestamp = simpleDateFormat.parse(date.toString()+" "+time.toString());
                    map.put("event_time", timestamp);
                    ]]>
                </script>
    
                <var-def name="description">
                    <xpath expression="//span[@name='DESCRIPTION']/text()">
                        <var name="item"/>
                    </xpath>
                </var-def>
    
                <script>
                    <![CDATA[
                    map.put("description", description.toString());
                    ]]>
                </script>
    
                <var-def name="street">
                    <xpath expression="//span[@name='STREET']/text()">
                        <var name="item"/>
                    </xpath>
                </var-def>
    
                <script>
                    <![CDATA[
                    map.put("location_main", street.toString());
                    ]]>
                </script>
    
                <var-def name="subdivision">
                    <xpath expression="//span[@name='SUBDIVISION']/text()">
                        <var name="item"/>
                    </xpath>
                </var-def>
    
                <script>
                    <![CDATA[
                    if (subdivision.toString().equals("")) {
                        map.put("location_alt1", null);
                    } else {
                        map.put("location_alt1", subdivision.toString());
                    }
                    list.add(map);
                    ]]>
                </script>            
            </body>
        </loop>
    
        <!-- Put list into WebHarvest variable context -->
        <script>
            <![CDATA[
            sys.defineVariable("callList", list);
            ]]>
        </script>
    
     
  • Alex Wajda

    Alex Wajda - 2011-11-17

    I think the problem is in how the collection type variables are handled in WH.
    According to the original idea when the collection is put into the context a
    shallow copy is created and each item of the copy is wrapped by Variable
    instance. That's what is happening when you call sys.defineVariable(..)
    method.

    Partially this issue was addressed in WH 2.1 by introducing ScriptingVariable
    and making native dynamic language context integrated with WH context, so that
    there is no need to explicitly pass variables to and from the <script> blocks.
    All the variables created at the top level of the <script> scope and passed
    by reference
    to the corresponding WH scope. That not only gives you
    flexibility and removes clutter related to passing data around the <script>
    blocks, but also it performs better as no shallow copies are created for the
    collection type variables passed to/from scripts.

    Consider the example:

            <script>
                list = ["foo", "bar"]
            </script>
    
            <script>
                list += "baz"   // the 'list' variable is passed by reference
            </script>
    
            <get var="list"/> <!-- the 'list' variable is passed by reference -->
    

    Also note that <script> blocks returns the value of the last statement and
    that value is passed to the WH context in a common way, i.e. the shallow copy
    of that value will be created. Currently there is no way to tell <script> to
    ignore the execution result (that's another item for my todo list), so if
    'list' is a long collection you would need to make sure it is not implicitly
    returned from the <script> block. In the example above simply adding some
    stupid empty value statement (like '' or null) in the end of the script block
    would do the trick.

            <script>
                list = ["foo", "bar"]
                ''
            </script>
    
            <script>
                list += "baz"
                null
            </script>
    
            <get var="list"/>
    
     
  • Alex Wajda

    Alex Wajda - 2011-11-17

    I'm thinking of deprecating the idea of total wrapping all the variables by
    Variable objects. The author says it was done that way to allow users not to
    care about NPEs or variable types when massaging the data. But on practice it
    doesn't actually work as expected. You will get "variable not defined" error
    if trying to access non-existing variable and in the <script> as well as ${}
    blocks you have to explicitly unwrap any variable and use one of .toXXX()
    methods to get the actual value before you are able to use it, which implies
    that you have to be aware of the variable type. In other words, to me,
    revealing Variable object to the user creates more problems then it solves and
    it should be either removed or hidden from the user, so instead of writing
    ${x.toInt() < y.toInt()} I would simply write ${x < y} and so forth.

     

Log in to post a comment.