Menu

Help with script?

Help
cjake2299
2010-12-02
2012-09-17
  • cjake2299

    cjake2299 - 2010-12-02

    Hi all, I'm very green and would like some input from the community.

    I am using the cannon camera example to extract data from a webpage.

    The website does not use a traditional hyperlink for the "Mext" page, however
    the pages are encoded in the URL numerically, i.e. http://www.contoso.net/Sea
    rchResults.aspx?id=2&page=1

    I have been able to get the URL to function correctly on my own, but the issue
    I am having now is incrementing the page number.

    I tried the following, but it stops on the first page: (this is my modified
    functions.xml)

    <?xml version="1.0" encoding="UTF-8"?>
    
    <config>
        <!-- 
            Download multi-page list of items.
    
            @param pageUrl       - URL of starting page
            @param itemXPath     - XPath expression to obtain single item in the list
            @param nextXPath     - XPath expression to URL for the next page
            @param maxloops      - maximum number of pages downloaded
    
            @return list of all downloaded items
         -->
        <function name="download-multipage-list">
            <return>
                <while condition="${pageUrl.toString().length() != 0}" maxloops="${maxloops}" index="i">
                    <empty>
                        <var-def name="content">
                            <html-to-xml>
                                <http url="${pageUrl}"/>
                            </html-to-xml>
                        </var-def>
    
                        <var-def name="pageUrl">
                            <template>${org.apache.commons.httpclient.util.URIUtil.encodeQuery(pageUrl.toString())} + ${i}</template>
                        </var-def>
                    </empty>
    
                    <xpath expression="${itemXPath}">
                        <var name="content"/>
                    </xpath>
                </while>
            </return>
        </function>
    </config>
    
     
  • Alex Wajda

    Alex Wajda - 2010-12-03

    You need to use the same URL template for every loop iteration, but in you
    example you reassign 'pageUrl' every time and each following iteration uses
    the result of the previous iteration as a template. That's why you have 1, 12,
    123 instead of 1, 2, 3.

    Following your example, you need to repeat the loop body exact number of times
    (maxloops) unconditionally and the 'pageUrl' (better to name it pageUrlPrefix
    or something) should stay unchanged during the loop and only be appended by a
    different suffix every time.Try do this:

    <while condition="${true}" maxloops="${maxloops}" index="i">
        <empty>
            <var-def name="content">
                <html-to-xml>
                    <http url="${org.apache.commons.httpclient.util.URIUtil.encodeQuery(pageUrl.toString())}${i}"/>
                </html-to-xml>
            </var-def>
        </empty>
        <xpath expression="${itemXPath}">
            <var name="content"/>
        </xpath>
    </while>
    
     
  • cjake2299

    cjake2299 - 2010-12-03

    SWEET! It works now, and even better than before. Thank you for your
    assistance!

     

Log in to post a comment.