Menu

oG Search pagination using nextXPath &am...

Help
2012-07-30
2012-09-04
  • Montgomery Webster

    Could not find a relevant post that addressed my problem.

    Trying to use the Yahoo Shopping example for the oG Search Engine, but continue to only get a single page of results. Anyone seen this problem?

    The XPath is virtually the same for the "Next" links on each site and the
    example gets 10 pages of results. I initially used the Bookmaker odds example
    to grab the first page of results when I ran into trouble. Now trying to
    convert to a more robust solution. Appreciate any help...

    <?xml version="1.0" encoding="UTF-8"?>
    <!-- XPath: //div[2]/div/ol/li/div/div/div/cite -->
    <config>
    
        <include path="C:\Program Files (Downloads)\webharvest2b1-exe\examples\functions.xml"/>
    
        <!-- collects all domains -->
        <var-def name="domains">
            <call name="download-multipage-list">
                <call-param name="pageUrl">[url]http://www.google.com/search?cr=countryUS&amp;q=restaurant+australia</call-param[/url]>
                <call-param name="nextXPath">//a[@id="pnnext"]/@href</call-param>
                <call-param name="itemXPath">//cite</call-param>
                <call-param name="maxloops">10</call-param>
            </call>
        </var-def>
    
        <file action="write" path="GoogleES-Results.xml" charset="UTF-8">
            <![CDATA[ <links> ]]>
            <loop item="item" index="i">
                <list><var name="domains"/></list>
                <body>
                    <xquery>
                        <xq-param name="item" type="node()"><var name="item"/></xq-param>
                        <xq-expression><![CDATA[
                            declare variable $item as node() external;
    
                            return
                                <link>{replace(data($item), '[ \n\t]', '')}</link>
                        ]]></xq-expression>
                    </xquery>       
                </body>
            </loop>
            <![CDATA[ </links> ]]>
        </file>
    
    </config>
    
     
  • Enissay

    Enissay - 2012-07-30

    Before testing your code, you must know that google doesnt allow scrapping...
    therefore you must use their api

     
  • Enissay

    Enissay - 2012-07-31

    Also, there's a semicolon at the end of the 1st </call-param>;

    The code seems good, but the link provided isnt... even when chaging it, it
    still not working, I don't know how it works for you...

    I'm pretty sure google's scripts block scrapping on their google.com page In
    order to force people to use their api limited to 100 free query xD

     
  • Selvin Fehric

    Selvin Fehric - 2012-07-31

    Change code as following:

    let $variable := replace(data($item), '', '')

    return

    <link>{$variable}</link>

    instead of yours:

    return

    <link>{replace(data($item), '', '')}</link>

     
  • Montgomery Webster

    Thanks for the feedback guys.

    The Google Custom Search API will work for me as I just need one exhaustive
    search per day.

    Also, I apologize for not posting the correct copy of my code; always seem to
    post when I am most frustrated and have totaled the code. The xq-expression
    should be as follows:

    <xq-expression><![CDATA[
                        declare variable $doc as node() external;
                        for $URL in $doc//cite
    
                        return
                            <link>{replace(data($URL), '[ \n\t]', '')}</link>
                    ]]></xq-expression>
    
     
  • Selvin Fehric

    Selvin Fehric - 2012-08-01

    No problem. Code that I set is also working, so you can check it.

     

Log in to post a comment.