Menu

ItemXpath iterations

Help
good
2006-11-20
2012-09-04
  • good

    good - 2006-11-20

    I am extracting datas from a website. I am able to extract datas. But the problem is it has three fields for each data. the last field can not be extracted. Even I debug it. The debug file is below. Why the span class is iterated and how to solve it
    <p/><p>
    <span class="Result10">S.s.air Travels(<b>chennai</b> )(p)ltd </span>
    <br/>
    <span class="Result3">6 Aziz Mulk 7Th Street Thousand Lights <b>Chennai</b> 600006 <br/>
    </span>
    <span class="Result2">Tel:-<i>044-28292077 </i>
    </span>
    </p><p>
    <span class="Result2">
    <i/>
    </span>
    </p><p>
    <span class="Result2">
    <i>
    <span class="Result10">
    <b>Chennai</b> Arun <b>Travel</b> Service </span>
    <br/>
    <span class="Result3">434 Ramaswamy Salai 600078 <b>Chennai</b>
    <br/>
    </span>
    <span class="Result2">Tel:-<i>044-24845327 </i>
    </span>
    </i>
    </span>
    </p><p>
    <span class="Result2">
    <i>
    <span class="Result2">
    <i/>
    </span>
    </i>
    </span>
    </p><p>
    <span class="Result2">
    <i>
    <span class="Result2">
    <i>
    <span class="Result10">S P Air Travels <b>Chennai</b> Pvt Ltd </span>
    <br/>
    <span class="Result3">43 44 Wellington Plaza Mount Road <b>Chennai</b> 600002 <br/>
    </span>
    <span class="Result2">Tel:-<i>044-28551821 </i>
    </span>
    </i>
    </span>
    </i>
    </span>

     
    • Cal

      Cal - 2006-11-20

      what's your xpath/query expression?

       
    • good

      good - 2006-11-20

      Myconfig file is:

              &lt;call-param name=&quot;pageUrl&quot;&gt;http://www.guruji.com/local?q=travel+chennai&lt;/call-param&gt;
              &lt;call-param name=&quot;nextXPath&quot;&gt;//a[.='Next']/@href&lt;/call-param&gt;
              &lt;call-param name=&quot;itemXPath&quot;&gt;//table[@class=&quot;LocalPageBG&quot;]/tr/td/p&lt;/call-param&gt;
              &lt;call-param name=&quot;maxloops&quot;&gt;3&lt;/call-param&gt;
          &lt;/call&gt;
      &lt;/var-def&gt;
      
        &lt;file action=&quot;write&quot; path=&quot;travel/catalog.xml&quot; charset=&quot;UTF-8&quot; &gt;
          &lt;![CDATA[ &lt;travel&gt; ]]&gt;
      

      <loop item="item" >
      <list><var name="products"/>
      </list>

              &lt;body&gt;
                  &lt;xquery&gt;
                      &lt;xq-param name=&quot;products&quot;&gt;&lt;var name=&quot;item&quot;/&gt;&lt;/xq-param&gt;
      
                      &lt;xq-expression&gt;&lt;![CDATA[
                             for $item in $products//p return 
                                  &lt;product&gt;
                                          &lt;name&gt;{normalize-space(data($item//span[@class='Result10']))}&lt;/name&gt;
                      &lt;address&gt;{normalize-space(data($item//span[@class='Result3']))}&lt;/address&gt;
                      &lt;phone&gt;{normalize-space(data($item/span[@class='Result2']))}&lt;/phone&gt;
      
                   &lt;/product&gt;
      
                      ]]&gt;&lt;/xq-expression&gt;
                  &lt;/xquery&gt;
              &lt;/body&gt;
      

      </loop>
      <![CDATA[ </travel> ]]>
      </file>

      </config>

       
    • Cal

      Cal - 2006-11-20

      dunno ... maybe try
      for $item in $products//p/span[@class='Result2']/span[@class='Result2']

       

Log in to post a comment.