I'm trying to figure out how to list a few specific URLS and have them scraped to one file. For instance rather than in the beginning of the script have
<var-def name="products">
<call name="download-multipage-list">
<call-param name="pageUrl">http://shopping.yahoo.com/s:Digital%20Cameras:4168 -Brand=Canon:browsename=Canon%20Digital%20Cameras:refspaceid=96303108;_ylt=AnH w0Qy0K6smBU.hHvYhlUO8cDMB;_ylu=X3oDMTBrcDE0a28wBF9zAzk2MzAzMTA4BHNlYwNibmF2 </call-param>
<call-param name="nextXPath">//a/@href</call-param>
<call-param name="itemXPath">//li</call-param>
<call-param name="maxloops">10</call-param>
</call>
</var-def>
I'd like to be able to scrape specific url's of the yahoo page like(these are not the actual URL's)
http://shopping.yahoo.com/s:Digital Cameras:page 1
http://shopping.yahoo.com/s:Digital Cameras:page 5
http://shopping.yahoo.com/s:Digital Cameras:page 10
and then continue on export the data in the same format as the rest of the script which i pasted below.
Any help would be greatly appreciated.
<config charset="ISO-8859-1">
<include path="functions.xml"/>
<file action="write" path="canon/catalog.xml" charset="UTF-8">
<loop item="item" index="i">
<list></list>
<body>
<xquery>
<xq-param name="item" type="node()"></xq-param>
<xq-expression><![CDATA)
let $desc := data($item//*)
let $price := data($item//*)
return
<product>
<name>{normalize-space($name)}</name>
{normalize-space($desc)}
<price>{normalize-space($price)}</price>
</product>
]]></xq-expression>
</xquery>
</body>
</loop>
</file>
</config>
my cannon project is not working. its not creating the catalog.xml. Please help me.. need this urgently
Log in to post a comment.
I'm trying to figure out how to list a few specific URLS and have them scraped
to one file. For instance rather than in the beginning of the script have
<var-def name="products">
<call name="download-multipage-list">
<call-param name="pageUrl">http://shopping.yahoo.com/s:Digital%20Cameras:4168
-Brand=Canon:browsename=Canon%20Digital%20Cameras:refspaceid=96303108;_ylt=AnH
w0Qy0K6smBU.hHvYhlUO8cDMB;_ylu=X3oDMTBrcDE0a28wBF9zAzk2MzAzMTA4BHNlYwNibmF2
</call-param>
<call-param name="nextXPath">//a/@href</call-param>
<call-param name="itemXPath">//li</call-param>
<call-param name="maxloops">10</call-param>
</call>
</var-def>
I'd like to be able to scrape specific url's of the yahoo page like(these are
not the actual URL's)
http://shopping.yahoo.com/s:Digital
Cameras:page 1
http://shopping.yahoo.com/s:Digital
Cameras:page 5
http://shopping.yahoo.com/s:Digital
Cameras:page 10
and then continue on export the data in the same format as the rest of the
script which i pasted below.
Any help would be greatly appreciated.
<config charset="ISO-8859-1">
<include path="functions.xml"/>
<var-def name="products">
<call name="download-multipage-list">
<call-param name="pageUrl">http://shopping.yahoo.com/s:Digital%20Cameras:4168
-Brand=Canon:browsename=Canon%20Digital%20Cameras:refspaceid=96303108;_ylt=AnH
w0Qy0K6smBU.hHvYhlUO8cDMB;_ylu=X3oDMTBrcDE0a28wBF9zAzk2MzAzMTA4BHNlYwNibmF2
</call-param>
<call-param name="nextXPath">//a/@href</call-param>
<call-param name="itemXPath">//li</call-param>
<call-param name="maxloops">10</call-param>
</call>
</var-def>
<file action="write" path="canon/catalog.xml" charset="UTF-8">
<loop item="item" index="i">
<list></list>
<body>
<xquery>
<xq-param name="item" type="node()"></xq-param>
<xq-expression><![CDATA)
let $desc := data($item//*)
let $price := data($item//*)
return
<product>
<name>{normalize-space($name)}</name>
<price>{normalize-space($price)}</price>
</product>
]]></xq-expression>
</xquery>
</body>
</loop>
</file>
</config>
my cannon project is not working. its not creating the catalog.xml. Please
help me.. need this urgently