I'm having a problem where I'm scraping a table, where there is more then one page. I'm successfully finding the next link but when I try to load it the first table is retrieved again.
please help!
~~~~
<style>
table,th,td
{
border:1px solid black;
}
</style>
1) die('done');
echo "getting $url
";
if (isset($html))
$html->clear();
$html = file_get_html($url);
$url = '';
foreach($html->find('a') as $element){
if ($element->plaintext=='Next'){
$url = 'http://finance.yahoo.com' . $element->href;
break;
}
}
$count++;
echo "
$count
";
$tableouter = $html->find('table',4);
if (!$tableouter) continue;
$table = $tableouter->find('table',9);
if (!$table) continue;
echo '';
foreach($table->find('tr') as $tr) {
echo '
';
foreach($tr->find('td') as $td) {
echo '';
}
echo '';
}
echo '
| ';
echo $td->plaintext;
echo ' |
';
}
?>
test case
as is so often the case. As soon as I asked the question, the answer presented itself.
Found element had "&" instead of "&" in the url which was confusing the issue. str_replace fixed it.