here is what im doing
im using
now the process of what i (will) do is:
now, this is my code for my thread class.
class ReadLinks extends Thread {
private $conn;
private $links;
private $fileObj;
public function __construct($conn, $links, $fileObj) {
//.. well just asign this to the global variables
}
public function run() {
try {
$this->logMsg("Start Reading Reviews");
$this->readLinks();
} catch (Exception $ex) {
$this->logMsg($ex);
}
$this->closeLog();
}
private function readLinks() {
$this->logMsg("Links");
foreach ($this->links as $link) {
$link = trim(preg_replace('/\s\s+/', ' ', $link));
$this->logMsg("Link: " . $link);
$html = html_readLink($link);
break;
}
}
private function logMsg($msg) {//something to write on the text file
}
private function closeLog() {//closes the textfile
}}
$conn - is my mysqli link to have db actions in the future
$links - is an array of links to be read.
$fileObj- is a resource return from fopen(). ( well to write into a file)
now who is that html_readlink,
its an outer function that is like this:
function html_readLink($link) {
return file_get_html($link);}
basically it is the resource returned by a function from simple html dom parser
now, i have as well a function that reads a link alone to do the other (different business requirement) and im using the simple html dom parser with ease.
with the pthreads, i tried writing the file(logs first) so that i can ensure that everything as well works fine.
about contacting db is not yet sure., well ill try to figure it out if it works fine, but logically it should work.
now when i called this class: its like this:
try {
$thread = new readLinks($conn, $Links, createlog());
if ($thread->start()) {
$thread->join();
} else {
echo "something i need to research if this happens";
}
} catch (Exception $err) {
echo $err; //something i need to research as well if this happens
}
i got this error
Warning: Invalid argument supplied for foreach() in C:\my\path\to\simplehtmldom_1_5\simple_html_dom.php on line 1119
that simplehtmldom code is:
function clear()
{
foreach ($this->nodes as $n) {$n->clear(); $n = null;}
// This add next line is documented in the sourceforge repository. 2977248 as a fix for ongoing memory leaks that occur even with the use of clear.
if (isset($this->children)) foreach ($this->children as $n) {$n->clear(); $n = null;}
if (isset($this->parent)) {$this->parent->clear(); unset($this->parent);}
if (isset($this->root)) {$this->root->clear(); unset($this->root);}
unset($this->doc);
unset($this->noise);
}
now that is the source code coming from simple html dom. that foreach is the one that is returning the error. now my other code using simple html dom doesn't have a problem with simple html dom. but with pthreads i got this error.
also, when i change my codes and didn't use pthreads, (had some revisions like this:
on pthreads class:
class ReadLinks {// extends Thread {
//insert other codes
public function readLinks() {
$this->logMsg("Links");
foreach ($this->links as $link) {
$link = trim(preg_replace('/\s\s+/', ' ', $link));
$this->logMsg("Link: " . $link);
$html = html_readLink($link);
$this->logMsg(getTitle($html));
//
break;
}
}
and change the way this is called like this:
try {
$thread = new ReadLinks($conn, $revLinks, createlog());
$thread->readLinks();
// if ($thread->start()) {
// $thread->join();
// } else {
// echo "something i need to research if this happens";
// }
} catch (Exception $err) {
echo $err; //something i need to debug and research if this happens
}
everything works fine, i get the desired results.
pthreads is something i need to use since loading bulk links and reading each of them is quite a time consuming process. and i need it to be on a separate thread. now i dont know whats wrong with these pthreads, or simple html dom parser. have i done something unnecessary/wrong? is there other way to do this?
anyone??
Thanks for including examples and explanations. You probably found an alternative solution by now, but I would still like to answer your question the best I can.
Unfortunately, I'm not familiar with pthreads. It might cause side-effects for the parser when it comes to memory management. These things get very complicated once code executes in parallel.
That said, I'm certain that the for loop breaks because your DOM doesn't have any nodes, which breaks the loop.
So, this line
should be
Here is an example to test the different behavior:
Note that the nodes list must be unset explicitly. This doesn't happen in your code provided above and I can only make assumptions to what could cause it. Adding a guard clause will prevent the issue from happening, so I'll add it to master anyway.
Fixed in master [edc3ac]
Related
Commit: [edc3ac]