Menu

Problem email crawling and parsing all urls

Help
Anonymous
2013-03-27
2013-04-09
  • Anonymous

    Anonymous - 2013-03-27

    Hello,
    I have searched a lot but i couldn't find the answer i was looking for.
    Actually i am trying to handle the urls that the phpcrawl crawls and search them one by one for emails and other content.
    So i think there are two ways either to take the urls and store them in a db and them parse them,but i tried takes a lot of hour and it's not so good.
    And the second to parse the urls the same time that they are crawled search for emails and if found to put them in a db table or in an array.
    I tried like that,but it's wrong

    <?php
    function email_crawl($url) {
        if (isset($url) && !empty($url)) {
            // fetch data from specified url
            $text = file_get_contents($url);
        }
        // parse emails
        if (!empty($text)) {
            $res = preg_match_all(
                    "/[a-z0-9]+([_\\.-][a-z0-9]+)*@([a-z0-9]+([\.-][a-z0-9]+)*)+\\.[a-z]{2,}/i", $text, $matches
            );
            if ($res) {
                foreach (array_unique($matches[0]) as $email) {
                    echo $email . "<br />";
                }
            } else {
                echo "No emails found.";
            }
        }
    }
    $database = "Crawl";
    $mysql_user = "root";
    $mysql_password = "root";
    $mysql_host = "localhost";
    $mysql_table_prefix = "";
    $success = mysql_pconnect($mysql_host, $mysql_user, $mysql_password);
    if (!$success)
        die("<b>Cannot connect to database, check if username, password and host are correct.</b>");
    $success = mysql_select_db($database);
    if (!$success) {
        print "<b>Cannot choose database, check if database name is correct.";
        die();
    }
    $SQLCommand = "SELECT url FROM links";
    $result = mysql_query($SQLCommand); // This line executes the MySQL query that you typed above
    $emails = array(); // make a new array to hold all your data
    $index = 0;
    while ($row = mysql_fetch_assoc($result)) { // loop to give you the data in an associative array so you can use it however.
        $emails[$index]->email_crawl($row);
        $index++;
    }
    ?>
    

    Hope you will help me out.Thanks.

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2013-04-08

    Hi!

    Shouldn't be a problem with phpcrawl, just parse the content of a page phpcrawl found and put  your extraced data
    into a DB-table.

    But you code you posted doesn't have to do anything with phpcrawl as far as i can see.

    Please understand that i can't provide support or help for general PHP questions or problems that dont refer to phpcrawl.

    But i will try to help if you have a concrete phpcrawl-question.

    Thanks

     
  • Anonymous

    Anonymous - 2013-04-08

    Hello,
    Thanks for the response. You have absolutely right you maybe can delete the post or the admins.
    It is not useful anymore,totally useless. I have made a completely new implementation that has nothing to do with that.
    I was wrong,of course i also have problems with my new try but that's another thing.
    Thank you again!

     
  • Uwe Hunfeld

    Uwe Hunfeld - 2013-04-08

    Ok, just let me know if you have questions and/or problems regarding phpcrawl in your new implentation.

    Best Regards!

     
  • Anonymous

    Anonymous - 2013-04-08

    Ok thank you very much!

    Best Regards

     

Anonymous
Anonymous

Add attachments
Cancel





Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.