Activity for PHPCrawl

  • Anonymous posted a comment on ticket #24

    The article on the [url=https://dailycaller.com/2024/07/05/the-best-cbd-gummies-for-pain-in-2024/ ]Best CBD Gummies For Pain[/url] was incredibly neighbourly exchange for navigating the substantial array of products available. It provided full reviews and comparisons, highlighting factors like potency, ingredients, and customer feedback. The involvement of pros and cons as a service to each product was particularly beneficial in making informed choices. Whether you're a late drug or looking to switch...

  • Anonymous posted a comment on discussion Help

    you the best

  • Anonymous posted a comment on ticket #24

    As someone who just tried CBD for the at the outset stretch, I be required to bring up I'm genuinely impressed! I've been hearing about CBD like [url=https://greenrevolutioncbd.com/cbd-products/water-soluble-cbd/ ]water soluble cbd oil[/url] in search a while on occasion, but I was a bit skeptical almost how it would feign me. I decided to evaluate a small administer of CBD oil to get a load of if it would relief with my dyed in the wool shoulder distress and desire, and the results were more positive...

  • Anonymous posted a comment on ticket #99

    ในเวลานี้ เชื่อว่าคนไทยนั้นต้องเผชิญกับปัญหาตกงานหรือมีปัญหาเรื่องของการเงินอย่างแน่แท้ เพราะว่าไม่ว่าจะเป็นเศรษฐกิจที่เสื่อมถอย ไม่สามารถปฏิบัติงานหารายได้ วันนี้ทางเว็บตรงไม่ผ่านเอเย่นต์เปิดใหม่มีวิธีการหาเงินง่ายๆกับการเล่นเกมสล็อตเว็บตรงที่เป็นเกมออนไลน์หาเงินได้จริง สามารถหาเงินออนไลน์ได้ทุกที่ทั้งวัน 24 ชั่วโมง ซึ่ง g2gbk8 เป็นสล็อต เว็บตรงไม่ผ่านเอเย่นต์ไม่มีขั้นต่ำมีมั่นคงถูกต้องตามกฎหมายคาสิโนสากล ประกันจากผู้เล่นจริงทั่วทั้งประเทศ เพื่อนๆสามารถหาเงินกล้วยๆ โดยทาง g2gbk8 นั้นมีค่ายเกมjoker...

  • James Shaver James Shaver posted a comment on discussion Help

    I have a github repo for phpcrawl that I'm using successfully in PHP8. It's installable by composer, or you can just download and install the package where you want. https://github.com/JamesShaver/phpcrawl

  • Anonymous posted a comment on discussion Help

    Maybe, but that's only your first error. Be prepared for other php8 differences that won't work.

  • Anonymous posted a comment on discussion Help

    So is it as simple as finding each() and changing each to another function? if so what would the new function be?

  • Anonymous posted a comment on discussion Help

    There's your problem.

  • Anonymous posted a comment on discussion Help

    PHP 8.1.7-1ubuntu3.1 (cli) (built: Nov 2 2022 13:39:03) (NTS) Copyright (c) The PHP Group Zend Engine v4.1.7, Copyright (c) Zend Technologies with Zend OPcache v8.1.7-1ubuntu3.1, Copyright (c), by Zend Technologies

  • Anonymous posted a comment on discussion Help

    What version of PHP are you using? https://www.php.net/manual/en/function.each.php

  • Anonymous posted a comment on discussion Help

    Hi All, I have downloaded and extracted PHPCrawl, and when i run php Example.php i get the following output. stu@stu:~/Downloads/PHPCrawl/PHPCrawl_083$ php example.php PHP Fatal error: Uncaught Error: Call to undefined function each() in /home/stu/Downloads/PHPCrawl/PHPCrawl_083/libs/UrlCache/PHPCrawlerMemoryURLCache.class.php:25 Stack trace: #0 /home/stu/Downloads/PHPCrawl/PHPCrawl_083/libs/PHPCrawler.class.php(588): PHPCrawlerMemoryURLCache->getNextUrl() #1 /home/stu/Downloads/PHPCrawl/PHPCrawl_083/libs/PHPCrawler.class.php(363):...

  • Anonymous posted a comment on discussion Help

    Ohhh my god!!!! More than 10 hours searching and more than a dozen of parameters trying to find a workaround and the most simple way was the answer. Thank you!!!

  • Anonymous posted a comment on discussion Help

    PHP7 support here: https://github.com/crispy-computing-machine/phpcrawl

  • Anonymous posted a comment on discussion Help

    PHP 7 supported in this version https://github.com/crispy-computing-machine/phpcrawl

  • Anonymous posted a comment on discussion Help

    Hi, due to your hint I was able to get this command working: $crawler->addURLFilterRule("/\/Category\/.+\/Search/"); Thanks for your help!

  • Anonymous posted a comment on discussion Help

    Thanks for your reply! I'll give it a try and let you know!

  • Anonymous posted a comment on discussion Help

    By the way, I didn't actually try it because I don't have those set up, but this is my example: https://www.regexpal.com/?fam=107703

  • Anonymous posted a comment on discussion Help

    You're close. The dot matches any (single) character except line breaks, but you need to match ALL of the characters. Try this: (\/Category\/.+\/Search)

  • Anonymous posted a comment on discussion Help

    I'm having an issue in creating the correct regex for this situation: The Crawler should ignore url's like: http://mysite.com/Category/ProductA/Search http://mysite.com/Category/ProductB/Search http://mysite.com/Category/ProductB/Search But it should crawl this url: http://mysite.com/Search I tried this: $crawler->addURLFilterRule("/Category\/.\/Search/"); and $crawler->addURLFilterRule("#/Category\/.\/Search/#"); But still the first 2 url's are also crawled. What would be the correct regex to prevent...

  • Anonymous posted a comment on discussion Help

    PHPCrawl doesn't natively use a MySQL database. It's up to you to define the schema and implement for your needs.

  • Anonymous posted a comment on discussion Help

    I'm a new user. I want to try PHPCrawl. I need to create the database and the tables needed for the program's function. I searched for all directories and files and nowhere I found the necessary function or at least the mysql query to create blank tables (search apod.) Can you please advise me where I could find the necessary information? Thank you.

  • Anonymous posted a comment on discussion Help

    Okey thanks. Lets work with the given data, but Google has crawler that can handle that! Otherwise thate would be cloaking and not allowed by Google.

  • Anonymous posted a comment on discussion Help

    If the page is loaded via javascript, you will NOT be able to do anything with it in PHP, other than fetch the url it's loaded from. Curl is not a browser and only fetches/pushes things for you. it's up to YOU to interpret the data that comes back, and that includes anything dynamic like JS.

  • Anonymous posted a comment on discussion Help

    Some pages do load content dynamicly via ajax for performance reasons. So when i open a Page like this https://www.zalando.at/damenbekleidung-hosen-shorts/ it shows a huge amount of products. But the crawler sees only 6 products! When you show the source code there are also only 6 products, the other one are loaded dynamic via js.

  • Anonymous posted a comment on discussion Help

    You ever find a solution ?

  • Anonymous posted a comment on discussion Help

    Go to pakagist, upload it there and install via composer

  • Anonymous posted a comment on discussion Help

    Thanks. It works.

  • Anonymous posted a comment on ticket #24

    IN PHPCrawlerHTTPRequest.class.php after if ($this->proxy != null) { I changed: $context_proxy = stream_context_create(array('ssl' => array( 'verify_peer_name'=>false, 'SNI_enabled' => false))); $this->socket = stream_socket_client('tcp://'.$this->proxy["proxy_host"].":".$this->proxy["proxy_port"], $error_code, $error_str, $this->socketConnectTimeout, STREAM_CLIENT_CONNECT, $context_proxy); Worked for me

  • Anonymous posted a comment on discussion Help

    thanks!

  • Anonymous created ticket #99

    SSL requests return 'HTTP/1.1 400 Bad request' in PHP 7

  • Anonymous posted a comment on discussion Help

    The 301 redirect is server side, so then there won't be any content to receive.

  • Anonymous posted a comment on discussion Help

    yes , actually requested page is having status of 301, and returning to subdomain url with https url. and i want to get the html of page

  • Anonymous posted a comment on discussion Help

    What have you tried? Getting the html from the 301 page would be nearly identical of a 200 page.

  • Anonymous posted a comment on discussion Help

    Hi, I need the html of page , where the http status has return 301 , for example http://www.test.com 301 https://www.test.com (i want html of this page) how to achive that , please need help regarding the same

  • Anonymous modified a comment on discussion Help

    I have enabled PDO-extnsion in my cpanel but didn't find PCNTL-extension, SEMAPHORE-extension. Please see this image

  • Anonymous modified a comment on discussion Help

  • Anonymous modified a comment on discussion Help

    I have found PDO-extension in my cpanel, but don't found PCNTL-extension, SEMAPHORE-extension. please see my screenshot... https://drive.google.com/file/d/1ZJGWEz8_X_0FI6wfTM6Lk3yOBXZi6HnP/view?usp=sharing

  • Anonymous posted a comment on discussion Help

    I have found PDO-extension in my cpanel, but don't found PCNTL-extension, SEMAPHORE-extension. please see my screenshot...

  • Anonymous posted a comment on discussion Help

    Yes cut use cli if you can

  • Anonymous posted a comment on discussion Help

    Is PHPCrawl run in unix-based(Linax) shared hosting ?

  • Anonymous posted a comment on discussion Help

    Do the multiprocess docs apply to you? Some PHP-extensions are required to successfully run phpcrawl in multi-process mode (PCNTL-extension, SEMAPHORE-extension, PDO-extension). For more details see the requirements page. The multi-process mode only works on unix/linux-based systems Scripts using phpcrawl with mutliple processes have to be run from the commandline (php CLI) Increasing the number of processes to very high values does't automatically mean that the crawling-process will go off faster!...

  • Anonymous posted a comment on discussion Help

    Yes. but it takes a long time.

  • Anonymous posted a comment on discussion Help

    Have you tried it without multiprocess?

  • Anonymous modified a comment on discussion Help

    This is my code. It not works. When i run this, it returns nothing. No results or no errors. I don't know where is my fault. Please help me. <?php include("libs/PHPCrawler.class.php"); class MyCrawler extends PHPCrawler { function handleDocumentInfo($DocInfo) { if (PHP_SAPI == "cli") $lb = "\n"; else $lb = "<br />"; // Print the URL and the HTTP-status-Code echo "Page requested: ".$DocInfo->url." (".$DocInfo->http_status_code.")".$lb; // Print the refering URL echo "Referer-page: ".$DocInfo->referer_url.$lb;...

  • Anonymous posted a comment on discussion Help

    This is my code. It not works. When i run this, it returns nothing. No results or no errors. I don't know where is my fault. Please help me. "; // Print the URL and the HTTP-status-Code echo "Page requested: ".$DocInfo->url." (".$DocInfo->http_status_code.")".$lb; // Print the refering URL echo "Referer-page: ".$DocInfo->referer_url.$lb; // Print if the content of the document was be recieved or not if ($DocInfo->received == true) echo "Content received: ".$DocInfo->bytes_received." bytes".$lb; else...

  • Anonymous posted a comment on discussion Help

    can you please give a demo code

  • Anonymous posted a comment on discussion Help

    can you please give a demo code.

  • Anonymous posted a comment on discussion Help

    change /libs/UrlCache/PHPCrawlerSQLiteURLCache.class.php line 273 to //if ($this->PreparedInsertStatement == null) { and line 285 to //} then it works

  • Anonymous posted a comment on discussion Help

    Use the cli version with multiprocessing.

  • Anonymous posted a comment on discussion Help

    I tried to grab all links from https://tangailpratidin.com/ . but it takes a lot of time. Anyone can help to improve runtime of crawling or any idea can give. thanks

  • Anonymous modified a comment on discussion Help

    echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>";

  • Anonymous posted a comment on ticket #80

    couple of years later but it works for me too. :-)

  • Anonymous posted a comment on discussion Help

    echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>";

  • Anonymous posted a comment on discussion Help

    echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>";

  • Anonymous posted a comment on discussion Help

    Is this above code is still works? how to get data from it?

  • Anonymous modified a comment on discussion Help

    Mostly copy/pasted from the example given.... <?php // It may take a whils to crawl a site ... set_time_limit(10000); // Inculde the phpcrawl-mainclass include("libs/PHPCrawler.class.php"); // Extend the class and override the handleDocumentInfo()-method class MyCrawler extends PHPCrawler { function handleDocumentInfo($DocInfo) { echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>"; flush(); } } // Now, create a instance of your class, define the behaviour // of the crawler (see class-reference...

  • Anonymous posted a comment on discussion Help

    Mostly copy/pasted from the example given.... <?php // It may take a whils to crawl a site ... set_time_limit(10000); // Inculde the phpcrawl-mainclass include("libs/PHPCrawler.class.php"); // Extend the class and override the handleDocumentInfo()-method class MyCrawler extends PHPCrawler { function handleDocumentInfo($DocInfo) { echo "<pre>" . print_r(get_headers($PageInfo->url), TRUE) . "</pre>"; flush(); } } // Now, create a instance of your class, define the behaviour // of the crawler (see class-reference...

  • Anonymous posted a comment on discussion Help

    am using the example proj how can i crawl entire domain and and return content heders e.g text/xml

  • Cipher Cipher posted a comment on ticket #39

    *mind

  • Cipher Cipher created ticket #39

    GitHub Commit

  • Anonymous posted a comment on ticket #97

    Same Problem with PHP 7

  • Anonymous posted a comment on discussion Help

    This is a good feature that I'm also looking for.

  • Anonymous posted a comment on discussion Help

    All I'm trying to say is that you need to first scrape the token value and then post the token in your post_data array. There's a bunch of ways to do that...

  • Anonymous posted a comment on discussion Help

    Thanks. I have found this which seems more related to my question: https://stackoverflow.com/questions/36198970/does-using-csrf-form-tokens-help-spam-prevention Curiously, the website I am targetting has no CAPTCHA on the form.

  • Anonymous posted a comment on discussion Help

    Maybe a good read.... https://stackoverflow.com/questions/6412813/do-login-forms-need-tokens-against-csrf-attacks

  • Anonymous posted a comment on discussion Help

    However I know that other bots are connecting and interacting with the website.

  • Anonymous posted a comment on discussion Help

    That's actually the point of csrf, to prevent you from doing this. Good luck...

  • Anonymous posted a comment on discussion Help

    Well, I do not know how to do it as the CSRF seems to be embedded on the cookie itself. I simply do not understand how this all fits together... General Request URL:https://www.interpals.net/app/auth/login Request Method:POST Status Code:302 Remote Address:104.20.197.2:443 Referrer Policy:no-referrer-when-downgrade Response Headers cache-control:no-cache cache-control:no-store, no-cache, must-revalidate cf-ray:3a974ce47ead2f4d-MAD content-type:text/html; charset=UTF-8 date:Fri, 06 Oct 2017 08:37:44...

  • Anonymous posted a comment on discussion Help

    Yes, it does matter. You need to post the csrf_token too.

  • David Díez David Díez posted a comment on discussion Help

    I want to use PHP Crawler to trigger artificial hits on user profiles in https://www.interpals.net but I have been unsuccessful so far. I added this code to the example.php file, among some other small modifications: // Login $post_data = array( "username" => "myusername", "password" => "mypassword", "submit" => "Sign in"); $crawler->addPostData("#https://www.interpals.net/app/auth/login#", $post_data); The crawler runs, but the network log is not showing any POST activity and the website pages are...

  • Anonymous posted a comment on discussion Help

    thank you boss

  • Anonymous posted a comment on ticket #98

    ...bug has to do with Apache sometimes setting Transfer-Encoding: chunked see: http://www.trigon-film.org/en/movies/Centaur/photos/large/Centaur_02.jpg I was able to get around the problem by forcing the crawler to HTTP 1.0 by setting $crawler->setHTTPProtocolVersion(PHPCrawlerHTTPProtocols::HTTP_1_0);

  • Anonymous posted a comment on discussion Help

    help

  • Anonymous posted a comment on discussion Help

    http://phpcrawl.cuab.de/classreferences/PHPCrawler/method_detail_tpl_method_excludeLinkSearchDocumentSections.htm

  • Anonymous posted a comment on discussion Help

    Is there a way to configure the crawler to return only what is between the <body> tags as the document info. (I don't need anything in the head and the css etc..)

  • Anonymous posted a comment on discussion Help

    The crawler don't use curl, the function setProxy recall this function: stream_socket_client($this->proxy["proxy_host"].":".$this->proxy["proxy_port"], $error_code, $error_str,$this->socketConnectTimeout, STREAM_CLIENT_CONNECT); i try to exec this with socks5 but the error is: “unable to connect to socks5://127.0.0.1:1080 (Unable to find the socket transport "socks5" - did you forget to enable it when you configured PHP?)”

  • Anonymous posted a comment on discussion Help

    What have you tried?

  • Anonymous posted a comment on discussion Help

    Search for only PDF do not work.

  • Anonymous posted a comment on discussion Help

    yes in this way i can hit the proxy but how can i pass it to the crawler?

  • James Shaver James Shaver posted a comment on discussion Help

    Have you tried hitting the proxy outside of phpcrawl? $url = "https://google.com"; $proxy = "127.0.0.1:1080"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL,$url); curl_setopt($ch, CURLOPT_PROXY, $proxy); curl_setopt($ch, CURLOPT_PROXYTYPE, CURLPROXY_SOCKS5); curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0); curl_setopt($ch, CURLOPT_HEADER, 1); curl_exec($ch); curl_close($ch);

  • Anonymous modified a comment on discussion Help

    This don't run for me, i have curl 7.47 but if i try this into my code raise me this error : Unable to connect to proxy 'socks5://127.0.0.1' on port '1080'.

  • Anonymous posted a comment on discussion Help

    This don't run for me, i have curl 7.47 but if i try this into my code rise me this error : Unable to connect to proxy 'socks5://127.0.0.1' on port '1080'.

  • James Shaver James Shaver posted a comment on discussion Help

    Hi, Sure you can. Use phpcrawl to generate the URLs, then write a script of your own to parse the pages for the data you want. I would suggest using the HTML DOM: $html = str_get_html($file_contents); $elem = $html->find('div[id=content]', 0);

  • Anonymous posted a comment on discussion Help

    Hello, I would like to crawl a website and retrieve two values (for example the contents of div 1 and div 2) for each url and export these 3 fields into a csv, each line corresponding to a url and these two values. How can this be achieved ? Thank you in advance for your help.

  • Anonymous posted a comment on discussion Help

    Sometimes I should read the manual before asking. It's $DocInfo->host what I was looking for. http://phpcrawl.cuab.de/classreferences/PHPCrawlerDocumentInfo/overview.html

  • Anonymous posted a comment on discussion Help

    Within the handeDocumentInfo how can I get the URL to crawl value again? I'm saving all entries to my DB but can't figure out how I can get this value again within this loop ... thx

  • James Shaver James Shaver posted a comment on discussion Help

    Remember grade school math? It would help us see the issue if you show your work.

  • Anonymous posted a comment on ticket #24

    Same Problem as Vinay. Has anyone a solution?

  • Anonymous created ticket #38

    How to print the average of cookies

  • Anonymous posted a comment on discussion Help

    Hi All, I have a issued about using php crawler i try in it first but said "Content...

  • Anonymous posted a comment on discussion Help

    and i'm looking for a way to download all torrents from a site.. but i don't find...

  • Anonymous modified a comment on discussion Help

    I'm trying to download all .mp3 files from a website. This website can have its mp3...

  • Anonymous posted a comment on discussion Help

    I'm trying to download all .mp3 files from a website. This website can have its mp3...

  • Anonymous posted a comment on ticket #36

    FWIW, that's a copy "for using with composer" and may not be from the owner

  • Anonymous created ticket #37

    put link priorities also on linktext as well

  • Anonymous posted a comment on discussion Help

    ex: I want to set addLinkPriority on contact pages. contact Contact is not included...

  • Anonymous posted a comment on discussion Help

    thx you. verygood

  • Anonymous posted a comment on discussion Help

    great!

  • Anonymous posted a comment on discussion Help

    Hello, after upgrading php to 7.0 it finds no links. Content of the index page is...

  • Anonymous posted a comment on discussion Help

    Hello, sorry for this question, its probably stupid. I am a newbie. I want the crawler...

1 >