Showing 30 open source projects for "crawler page"

View related business solutions
  • Our Free Plans just got better! | Auth0 by Okta Icon
    Our Free Plans just got better! | Auth0 by Okta

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your secuirty. Auth0 now, thank yourself later.
    Try free now
  • Bright Data - All in One Platform for Proxies and Web Scraping Icon
    Bright Data - All in One Platform for Proxies and Web Scraping

    Say goodbye to blocks, restrictions, and CAPTCHAs

    Bright Data offers the highest quality proxies with automated session management, IP rotation, and advanced web unlocking technology. Enjoy reliable, fast performance with easy integration, a user-friendly dashboard, and enterprise-grade scaling. Powered by ethically-sourced residential IPs for seamless web scraping.
    Get Started
  • 1
    ACHE Focused Crawler

    ACHE Focused Crawler

    ACHE is a web crawler for domain-specific search

    ACHE is a focused web crawler. It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    SiteOne Crawler (desktop app)

    SiteOne Crawler (desktop app)

    A free, feature-rich web analyzer and exporter/cloner you will love!

    A free in-depth website analyzer providing audits of security, performance, SEO, accessibility and other technical aspects. Available as a desktop application for Windows/macOS/Linux and as a CLI tool for advanced users and CI/CD processes. It also includes an offline web page exporter (website clone, mirror).
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Gerapy

    Gerapy

    Distributed Crawler Management Framework Based on Scrapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t run...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    TorBot

    TorBot

    Dark Web OSINT Tool

    Contributions to this project are always welcome. To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete. If its a new module, it should be put inside the modules directory. The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. On Linux platforms, you can make an executable for TorBot by using the install.sh script. You will need to give the script the correct permissions using chmod +x...
    Downloads: 5 This Week
    Last Update:
    See Project
  • The next chapter in business mental wellness Icon
    The next chapter in business mental wellness

    Entrust your employee well-being to Calmerry's nationwide network of licensed mental health professionals.

    Calmerry is beneficial for businesses of all sizes, particularly those in high-stress industries, organizations with remote teams, and HR departments seeking to improve employee well-being and productivity
    Learn More
  • 5
    Laravel Sitemap

    Laravel Sitemap

    Create and generate sitemaps with ease

    ... it in the callable you pass to hasCrawled. You can also instruct the underlying crawler to not crawl some pages by passing a callable to shouldCrawl. You can configure the crawler used by the sitemap generator. The sitemap generator can execute JavaScript on each page so it will discover links that are generated by your JS scripts. You can enable this feature by setting execute_javascript in the config file to true.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    LambdaHack

    LambdaHack

    Haskell game engine library for roguelike dungeon crawlers

    Haskell game engine library for roguelike dungeon crawlers. LambdaHack is a Haskell game engine library for ASCII roguelike games of arbitrary theme, size and complexity, with optional tactical squad combat. It's packaged together with a sample dungeon crawler in a quirky fantasy setting. To use the engine, you need to specify the content to be procedurally generated. You declare what the game world is made of (entities, their relations, physics and lore) and the engine builds the world...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    ScrapBot 1.40 64bits

    ScrapBot 1.40 64bits

    Task automation software for accessing and manipulating website data.

    ScrapBot is a task automation software that allows you to access, authenticate, extract, and insert data on any website. The software utilizes JavaScript to execute tasks, eliminating the need for server or additional software installations. The system can control the accessed webpage through JavaScript, and the entire navigation can be viewed in the program window. The main.js script runs in a separate frame from the navigation frame but can access all page content without any restrictions.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    magnetW

    magnetW

    Magnet link aggregation search

    ... such advertisements. This application is open source and free, and is only used for crawler technology exchange and learning. The search results are all from the source site, and no responsibility is assumed. The project complies with GNU General Public License v3.0. Online playback is performed in conjunction with the webtorrent desktop version. It needs to be downloaded separately. After clicking the online play, it will jump to webtorrent to add tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't lose...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Tigerpaw One | Business Automation Software for SMBs Icon
    Tigerpaw One | Business Automation Software for SMBs

    Fed up with not having the time, money and resources to grow your business?

    The only software you need to increase cash flow, optimize resource utilization, and take control of your assets and inventory.
    Learn More
  • 10
    crawler4j

    crawler4j

    Open source web crawler for Java

    crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not. In the above example, this example is not allowing .css, .js and media files and only allows pages within...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 12
    Catberry

    Catberry

    Catberry is an isomorphic framework

    ... dependencies and create plugins, and Flux, for the data layer. Search crawler receives a full page from the server. The whole state of the application is restored from URL. Server-side progressive rendering based on node.js streams and parallel rendering of components in a browser. The framework is well-tested (code coverage is about 90%) and it is already used in production.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Perl Web Scraping Project

    Perl Web Scraping Project

    Perl Web Scraping Project

    Web scraping (web harvesting or web data extraction) is data scraping used for extracting data from websites.[1] Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Pathfinder Wiki-fr Crawler

    Pathfinder Wiki-fr Crawler

    Tous les sorts, les monstres, les dons et les objets magiques en VF

    Toutes les infos viennent du http://www.pathfinder-fr.org/Wiki/Pathfinder-RPG.MainPage.ashx Le logiciel permet aussi la création de liste de sorts détaillé, d'exportation de de chaque type de données.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    WebCrawler

    get web page. include html、css and js files

    This tool is for the people who want to learn from a web site or web page,especially Web Developer.It can help get a web page's source code.Input the web page's address and press start button and this tool will find the page and according the page's quote,download all files that used in the page ,include css file and javascript files. The html file's name will be 'index.html' and other file's will use it's source name. Note:only support windows platform and http protocol.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    go_spider

    go_spider

    An awesome Go concurrent Crawler(spider) framework

    An awesome Go concurrent Crawler(spider) framework. The crawler is flexible and modular. It can be expanded to an Individualized crawler easily or you can use the default crawl components only. Spider gets a Request in Scheduler that has url to be crawled. Then Downloader downloads the result(html, json, jsonp, text) of the Request. The result is saved in Page for parsing in PageProcesser. Html parsing is based on goquery package. Json parsing is based on simple JSON package. Jsonp...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17

    sitecheck

    Modular web site spider for web developers.

    More than just a link checker, sitecheck is a website spider (also known as a crawler) which can assist with SEO by testing an entire site plus both inbound links from search engines and outbound links to other sites for the following issues: looping redirects (HTTP 301/302), broken links (HTTP 404), server errors (HTTP 500), spelling mistakes, low readability scores (using the Flesch Reading Ease test), missing/empty/duplicate meta tags, duplicate content, slow page speed, W3C validation...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18

    Domain Analyzer Security Tool

    Finds all the security information for a given domain name

    Domain analyzer is a security analysis tool which automatically discovers and reports information about the given domain. Its main purpose is to analyze domains in an unattended way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    A simple Crawler

    We can make a simple crawler with using Java Servlet & JSP . A crawl

    ... - HelloResult.class - Bfs.class - Queue.class - WebSource.class - [hw5] - [WEB-INF] - [classes] - [mvc] - index.html ( first page for crawler ) - web.xml ( the configuraion of all servlet ) - HelloController.java ( process the HTTP request and response ) - HelloModel.java ( main process and crawler , url match ) - HelloView.java ( show the result of crawler and search) - HelloResult.java ( show the search result)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    anonme.sh

    anonme.sh

    anonymous tools [uncontinued]

    anonme.sh {bash script} V1.0 Operative Systems Suported: Linux Dependencies: slowloris macchanger decrypter.py description of the script * this script makes it easy tasks such as DoS attacks, change you MAC address, inject XSS on target website, file upload vulns, MD5 decrypter, webcrawler (scan websites for vulns) and we can use WGET to download files from target domain or retrieve the all website... tutorial:http://www.youtube.com/watch?v=PrlrBuioCMc
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    This 5 generation selenium web crawler crawl through web page of a host website searching for static and dynamic links and able to detect honeypot links.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    This software will "crawl" pages for a given URL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    This game is a mouse-driven dungeon crawler. The player moves the PC around, buys equipment, and attacks enemies using the mouse. More info on the wiki page. Use git to get the code.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Crawler
    Crawler is a bare-bones spider designed to quickly and effectively build an index of all files and pages on a given Web site as well as the link relationship (both incoming and outgoing) between each page. More open source at https://github.com/fcc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Started as a MSc Project, it is a Tweet crawler and a search engine based on finding relationships to the found results with the help of a page graph generated by the crawling system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next