Showing 682 open source projects for "ex-crawler"

View related business solutions
  • Cloud-Based Speech Analytics Solutions Icon
    Cloud-Based Speech Analytics Solutions

    Transform Your QA with the Speech Analytics Experts

    CallFinder’s speech analytics software automates outdated, manual QA processes to save time and provide immediate insights so you can make data-driven decisions. Spend your valuable time coaching agents on what matters most to you and your customers.
  • Best Ad Builder and Advertising Platform Icon
    Best Ad Builder and Advertising Platform

    For media publishers, agencies and brands in need of a solution to build their ads and create landing pages

    Our self-service ad builder and landing page creation studio serve as bespoke advertising solutions for media platforms.
  • 1
    ACHE Focused Crawler

    ACHE Focused Crawler

    ACHE is a web crawler for domain-specific search

    ACHE is a focused web crawler. It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    File System Crawler for Elasticsearch

    File System Crawler for Elasticsearch

    Elasticsearch File System Crawler (FS Crawler)

    This crawler helps to index binary documents such as PDF, Open Office, MS Office. Local file system (or a mounted drive) crawling and indexing new files, updating existing ones, and removing old ones. Remote file system over SSH/FTP crawling. REST interface to let you “upload” your binary documents to elastic search.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    Cryptocurrency Exchange Feed Handler

    Cryptocurrency Exchange Feed Handler

    Cryptocurrency Exchange Websocket data feed handler

    Handles multiple cryptocurrency exchange data feeds and returns normalized and standardized results to client registered callbacks for events like trades, book updates, ticker updates, etc. Utilizes WebSockets when possible, but can also poll data via REST endpoints if a WebSocket is not provided. Create a FeedHandler object and add subscriptions. For the various data channels that an exchange supports, you can supply callbacks for data events, or use provided backends to handle the data for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    EasySpider

    EasySpider

    A visual no-code/code-free web crawler/spider

    A visual code-free/no-code web crawler/spider, supporting both Chinese and English.
    Downloads: 44 This Week
    Last Update:
    See Project
  • The all-in-one software to manage your HOA Icon
    The all-in-one software to manage your HOA

    PayHOA is association management software for HOAs, COAs, and any community looking for robust, easy-to-use, affordable software

    PayHOA is modern software for self-managed communities. Trusted by more than 11,000 associations, PayHOA automates operations, so you can focus on what's important. We provide free onboarding, free unlimited support, a 30-day free trial, and no contracts. Claim your free trial and see why HOAs across the country are choosing PayHOA.
  • 5
    LosslessCut

    LosslessCut

    The swiss army knife of lossless video/audio editing

    ... losing quality. Or you can add a music or subtitle track to your video without needing to encode. Everything is extremely fast because it does an almost direct data copy, fueled by the awesome FFmpeg which does all the grunt work. Lossless merge/concatenation of arbitrary files (with identical codecs parameters, e.g. from the same camera). Lossless stream editing: Combine arbitrary tracks from multiple files (ex. add music or subtitle track to a video file).
    Downloads: 300 This Week
    Last Update:
    See Project
  • 6
    TensorFlow

    TensorFlow

    TensorFlow is an open source library for machine learning

    Originally developed by Google for internal use, TensorFlow is an open source platform for machine learning. Available across all common operating systems (desktop, server and mobile), TensorFlow provides stable APIs for Python and C as well as APIs that are not guaranteed to be backwards compatible or are 3rd party for a variety of other languages. The platform can be easily deployed on multiple CPUs, GPUs and Google's proprietary chip, the tensor processing unit (TPU). TensorFlow...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 7
    Goutte

    Goutte

    Goutte, a simple PHP Web Scraper

    Goutte is a screen scraping and web crawling library for PHP. Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses. Goutte depends on PHP 7.1+. Add fabpot/goutte as a require dependency in your composer.json file. Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser). Make requests with the request() method. The method returns a Crawler object (Symfony\Component\DomCrawler\Crawler). To use your own HTTP settings, you may...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 8
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Tesseract.js

    Tesseract.js

    A pure Javascript Multilingual OCR

    Tesseract.js is a pure Javascript port of the popular Tesseract OCR engine. Tesseract.js' library supports more than 100 languages, automatic text orientation and script detection, a simple interface for reading paragraph, word, and character bounding boxes. Tesseract.js can run either in a browser and on a server with NodeJS. Tesseract.js is a javascript library that gets words in almost any spoken language out of images. The main Tesseract.js functions (ex. recognize, detect) take an image...
    Downloads: 12 This Week
    Last Update:
    See Project
  • All-in-One Business Property Management Platform Icon
    All-in-One Business Property Management Platform

    We enable hoteliers to automate heaps of daily tasks, bridge gaps in guest service, and run profitable businesses from anywhere

    HotelFriend is a cloud-based hotel management software. It covers the organization of work, direct room & service sales, simplifies the processes of serving guests, making their stay at the hotel even more comfortable. It gives an ability to manage the sales online from any device in any part of the world, analyze guest behavior and increase the income.
  • 10
    TorBot

    TorBot

    Dark Web OSINT Tool

    Contributions to this project are always welcome. To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete. If its a new module, it should be put inside the modules directory. The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. On Linux platforms, you can make an executable for TorBot by using the install.sh script. You will need to give the script the correct permissions using chmod +x...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 11
    Solar2D Game Engine

    Solar2D Game Engine

    Solar2D Game Engine main repository (ex Corona SDK)

    This is a fully open source project that is forked of the well-established and widely used Corona SDK game engine, which is no longer commercially supported. Development is lead by Vlad Shcherban, former technical lead engineer at Corona Labs Inc. If you are using this engine, consider supporting its development. Develop for mobile, desktop, and connected TV devices with just one code base: iOS, tvOS, Android, Android TV, macOS, Windows, Linux, or HTML5. Update your code, save the changes,...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    miniblink49

    miniblink49

    Lighter, faster browser kernel of blink to integrate HTML UI in apps

    ... electron). Customize as you wish, simulate another browser environment. Perfect HTML5 support, friendly to various front-end libraries (support HTML5, and friendly to front framework). After turning off the cross-domain switch, you can use various cross-domain functions (support cross-domain). Headless mode, which greatly saves resources and is suitable for crawlers (headless mode, be suitable for Web Crawler).
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    crawley

    crawley

    The unix-way web crawler

    Crawls web pages and prints any link it can find. Fast HTML SAX-parser (powered by golang.org/x/net/html) Small (below 1500 SLOC), idiomatic, 100% test-covered codebase. Grabs most of useful resources URLs (pics, videos, audios, forms, etc...) Found URLs are streamed to stdout and guaranteed to be unique (with fragments omitted) Scan depth (limited by starting host and path, by default - 0) can be configured. Can crawl rules and sitemaps from robots.txt. Brute mode - scan HTML comments for...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Quod Libet

    Quod Libet

    Music player and music library manager for Linux, Windows, and macOS

    Quod Libet is a cross-platform audio/music management program. It provides many ways to view your local library, and supports streaming audio and feeds (podcasts, etc). It has extremely flexible metadata editing and searching capabilities. With over 90 plugins included, you can extend and integrate with almost anything, or write your own! Ex Falso is a bare-bones tag editor with the same editing interface as Quod Libet. Quod Libet is a GTK+-based audio player written in Python, using...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Gerapy

    Gerapy

    Distributed Crawler Management Framework Based on Scrapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t run...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Nebula libp2p DHT

    Nebula libp2p DHT

    A libp2p DHT crawler, monitor, and measurement tool

    A libp2p DHT crawler and monitor that tracks the liveness of peers. The crawler connects to DHT bootstrap peers and then recursively follows all entries in their k-buckets until all peers have been visited. The crawler supports the IPFS, Filecoin, Polkadot, Kusama, Rococo, Westend networks and more. The crawler can store its results as JSON documents or in a postgres database - the --dry-run flag prevents it from doing either. Nebula will print a summary of the crawl at the end instead. A crawl...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    crwlr

    crwlr

    Library for Rapid (Web) Crawler and Scraper Development

    This library provides kind of a framework and a lot of ready-to-use, so-called steps, that you can use as building blocks, to build your own crawlers and scrapers with. Before diving into the library, let's have a look at the terms crawling and scraping. For most real-world use cases, those two things go hand in hand, which is why this library helps with and combines both. A (web) crawler is a program that (down)loads documents and follows the links in it to load them as well. A crawler could...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Proxy_Pool

    Proxy_Pool

    Python crawler proxy IP pool (proxy pool)

    The main function of the crawler agent IP pool project is to regularly collect free agents published on the Internet for verification and storage, and to regularly verify and store agents to ensure the availability of agents, and to provide API and CLI. At the same time, you can also expand the proxy source to increase the quality and quantity of the proxy pool IP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Laravel Sitemap

    Laravel Sitemap

    Create and generate sitemaps with ease

    ... it in the callable you pass to hasCrawled. You can also instruct the underlying crawler to not crawl some pages by passing a callable to shouldCrawl. You can configure the crawler used by the sitemap generator. The sitemap generator can execute JavaScript on each page so it will discover links that are generated by your JS scripts. You can enable this feature by setting execute_javascript in the config file to true.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    ngx_waf

    ngx_waf

    Handy, High performance, ModSecurity compatible Nginx firewall module

    Handy, High-performance Nginx firewall module. Such as black and white list of IPs or IP range, uri black and white list, and request body black list, etc. Directives and rules are easy to write and readable. The IP detection is a constant-time operation. Most of the remaining inspections use caching to improve performance. Compatible with ModSecurity's rules, you can use OWASP ModSecurity Core Rule Set. Supports verifying Google, Bing, Baidu and Yandex crawlers and allowing them...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    HyperDX

    HyperDX

    An open source observability platform unifying session replays & logs

    HyperDX helps engineers figure out why production is broken faster by centralizing and correlating logs, metrics, traces, exceptions and session replays in one place. An open-source and developer-friendly alternative to Datadog and New Relic. The HyperDX stack ingests, stores, and searches/graphs your telemetry data. After standing up the Docker Compose stack, you'll want to instrument your app to send data over to HyperDX.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    ShowCaseView

    ShowCaseView

    Flutter plugin that allows you to showcase features on flutter app

    A Flutter package allows you to Showcase/Highlight your widgets step by step. Auto Scrolling to active showcase feature will not work properly in scroll views that renders widgets on demand(ex, ListView, GridView). In order to scroll to a widget it needs to be attached with widget tree. So, If you are using a scrollview that renders widgets on demand, it is possible that the widget on which showcase is applied is not attached in widget tree. So, flutter won't be able to scroll to that widget...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    LambdaHack

    LambdaHack

    Haskell game engine library for roguelike dungeon crawlers

    Haskell game engine library for roguelike dungeon crawlers. LambdaHack is a Haskell game engine library for ASCII roguelike games of arbitrary theme, size and complexity, with optional tactical squad combat. It's packaged together with a sample dungeon crawler in a quirky fantasy setting. To use the engine, you need to specify the content to be procedurally generated. You declare what the game world is made of (entities, their relations, physics and lore) and the engine builds the world...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next