Showing 48 open source projects for "scrape"

View related business solutions
  • SKUDONET Open Source Load Balancer Icon
    SKUDONET Open Source Load Balancer

    Take advantage of Open Source Load Balancer to elevate your business security and IT infrastructure with a custom ADC Solution.

    SKUDONET ADC, operates at the application layer, efficiently distributing network load and application load across multiple servers. This not only enhances the performance of your application but also ensures that your web servers can handle more traffic seamlessly.
  • EBizCharge Payment Platform for Accounts Receivable Icon
    EBizCharge Payment Platform for Accounts Receivable

    Getting paid has never been easier.

    Don’t let unpaid invoices limit your business’s growth. EBizCharge plugs directly into the tools your business already uses to speed up payment collection.
  • 1
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 34 This Week
    Last Update:
    See Project
  • 3
    DocSearch

    DocSearch

    The easiest way to add search to your documentation

    ... with the interaction patterns of each OS. We scrape your documentation or technical blog, configure the Algolia application and send you the snippet you'll have to integrate. It's that simple. You don't need to configure any settings or even have an Algolia account. We take care of this for you! We'll send you a small snippet to integrate DocSearch to your website and an invite to your fully configured Algolia application.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    JMX Exporter

    JMX Exporter

    A process for exposing JMX Beans via HTTP for Prometheus consumption

    JMX to Prometheus exporter: a collector that can configurable scrape and expose mBeans of a JMX target. This exporter is intended to be run as a Java Agent, exposing a HTTP server and serving metrics of the local JVM. It can be also run as a standalone HTTP server and scrape remote JMX targets, but this has various disadvantages, such as being harder to configure and being unable to expose process metrics (e.g., memory and CPU usage). Running the exporter as a Java agent is strongly encouraged.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Cybersecurity Management Software for MSPs Icon
    Cybersecurity Management Software for MSPs

    Secure your clients from cyber threats.

    Define and Deliver Comprehensive Cybersecurity Services. Security threats continue to grow, and your clients are most likely at risk. Small- to medium-sized businesses (SMBs) are targeted by 64% of all cyberattacks, and 62% of them admit lacking in-house expertise to deal with security issues. Now technology solution providers (TSPs) are a prime target. Enter ConnectWise Cybersecurity Management (formerly ConnectWise Fortify) — the advanced cybersecurity solution you need to deliver the managed detection and response protection your clients require. Whether you’re talking to prospects or clients, we provide you with the right insights and data to support your cybersecurity conversation. From client-facing reports to technical guidance, we reduce the noise by guiding you through what’s really needed to demonstrate the value of enhanced strategy.
  • 5
    rvest

    rvest

    Simple web scraping for R

    rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Elasticsearch Exporter

    Elasticsearch Exporter

    Elasticsearch stats exporter for Prometheus

    Prometheus exporter for various metrics about Elasticsearch, written in Go. The exporter fetches information from an Elasticsearch cluster on every scrape, therefore having a too short scrape interval can impose load on ES master nodes, particularly if you run with --es.all and --es.indices. We suggest you measure how long fetching /_nodes/stats and /_all/_stats takes for your ES cluster to determine whether your scraping interval is too short. As a last resort, you can scrape this exporter...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    jsoup

    jsoup

    Java library for working with real-world HTML

    jsoup is a Java library for working with real-world HTML. It provides a very convenient API for fetching URLs and extracting and manipulating data, using the best of HTML5 DOM methods and CSS selectors. jsoup implements the WHATWG HTML5 specification, and parses HTML to the same DOM as modern browsers do. jsoup is designed to deal with all varieties of HTML found in the wild; from pristine and validating, to invalid tag-soup; jsoup will create a sensible parse tree. The parser will make...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Automa

    Automa

    A chrome extension for automating your browser by connecting blocks

    .... There're dozens of workflows been shared by Automa users which you can add and customize. Auto-fill forms, do a repetitive task, take a screenshot, or scrape website data, the choice is yours. You can even schedule when the automation will execute! Browse the Automa marketplace where you can share and download workflows with others.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Artisan View

    Artisan View

    Manage your views in Laravel projects through artisan

    This package adds a handful of view-related commands to Artisan in your Laravel project. Generate blade files that extend other views, scaffold out sections to add to those templates, and more. All from the command line we know and love.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Recruit and Manage your Workforce Icon
    Recruit and Manage your Workforce

    Evolia makes it easier to hire, schedule and track time worked by frontline in medium and large-sized businesses.

    Evolia is a web and mobile platform that connects enterprises with 1000’s of local shift workers and offers free workforce scheduling and time and attendance solutions. Is your business on Evolia?
  • 10
    Linkedin Scraper

    Linkedin Scraper

    A library that scrapes Linkedin for user data

    Linkedin Scraper is a library that scrapes Linkedin for user data. Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper. The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    crawlee

    crawlee

    A web scraping and browser automation library for Node.js

    ... that make your crawlers look human-like. It's not unblockable, but it will save you money in the long run. Crawlee is built by people who scrape for a living and use it every day to scrape millions of pages. Meet our community on Discord. We believe websites are best scraped in the language they're written in. Crawlee runs on Node.js and it's built in TypeScript to improve code completion in your IDE, even if you don't use TypeScript yourself.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    URS (Universal Reddit Scraper)

    URS (Universal Reddit Scraper)

    A comprehensive Reddit scraping command-line tool written in Python

    Universal Reddit Scraper, a comprehensive Reddit scraping command-line tool written in Python. Whether you are using URS for enterprise or personal use, I am very interested in hearing about your use case and how it has helped you achieve a goal. This is a comprehensive Reddit scraping tool that integrates multiple features. All files except for those generated by the wordcloud tool are exported to JSON by default. Wordcloud files are exported to PNG by default. All exported files are saved...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Prometheus Redis Metrics Exporter

    Prometheus Redis Metrics Exporter

    Prometheus Exporter for Redis Metrics. Supports Redis 2.x, 3.x, 4.x, 5

    ... for the Redis instances then you can set the password via the --redis.password command line option of the exporter (this means you can currently only use one password across the instances you try to scrape this way. Use several exporters if this is a problem). If your Redis instance requires authentication then there are several ways how you can supply a username (new in Redis 6.x with ACLs) and a password.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Soketi

    Soketi

    Just another simple, fast, and resilient open-source WebSockets server

    Ever dreamed about Serverless WebSockets? Soketi can be deployed to Cloudflare Workers. All around the world, closer to your users. Same Pusher protocol. Powered by Cloudflare's Durable Objects and KV, you can achieve great speeds at edge for your users.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Rod

    Rod

    A Devtools driver for web automation and scraping

    Rod is a high-level driver for DevTools Protocol. It's widely used for web automation and scraping. Rod can automate most things in the browser that can be done manually. Chained context design, intuitive to timeout or cancel the long-running task. Auto-wait elements to be ready. Debugging friendly, auto input tracing, remote monitoring headless browser. Thread-safe for all operations. Automatically find or download browser. High-level helpers like WaitStable, WaitRequestIdle,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Roach

    Roach

    The complete web scraping toolkit for PHP

    Roach is a complete web scraping toolkit for PHP. It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP. Roach doesn’t depend on a specific framework. Instead, you can use the core package...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Instagram Scraper

    Instagram Scraper

    Scrapes an instagram user's photos and videos

    instagram-scraper is a command-line application written in Python that scrapes and downloads an instagram user's photos and videos. Use responsibly. To scrape a private user's media you must be an approved follower. Providing username and password is optional, if not supplied the scraper runs as a guest. In this case all private user's media will be unavailable. All user's stories and high-resolution profile pictures will also be unavailable. By default, downloaded media will be placed...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 21

    htmLawed

    PHP code to purify & filter HTML

    The htmLawed PHP script makes HTML more secure and standards- & policy-compliant. The customizable HTML filter/purifier can balance tags, ensure proper nestings, neutralize XSS, restrict HTML, beautify code like Tidy, implement anti-spam measures, etc.
    Downloads: 79 This Week
    Last Update:
    See Project
  • 22
    Catbird Linux

    Catbird Linux

    Linux for content creation, web scraping, coding, and data analysis.

    Catbird Linux is an operating system built for media creation, web scraping, and software coding. It is the daily driver you want for retrieving data, making videos or podcasts, and making software tools to automate the repetitive tasks. It is ready for work in Python, Lua, and Go languages, with numerous packages for web scraping or downloading data via API calls. Using Catbird Linux, it is possible to accomplish in depth stock market analysis, track weather trends, follow social media...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    TorrentPier

    TorrentPier

    🐂 Bull-powered BitTorrent tracker engine

    TorrentPier — bull-powered BitTorrent Public/Private tracker engine, written in php. High speed, simple modification, high load architecture. In addition, we have very helpful official support forum, where it's possible to get any support and download modifications for engine.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you have...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    SCVZ menza

    SCVZ menza

    App za prikaz jelovnika u studentskoj menzi u Varaždinu

    Aplikacija napravljena u Flutter framework-u. Projekt koji sam radio u slobodno vrijeme i želim predstaviti prilikom prijave za studentsku praksu. Aplikacija komunicira s Flask serverom koji scrape-a podatke sa službene stranice menze. Server aplikaciji vraća json u kojoj se nalazi današnji jelovnik. Trenutno je server hostan na Heroku platformi. https://scvzmenza-app.herokuapp.com/danas
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next