Showing 10 open source projects for "data integration"

View related business solutions
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 1
    fess

    fess

    Open source enterprise search server for websites, files, and data

    Fess is an open source enterprise search server designed to provide powerful full-text search capabilities across multiple data sources. It enables organizations to quickly deploy a scalable search environment without requiring deep knowledge of underlying search technologies. Fess is built on top of OpenSearch and offers an integrated solution for crawling, indexing, and searching documents from websites, file systems, and various data stores. Fess includes a built-in crawler that can...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    newpipeextractor

    newpipeextractor

    Library for extracting streaming site data without official APIs

    NewPipeExtractor is an open source Java library designed to extract data from streaming platforms by analyzing their web interfaces instead of relying on official APIs. It serves as the core extraction component used by the NewPipe Android application, but it is built as a standalone library that can also be integrated into other software projects. NewPipeExtractor provides a unified framework for retrieving information such as video streams, playlists, channels, and search results from...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    skycaiji

    skycaiji

    Open source web scraping system for automated data collection tasks

    SkyCaiji is an open source web scraping and data collection system designed to gather information from websites through configurable extraction rules. It focuses on simplifying the process of building crawlers by allowing users to visually define scraping rules rather than writing complex code. It can collect structured or unstructured data from many types of webpages and automate the extraction process for large datasets. SkyCaiji is designed to run on a variety of hosting environments...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    FEAPDER

    FEAPDER

    Powerful Python crawler framework for scalable web scraping tasks

    feapder is a Python-based web crawling framework designed to simplify the process of building scalable and efficient web scrapers. It focuses on providing a developer-friendly environment that makes it easier to create, run, and manage crawlers for a variety of data collection tasks. It includes several built-in spider types, such as AirSpider, Spider, TaskSpider, and BatchSpider, which address different crawling scenarios ranging from lightweight scraping to distributed and batch-based...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 5
    news-please

    news-please

    Python tool for crawling and extracting structured data from news site

    news-please is an open source news crawler and information extraction tool designed to collect and structure articles from online news websites. It provides an integrated pipeline that crawls news sites, retrieves article pages, and extracts structured information such as headlines, authors, publication dates, and article text. news-please can recursively follow internal links and read RSS feeds to gather both recent and archived articles from a news outlet when given only the root URL of a...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    ...The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    ...In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages. Once trained, the generated scraper...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    Blackfire Player

    Blackfire Player

    Web Crawling, Web Testing, and Web Scraping application

    Blackfire Player is a powerful Web Crawling, Web Testing, and Web Scraper application. It provides a nice DSL to crawl HTTP services, assert responses, and extract data from HTML/XML/JSON responses. Some Blackfire Player use cases: Crawl a website/API and check expectations -- aka Acceptance Tests; Scrape a website/API and extract values; Monitor a website; Test code with unit test integration (PHPUnit, Behat, Codeception, ...); Test code behavior from the outside thanks to the native Blackfire Profiler integration -- aka Unit Tests from the HTTP layer (tm). ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB