Showing 48 open source projects for "data processing"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    text-dedup

    text-dedup

    All-in-one text de-duplication

    text-dedup is a Python library that enables efficient deduplication of large text corpora by using MinHash and other probabilistic techniques to detect near-duplicate content. This is especially useful for NLP tasks where duplicated training data can skew model performance. text-dedup scales to billions of documents and offers tools for chunking, hashing, and comparing text efficiently with low memory usage. It supports Jaccard similarity thresholding, parallel execution, and flexible...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    SnappyData

    SnappyData

    Memory optimized analytics database, based on Apache Spark

    ...SnappyData delivers high throughput, low latency, and high concurrency for a unified analytics workload. By fusing an in-memory hybrid database inside Apache Spark, it provides analytic query processing, mutability/transactions, access to virtually all big data sources and stream processing all in one unified cluster. One common use case for SnappyData is to provide analytics at interactive speeds over large volumes of data with minimal or no pre-processing of the dataset. For instance, there is no need to often pre-aggregate/reduce or generate cubes over your large data sets for ad-hoc visual analytics. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    Padasip

    Padasip

    Python Adaptive Signal Processing

    Padasip (Python Adaptive Signal Processing) is a Python library tailored for adaptive filtering and online learning applications, particularly in signal processing and time series forecasting. It includes a variety of adaptive filter algorithms such as LMS, RLS, and their variants, offering real-time adaptation to changing environments. The library is lightweight, well-documented, and ideal for research, prototyping, or teaching purposes. Padasip supports both supervised and unsupervised...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    Amadeus

    Amadeus

    Harmonious distributed data analysis in Rust

    Amadeus is a high-performance, distributed data processing framework written in Rust, designed to offer an ergonomic and safe alternative to tools like Apache Spark. It provides both streaming and batch capabilities, allowing users to work with real-time and historical data at scale. Thanks to Rust’s memory safety and zero-cost abstractions, Amadeus delivers performance gains while reducing the complexity and bugs common in large-scale data pipelines. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • 5
    FLOGO

    FLOGO

    Simplify building efficient & modern serverless functions and apps

    Project Flogo is an ultra-light, Go-based open source ecosystem for building event-driven apps. Event-driven, you say? Yup, the notion of triggers and actions are leveraged to process incoming events. An action, a common interface, exposes key capabilities such as application integration, stream processing, etc. All capabilities within the Flogo Ecosystem have a few things in common, they all process events (in a manner suitable for the specific purpose) and they all implement the action...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 6
    ksqlDB

    ksqlDB

    The database purpose-built for stream processing applications

    Build applications that respond immediately to events. Craft materialized views over streams. Receive real-time push updates, or pull current state on demand. Seamlessly leverage your existing Apache Kafka® infrastructure to deploy stream-processing workloads and bring powerful new capabilities to your applications. Use a familiar, lightweight syntax to pack a powerful punch. Capture, process, and serve queries using only SQL. No other languages or services are required. ksqlDB enables you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DSPatch

    DSPatch

    The Refreshingly Simple C++ Dataflow Framework

    Webite: http://flowbasedprogramming.com DSPatch, pronounced "dispatch", is a powerful C++ dataflow framework. DSPatch is not limited to any particular domain or data type, from reactive programming to stream processing, DSPatch's generic, object-oriented API allows you to create virtually any dataflow system imaginable. *See also:* DSPatcher ( https://github.com/MarcusTomlinson/DSPatcher ): A cross-platform graphical tool for building DSPatch circuits. DSPatchables ( https://github.com/MarcusTomlinson/DSPatchables ): A DSPatch component repository.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Wally

    Wally

    Distributed Stream Processing

    ...Provide high-performance & low-latency data processing. Be portable and deploy easily (i.e., run on-prem or any cloud). Manage in-memory state for the application. Allow applications to scale as needed, even when they are live and up-and-running. The primary API for Wally is written in Pony. Wally applications are written using this Pony API.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    Cosmos DB Spark

    Cosmos DB Spark

    Apache Spark Connector for Azure Cosmos DB

    ...The connector allows you to easily read to and write from Azure Cosmos DB via Apache Spark DataFrames in Python and Scala. It also allows you to easily create a lambda architecture for batch-processing, stream-processing, and a serving layer while being globally replicated and minimizing the latency involved in working with big data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    SPar: Stream Parallelism in Multi-Cores

    SPar: Stream Parallelism in Multi-Cores

    An Embedded C++ Domain-Specific Language

    SPar is an internal C++ Domain-Specific Language (DSL) suitable to model and implement classical stream parallel patterns. The DSL uses standard C++ attributes to introduce annotations tagging the notable components of stream parallel applications: stream sources and stream processing stages. Latest version can be downloaded from the SVN using the following command: svn checkout svn://svn.code.sf.net/p/spar-dsl-compiler/svn/ spar
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Dataflow Java SDK

    Dataflow Java SDK

    Google Cloud Dataflow provides a simple, powerful model

    The Dataflow Java SDK is the open-source Java library that powers Apache Beam pipelines for Google Cloud Dataflow, a serverless and scalable platform for processing large datasets in both batch and stream modes. This SDK allows developers to write Beam-based pipelines in Java and execute them on Dataflow, taking advantage of features like autoscaling, dynamic work rebalancing, and fault-tolerant distributed processing. While it has been mostly succeeded by the unified Beam SDKs, it remains relevant for legacy systems and offers insight into the underlying mechanisms that power scalable data workflows on Google Cloud.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    horizon

    horizon

    Horizon is a realtime, open-source backend for JavaScript apps

    Horizon is an open-source developer platform for building sophisticated realtime apps. It provides a complete backend that makes it dramatically simpler to build, deploy, manage, and scale engaging JavaScript web and mobile apps. Horizon is extensible, integrates with the Node.js stack, and allows building modern, arbitrarily complex applications. While technologies like RethinkDB and WebSocket make it possible to build engaging realtime apps, empirically there is still too much friction for...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    TeleScope

    TeleScope

    XML Data Stream Broker/Replicator

    TeleScope is the efficient intensive-load XML data stream broker, replicator and simple event processing platform (SEP) written in C for the Fedora 17-18, Slackware 13-14, Red Hat Enterprise Linux 6 (RHEL-6) Linux distributions. The platform is intended to be operated upon the single number/word values and is not meant to be deployed for full-text XML stream analysis. TeleScope has internal query language with a set of standard logical operators that allows to construct relatively complex query expressions. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    A production stable Java utility library with convenience methods for string- and stream processing, file handling, XML, XSLTs and XPath, checksums, console formatting, and more. The project is developed by the State and University Library of Denmark
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    LogsGrep

    A grep-like utility for log files.

    LogsGrep is a unique, grep-like utility designed specifically to target log files containing multi-line entries. The primary target is Java log files (Log4J, common, ...), where it is very common to have multiline log entries (for example log entries with a stacktrace). It follows Unix philosophy, does only its primary job and expects its input to be generated by other more advanced tools (tail, cat, type, find...); There is no goal to be compatible with Unix grep. LogsGrep is...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    activeinsight
    ActiveInsight provides real-time detection and reaction to events and patterns. It is a platform that enables the detection of meaningful events within multiple, high frequency, event streams.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MXQuery is a low-footprint implementation of XQuery 1.0, XQuery Update 1.0, XQuery Fulltext 1.0 and XQuery Scripting 1.0 as well as a subset of XQuery 1.1 (windowing, try/catch). It provides extensions to do data stream processing/CEP and SOAP/REST
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    An experimental CEP (Complex Event Processing) engine. It implements the event stream processing as a library embeddable in C++ and Perl. Since then it has been renamed to Triceps, so please look at the new location https://sourceforge.net/projects/t
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    StreamScale is a Java based Data Stream Processing System.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Video Stream Processing On CBE Project
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    A Middleware for Distrubted Data Stream Processing
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Sed.py is a python module to provide a easy way to do text stream processing. Just like the name of module, it likes to do the work that sed can do. But not in sed's way, it's in Python's way. To use this module, the knowledge of regexp is necessary.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    StreamMine is a distributed event processing (streaming) infrastructure. You can create low-latency, fault-tolerant stream processing functionality with any stream-oriented operators that can be implemented in Python.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB