Search Results for "data transformation" - Page 2

Showing 242 open source projects for "data transformation"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    Lantern Database

    Lantern Database

    PostgreSQL vector database extension for building AI applications

    Lantern is a real-time data transformation engine that enables data engineers to build, run, and monitor streaming data pipelines with SQL. It’s designed to process events in motion, offering low-latency stream transformations, aggregations, and enrichment in a declarative way. Lantern is especially suited for modern data infrastructure and analytics platforms.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    SymmetricDS

    SymmetricDS

    SymmetricDS is database replication and file synchronization software

    SymmetricDS is an open-source platform for database replication and synchronization across heterogeneous systems. It supports multi-master and one-way replication with conflict resolution and works over unreliable networks. SymmetricDS is ideal for distributed applications, data warehousing, and edge deployments where consistency and availability are key.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    json-joy

    json-joy

    json-joy — JSON CRDT, JSON CRDT Patch, JSON Patch

    json-joy library implements cutting-edge real-time and collaborative editing algorithms and other utilities for JSON data models. Major focus of json-joy is development of the JSON CRDT protocol, a Conflict-free Replicated Data Type that enables seamless merging of changes in JSON data models, avoiding conflicts between replicas.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Databend

    Databend

    Cloud-native open source data warehouse for analytics and AI queries

    ...Databend provides a unified engine capable of handling analytics, vector search, and full-text search within a single platform. Databend supports SQL-based workflows and enables real-time data ingestion, transformation, and analysis through streaming and task orchestration features. With its cloud-native design and distributed architecture, Databend can run both as a self-hosted system or within managed environments to power data analytics, AI workloads, and large-scale data.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    osm2pgsql

    osm2pgsql

    Import OpenStreetMap data into a PostgreSQL/PostGIS database

    osm2pgsql is a powerful tool for importing OpenStreetMap (OSM) data into a PostgreSQL/PostGIS database, enabling geographic data analysis and map rendering. It supports various rendering schemas like "flex" and "lua" to customize how data is loaded and indexed. Designed for performance and scalability, osm2pgsql is widely used in map tile generation pipelines and by GIS professionals handling large-scale spatial datasets.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Symfony Serializer

    Symfony Serializer

    Handles serializing and deserializing data structures

    Symfony Serializer is a PHP component that converts objects into various formats (like JSON, XML, or YAML) and vice versa. It handles data normalization and denormalization, making it ideal for APIs and data transformation tasks. The serializer supports complex object graphs and custom normalization logic, allowing developers to convert data in a structured and efficient manner.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Instill Core

    Instill Core

    Instill Core is a full-stack AI infrastructure tool for data

    Instill Core is an open-source, full-stack AI infrastructure platform designed to orchestrate data pipelines, machine learning models, and unstructured data processing into a unified, production-ready system. It provides an end-to-end solution that enables developers to build, deploy, and manage AI-powered applications without needing to manually stitch together multiple tools across the data and model lifecycle. The platform focuses heavily on handling unstructured data such as documents,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Humanizer Skill

    Humanizer Skill

    Claude Code skill that removes signs of AI-generated writing from text

    Humanizer Skill is a utility library focused on transforming technical or machine-oriented text into expressions that are more natural, readable, and “human-friendly.” It provides a suite of algorithms that convert timestamps, identifiers, file sizes, code tokens, and structured data into phrases that resemble typical human phrasing rather than compact machine output. For example, date and time values can be expressed as relative terms (“two hours ago”), and file sizes can be shown in...
    Downloads: 120 This Week
    Last Update:
    See Project
  • 9
    eleventy

    eleventy

    A simpler site generator. Transforms a directory of templates

    A static site generator for modern web development, focusing on flexibility and customization.
    Downloads: 0 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 10
    Fondant

    Fondant

    Production-ready data processing made easy and shareable

    Fondant is a modular, pipeline-based framework designed to simplify the preparation of large-scale datasets for training machine learning models, especially foundation models. It offers an end-to-end system for ingesting raw data, applying transformations, filtering, and formatting outputs—all while remaining scalable and traceable. Fondant is designed with reproducibility in mind and supports containerized steps using Docker, making it easy to share and reuse data processing components....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Jitsu

    Jitsu

    Jitsu is an open-source Segment alternative

    ...Jitsu can either stream data in real-time or send it in micro-batches (up to once a minute). Apply any transformation with Jitsu. Just write JavaScript code right in the UI to do anything with incoming data. And yes, the code editor supports code completion, debugging and many more. It feels like a full-featured IDE!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Apache DevLake

    Apache DevLake

    Apache DevLake is an open-source dev data platform

    ...You can ask Apache DevLake many questions regarding your development process. Just connect and query. Your Dev Data lives in many silos and tools. DevLake brings them all together to give you a complete view of your Software Development Life Cycle (SDLC). From DORA to scrum retros, DevLake implements metrics effortlessly with prebuilt dashboards supporting common frameworks and goals. DevLake fits teams of all shapes and sizes, and can be readily extended to support new data sources, metrics, and dashboards, with a flexible framework for data collection and transformation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Apache Spark

    Apache Spark

    A unified analytics engine for large-scale data processing

    Apache Spark is a unified engine for large-scale data processing, offering APIs for batch jobs, streaming, machine learning, and graph computation. It builds on resilient distributed datasets (RDDs) and the newer DataFrame/Dataset abstractions to provide fault-tolerant, in-memory computation across clusters. Spark’s execution engine handles scheduling, shuffles, caching, and data locality so users can focus on transformations rather than infrastructure plumbing. With Spark Streaming...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 14
    Embedding Studio

    Embedding Studio

    Framework which allows you transform your Vector Database

    Embedding Studio is a framework that transforms vector databases into feature-rich search engines. It leverages embeddings to enhance search capabilities, enabling more accurate and context-aware retrieval of information. Embedding Studio supports various data types and integrates seamlessly with existing databases, providing tools for fine-tuning and optimizing embeddings to suit specific application needs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    sparklyr

    sparklyr

    R interface for Apache Spark

    sparklyr is an R package that provides seamless interfacing with Apache Spark clusters—either local or remote—while letting users write code in familiar R paradigms. It supplies a dplyr-compatible backend, Spark machine learning pipelines, SQL integration, and I/O utilities to manipulate and analyze large datasets distributed across cluster environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Chimney

    Chimney

    Scala library for boilerplate-free, type-safe data transformations

    Chimney is a Scala library that facilitates boilerplate-free, type-safe data transformations between different data types. It enables developers to define mappings between source and target types, ensuring that transformations are checked at compile time, thereby reducing runtime errors and enhancing code reliability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Pyper

    Pyper

    Concurrent Python made simple

    Pyper is a Python-native orchestration and scheduling framework designed for modern data workflows, machine learning pipelines, and any task that benefits from a lightweight DAG-based execution engine. Unlike heavier platforms like Airflow, Pyper aims to remain lean, modular, and developer-friendly, embracing Pythonic conventions and minimizing boilerplate. It focuses on local development ergonomics and seamless transition to production environments, making it ideal for small teams and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Flix

    Flix

    The Flix Programming Language

    Flix is a statically typed programming language combining functional, imperative, and logic paradigms, with first‑class Datalog constraints and a polymorphic effect system. Designed to run on the JVM, Flix enforces purity tracking at compile time, supports algebraic data types, tail‑call elimination, and allows entire Datalog programs as values.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Apache Hamilton

    Apache Hamilton

    Helps data scientists define testable self-documenting dataflows

    ...This approach encourages modular, testable, and maintainable data pipelines because each transformation is isolated and easily unit tested. The framework also automatically tracks lineage and metadata about how data is produced, which improves debugging, reproducibility, and transparency in data workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    kpt

    kpt

    Automate Kubernetes Configuration Editing

    ...Because they are expected to be used for in-place transformation, the functions need to be idempotent. The package orchestrator enables the magic behind the unique WYSIWYG experience.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    PyVista

    PyVista

    3D plotting and mesh analysis through a streamlined interface

    3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK). PyVista is a helper module for the Visualization Toolkit (VTK) that takes a different approach on interfacing with VTK through NumPy and direct array access. This package provides a Pythonic, well-documented interface exposing VTK’s powerful visualization backend to facilitate rapid prototyping, analysis, and visual integration of spatially referenced datasets. This module can be used for...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Hacks

    Hacks

    A collection of hacks and one-off scripts

    Hacks is a collection of experimental scripts, utilities, and one-off tools created to solve specific problems in security research, data processing, and automation. Rather than being a single cohesive application, it serves as a repository of practical command-line tools that can be used independently or combined into workflows. The scripts cover a wide range of tasks, including URL manipulation, parameter replacement, data extraction, and reconnaissance automation. Many of the tools in the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Monocle

    Monocle

    Optics library for Scala

    Monocle is a pure functional, optics library for Scala providing immutable data access and transformation tools — including Lens, Prism, Iso, Optional, and Traversal. It enables composable, declarative modifications of deeply nested immutable structures in a concise and type-safe fashion.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    TorchRL

    TorchRL

    A modular, primitive-first, python-first PyTorch library

    TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch. TorchRL provides PyTorch and python-first, low and high-level abstractions for RL that are intended to be efficient, modular, documented, and properly tested. The code is aimed at supporting research in RL. Most of it is written in Python in a highly modular way, such that researchers can easily swap components, transform them, or write new ones with little effort.
    Downloads: 61 This Week
    Last Update:
    See Project
  • 25
    Datumaro

    Datumaro

    Dataset Management Framework, a Python library and a CLI tool to build

    Datumaro is a flexible Python-based dataset management framework and command-line tool for building, analyzing, transforming, and converting computer vision datasets in many popular formats. It supports importing and exporting annotations and images across a wide variety of standards like COCO, PASCAL VOC, YOLO, ImageNet, Cityscapes, and many more, enabling easy integration with different training pipelines and tools. Datumaro makes it easy to merge datasets, split them into...
    Downloads: 4 This Week
    Last Update:
    See Project