Search Results for "data processing" - Page 2

Showing 400 open source projects for "data processing"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 1
    DeepBI

    DeepBI

    LLM based data scientist, AI native data application

    DeepBI is an AI-native data analysis platform. DeepBI leverages the power of large language models to explore, query, visualize, and share data from any data source. Users can use DeepBI to gain data insight and make data-driven decisions.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 2
    DataDreamer

    DataDreamer

    DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models

    DataDreamer is a tool designed to assist in the generation and manipulation of synthetic data for various applications, including testing and machine learning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Sparrow

    Sparrow

    Structured data extraction and instruction calling with ML, LLM

    ...The architecture is modular, allowing developers to build customizable processing pipelines that integrate with external tools and data extraction frameworks. Sparrow also includes workflow orchestration tools that allow multiple extraction tasks to be combined into automated pipelines for large-scale document processing.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    Datasets

    Datasets

    Hub of ready-to-use datasets for ML models

    Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    MineContext

    MineContext

    MineContext is your proactive context-aware AI partner

    ...It is built around a context engineering framework that manages the full lifecycle of data, including capture, processing, storage, retrieval, and consumption. The platform emphasizes privacy through a local-first architecture, allowing users to keep their data stored and processed on their own device rather than relying on external cloud services.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 6
    Classical Language Toolkit (CLTK)

    Classical Language Toolkit (CLTK)

    The Classical Language Toolkit

    The Classical Language Toolkit (CLTK) is a Python library offering natural language processing support for classical languages, including Latin, Greek, and others.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Pyper

    Pyper

    Concurrent Python made simple

    Pyper is a Python-native orchestration and scheduling framework designed for modern data workflows, machine learning pipelines, and any task that benefits from a lightweight DAG-based execution engine. Unlike heavier platforms like Airflow, Pyper aims to remain lean, modular, and developer-friendly, embracing Pythonic conventions and minimizing boilerplate. It focuses on local development ergonomics and seamless transition to production environments, making it ideal for small teams and...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    python-small-examples

    python-small-examples

    Focus on creating classic Python small examples and cases

    python-small-examples is an open-source educational repository that contains hundreds of concise Python programming examples designed to illustrate practical coding techniques. The project focuses on teaching programming concepts through small, focused scripts that demonstrate common tasks in data processing, visualization, and general programming. Each example highlights a specific function or programming pattern so that learners can quickly understand how to apply Python features in real-world scenarios. The repository includes examples covering topics such as file processing, JSON manipulation, data visualization, and library usage. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    WiFi DensePose

    WiFi DensePose

    Turn WiFi signals into real-time human pose estimation and detection

    WiFi DensePose is a production-oriented implementation of a WiFi-based human pose estimation system that enables real-time full-body tracking using wireless signals rather than cameras. The project demonstrates how commodity mesh routers and signal processing techniques can be leveraged to infer dense human pose information, even through obstacles such as walls. It is designed to showcase the emerging field of RF-based sensing, where machine learning models interpret wireless channel data to reconstruct human movement and posture. The repository includes components for data processing, model inference, and real-time visualization, making it suitable for research and experimental deployments. ...
    Downloads: 48 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    SetFit

    SetFit

    Efficient few-shot learning with Sentence Transformers

    SetFit is an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers. It achieves high accuracy with little labeled data - for instance, with only 8 labeled examples per class on the Customer Reviews sentiment dataset, SetFit is competitive with fine-tuning RoBERTa Large on the full training set of 3k examples.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 12
    OCRmyPDF

    OCRmyPDF

    OCRmyPDF adds an OCR text layer to scanned PDF files

    OCRmyPDF adds an optical character recognition (OCR) text layer to scanned PDF files, allowing them to be searched. PDF is the best format for storing and exchanging scanned documents. Unfortunately, PDFs can be difficult to modify. OCRmyPDF makes it easy to apply image processing and OCR (recognized, searchable text) to existing PDFs.
    Downloads: 101 This Week
    Last Update:
    See Project
  • 13
    Kornia

    Kornia

    Open Source Differentiable Computer Vision Library

    Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer vision problems. At its core, the package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions. Inspired by existing packages, this library is composed by a subset of packages containing operators that can be inserted within...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 14
    deepdoctection

    deepdoctection

    A Repo For Document AI

    ...For more specific text processing tasks use one of the many other great NLP libraries.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Superlinked

    Superlinked

    Superlinked is a Python framework for AI Engineers

    Superlinked is a Python framework designed for AI engineers to build high-performance search and recommendation applications that combine structured and unstructured data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    LightAutoML

    LightAutoML

    Fast and customizable framework for automatic ML model creation

    LightAutoML is an automated machine learning (AutoML) framework optimized for efficient model training and hyperparameter tuning, focusing on both tabular and text data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MindNLP

    MindNLP

    Easy-to-use and high-performance NLP and LLM framework

    MindNLP is a natural language processing library built on the MindSpore framework, providing tools and models for various NLP tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    AI-Media2Doc

    AI-Media2Doc

    AI tool converting video/audio into structured documents instantly

    ...It is designed to transform multimedia inputs into formats such as knowledge notes, summaries, mind maps, and social-style articles, making content easier to review and reuse. AI-Media2Doc emphasizes privacy by processing media locally in the browser using WebAssembly-based ffmpeg, ensuring that original video files are not uploaded externally. It separates client-side media handling from backend AI processing, reducing data exposure while still enabling transcription and document generation. AI-Media2Doc supports flexible customization through prompts, allowing users to tailor output styles based on their needs. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 19
    E2M

    E2M

    E2M converts various file types (doc, docx, epub, html, htm, url

    E2M is a SourceForge mirror of the e2m open-source project, which focuses on providing tools or services designed to convert or process content between different formats or systems. Projects with similar naming conventions typically emphasize automation workflows where input data from one environment is transformed into another representation or output structure. The mirrored repository allows users to access the project’s codebase independently from its original hosting platform while preserving the development history and release artifacts. Systems like e2m often serve as middleware components that connect different software systems or facilitate data processing pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Nexent

    Nexent

    Zero-code platform for building AI agents from natural language input

    ...It focuses on a zero-code approach, allowing users to define workflows and agent behavior purely through language prompts, significantly lowering the barrier to entry for AI development. Built on the MCP ecosystem, Nexent integrates a wide range of tools, models, and data sources into a unified environment for agent creation and execution. Nexent supports multi-agent collaboration, enabling multiple intelligent agents to interact and coordinate tasks within complex workflows. It also includes capabilities for data processing, knowledge tracing, and multimodal interaction, allowing agents to work with different input and output formats. ...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 21
    BambooAI

    BambooAI

    A Python library powered by Language Models (LLMs)

    BambooAI is a Python library powered by large language models (LLMs) for conversational data discovery and analysis, allowing users to interact with data through natural language.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    PaddleNLP

    PaddleNLP

    Easy-to-use and powerful NLP library with Awesome model zoo

    ...Provide rich industry-level pre-task capabilities Taskflow And process-wide text area API: Support for the loading of rich Chinese data sets Dataset API, can flexibly and efficiently complete data pretreatment Data API, Preset 60 + pre-training word vector Embedding API, Providing 100 + pre-training model Transformer API Wait, the efficiency of NLP task modeling can be greatly improved.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    DataChain

    DataChain

    AI-data warehouse to enrich, transform and analyze unstructured data

    ...The resulting datasets can be saved, versioned, and sent directly to PyTorch and TensorFlow for training. Datachain can persist features of Python objects returned by AI models, and enables vectorized analytical operations over them. The typical use cases are data curation, LLM analytics and validation, image segmentation, pose detection, and GenAI alignment. Datachain is especially helpful if batch operations can be optimized – for instance, when synchronous API calls can be parallelized or where an LLM API offers batch processing.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    TorchAudio

    TorchAudio

    Data manipulation and transformation for audio signal processing

    The aim of torchaudio is to apply PyTorch to the audio domain. By supporting PyTorch, torchaudio follows the same philosophy of providing strong GPU acceleration, having a focus on trainable features through the autograd system, and having consistent style (tensor names and dimension names). Therefore, it is primarily a machine learning library and not a general signal processing library. The benefits of PyTorch can be seen in torchaudio through having all the computations be through PyTorch...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    Video-subtitle-remover (VSR)

    Video-subtitle-remover (VSR)

    AI tool that removes hardcoded subtitles and text from videos locally

    ...It allows users to define a specific subtitle region so that only text in that area is removed rather than modifying the entire frame. It can also automatically remove text throughout the whole video when a position is not specified. In addition to video processing, the project supports removing text-like watermarks from images through similar techniques. The processing runs locally without requiring any external API services, enabling offline use and greater control over the data being processed.
    Downloads: 131 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB