Showing 39 open source projects for "manipulation"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Poetiq

    Poetiq

    Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1

    ...Instead of relying on a single prompt or fixed strategy, their solver dynamically adapts the reasoning path, selecting what to ask or analyze next depending on intermediate results — effectively compositing reasoning, perception, and program synthesis (or symbolic manipulation) in a loop. The repository allows others to reproduce their results, experiment with different LLM backends (e.g. the user may supply keys for supported models), and observe how their adaptive meta-system handles the logic and abstraction challenges.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    DreamO

    DreamO

    A Unified Framework for Image Customization

    DreamO is a unified, open-source framework from ByteDance for advanced image customization and generation that consolidates multiple “image manipulation” tasks into a single system, rather than requiring separate specialized models. Built on a diffusion-transformer (DiT) backbone, it supports a diverse set of tasks — including identity preservation, virtual “try-on” (e.g. clothing, accessories), style transfer, IP adaptation (objects/characters), and layout/condition-aware customizations — all handled within the same unified architecture. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    Vidi is a family of large multimodal models developed for deep video understanding and editing tasks, integrating vision, audio, and language to allow sophisticated querying and manipulation of video content. It’s designed to process long-form, real-world videos and answer complex queries such as “when in this clip does X happen?” or “where in the frame is object Y during that moment?” — offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and even video question answering. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Step-Audio-EditX

    Step-Audio-EditX

    LLM-based Reinforcement Learning audio edit model

    Step-Audio-EditX is an open-source, 3 billion-parameter audio model from StepFun AI designed to make expressive and precise editing of speech and audio as easy as text editing. Rather than treating audio editing as low-level waveform manipulation, this model converts speech into a sequence of discrete “audio tokens” (via a dual-codebook tokenizer) — combining a linguistic token stream and a semantic (prosody/emotion/style) token stream — thereby abstracting audio editing into high-level token operations. This allows users to modify not only what is said (the text) but also how it's said: emotion, tone, speaking style, prosody, accent, even paralinguistic cues. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    MuJoCo MPC

    MuJoCo MPC

    Real-time behaviour synthesis with MuJoCo, using Predictive Control

    ...The system supports multi-shooting optimization, enabling precise motion planning across diverse domains like quadruped locomotion, humanoid tracking, and dexterous manipulation. In addition to its C++ core, MJPC includes an experimental Python API, enabling integration with custom models and MuJoCo tasks for flexible scripting and experimentation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    DragGAN

    DragGAN

    Official Code for DragGAN (SIGGRAPH 2023)

    ...DragGAN has gained attention for making complex image edits, such as pose changes or shape adjustments, accessible through an intuitive interface. The repository provides code and GUI tooling that allow researchers and advanced users to experiment with this next-generation controllable image manipulation technique.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    OpenAI Glow

    OpenAI Glow

    Copy code in "Glow: Generative Flow with Invertible 1x1 Convolutions"

    ...Unlike models that rely on approximate inference, Glow uses invertible transformations to directly learn the data distribution, allowing for exact likelihood computation and efficient sampling. The model is capable of producing high-quality synthetic images while maintaining interpretable latent spaces that enable meaningful manipulation of generated outputs. Glow’s architecture is based on reversible layers and efficient flow operations, which allow large-scale training while keeping memory usage manageable. The repository provides training code, pretrained models, and scripts for generating samples or reproducing key results from the original research. Glow is primarily intended for researchers and practitioners exploring generative modeling, likelihood-based training, and interpretable deep learning systems.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    ALAE

    ALAE

    Adversarial Latent Autoencoders

    ALAE (Adversarial Latent Autoencoders) is a deep learning research implementation that combines autoencoders with generative adversarial networks to produce high-quality image synthesis models. The project implements the architecture introduced in the CVPR research paper on Adversarial Latent Autoencoders, which focuses on improving generative modeling by learning latent representations aligned with adversarial training objectives. Unlike traditional GANs that directly generate images from...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    GIMP ML

    GIMP ML

    AI for GNU Image Manipulation Program

    This repository introduces GIMP3-ML, a set of Python plugins for the widely popular GNU Image Manipulation Program (GIMP). It enables the use of recent advances in computer vision to the conventional image editing pipeline. Applications from deep learning such as monocular depth estimation, semantic segmentation, mask generative adversarial networks, image super-resolution, de-noising and coloring have been incorporated with GIMP through Python-based plugins.
    Downloads: 13 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    StarGAN

    StarGAN

    Official PyTorch Implementation

    ...Unlike earlier GAN approaches that required separate models for each domain pair, StarGAN enables flexible attribute transfer across multiple domains within one network, significantly improving efficiency and scalability. The repository includes full training and inference pipelines for tasks such as facial attribute manipulation and style transfer. It demonstrates adversarial training strategies, domain classification losses, and generator-discriminator coordination required for stable multi-domain translation. Researchers and practitioners often use the project as a reference when studying conditional GANs and advanced image synthesis techniques. Overall, the repository provides a clean and practical baseline for experimenting with multi-domain generative modeling in PyTorch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    TensorFlow Machine Learning Cookbook

    TensorFlow Machine Learning Cookbook

    Code for Tensorflow Machine Learning Cookbook

    ...The repository contains numerous Python scripts and Jupyter notebooks that demonstrate how to implement machine learning algorithms and neural networks using the TensorFlow framework. Each section focuses on a different aspect of machine learning development, including tensor manipulation, model training, optimization strategies, and data processing techniques. The examples illustrate how TensorFlow operations and tensors can be used to build machine learning pipelines and perform tasks such as regression, classification, and clustering. By combining theoretical explanations with executable code, the project helps developers understand how TensorFlow algorithms operate internally while also providing working examples that can be adapted for real projects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Neural Photo Editor

    Neural Photo Editor

    A simple interface for editing natural photos

    Neural Photo Editor is an experimental machine learning application that demonstrates how generative neural networks can be used as an interactive photo editing tool. The project implements the system described in the research paper Neural Photo Editing with Introspective Adversarial Networks, which introduces a generative model capable of modifying images in semantically meaningful ways. Instead of editing images by directly manipulating pixels, the software allows users to influence...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Tesseract-gui
    Tessract-GUI is not a front-end for tesseract-ocr. It is just a graphical way to use it with simple image manipulation thru ImageMagick.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 14
    The CMU personal robotics package offers many robotics algorithms/controllers/drivers that enable robots to perform basic tasks like manipulation and vision. The main infrastructure used is OpenRAVE and Robot Operating System (ROS).
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB