Showing 48 open source projects for "realistic"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    MAI-UI

    MAI-UI

    Real-World Centric Foundation GUI Agents

    ...Developed by Tongyi-MAI (Alibaba’s research initiative), the MAI-UI models are multimodal agents trained to understand user instructions and corresponding screenshots, grounding those instructions to on-screen elements and generating sequences of GUI actions such as taps, swipes, text input, and system commands. Unlike traditional UI frameworks, MAI-UI emphasizes realistic deployment by supporting agent–user interaction (clarifying ambiguous instructions), integration with external tool APIs using MCP calls, and a device–cloud collaboration mechanism that dynamically routes computation to on-device or cloud models based on task state and privacy constraints.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    The SpeechBrain Toolkit

    The SpeechBrain Toolkit

    A PyTorch-based Speech Toolkit

    ...SpeechBrain supports state-of-the-art methods for end-to-end speech recognition, including models based on CTC, CTC+attention, transducers, transformers, and neural language models relying on recurrent neural networks and transformers. Speaker recognition is already deployed in a wide variety of realistic applications. SpeechBrain provides different models for speaker recognition, including X-vector, ECAPA-TDNN, PLDA, and contrastive learning. Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Auto Synced & Translated Dubs

    Auto Synced & Translated Dubs

    Automatically translates the text of a video based on a subtitle file

    Auto-Synced-Translated-Dubs is a toolchain that automatically translates and re-dubs videos using AI voices while keeping the new speech aligned to the original timing via subtitle files. It assumes you have a human-made SRT (or similar) subtitle file; the script then uses translation services such as Google Cloud or DeepL to generate translated subtitle tracks in one or more target languages. Using the timestamps of each subtitle line, it computes the required duration of each spoken...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Robyn

    Robyn

    Experimental, AI/ML-powered and open sourced Marketing Mix Modeling

    ...Robyn takes in historical data (spends on different marketing channels, conversions, or revenue, and optional context or organic-media variables) and uses a combination of techniques, regularized regression (Ridge), time-series decomposition (trend, seasonality, holiday effects), and hyperparameter optimization (via evolutionary algorithms), to estimate the incremental impact of each marketing channel. It explicitly models “carry-over” (adstock) and diminishing-returns (saturation) effects per channel, enabling realistic modeling of how advertising persists over time and saturates.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 5
    Prompt Engineering Interactive Tutorial

    Prompt Engineering Interactive Tutorial

    Anthropic's Interactive Prompt Engineering Tutorial

    ...It starts with the anatomy of a good prompt and moves into techniques that deliver the “80/20” gains—separating instructions from data, specifying schemas, and setting evaluation criteria. The course leans heavily on realistic failure modes (ambiguity, hallucination, brittle instructions) and shows how to iteratively debug prompts the way you would debug code. Lessons include building prompts from scratch for common tasks like extraction, classification, transformation, and step-by-step reasoning, with checkpoints that let you compare your outputs against solid baselines. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    StyleTTS 2

    StyleTTS 2

    Towards Human-Level Text-to-Speech through Style Diffusion

    StyleTTS2 is a state-of-the-art text-to-speech system that aims for human-level naturalness by combining style diffusion, adversarial training, and large speech language models. It extends the original StyleTTS idea by introducing a style diffusion model that can sample rich, realistic speaking styles conditioned on reference speech, allowing highly expressive and diverse prosody. The architecture uses a two-stage training process and leverages an auxiliary speech language model to guide generation toward more natural and coherent utterances. StyleTTS2 supports both single-speaker and multi-speaker configurations, with the ability to sample or transfer styles from reference audio, making it powerful for expressive TTS and character voices. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    VideoCrafter2

    VideoCrafter2

    Overcoming Data Limitations for High-Quality Video Diffusion Models

    VideoCrafter is an open-source video generation and editing toolbox designed to create high-quality video content. It features models for both text-to-video and image-to-video generation. The system is optimized for generating videos from textual descriptions or still images, leveraging advanced diffusion models. VideoCrafter2, an upgraded version, improves on its predecessor by enhancing motion dynamics and concept combinations, especially in low-data scenarios. Users can explore a wide...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    WebArena

    WebArena

    Code repo for "WebArena to build Autonomous Agents

    WebArena is a realistic web environment designed for building and testing autonomous agents, providing a platform for developing web-based AI agents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    AnyTrading

    AnyTrading

    The most simple, flexible, and comprehensive OpenAI Gym trading

    gym-anytrading is an OpenAI Gym-compatible environment designed for developing and testing reinforcement learning algorithms on trading strategies. It simulates trading environments for financial markets, including stocks and forex.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    OGB

    OGB

    Benchmark datasets, data loaders, and evaluators for graph machine

    The Open Graph Benchmark (OGB) is a collection of realistic, large-scale, and diverse benchmark datasets for machine learning on graphs. OGB datasets are automatically downloaded, processed, and split using the OGB Data Loader. The model performance can be evaluated using the OGB Evaluator in a unified manner. OGB is a community-driven initiative in active development. We expect the benchmark datasets to evolve.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    texturize

    texturize

    Generate photo-realistic textures based on source images

    Generate photo-realistic textures based on source images. Remix, remake, mashup! Useful if you want to create variations on a theme or elaborate on an existing texture. A command-line tool and Python library to automatically generate new textures similar to a source image or photograph. It's useful in the context of computer graphics if you want to make variations on a theme or expand the size of an existing texture.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    DiffSinger

    DiffSinger

    Singing Voice Synthesis via Shallow Diffusion Mechanism

    ...The core idea is to view generation of a sung voice (mel-spectrogram) as a diffusion process: starting from noise, the model iteratively “denoises” while being conditioned on a music score (lyrics, pitch, musical timing). This avoids some of the typical problems of prior SVS models — like over-smoothing or unstable GAN training — and produces more realistic, expressive, and natural-sounding singing. The method introduces a “shallow diffusion” mechanism: instead of diffusing over many steps, generation begins at a shallow step determined adaptively, which leverages prior knowledge learned by a simple mel-spectrogram decoder and speeds up inference.
    Downloads: 45 This Week
    Last Update:
    See Project
  • 14
    Talking Head Anime from a Single Image

    Talking Head Anime from a Single Image

    Demo for the "Talking Head Anime from a Single Image"

    ...The underlying model uses deep learning techniques to predict how different facial features and body parts should move based on pose parameters or input signals. This allows the software to create realistic animated frames while preserving the identity and appearance of the original character. The repository includes demo applications that allow users to interact with the system through graphical controls or webcam input to drive character motion. These demonstrations illustrate how generative neural rendering can be used to build real-time avatar systems for virtual characters.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Google Research Football

    Google Research Football

    Check out the new game server

    Google Research Football is a reinforcement learning environment simulating soccer matches. It focuses on learning complex behaviors such as team collaboration and strategy formation in competitive settings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Deep Exemplar-based Video Colorization

    Deep Exemplar-based Video Colorization

    The source code of CVPR 2019 paper "Deep Exemplar-based Colorization"

    ...Video frames are colorized in sequence based on the colorization history, and its coherency is further enforced by the temporal consistency loss. All of these components, learned end-to-end, help produce realistic videos with good temporal stability. Experiments show our result is superior to the state-of-the-art methods both quantitatively and qualitatively. In order to colorize your own video, it requires to extract the video frames, and provide a reference image as an example.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Consistent Depth

    Consistent Depth

    We estimate dense, flicker-free, geometrically consistent depth

    ...The system builds upon traditional structure-from-motion (SfM) techniques to provide geometric constraints while integrating a convolutional neural network trained for single-image depth estimation. During inference, the model fine-tunes itself to align with the geometric constraints of a specific input video, ensuring stable and realistic depth maps even in less-constrained regions. This approach achieves improved geometric consistency and visual stability compared to prior monocular reconstruction methods. The project can process challenging hand-held video footage, including those with moderate dynamic motion, making it practical for real-world usage.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Image Super-Resolution (ISR)

    Image Super-Resolution (ISR)

    Super-scale your images and run experiments with Residual Dense

    The goal of this project is to upscale and improve the quality of low-resolution images. This project contains Keras implementations of different Residual Dense Networks for Single Image Super-Resolution (ISR) as well as scripts to train these networks using content and adversarial loss components. Docker scripts and Google Colab notebooks are available to carry training and prediction. Also, we provide scripts to facilitate training on the cloud with AWS and Nvidia-docker with only a few...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    vid2vid

    vid2vid

    Pytorch implementation of our method for high-resolution

    ...Built on top of image-to-image translation techniques like pix2pixHD, it extends these ideas into the temporal domain by ensuring consistency across video frames. The system can synthesize complex outputs such as realistic talking faces, human motion animations, or dynamic street scenes by learning temporal relationships between frames. It uses generative adversarial networks combined with temporal modeling strategies to maintain coherence and reduce flickering artifacts. The framework is capable of producing high-resolution outputs and is widely used in research related to video synthesis, animation, and simulation. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    SG2Im

    SG2Im

    Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201

    ...Instead of conditioning on free-form text alone, it leverages graph structure to control layout and interactions, generating scenes that respect constraints like “person left of dog” or “cup on table.” The pipeline typically predicts object layouts (bounding boxes and masks) from the graph, then renders a realistic image conditioned on those layouts. This separation lets the model reason about geometry and composition before committing to texture and color, improving spatial fidelity. The repository includes training code, datasets, and evaluation scripts so researchers can reproduce baselines and extend components such as the graph encoder or image generator. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    FastPhotoStyle

    FastPhotoStyle

    Style transfer, deep learning, feature transform

    FastPhotoStyle is a deep learning-based image stylization framework designed to transfer the style of one photograph onto another while preserving photorealistic quality. Unlike traditional artistic style transfer methods that produce painterly outputs, this approach focuses on maintaining realistic textures, lighting, and spatial consistency. The method is based on a two-step process that includes a stylization phase followed by a smoothing operation, ensuring that the output image remains coherent and free of visual artifacts. It is computationally efficient due to its closed-form solution, allowing fast processing compared to iterative optimization-based methods. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    House3D

    House3D

    A Realistic and Rich 3D Environment

    House3D is a large-scale virtual 3D simulation environment designed to support research in embodied AI, reinforcement learning, and vision-language navigation. It provides more than 45,000 richly annotated indoor scenes sourced from the SUNCG dataset, covering diverse architectural layouts such as studios, multi-floor homes, and spaces with detailed furnishings and room types. Each environment includes fully labeled 3D objects, allowing agents to perceive and interact with their surroundings...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Dia-1.6B

    Dia-1.6B

    Dia-1.6B generates lifelike English dialogue and vocal expressions

    Dia-1.6B is a 1.6 billion parameter text-to-speech model by Nari Labs that generates high-fidelity dialogue directly from transcripts. Designed for realistic vocal performance, Dia supports expressive features like emotion, tone control, and non-verbal cues such as laughter, coughing, or sighs. The model accepts speaker conditioning through audio prompts, allowing limited voice cloning and speaker consistency across generations. It is optimized for English and built for real-time performance on enterprise GPUs, though CPU and quantized versions are planned. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB