Showing 88 open source projects for "resolution"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    VideoCrafter2

    VideoCrafter2

    Overcoming Data Limitations for High-Quality Video Diffusion Models

    VideoCrafter is an open-source video generation and editing toolbox designed to create high-quality video content. It features models for both text-to-video and image-to-video generation. The system is optimized for generating videos from textual descriptions or still images, leveraging advanced diffusion models. VideoCrafter2, an upgraded version, improves on its predecessor by enhancing motion dynamics and concept combinations, especially in low-data scenarios. Users can explore a wide...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    Conscious Artificial Intelligence

    Conscious Artificial Intelligence

    It's possible for machines to become self-aware.

    This project is a quest for conscious artificial intelligence. A number of prototypes will be developed as the project progresses. This project has 2 subprojects: Object Pascal based CAI NEURAL API - https://github.com/joaopauloschuler/neural-api Python based K-CAI NEURAL API - https://github.com/joaopauloschuler/k-neural-api A video from the first prototype has been made: http://www.youtube.com/watch?v=qH-IQgYy9zg Above video shows a popperian agent collecting mining ore from 3...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 3
    FLUX.1 Krea

    FLUX.1 Krea

    Powerful open source image generation model

    FLUX.1 Krea [dev] is an open-source 12-billion parameter image generation model developed collaboratively by Krea and Black Forest Labs, designed to deliver superior aesthetic control and high image quality. It is a rectified-flow model distilled from the original Krea 1, providing enhanced sampling efficiency through classifier-free guidance distillation. The model supports generation at resolutions between 1024 and 1280 pixels with recommended inference steps between 28 and 32 for optimal...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Qwen-VL

    Qwen-VL

    Chat & pretrained large vision language model

    Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. Qwen-VL supports multilingual inputs...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    Recurrent Interface Network (RIN)

    Recurrent Interface Network (RIN)

    Implementation of Recurrent Interface Network (RIN)

    Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in Pytorch. The author unawaredly reinvented the induced set-attention block from the set transformers paper. They also combine this with the self-conditioning technique from the Bit Diffusion paper, specifically for the latents. The last ingredient seems to be a new noise function based around the sigmoid, which the author claims is better than cosine...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    BackgroundMattingV2

    BackgroundMattingV2

    Real-Time High-Resolution Background Matting

    Official repository for the paper Real-Time High-Resolution Background Matting. Our model requires capturing an additional background image and produces state-of-the-art matting results at 4K 30fps and HD 60fps on an Nvidia RTX 2080 TI GPU.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    MMEditing

    MMEditing

    MMEditing is a low-level vision toolbox based on PyTorch

    MMEditing is an open-source toolbox for low-level vision. It supports various tasks. MMEditing is a low-level vision toolbox based on PyTorch, supporting super-resolution, inpainting, matting, video interpolation, etc. We decompose the editing framework into different components and one can easily construct a customized editor framework by combining different modules. The toolbox directly supports popular and contemporary inpainting, matting, super-resolution and generation tasks. The toolbox provides state-of-the-art methods in inpainting/matting/super-resolution/generation. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    SuperAGI

    SuperAGI

    A dev-first open source autonomous AI agent framework

    ...Control token usage to manage costs effectively. Enable your agents to learn and adapt by storing their memory. Get notified when agents get stuck in the loop, and provide proactive resolution. Read and store files generated by Agents.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    CogView

    CogView

    Text-to-Image generation. The repo for NeurIPS 2021 paper

    ...The model incorporates innovations such as PB-relax and Sandwich-LN to enable stable training of very deep transformers without NaN loss issues. CogView supports multiple tasks beyond text-to-image, including image captioning, post-selection (ranking candidate images by relevance to a prompt), and super-resolution (upscaling model-generated images). The repository provides pretrained models, inference scripts, and training examples, along with a Docker environment for reproducibility.
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    DiT (Diffusion Transformers)

    DiT (Diffusion Transformers)

    Official PyTorch Implementation of "Scalable Diffusion Models"

    ...The model architecture parallels large language models but for image tokens—each block refines noisy latent representations toward cleaner outputs through iterative denoising steps. DiT achieves strong results on benchmarks like ImageNet and LSUN while being architecturally simple and highly modular. It supports variable resolution, conditioning on class or text embeddings, and integration with latent autoencoders (like those used in Stable Diffusion).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    PIFuHD

    PIFuHD

    High-Resolution 3D Human Digitization from A Single Image

    PIFuHD (Pixel-Aligned Implicit Function for 3D human reconstruction at high resolution) is a method and codebase to reconstruct high-fidelity 3D human meshes from a single image. It extends prior PIFu work by increasing resolution and detail, enabling fine geometry in cloth folds, hair, and subtle surface features. The method operates by learning an implicit occupancy / surface function conditioned on the image and camera projection; at inference time it queries dense points to reconstruct a mesh via marching cubes. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    Stable-Dreamfusion

    Stable-Dreamfusion

    Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion

    ...Different from Imagen, Stable-Diffusion is a latent diffusion model, which diffuses in a latent space instead of the original image space. Therefore, we need the loss to propagate back from the VAE's encoder part too, which introduces extra time costs in training. We use the multi-resolution grid encoder to implement the NeRF backbone (implementation from torch-ngp), which enables much faster rendering.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    texturize

    texturize

    Generate photo-realistic textures based on source images

    Generate photo-realistic textures based on source images. Remix, remake, mashup! Useful if you want to create variations on a theme or elaborate on an existing texture. A command-line tool and Python library to automatically generate new textures similar to a source image or photograph. It's useful in the context of computer graphics if you want to make variations on a theme or expand the size of an existing texture. This software is powered by deep learning technology, using a combination...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Point-E

    Point-E

    Point cloud diffusion for 3D model synthesis

    point-e is the official repository for Point-E, a generative model developed by OpenAI that produces 3D point clouds from textual (or image) prompts. Its principal advantage is speed: it can generate 3D assets in just 1–2 minutes on a single GPU, which is significantly faster than many competing text-to-3D models. The model works via a two-stage diffusion approach: first, it uses a text → image diffusion network to produce a synthetic 2D view consistent with the prompt; then a second...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Karlo

    Karlo

    Text-conditional image generation model based on OpenAI's unCLIP

    Karlo is a text-conditional image generation model based on OpenAI's unCLIP architecture with the improvement over the standard super-resolution model from 64px to 256px, recovering high-frequency details only in the small number of denoising steps. We train all components from scratch on 115M image-text pairs including COYO-100M, CC3M, and CC12M. In the case of Prior and Decoder, we use ViT-L/14 provided by OpenAI’s CLIP repository. Unlike the original implementation of unCLIP, we replace the trainable transformer in the decoder into the text encoder in ViT-L/14 for efficiency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    BCI

    BCI

    BCI: Breast Cancer Immunohistochemical Image Generation

    Breast Cancer Immunohistochemical Image Generation through Pyramid Pix2pix. We have released the trained model on BCI and LLVIP datasets. We host a competition for breast cancer immunohistochemistry image generation on Grand Challenge. Project pix2pix provides a python script to generate pix2pix training data in the form of pairs of images {A,B}, where A and B are two different depictions of the same underlying scene, these can be pairs {HE, IHC}. Then we can learn to translate A(HE images)...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    ruDALL-E

    ruDALL-E

    Generate images from texts. In Russian

    We present a family of generative models from SberDevices and Sber AI! Models allow you to create images that did not exist before. All you need is a text description in Russian or another language. Try to create unique images together with generative artists using your own formulations. Ask generative artists to depict something special for you as well. The Kandinsky 2.0 model uses the reverse diffusion method and creates colorful images on various topics in a matter of seconds by text...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Mask2Former

    Mask2Former

    Code release for "Masked-attention Mask Transformer

    ...A pixel decoder fuses multi-scale features and feeds masked attention in the transformer so each query focuses computation on its current spatial support. This leads to accurate masks with sharp boundaries and strong small-object performance while remaining efficient on high-resolution inputs. The project provides extensive configurations and pretrained models across popular benchmarks like COCO, ADE20K, and Cityscapes. Built on top of Detectron2, it includes training scripts, inference tools, and visualization utilities that make experimentation straightforward.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    RQ-Transformer

    RQ-Transformer

    Implementation of RQ Transformer, autoregressive image generation

    ...I also think there is something deeper going on, and have generalized this to any number of dimensions. You can use it by importing the HierarchicalCausalTransformer. For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes. However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Piano transcription

    Piano transcription

    Task of transcribing piano recordings into MIDI files

    Piano transcription is an open-source high-resolution piano transcription system by ByteDance that converts raw audio recordings of piano performance into symbolic MIDI files — detecting note onsets, offsets, pitch, velocity, and even pedal usage. The system is implemented in Python (PyTorch) and is capable of accurate transcription of polyphonic piano recordings, even with complex passages and pedal techniques, making it suitable for classical piano music.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    Fashion-MNIST

    Fashion-MNIST

    A MNIST-like fashion product database

    ...It was designed as a direct replacement for the original MNIST handwritten digits dataset, maintaining the same structure and image size so that researchers could easily switch datasets without modifying their experimental pipelines. The dataset consists of 70,000 images in total, with 60,000 examples used for training and 10,000 reserved for testing. Each image has a resolution of 28 by 28 pixels and belongs to one of ten clothing classes, making it suitable for evaluating classification models. Because the dataset represents real-world objects rather than handwritten digits, it offers a more challenging benchmark for testing machine learning algorithms.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 22
    GANformer

    GANformer

    Generative Adversarial Transformers

    ...The network employs a bipartite structure that enables long-range interactions across the image, while maintaining computation of linearly efficiency, that can readily scale to high-resolution synthesis. The model iteratively propagates information from a set of latent variables to the evolving visual features and vice versa, to support the refinement of each in light of the other and encourage the emergence of compositional representations of objects and scenes. In contrast to the classic transformer architecture, it utilizes multiplicative integration that allows flexible region-based modulation and can thus be seen as a generalization of the successful StyleGAN network. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    GiantMIDI-Piano

    GiantMIDI-Piano

    Classical piano MIDI dataset

    GiantMIDI-Piano is a large-scale symbolic classical piano music dataset built by applying the piano_transcription system on a vast collection of piano performance recordings. The dataset contains thousands of piano works, spanning a large number of composers and styles, with each piece transcribed into high-precision MIDI files capturing note events, pedal usage, velocities, etc. It provides a resource for music information retrieval (MIR), symbolic music modeling, composer classification,...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 24
    PaddleGAN

    PaddleGAN

    PaddlePaddle GAN library, including lots of interesting applications

    PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, Wav2Lip, picture repair, image editing, photo2cartoon, image style transfer, GPEN, and so on. PaddleGAN provides developers with high-performance implementation of classic and SOTA Generative Adversarial Networks, and supports developers to quickly build, train and deploy GANs for academic, entertainment, and industrial usage. GAN-Generative Adversarial Network, was praised by "the Father...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    VoiceFixer

    VoiceFixer

    General Speech Restoration

    VoiceFixer is a machine-learning framework for “speech restoration”: given a degraded or distorted audio recording — with noise, clipping, low sampling rate, reverberation, or other artifacts — it attempts to recover high-fidelity, clean speech. The architecture works in two stages: first an analysis stage that tries to extract “clean” intermediate features from the noisy audio (e.g. removing noise, denoising, dereverberation, upsampling), and then a neural vocoder-based synthesis stage that...
    Downloads: 2 This Week
    Last Update:
    See Project
Auth0 Logo