image text input free download

Transformers

State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX

...Using pre-trained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities. Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. Images, for tasks like image classification, object detection, and segmentation. Audio, for tasks like speech recognition and audio classification. Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. ...

Downloads: 1 This Week

Last Update: 3 days ago

See Project

ImageBind

ImageBind One Embedding Space to Bind Them All

ImageBind is a multimodal embedding framework that learns a shared representation space across six modalities—images, text, audio, depth, thermal, and IMU (inertial motion) data—without requiring explicit pairwise training for every modality combination. Instead of aligning each pair independently, ImageBind uses image data as the central binding modality, aligning all other modalities to it so they can interoperate zero-shot. This creates a unified embedding space where representations from any modality can be compared or retrieved against any other (e.g., matching sound to text or depth to image). ...

Downloads: 0 This Week

Last Update: 2025-11-21

See Project

AutoGluon

AutoGluon: AutoML for Image, Text, and Tabular Data

...Easily improve/tune your bespoke models and data pipelines, or customize AutoGluon for your use-case. AutoGluon is modularized into sub-modules specialized for tabular, text, or image data. You can reduce the number of dependencies required by solely installing a specific sub-module via: python3 -m pip install <submodule>.

Downloads: 0 This Week

Last Update: 2025-12-19

See Project

DocArray

The data structure for multimodal data

DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. ...

Downloads: 0 This Week

Last Update: 2025-03-21

See Project

Make-A-Video - Pytorch (wip)

Implementation of Make-A-Video, new SOTA text to video generator

...The gist of the paper comes down to, take a SOTA text-to-image model (here they use DALL-E2, but the same learning points would easily apply to Imagen), make a few minor modifications for attention across time and other ways to skimp on the compute cost, do frame interpolation correctly, get a great video model out. Passing in images (if one were to pretrain on images first), both temporal convolution and attention will be automatically skipped.

Downloads: 0 This Week

Last Update: 2024-05-03

See Project

Raster Vision

Open source framework for deep learning satellite and aerial imagery

Raster Vision is an open source framework for Python developers building computer vision models on satellite, aerial, and other large imagery sets (including oblique drone imagery). There is built-in support for chip classification, object detection, and semantic segmentation using PyTorch. Raster Vision allows engineers to quickly and repeatably configure pipelines that go through core components of a machine learning workflow: analyzing training data, creating training chips, training...

Downloads: 0 This Week

Last Update: 2024-08-30

See Project

ktrain

ktrain is a Python library that makes deep learning AI more accessible

ktrain is a Python library that makes deep learning and AI more accessible and easier to apply. ktrain is a lightweight wrapper for the deep learning library TensorFlow Keras (and other libraries) to help build, train, and deploy neural networks and other machine learning models. Inspired by ML framework extensions like fastai and ludwig, ktrain is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners. With only a few lines...

Downloads: 0 This Week

Last Update: 2024-06-19

See Project

DeepSpeed MII

MII makes low-latency and high-throughput inference possible

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed. The Deep Learning (DL) open-source community has seen tremendous growth in the last few months. Incredibly powerful text generation models such as the Bloom 176B, or image generation model such as Stable Diffusion are now available to anyone with access to a handful or even a single GPU through platforms such as Hugging Face. While open-sourcing has democratized access to AI capabilities, their application is still restricted by two critical factors: inference latency and cost. ...

Downloads: 0 This Week

Last Update: 2025-03-25

See Project

Jina

Build cross-modal and multimodal applications on the cloud

...Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP, GraphQL protocols with TLS. Intuitive design pattern for high-performance microservices. Seamless Docker container integration: sharing, exploring, sandboxing, versioning and dependency control via Jina Hub. Fast deployment to Kubernetes, Docker Compose and Jina Cloud. ...

Downloads: 0 This Week

Last Update: 2024-11-12

See Project

Deep Daze

Simple command line tool for text to image generation

Simple command-line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). In true deep learning fashion, more layers will yield better results. Default is at 16, but can be increased to 32 depending on your resources. Technique first devised and shared by Mario Klingemann, it allows you to prime the generator network with a starting image, before being steered towards the text.

Downloads: 0 This Week

Last Update: 2022-03-13

See Project

Interactive Deep Colorization

Deep learning software for colorizing black and white images

Interactive Deep Colorization is a software project for colorizing black-and-white (grayscale) images using deep learning, allowing users to add a few hints (e.g. scribbles) and get a plausible, fully colorized output. The idea is to merge automatic colorization (via neural networks) with optional user guidance — so if the automatic model’s guess isn’t quite right, the user can nudge colors via hints to steer the result, achieving more controlled, satisfying outputs. The project includes...

Downloads: 0 This Week

Last Update: 2025-12-09

See Project

PaddlePaddle models

Pre-trained and Reproduced Deep Learning Models

Pre-trained and Reproduced Deep Learning Models ("Flying Paddle" official model library, including a variety of academic frontier and industrial scene verification of deep learning models) Flying Paddle's industrial-level model library includes a large number of mainstream models that have been polished by industrial practice for a long time and models that have won championships in international competitions; it provides many scenarios for semantic understanding, image classification, target detection, image segmentation, text recognition, speech synthesis, etc. An end-to-end development kit that meets the needs of enterprises for low-cost development and rapid integration. The model library of Flying Paddle is an industrial-level model library tailored around the actual R&D process of domestic enterprises, serving enterprises in many fields such as energy, finance, industry, and agriculture.

Downloads: 0 This Week

Last Update: 2022-08-01

See Project

Consistent Depth

We estimate dense, flicker-free, geometrically consistent depth

...The system builds upon traditional structure-from-motion (SfM) techniques to provide geometric constraints while integrating a convolutional neural network trained for single-image depth estimation. During inference, the model fine-tunes itself to align with the geometric constraints of a specific input video, ensuring stable and realistic depth maps even in less-constrained regions. This approach achieves improved geometric consistency and visual stability compared to prior monocular reconstruction methods. ...

Downloads: 0 This Week

Last Update: 5 days ago

See Project

Tensor2Tensor

Library of deep learning models and datasets

Deep Learning (DL) has enabled the rapid advancement of many useful technologies, such as machine translation, speech recognition and object detection. In the research community, one can find code open-sourced by the authors to help in replicating their results and further advancing deep learning. However, most of these DL systems use unique setups that require significant engineering effort and may only work for a specific problem or architecture, making it hard to run new experiments and...

Downloads: 0 This Week

Last Update: 2021-05-24

See Project

Search Results for "image text input"

Showing 14 open source projects for "image text input"

Transformers

ImageBind

AutoGluon

DocArray

Make-A-Video - Pytorch (wip)

Raster Vision

ktrain

DeepSpeed MII

Jina

Deep Daze

Interactive Deep Colorization

PaddlePaddle models

Consistent Depth

Tensor2Tensor

Search Results for "image text input"

Showing 14 open source projects for "image text input"

Transformers

ImageBind

AutoGluon

DocArray

Make-A-Video - Pytorch (wip)

Raster Vision

ktrain

DeepSpeed MII

Jina

Deep Daze

Interactive Deep Colorization

PaddlePaddle models

Consistent Depth

Tensor2Tensor

Related Searches

Related Categories