Showing 41 open source projects for "image text input"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    CogVideo

    CogVideo

    text and image to video generation: CogVideoX (2024) and CogVideo

    CogVideo is an open source text-/image-/video-to-video generation project that hosts the CogVideoX family of diffusion-transformer models and end-to-end tooling. The repo includes SAT and Diffusers implementations, turnkey demos, and fine-tuning pipelines (including LoRA) designed to run across a wide range of NVIDIA GPUs, from desktop cards (e.g., RTX 3060) to data-center hardware (A100/H100).
    Downloads: 19 This Week
    Last Update:
    See Project
  • 2
    ImageReward

    ImageReward

    [NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences

    ImageReward is the first general-purpose human preference reward model (RM) designed for evaluating text-to-image generation, introduced alongside the NeurIPS 2023 paper ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation. Trained on 137k expert-annotated image pairs, ImageReward significantly outperforms existing scoring methods like CLIP, Aesthetic, and BLIP in capturing human visual preferences. It is provided as a Python package (image-reward) that enables quick scoring of generated images against textual prompts, with APIs for ranking, scoring, and filtering outputs. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Eisvogel

    Eisvogel

    A pandoc LaTeX template to convert markdown files to PDF or LaTeX

    A clean pandoc LaTeX template to convert your markdown files to PDF or LaTeX. It is designed for lecture notes and exercises with a focus on computer science. The template is compatible with Pandoc 3. Alternatively, if you don't want to install LaTeX, you can use the Docker image named pandoc/extra. The image contains pandoc, LaTeX, and a curated selection of components such as the eisvogel template, pandoc filters, and open source fonts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Everywhere

    Everywhere

    Context-aware desktop AI assistant that understands screen content

    Everywhere is a context-aware desktop AI assistant designed to interact directly with the content displayed on a user’s screen. It distinguishes itself from traditional AI tools by eliminating the need for manual input methods such as copying text or taking screenshots, instead allowing users to invoke assistance instantly through a shortcut. It can analyze on-screen information in real time and provide contextual responses, making it useful for tasks like troubleshooting errors, summarizing articles, translating text, and refining written content. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 5
    VisualGLM-6B

    VisualGLM-6B

    Chinese and English multimodal conversational language model

    VisualGLM-6B is an open-source multimodal conversational language model developed by ZhipuAI that supports both images and text in Chinese and English. It builds on the ChatGLM-6B backbone, with 6.2 billion language parameters, and incorporates a BLIP2-Qformer visual module to connect vision and language. In total, the model has 7.8 billion parameters. Trained on a large bilingual dataset — including 30 million high-quality Chinese image-text pairs from CogView and 300 million English pairs — VisualGLM-6B is designed for image understanding, description, and question answering. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Generative AI for Beginners (Version 3)

    Generative AI for Beginners (Version 3)

    21 Lessons, Get Started Building with Generative AI

    ...Lessons are split into “Learn” modules for core concepts and “Build” modules with hands-on code in Python and TypeScript, so you can jump in at any point that matches your goals. The course covers everything from model selection, prompt engineering, and chat/text/image app patterns to secure development practices and UX for AI. It also walks through modern application techniques such as function calling, RAG with vector databases, working with open source models, agents, fine-tuning, and using SLMs. Each lesson includes a short video, a written guide, runnable samples for Azure OpenAI, the GitHub Marketplace Model Catalog, and the OpenAI API, plus a “Keep Learning” section for deeper study.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Mesh R-CNN

    Mesh R-CNN

    code for Mesh R-CNN, ICCV 2019

    ...Unlike voxel-based or point-based approaches, Mesh R-CNN uses a differentiable mesh representation, allowing it to efficiently refine surface geometry while maintaining high spatial detail. The system combines 2D detection from Mask R-CNN with 3D reasoning modules that output full mesh reconstructions aligned with the input image. It has been evaluated on datasets such as Pix3D, where it demonstrates state-of-the-art performance in reconstructing real-world object geometry.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Rig

    Rig

    Rust framework for building modular and scalable LLM-powered apps

    ...Rig includes built-in support for agent workflows, allowing systems to perform multi-turn reasoning, tool calling, and retrieval-based tasks within structured pipelines. It also supports capabilities such as text generation, embeddings, transcription, image generation, and audio generation depending on the provider used. Developers can integrate language models into their software with minimal boilerplate while maintaining flexibility for complex AI workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    RuoYi AI

    RuoYi AI

    Enterprise AI platform for building, deploying, and managing apps

    RuoYi AI is a full-stack enterprise-oriented AI development platform designed to help developers rapidly build, deploy, and manage intelligent applications using modern large language models and AI ecosystems. It provides a unified framework for integrating multiple AI models from different providers, allowing teams to switch or combine models through a consistent interface without vendor lock-in. RuoYi AI includes built-in support for retrieval-augmented generation, enabling organizations...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    AI File Sorter

    AI File Sorter

    Local AI file organization with categorization and rename suggestions

    AI File Sorter is a cross-platform desktop application that uses AI (local LLMs run on your computer) to organize files and suggest meaningful file names based on real content, not just filenames or extensions. The app can analyze images locally and propose descriptive rename suggestions (for example, IMG_2048.jpg → clouds_over_lake.jpg). It can also analyze document text to improve categorization and renaming. Supported formats include PDF, DOCX, XLSX, PPTX, ODT, ODS, ODP, and common...
    Downloads: 218 This Week
    Last Update:
    See Project
  • 11
    bldndx

    bldndx

    Create webpages for very large collections of images and descriptions

    Bldndx helps organize and describe large collections of images. It is great for preserving information about images for others to read and enjoy. Bldndx keeps descriptions in a text file that it uses to create an html file with webpages about the image collections. A web browser is ideal for viewing images next to their descriptions. The webpage keeps medium resolution images with a title added to the top and has links to high resolution images for added detail. The octfont reviews serve as a question and answer area for bldndx and other josephms sourceforge projects and are regularly checked. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Snowmix

    Snowmix

    Video mixer for mixing live and recorded video and audio feeds

    ...Control over both CLI and a TCP connections. Video input and outputs can be done through GStreamer pipelines or the GStreamer shmsrc/shmsink API. Supported for Ubuntu, Mint, Debian, Alma, CentOS, Fedora, Rocky, Mageia, Manjaro, MX Linux, OpenSUSE, EndeavourOS and macOS/OS X. Free support in the discussion forum. See Snowmix in action on Youtube http://www.youtube.com/user/Snowmix4video
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    LPub3D

    LPub3D

    LDraw™ editor for LEGO® style digital building instructions.

    LPub3D is an Open Source WYSIWYG editing application for creating LEGO® style digital building instructions. LPub3D is developed and maintained by Trevor SANDY. It uses the LDraw™ parts library, the most comprehensive library of digital Open Source LEGO® bricks available (www.ldraw.org/ ) and reads the LDraw LDR and MPD model file formats. LPub3D is available for free under the GNU Public License v3 and runs on Windows, Linux and macOS Operating Systems. LPub3D is also...
    Leader badge
    Downloads: 30 This Week
    Last Update:
    See Project
  • 14
    CogView

    CogView

    Text-to-Image generation. The repo for NeurIPS 2021 paper

    CogView is a large-scale pretrained text-to-image transformer model, introduced in the NeurIPS 2021 paper CogView: Mastering Text-to-Image Generation via Transformers. With 4 billion parameters, it was one of the earliest transformer-based models to successfully generate high-quality images from natural language descriptions in Chinese, with partial support for English via translation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    command-output-to-html-table

    command-output-to-html-table

    A shell script to convert any file or command output into a html table

    Please watch the video below, to convert any file or a command output into a nice html table, in less than 5 Minutes time. The output html file can then be browsed from any location, using a local webserver or an internet www domain. Usage Examples: (Type them on Terminal) cd ~/Downloads/tabulate # location chmod +x *.sh cat "student_marks.csv" | { cat ; echo ; } | ./tabulate.sh -d "," -t "My School" -h "First Term" > "marks.html" # or > "/var/www/html/marks.html" -d specifies...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 16
    Consistent Depth

    Consistent Depth

    We estimate dense, flicker-free, geometrically consistent depth

    ...The system builds upon traditional structure-from-motion (SfM) techniques to provide geometric constraints while integrating a convolutional neural network trained for single-image depth estimation. During inference, the model fine-tunes itself to align with the geometric constraints of a specific input video, ensuring stable and realistic depth maps even in less-constrained regions. This approach achieves improved geometric consistency and visual stability compared to prior monocular reconstruction methods. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    LaTeX Reference Card Creator

    LaTeX Reference Card Creator

    A Makefile based build system for creating LaTeX reference cards

    LaTeX Reference Card Creator is a Makefile based build system for creating reference cards. LaTeX Reference Card Creator compiles content into PDF, DjVu, TEX DVI, HTML and PostScript output formats. A three column reference card will be created. Features include batch image format conversions, spell checking, broken link checking, automatic backups and .zip and .tar.gz distribution building. LaTeX Reference Card Creator provides many LaTeX examples which can be used to make a reference card.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    PWMScan

    PWMScan

    A Web-based genome-wide Position Weight Matrix (PWM) Scanner

    ...The TF binding score for a given k-mer sequence is then obtained by simply adding up the base-specific scores at respective positions of the binding site. PWMScan takes as input a PWM, the background probabilities for the letters of the DNA alphabet, and a threshold score or a p-value. The search is carried out across the entire genome sequence. It can accept PWMs, such as those available in the Transfac or Jaspar databases as well as plain-text PWMs. It computes all occurrences of the PWM in the genome sequence for a given p-value threshold or cut-off. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 19
    Create Website From Text File

    Create Website From Text File

    A Website Builder / Creator Script to Create Website From a Text File

    Please watch the video below to see this website builder / creator script in action. in less than 5 Minutes, you can create a website / webpage, from a text file, using the script enclosed, in the zip file downloaded from above. A custom Puppy Linux Operating System has been created for this purpose of running the script on various client computers. You can download it here : https://sourceforge.net/projects/command-output-to-html-table/files/OS/ Wherever possible, give...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    aioulinux

    aioulinux

    Linux for Arduino and Makers developers

    Hello, I'm the Aioulinux founder, eager to professionally revive the project. Since 2018, the demand for an IoT and Arduino-tailored environment has been evident. Seeking partners for a 2024 version targeting schools and IoT companies, aiming for a secure and comprehensive platform. If you share this vision and wish to collaborate, reach out. Let's revive Aioulinux stronger than ever! Now seeking partners: Live Distro Specialist: Expert in live distributions to ensure...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Emacs Anywhere

    Emacs Anywhere

    Configurable automation + hooks called with application information

    Emacs Anywhere provides configurable automation and hooks containing window info, so you can bust moves anywhere in a quick, customizable fashion. Open System Preferences and navigate to keyboard > shortcuts > Services. Check the box beside "Emacs Anywhere", click "Add Shortcut" and key a shortcut. In order to use Emacs Anywhere, you must use Xorg as your window manager. You can switch your window manager in Ubuntu by going to the login screen, clicking the cog icon, and selecting Xorg.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Bash Shell Scripting in a Minute

    Bash Shell Scripting in a Minute

    Learn Bash Shell Scripting in a Minute using this Collection of Script

    Learn Bash Shell Scripting in a Minute using this Collection of Scripts, which are Real World Examples, ranging from a simple Date Based Theme to a Standalone Kiosk Type User Input Forms. The Learning is based on a "Observe and Understand Strategy", which means you run these scripts, observe their outputs, understand the script that generates them and learn the whole thing. These Scripts cover almost all the basics of bash shell scripting and summarizes them in the last script for user...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Antorca
    A performance and usability focused Linux distribution based on 64-bit Debian testing. It is the successor to illume OS. To use the live ISO image, login to "root" with the password "antorca".
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Expose

    Expose

    A simple static site generator for photoessays

    ...Implemented as a Bash script, it converts directories of media files into cleanly structured static websites with built-in themes. By default, it includes both a blog-style layout and a Medium-inspired theme, but users can also build their own templates. Expose reads associated text files, YAML metadata, and folder structures to automatically generate navigation menus, captions, and styling for each gallery. It supports image and video customization through ImageMagick and FFmpeg, enabling batch effects, filters, watermarks, and even video stabilization. With minimal setup, users can transform raw media collections into polished photoessays suitable for personal portfolios, storytelling, or lightweight publishing.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    update hosts

    update hosts

    The latest available google hosts file. Domestic mirror image

    It is recommended to use the application to automatically obtain the latest hosts file. Open it with a text editor (such as Notepad++ ) Copy the entire content of hosts to the above file and save it. When manually replacing hosts, it is recommended to clear the original content of hosts and then perform a copy operation. After replacing the hosts file, the relevant records may not take effect immediately. You can turn off and turn on the network, or enable or disable the airplane mode to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB