Showing 104 open source projects for "open source png text"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    text-extract-api

    text-extract-api

    Document (PDF, Word, PPTX ...) extraction and parse API

    text-extract-api is an open-source service designed to extract readable text from a wide variety of document formats through a simple API interface. The project focuses on converting complex files such as PDFs, images, scanned documents, and office files into structured plain text that can be processed by downstream applications or language models.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Text-to-LoRA (T2L)

    Text-to-LoRA (T2L)

    Hypernetworks that adapt LLMs for specific benchmark tasks

    Text-to-LoRA is a research project that introduces a method for dynamically adapting large language models using hypernetworks that generate LoRA parameters directly from textual descriptions. Instead of training a new LoRA adapter for every task or dataset, the system can produce task-specific adaptations based solely on a text description of the desired capability. This approach enables models to rapidly internalize new contextual knowledge without performing traditional fine-tuning steps....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    LLaMA-Mesh

    LLaMA-Mesh

    Unifying 3D Mesh Generation with Language Models

    LLaMA-Mesh is a research framework that extends large language models so they can understand and generate 3D mesh data alongside text. The system introduces a method for representing 3D meshes in a textual format by encoding vertex coordinates and face definitions as sequences that can be processed by a language model. By serializing 3D geometry into text tokens, the approach allows existing transformer architectures to generate and interpret 3D models without requiring specialized visual...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Qwen3-Omni

    Qwen3-Omni

    Qwen3-omni is a natively end-to-end, omni-modal LLM

    ...It achieves state-of-the-art results: across 36 audio and audio-visual benchmarks, it hits open-source SOTA on 32 and overall SOTA on 22, outperforming or matching strong closed-source models such as Gemini-2.5 Pro and GPT-4o. To reduce latency, especially in audio/video streaming, Talker predicts discrete speech codecs via a multi-codebook scheme and replaces heavier diffusion approaches.
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    LLM-Aided OCR Project

    LLM-Aided OCR Project

    Enhances Tesseract OCR output using LLMs (local or API)

    LLM Aided OCR is an open-source system designed to improve optical character recognition accuracy by combining traditional OCR tools with large language models. The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines. The system first extracts raw text using OCR engines and then applies language models to analyze and correct recognition errors based on context. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    Qwen-Image

    Qwen-Image

    Qwen-Image is a powerful image generation foundation model

    Qwen-Image is a powerful 20-billion parameter foundation model designed for advanced image generation and precise editing, with a particular strength in complex text rendering across diverse languages, especially Chinese. Built on the MMDiT architecture, it achieves remarkable fidelity in integrating text seamlessly into images while preserving typographic details and layout coherence. The model excels not only in text rendering but also in a wide range of artistic styles, including...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    Qwen3 Embedding

    Qwen3 Embedding

    Designed for text embedding and ranking tasks

    Qwen3-Embedding is a model series from the Qwen family designed specifically for text embedding and ranking tasks. It builds upon the Qwen3 base/dense models and offers several sizes (0.6B, 4B, 8B parameters), for both embedding and reranking, with high multilingual capability, long‐context understanding, and reasoning. It achieves state-of-the-art performance on benchmarks like MTEB (Multilingual Text Embedding Benchmark) and supports instruction-aware embedding (i.e. embedding task...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    kg-gen

    kg-gen

    Knowledge Graph Generation from Any Text

    kg-gen is an open-source framework developed by the STAIR Lab that automatically generates knowledge graphs from unstructured text using large language models. The system is designed to transform plain text sources such as documents, articles, or conversation transcripts into structured graphs composed of entities and relationships. Instead of relying on traditional rule-based extraction techniques, KG-Gen uses language models to identify entities and their relationships, producing higher-quality graph structures from raw text.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    vim-ai

    vim-ai

    AI-powered code assistant for Vim. OpenAI and ChatGPT plugin for Vim

    vim-ai is an AI-powered assistant plugin for Vim and Neovim that brings language-model features directly into the editor. It allows users to generate code or text, edit selections in place, and carry on interactive chat-style conversations without leaving the terminal editing environment. The plugin is built around OpenAI-compatible APIs, which means it can work not only with OpenAI itself but also with compatible proxies and alternative providers. Its command set covers text completion,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    NarratoAI

    NarratoAI

    Using AI models to automatically provide commentary and edit videos

    NarratoAI is an open-source platform designed to automate the generation of narrative content using artificial intelligence. The system combines large language models with media processing capabilities to create scripts, stories, and structured narrative outputs from user inputs. NarratoAI supports workflows where users provide prompts, themes, or source materials, and the software organizes them into coherent narrative structures suitable for articles, scripts, or multimedia storytelling. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    SkillOpt

    SkillOpt

    Text-space optimizer that trains reusable natural-language skills

    SkillOpt is a Microsoft research project for improving frozen LLM agents by optimizing reusable natural-language skill documents. Instead of changing model weights, it treats a compact skill file as the trainable state of the agent. The system learns from agent rollouts, reflection, bounded edits, and validation gates to produce better instructions over time. Its output is a deployable best_skill.md artifact that can be reused across agent tasks. The project is focused on making agents more...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    Qwen-2.5-VL

    Qwen-2.5-VL

    Qwen2.5-VL is the multimodal large language model series

    ...The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation (exceeding 8,000 tokens), and structured data comprehension, such as tables and JSON formats. They support context lengths up to 128,000 tokens and offer multilingual capabilities in over 29 languages, including Chinese, English, French, Spanish, and more. The models are open-source under the Apache 2.0 license, with resources and documentation available on platforms like Hugging Face and ModelScope.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    NExT-GPT

    NExT-GPT

    Code and models for ICML 2024 paper, NExT-GPT

    NExT-GPT is an open-source research framework that implements an advanced multimodal large language model capable of understanding and generating content across multiple modalities. Unlike traditional models that primarily handle text, NExT-GPT supports input and output combinations involving text, images, video, and audio in a unified architecture.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    GLM-4-Voice

    GLM-4-Voice

    GLM-4-Voice | End-to-End Chinese-English Conversational Model

    GLM-4-Voice is an open-source speech-enabled model from ZhipuAI, extending the GLM-4 family into the audio domain. It integrates advanced voice recognition and generation with the multimodal reasoning capabilities of GLM-4, enabling smooth natural interaction via spoken input and output. The model supports real-time speech-to-text transcription, spoken dialogue understanding, and text-to-speech synthesis, making it suitable for conversational AI, virtual assistants, and accessibility applications. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    StarVector

    StarVector

    StarVector is a foundation model for SVG generation

    StarVector is a multimodal foundation model designed for generating Scalable Vector Graphics (SVG) from images or textual descriptions. The system treats vector graphics creation as a code generation problem, producing SVG code that can render detailed vector images. Its architecture combines computer vision techniques with language modeling capabilities so it can understand visual inputs and textual prompts simultaneously. The model converts raster images or text instructions into...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Train LLM From Scratch

    Train LLM From Scratch

    A straightforward method for training your LLM

    Train LLM From Scratch is an educational PyTorch project that shows how to build and train a transformer-based language model from the ground up. It is based on the architecture described in Attention Is All You Need and is designed to make the training pipeline understandable rather than hidden behind a large framework. The repository walks through the process from downloading data to generating text with a trained model. It supports training smaller or larger models, including million- and...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 17
    LlamaGen

    LlamaGen

    Autoregressive Model Beats Diffusion

    LlamaGen is an open-source research project that introduces a new approach to image generation by applying the autoregressive next-token prediction paradigm used in large language models to visual generation tasks. Instead of relying on diffusion models, the framework treats images as sequences of tokens that can be generated progressively using transformer architectures similar to those used for text generation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Qwen2.5-Omni

    Qwen2.5-Omni

    Capable of understanding text, audio, vision, video

    Qwen2.5-Omni is an end-to-end multimodal flagship model in the Qwen series by Alibaba Cloud, designed to process multiple modalities (text, images, audio, video) and generate responses both as text and natural speech in streaming real-time. It supports “Thinker-Talker” architecture, and introduces innovations for aligning modalities over time (for example synchronizing video/audio), robust speech generation, and low-VRAM/quantized versions to make usage more accessible. It holds...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    LiteLLM

    LiteLLM

    lightweight package to simplify LLM API calls

    Call all LLM APIs using the OpenAI format [Anthropic, Huggingface, Cohere, Azure OpenAI etc.] liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response. Streaming is supported for OpenAI, Azure, Anthropic, and Huggingface models.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 21
    LongWriter

    LongWriter

    Unleashing 10,000+ Word Generation from Long Context LLMs

    LongWriter is an open-source framework and set of large language models designed to enable ultra-long text generation that can exceed 10,000 words while maintaining coherence and structure. Traditional large language models can process large inputs but often struggle to generate long outputs due to limitations in training data and alignment strategies.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 22
    CogView4

    CogView4

    CogView4, CogView3-Plus and CogView3(ECCV 2024)

    CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets, enabling stronger alignment between textual prompts and generated visual content. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    MarkPDFDown

    MarkPDFDown

    A high-quality PDF to Markdown tool based on large language model

    MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Kor

    Kor

    LLM

    This is a half-baked prototype that “helps” you extract structured data from text using LLMs. Specify the schema of what should be extracted and provide some examples. Kor will generate a prompt, send it to the specified LLM and parse out the output. You might even get results back.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    MiniMax-01

    MiniMax-01

    Large-language-model & vision-language-model based on Linear Attention

    MiniMax-01 is the official repository for two flagship models: MiniMax-Text-01, a long-context language model, and MiniMax-VL-01, a vision-language model built on top of it. MiniMax-Text-01 uses a hybrid attention architecture that blends Lightning Attention, standard softmax attention, and Mixture-of-Experts (MoE) routing to achieve both high throughput and long-context reasoning. It has 456 billion total parameters with 45.9 billion activated per token and is trained with advanced parallel...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next