Showing 498 open source projects for "text to"

View related business solutions
  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 1
    DeerFlow

    DeerFlow

    Deep Research framework, combining language models with tools

    DeerFlow is an open-source, community-driven “deep research” framework / multi-agent orchestration platform developed by ByteDance. It aims to combine the reasoning power of large language models (LLMs) with automated tool-use — such as web search, web crawling, Python execution, and data processing — to enable complex, end-to-end research workflows. Instead of a monolithic AI assistant, DeerFlow defines multiple specialized agents (e.g. “planner,” “searcher,” “coder,” “report generator”)...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    ticket

    ticket

    Fast, powerful, git-native ticket tracking in a single bash script

    ticket is a lightweight, git-native ticket management tool implemented as a single Bash script that brings powerful issue tracking directly into your Git workflows without requiring a database or complex setup. It stores each ticket as a Markdown file with YAML frontmatter, making them human-readable and easy to version control alongside your code, while also allowing IDEs to jump straight to ticket definitions. The CLI provides common subcommands to create, list, edit, close, and manage...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    HumanEval

    HumanEval

    Code for the paper "Evaluating Large Language Models Trained on Code"

    human-eval is a benchmark dataset and evaluation framework created by OpenAI for measuring the ability of language models to generate correct code. It consists of hand-written programming problems with unit tests, designed to assess functional correctness rather than superficial metrics like text similarity. Each task includes a natural language prompt and a function signature, requiring the model to generate an implementation that passes all provided tests. The benchmark has become a standard for evaluating code generation models, including those in the Codex and GPT families. Researchers can use the dataset to run reproducible comparisons across models and track improvements in functional code synthesis. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    GPT-2

    GPT-2

    Code for the paper Language Models are Unsupervised Multitask Learners

    This repository contains the code and model weights for GPT-2, a large-scale unsupervised language model described in the OpenAI paper “Language Models are Unsupervised Multitask Learners.” The intent is to provide a starting point for researchers and engineers to experiment with GPT-2: generate text, fine‐tune on custom datasets, explore model behavior, or study its internal phenomena. The repository includes scripts for sampling, training, downloading pre-trained models, and utilities for tokenization and model handling. Support for memory-saving gradient techniques/optimizations during training. Sampling/generation scripts (conditional, unconditional, interactive).
    Downloads: 10 This Week
    Last Update:
    See Project
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • 5
    1D Visual Tokenization and Generation

    1D Visual Tokenization and Generation

    This repo contains the code for 1D tokenizer and generator

    ...This compact representation makes sampling and generation many times faster compared to previous tokenization methods (claims of ~410× speedups relative to heavyweight models) while still producing competitive image quality. The repo also bundles a full generative modeling pipeline (e.g. the framework “MaskGen” / “TA-TiTok”) that demonstrates how this 1D tokenizer can be used in text-to-image generation or image reconstruction tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    NVIDIA NeMo Framework

    NVIDIA NeMo Framework

    Scalable generative AI framework built for researchers and developers

    NVIDIA NeMo is a scalable, cloud-native generative AI framework aimed at researchers and PyTorch developers working on large language models, multimodal models, and speech AI (ASR and TTS), with growing support for computer vision. It provides collections of domain-specific modules and reference implementations that make it easier to pre-train, fine-tune, and deploy very large models on multi-GPU and multi-node infrastructure. NeMo 2.0 introduces a Python-based configuration system,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Qwen2-Audio

    Qwen2-Audio

    Repo of Qwen2-Audio chat & pretrained large audio language model

    ...It is trained to accept various audio signal inputs (including speech, sounds, etc.) and perform both voice chat and audio analysis, producing textual responses. It supports two major modes: Voice Chat (interactive voice only input) and Audio Analysis (audio + text instructions), with both base and instruction-tuned models. It is evaluated on many benchmarks (speech recognition, translation, sound classification, emotion, etc.), and offers pretrained models (e.g. 7B) released via ModelScope and Hugging Face. Code & examples provided with Hugging Face transformers, and usage via AutoProcessor, model classes etc. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    TextAttack

    TextAttack

    Python framework for adversarial attacks, and data augmentation

    Generating adversarial examples for NLP models. TextAttack is a Python framework for adversarial attacks, data augmentation, and model training in NLP.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    MCP Server OpenDAL

    MCP Server OpenDAL

    Model Context Protocol Server for Apache OpenDAL™

    Model Context Protocol Server for Apache OpenDAL™ is an MCP server implementation that provides access to various storage services via Apache OpenDAL. It enables seamless interactions with multiple storage backends through a unified interface. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • Desktop and Mobile Device Management Software Icon
    Desktop and Mobile Device Management Software

    It's a modern take on desktop management that can be scaled as per organizational needs.

    Desktop Central is a unified endpoint management (UEM) solution that helps in managing servers, laptops, desktops, smartphones, and tablets from a central location.
    Learn More
  • 10
    MindNLP

    MindNLP

    Easy-to-use and high-performance NLP and LLM framework

    MindNLP is a natural language processing library built on the MindSpore framework, providing tools and models for various NLP tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    LightAutoML

    LightAutoML

    Fast and customizable framework for automatic ML model creation

    LightAutoML is an automated machine learning (AutoML) framework optimized for efficient model training and hyperparameter tuning, focusing on both tabular and text data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Evidently

    Evidently

    Evaluate and monitor ML models from validation to production

    Evidently is an open-source Python library for data scientists and ML engineers. It helps evaluate, test, and monitor ML models from validation to production. It works with tabular, text data and embeddings.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Android Use

    Android Use

    Automate native Android apps with AI using accessibility APIs

    ...This approach bypasses expensive vision-based models and provides faster, cheaper automation with fine-grained interaction capabilities (for example, tapping buttons, typing text, navigating screens).
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    MegaTTS 3

    MegaTTS 3

    Official PyTorch Implementation

    MegaTTS3 is an open-source text-to-speech (TTS) and voice-cloning system from ByteDance that aims to deliver high-quality, expressive speech synthesis, including zero-shot voice cloning of previously unseen speakers. Its backbone is a lightweight diffusion-transformer (on the order of ~0.45 B parameters), which enables efficient inference while still producing high-fidelity audio.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    Unstructured.IO

    Unstructured.IO

    Open source libraries and APIs to build custom preprocessing pipelines

    The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular bricks and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and is efficient in transforming unstructured data into structured outputs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    ...It integrates a latent-space audio encoder, discrete acoustic tokens, and reinforcement-learning–based training (CoT + RL) to enhance its ability to capture and reproduce voice styles, intonations, and subtle vocal cues. Moreover, Step-Audio2 supports tool-calling and retrieval-augmented generation (RAG), allowing it to access external knowledge sources or audio/text databases, thus reducing hallucinations and improving coherence in complex dialogues.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    Vidi is a family of large multimodal models developed for deep video understanding and editing tasks, integrating vision, audio, and language to allow sophisticated querying and manipulation of video content. It’s designed to process long-form, real-world videos and answer complex queries such as “when in this clip does X happen?” or “where in the frame is object Y during that moment?” — offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    files-to-prompt

    files-to-prompt

    Concatenate a directory full of files into a single prompt

    ...It includes rich filtering controls, letting you limit by extension, include or skip hidden files, and ignore paths that match glob patterns or .gitignore rules. The output format is flexible: you can emit plain text, Markdown with fenced code blocks, or a Claude-XML style format designed for structured multi-file prompts. It can read file paths from stdin (including NUL-separated paths), which makes it easy to combine with find, rg, or other shell tools.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Milvus Bootcamp

    Milvus Bootcamp

    Dealing with all unstructured data, such as reverse image search

    Milvus Bootcamp is a collection of tutorials, examples, and best practices for using Milvus, an open-source vector database designed for AI-powered similarity search and retrieval applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Alibi Detect

    Alibi Detect

    Algorithms for outlier, adversarial and drift detection

    Alibi Detect is an open source Python library focused on outlier, adversarial and drift detection. The package aims to cover both online and offline detectors for tabular data, text, images and time series. Both TensorFlow and PyTorch backends are supported for drift detection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    DFlash

    DFlash

    Block Diffusion for Ultra-Fast Speculative Decoding

    DFlash is an open-source framework for ultra-fast speculative decoding using a lightweight block diffusion model to draft text in parallel with a target large language model, dramatically improving inference speed without sacrificing generation quality. It acts as a “drafter” that proposes likely continuations which the main model then verifies, enabling significant throughput gains compared to traditional autoregressive decoding methods that generate token by token.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    PPTAgent

    PPTAgent

    PPTAgent: Generating and Evaluating Presentations

    PPTAgent is a research system for generating and evaluating slide decks that goes beyond simple text-to-slides. It follows a two-stage, edit-based workflow: first it analyzes reference presentations to infer slide roles and structure, then it drafts an outline and iteratively performs editing actions to produce new slides. The project includes both the generation agent and an evaluation framework, PPTEval, to score content quality, design, and coherence.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Self-Operating Computer

    Self-Operating Computer

    A framework to enable multimodal models to operate a computer

    The Self-Operating Computer Framework is an innovative system that enables multimodal models to autonomously operate a computer by interpreting the screen and executing mouse and keyboard actions to achieve specified objectives. This framework is compatible with various multimodal models and currently integrates with GPT-4o, o1, Gemini Pro Vision, Claude 3, and LLaVa. Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen....
    Downloads: 5 This Week
    Last Update:
    See Project
  • 24
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    ...Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one system, enabling developers to build rich, multimodal audio applications without stitching together disparate components. It uses a novel model setup that combines continuous acoustic features with discrete semantic tokens to richly capture sound and meaning across speech, music, and environmental audio.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Acontext

    Acontext

    Context data platform for building observable, self-learning AI agents

    Acontext is a cloud-native context data platform designed to support the development and operation of advanced AI agents. It provides a unified system to store and manage contexts, multimodal messages, artifacts, and task workflows, enabling developers to engineer context effectively for their agent products. The platform observes agent tasks and user feedback in real time, offering robust observability into workflows and helping teams understand how agents perform over time. Acontext also...
    Downloads: 3 This Week
    Last Update:
    See Project