Showing 633 open source projects for "python text"

View related business solutions
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 1
    ChatGPT Academic

    ChatGPT Academic

    ChatGPT extension for scientific research work

    ChatGPT extension for scientific research work, specially optimized academic paper polishing experience, supports custom shortcut buttons, supports custom function plug-ins, supports markdown table display, double display of Tex formulas, complete code display function, new local Python/C++/Go project tree Analysis function/Project source code self-translation ability, newly added PDF and Word document batch summary function/PDF paper full-text translation function. All buttons are dynamically generated by reading functional.py, you can add custom functions at will, and liberate the pasteboard. Support for markdown tables output by GPT. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Self-Operating Computer

    Self-Operating Computer

    A framework to enable multimodal models to operate a computer

    The Self-Operating Computer Framework is an innovative system that enables multimodal models to autonomously operate a computer by interpreting the screen and executing mouse and keyboard actions to achieve specified objectives. This framework is compatible with various multimodal models and currently integrates with GPT-4o, o1, Gemini Pro Vision, Claude 3, and LLaVa. Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen....
    Downloads: 9 This Week
    Last Update:
    See Project
  • 3
    Qwen-Image-Layered

    Qwen-Image-Layered

    Qwen-Image-Layered: Layered Decomposition for Inherent Editablity

    Qwen-Image-Layered is an extension of the Qwen series of multimodal models that introduces layered image understanding, enabling the model to reason about hierarchical visual structures — such as separating foreground, background, objects, and contextual layers within an image. This architecture allows richer semantic interpretation, enabling use cases such as scene decomposition, object-level editing, layered captioning, and more fine-grained multimodal reasoning than with flat image...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    MiniMax-M1

    MiniMax-M1

    Open-weight, large-scale hybrid-attention reasoning model

    MiniMax-M1 is presented as the world’s first open-weight, large-scale hybrid-attention reasoning model, designed to push the frontier of long-context, tool-using, and deeply “thinking” language models. It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling. Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    LangChain

    LangChain

    ⚡ Building applications with LLMs through composability ⚡

    Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge. This library is aimed at assisting in the development of those types of applications.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 6
    1D Visual Tokenization and Generation

    1D Visual Tokenization and Generation

    This repo contains the code for 1D tokenizer and generator

    The 1D Visual Tokenization and Generation project from ByteDance introduces a novel “one-dimensional” tokenizer designed for images: instead of representing images with large grids of 2D tokens (as in many prior generative/image-modeling systems), it compresses images into as few as 32 discrete tokens (or more, optionally) — thereby achieving a very compact, efficient representation that drastically speeds up generation and reconstruction while retaining strong fidelity. This compact...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    TEN

    TEN

    Open-source framework for conversational voice AI agents

    TEN (Transformative Extensions Network) is an open source framework designed to empower developers to build real-time multimodal AI agents capable of voice, video, text, image, and data-stream interaction with ultra-low latency. It includes a full ecosystem, TEN Turn Detection, TEN Agent, and TMAN Designer, allowing developers to rapidly assemble human-like, responsive agents that can see, speak, hear, and interact. With support for languages like Python, C++, and Go, it offers flexible deployment on both edge and cloud environments. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    DocArray

    DocArray

    The data structure for multimodal data

    DocArray is a library for nested, unstructured, multimodal data in transit, including text, image, audio, video, 3D mesh, etc. It allows deep-learning engineers to efficiently process, embed, search, recommend, store, and transfer multimodal data with a Pythonic API. Door to multimodal world: super-expressive data structure for representing complicated/mixed/nested text, image, video, audio, 3D mesh data. The foundation data structure of Jina, CLIP-as-service, DALL·E Flow, DiscoArt etc. Data...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DeepAnalyze

    DeepAnalyze

    Autonomous LLM agent for end-to-end data science workflows

    DeepAnalyze is an open source project that introduces an agentic large language model designed to perform autonomous data science tasks from start to finish. It is built to handle the entire data science pipeline, including data preparation, analysis, modeling, visualization, and report generation without requiring continuous human guidance. DeepAnalyze is capable of conducting open-ended data research across multiple data formats such as structured tables, semi-structured files, and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 10
    HunyuanVideo-Foley

    HunyuanVideo-Foley

    Multimodal Diffusion with Representation Alignment

    HunyuanVideo-Foley is a multimodal diffusion model from Tencent Hunyuan for high-fidelity Foley (sound effects) audio generation synchronized to video scenes. It is designed to generate audio that matches both visual content and textual semantic cues, for use in video production, film, advertising, games, etc. The model architecture aligns audio, video, and text representations to produce realistic synchronized soundtracks. Produces high-quality 48 kHz audio output suitable for professional...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Kimi-Audio

    Kimi-Audio

    Audio foundation model excelling in audio understanding

    Kimi-Audio is an ambitious open-source audio foundation model designed to unify a wide array of audio processing tasks — from speech recognition and audio understanding to generative conversation and sound event classification — within a single cohesive architecture. Instead of fragmenting work across specialized models, Kimi-Audio handles automatic speech recognition (ASR), audio question answering, automatic audio captioning, speech emotion recognition, and audio-to-text chat in one...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    YuE

    YuE

    Open source AI model for generating full songs from lyrics prompts

    YuE is an open source project that provides a foundation model designed for full-song music generation using artificial intelligence. It focuses on transforming text inputs such as lyrics and genre prompts into complete musical compositions that include both vocal and instrumental tracks. Unlike many shorter audio generators, the model is capable of producing songs that last several minutes while maintaining coherent musical structure and alignment with the provided lyrics. YuE introduces a...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    Biomni

    Biomni

    Biomni: a general-purpose biomedical AI agent

    Biomni is a general-purpose biomedical AI agent designed to autonomously perform complex research tasks across a wide range of scientific domains, combining language model reasoning with structured planning and execution. It integrates retrieval-augmented generation with code-based execution, allowing it to access external knowledge, process data, and generate testable hypotheses in scientific workflows. The system is built to support researchers by automating repetitive and time-consuming...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    AI-Researcher

    AI-Researcher

    AI-Researcher: Autonomous Scientific Innovation

    AI-Researcher is an open-source system designed to automate complex research tasks end-to-end using large language models and structured workflows, aiming to replicate parts of a human research assistant’s capabilities. It lets users input high-level research goals or questions in natural language and then automatically plans, decomposes, and executes tasks such as literature surveying, summarization, synthesis, experiment design, and draft generation. The system integrates retrieval...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. It...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Gemma in PyTorch

    Gemma in PyTorch

    The official PyTorch implementation of Google's Gemma models

    gemma_pytorch provides the official PyTorch reference for running and fine-tuning Google’s Gemma family of open models. It includes model definitions, configuration files, and loading utilities for multiple parameter scales, enabling quick evaluation and downstream adaptation. The repository demonstrates text generation pipelines, tokenizer setup, quantization paths, and adapters for low-rank or parameter-efficient fine-tuning. Example notebooks walk through instruction tuning and evaluation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    OpenMLSys-ZH

    OpenMLSys-ZH

    Machine Learning Systems: Design and Implementation

    This repository is the Chinese translation (or localization) of the OpenMLSys project documentation. Its aim is to make the technical content, tutorials, architecture descriptions, and user guides of the OpenMLSys system more accessible to Chinese-speaking users. The repo mirrors the structure of the original OpenMLSys docs: sections on system design, API references, deployment instructions, module overviews, and example workflows. It helps bridge language barriers in open machine learning...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Extractous

    Extractous

    Fast and efficient unstructured data extraction

    Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. For broader format...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    MLX Engine

    MLX Engine

    LM Studio Apple MLX engine

    MLX Engine is the Apple MLX-based inference backend used by LM Studio to run large language models efficiently on Apple Silicon hardware. Built on top of the mlx-lm and mlx-vlm ecosystems, the engine provides a unified architecture capable of supporting both text-only and multimodal models. Its design focuses on high-performance on-device inference, leveraging Apple’s MLX stack to accelerate computation on M-series chips. The project introduces modular VisionAddOn components that allow image...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    OpenAI Harmony

    OpenAI Harmony

    Renderer for the harmony response format to be used with gpt-oss

    Harmony is a response format developed by OpenAI for use with the gpt-oss model series. It defines a structured way for language models to produce outputs, including regular text, reasoning traces, tool calls, and structured data. By mimicking the OpenAI Responses API, Harmony provides developers with a familiar interface while enabling more advanced capabilities such as multiple output channels, instruction hierarchies, and tool namespaces. The format is essential for ensuring gpt-oss...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Cleanlab

    Cleanlab

    The standard data-centric AI package for data quality and ML

    cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    minbpe

    minbpe

    Minimal, clean code for the Byte Pair Encoding (BPE) algorithm

    minbpe is a minimal, clean implementation of byte-level Byte Pair Encoding (BPE), the tokenization approach widely used in modern language models. It operates on UTF-8 encoded bytes rather than Unicode characters, which makes it robust to arbitrary text inputs and avoids needing a language-specific character vocabulary. The repository is structured as a teaching-oriented implementation that shows how to train a tokenizer by learning merge rules, then apply those merges to encode text into...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Agent Sprite Forge

    Agent Sprite Forge

    Agent Skill for generating 2D sprite sheets and map, transparent PNG

    Agent Sprite Forge is an AI-powered asset generation toolkit designed to create 2D game sprites, transparent PNG frames, animated GIFs, and sprite sheets directly from text prompts. The project functions as an “agent skill” that can integrate with coding assistants and AI workflows to automate parts of the game asset creation pipeline. It focuses on generating production-friendly pixel art and animation assets that can be used in indie games, prototypes, and rapid iteration workflows. The...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    myGPTReader

    myGPTReader

    AI Slack bot for reading, summarizing, and chatting with content

    myGPTReader is an AI-powered Slack bot designed to help users read, summarize, and interact with various types of digital content through conversational interfaces. It enables users to quickly understand web pages, documents, and even video content by transforming them into interactive discussions rather than static reading experiences. myGPTReader supports a wide range of file formats, including eBooks, PDFs, and text-based documents, making it flexible for both casual and professional use...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    DreamCraft3D is DeepSeek’s generative 3D modeling framework / model family that likely extends their earlier 3D efforts (e.g. Shap-E or Point-E style models) with more capability, control, or expression. The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or...
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB