Showing 380 open source projects for "python text"

View related business solutions
  • Cloud-based help desk software with ServoDesk Icon
    Cloud-based help desk software with ServoDesk

    Full access to Enterprise features. No credit card required.

    What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.
    Try ServoDesk for free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    GPT-2

    GPT-2

    Code for the paper Language Models are Unsupervised Multitask Learners

    This repository contains the code and model weights for GPT-2, a large-scale unsupervised language model described in the OpenAI paper “Language Models are Unsupervised Multitask Learners.” The intent is to provide a starting point for researchers and engineers to experiment with GPT-2: generate text, fine‐tune on custom datasets, explore model behavior, or study its internal phenomena. The repository includes scripts for sampling, training, downloading pre-trained models, and utilities for...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    segment-geospatial

    segment-geospatial

    A Python package for segmenting geospatial data with the SAM

    The segment-geospatial package draws its inspiration from segment-anything-eo repository authored by Aliaksandr Hancharenka. To facilitate the use of the Segment Anything Model (SAM) for geospatial data, I have developed the segment-anything-py and segment-geospatial Python packages, which are now available on PyPI and conda-forge. My primary objective is to simplify the process of leveraging SAM for geospatial data analysis by enabling users to achieve this with minimal coding effort. I...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    ChatGPT Academic

    ChatGPT Academic

    ChatGPT extension for scientific research work

    ChatGPT extension for scientific research work, specially optimized academic paper polishing experience, supports custom shortcut buttons, supports custom function plug-ins, supports markdown table display, double display of Tex formulas, complete code display function, new local Python/C++/Go project tree Analysis function/Project source code self-translation ability, newly added PDF and Word document batch summary function/PDF paper full-text translation function. All buttons are dynamically generated by reading functional.py, you can add custom functions at will, and liberate the pasteboard. Support for markdown tables output by GPT. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Free and Open Source HR Software Icon
    Free and Open Source HR Software

    OrangeHRM provides a world-class HRIS experience and offers everything you and your team need to be that HR hero you know that you are.

    Give your HR team the tools they need to streamline administrative tasks, support employees, and make informed decisions with the OrangeHRM free and open source HR software.
    Learn More
  • 5
    HumanEval

    HumanEval

    Code for the paper "Evaluating Large Language Models Trained on Code"

    human-eval is a benchmark dataset and evaluation framework created by OpenAI for measuring the ability of language models to generate correct code. It consists of hand-written programming problems with unit tests, designed to assess functional correctness rather than superficial metrics like text similarity. Each task includes a natural language prompt and a function signature, requiring the model to generate an implementation that passes all provided tests. The benchmark has become a...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    PaddleSpeech

    PaddleSpeech

    Easy-to-use Speech Toolkit including Self-Supervised Learning model

    PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with state-of-art and influential models. Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. Low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey. We provide...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Universal Tool Calling Protocol (UTCP)

    Universal Tool Calling Protocol (UTCP)

    Official python implementation of UTCP. UTCP is an open standard

    The python-utcp repository is the official Python SDK implementation of the Universal Tool Calling Protocol (UTCP). UTCP is an open, modern standard designed to let AI agents call any tool or API directly—over HTTP, CLI, WebSocket, gRPC, and more—without the overhead of extra wrapper layers or middleware. It leverages a modular, plugin-based architecture built around Pydantic models and separates the core functionality into a lightweight client and extensible protocol plugins, enabling...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Style Aligned

    Style Aligned

    Official code for Style Aligned Image Generation via Shared Attention

    StyleAligned is a diffusion-model editing technique and codebase that preserves the visual “style” of an original image while applying new semantic edits driven by text. Instead of fully re-generating an image—and risking changes to lighting, texture, or rendering choices—the method aligns internal features across denoising steps so the target edit inherits the source style. This alignment acts like a constraint on the model’s evolution, steering composition, palette, and brushwork even as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Mistral Finetune

    Mistral Finetune

    Memory-efficient and performant finetuning of Mistral's models

    mistral-finetune is an official lightweight codebase designed for memory-efficient and performant finetuning of Mistral’s open models (e.g. 7B, instruct variants). It builds on techniques like LoRA (Low-Rank Adaptation) to allow customizing models without full parameter updates, which reduces GPU memory footprint and training cost. The repo includes utilities for data preprocessing (e.g. reformat_data.py), validation scripts, and example YAML configs for training variants like 7B base or...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Cloud data warehouse to power your data-driven innovation Icon
    Cloud data warehouse to power your data-driven innovation

    BigQuery is a serverless and cost-effective enterprise data warehouse that works across clouds and scales with your data.

    BigQuery Studio provides a single, unified interface for all data practitioners of various coding skills to simplify analytics workflows from data ingestion and preparation to data exploration and visualization to ML model creation and use. It also allows you to use simple SQL to access Vertex AI foundational models directly inside BigQuery for text processing tasks, such as sentiment analysis, entity extraction, and many more without having to deal with specialized models.
    Try for free
  • 10
    DeepSeek Math

    DeepSeek Math

    Pushing the Limits of Mathematical Reasoning in Open Language Models

    DeepSeek-Math is DeepSeek’s specialized model (or dataset + evaluation) focusing on mathematical reasoning, symbolic manipulation, proof steps, and advanced quantitative problem solving. The repository is likely to include fine-tuning routines or task datasets (e.g. MATH, GSM8K, ARB), demonstration notebooks, prompt templates, and evaluation results on math benchmarks. The goal is to push DeepSeek’s performance in domains that require rigorous symbolic steps, calculus, linear algebra, number...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 11
    MiniCPM-o

    MiniCPM-o

    A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

    MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Infinity

    Infinity

    Low-latency REST API for serving text-embeddings

    Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under MIT License. Infinity powers inference behind Gradient.ai and other Embedding API providers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    refinery

    refinery

    Open-source choice to scale, assess and maintain natural language data

    The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact. You are one of the people we've built refinery for. refinery helps you to build better NLP models in a data-centric approach. Semi-automate your labeling, find low-quality subsets in your training data, and monitor your data in one place. refinery doesn't get rid of manual labeling, but it makes sure that your valuable time is spent well. Also, the makers...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Argilla

    Argilla

    The open-source data curation platform for LLMs

    Argilla is a production-ready framework for building and improving datasets for NLP projects. Deploy your own Argilla Server on Spaces with a few clicks. Use embeddings to find the most similar records with the UI. This feature uses vector search combined with traditional search (keyword and filter based). Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can use and combine your preferred...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Llama Cookbook

    Llama Cookbook

    Solve end to end problems using Llama model family

    The Llama Cookbook is the official Meta LLaMA guide for inference, fine‑tuning, RAG, and multi-step use-cases. It offers recipes, code samples, and integration examples across provider platforms (WhatsApp, SQL, long context workflows), enabling developers to quickly harness LLaMA models
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Bolna

    Bolna

    Conversational voice AI agents

    Bolna is an end-to-end open-source platform for building conversational voice AI agents, enabling developers to create voice-first conversational assistants efficiently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    SetFit

    SetFit

    Efficient few-shot learning with Sentence Transformers

    SetFit is an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers. It achieves high accuracy with little labeled data - for instance, with only 8 labeled examples per class on the Customer Reviews sentiment dataset, SetFit is competitive with fine-tuning RoBERTa Large on the full training set of 3k examples.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Guardrails

    Guardrails

    Adding guardrails to large language models

    Guardrails is a Python package that lets a user add structure, type and quality guarantees to the outputs of large language models (LLMs). At the heart of Guardrails is the rail spec. rail is intended to be a language-agnostic, human-readable format for specifying structure and type information, validators and corrective actions over LLM outputs. We create a RAIL spec to describe the expected structure and types of the LLM output, the quality criteria for the output to be considered valid,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    spacy-llm

    spacy-llm

    Integrating LLMs into structured NLP pipelines

    Large Language Models (LLMs) feature powerful natural language understanding capabilities. With only a few (and sometimes no) examples, an LLM can be prompted to perform custom NLP tasks such as text categorization, named entity recognition, coreference resolution, information extraction and more. This package integrates Large Language Models (LLMs) into spaCy, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    InvokeAI

    InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models

    InvokeAI is an implementation of Stable Diffusion, the open source text-to-image and image-to-image generator. It provides a streamlined process with various new features and options to aid the image generation process. It runs on Windows, Mac and Linux machines, and runs on GPU cards with as little as 4 GB or RAM. InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies....
    Downloads: 28 This Week
    Last Update:
    See Project
  • 21
    Gemma

    Gemma

    Gemma open-weight LLM library, from Google DeepMind

    Gemma, developed by Google DeepMind, is a family of open-weights large language models (LLMs) built upon the research and technology behind Gemini. This repository provides the official implementation of the Gemma PyPI package, a JAX-based library that enables users to load, interact with, and fine-tune Gemma models. The framework supports both text and multi-modal input, allowing natural language conversations that incorporate visual content such as images. It includes APIs for...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Agent S2

    Agent S2

    Agent S: an open agentic framework that uses computers like a human

    Simular's Agent S2 represents a leap forward in the development of computer-use agents, capable of autonomously interacting with a range of devices and interfaces. By integrating specialized AI models, Agent S2 delivers state-of-the-art performance, whether on desktop systems or smartphones. Through modular architecture, it efficiently handles complex tasks, such as navigating UIs, performing low-level actions like text selection, and executing high-level strategies like planning....
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    Janus

    Janus

    Unified Multimodal Understanding and Generation Models

    Janus is a sophisticated open-source project from DeepSeek AI that aims to unify both visual understanding and image generation in a single model architecture. Rather than having separate systems for “look and describe” and “prompt and generate”, Janus uses an autoregressive transformer framework with a decoupled visual encoder—allowing it to ingest images for comprehension and to produce images from text prompts with shared internal representations. The design tackles long-standing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    towhee

    towhee

    Framework that is dedicated to making neural data processing

    Towhee is an open-source machine-learning pipeline that helps you encode your unstructured data into embeddings. You can use our Python API to build a prototype of your pipeline and use Towhee to automatically optimize it for production-ready environments. From images to text to 3D molecular structures, Towhee supports data transformation for nearly 20 different unstructured data modalities. We provide end-to-end pipeline optimizations, covering everything from data decoding/encoding, to model inference, making your pipeline execution 10x faster. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    ktrain

    ktrain

    ktrain is a Python library that makes deep learning AI more accessible

    ktrain is a Python library that makes deep learning and AI more accessible and easier to apply. ktrain is a lightweight wrapper for the deep learning library TensorFlow Keras (and other libraries) to help build, train, and deploy neural networks and other machine learning models. Inspired by ML framework extensions like fastai and ludwig, ktrain is designed to make deep learning and AI more accessible and easier to apply for both newcomers and experienced practitioners. With only a few lines...
    Downloads: 0 This Week
    Last Update:
    See Project