Showing 385 open source projects for "python text parser"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • No-Nonsense Code-to-Cloud Security for Devs | Aikido Icon
    No-Nonsense Code-to-Cloud Security for Devs | Aikido

    Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

    Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.
    Start for Free
  • 1
    VisualGLM-6B

    VisualGLM-6B

    Chinese and English multimodal conversational language model

    VisualGLM-6B is an open-source multimodal conversational language model developed by ZhipuAI that supports both images and text in Chinese and English. It builds on the ChatGLM-6B backbone, with 6.2 billion language parameters, and incorporates a BLIP2-Qformer visual module to connect vision and language. In total, the model has 7.8 billion parameters. Trained on a large bilingual dataset — including 30 million high-quality Chinese image-text pairs from CogView and 300 million English pairs...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    torchtext

    torchtext

    Data loaders and abstractions for text and NLP

    We recommend Anaconda as a Python package management system. Please refer to pytorch.org for the details of PyTorch installation. LTS versions are distributed through a different channel than the other versioned releases. Alternatively, you might want to use the Moses tokenizer port in SacreMoses (split from NLTK). You have to install SacreMoses. To build torchtext from source, you need git, CMake and C++11 compiler such as g++. When building from source, make sure that you have the same C...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Universal Tool Calling Protocol (UTCP)

    Universal Tool Calling Protocol (UTCP)

    Official python implementation of UTCP. UTCP is an open standard

    The python-utcp repository is the official Python SDK implementation of the Universal Tool Calling Protocol (UTCP). UTCP is an open, modern standard designed to let AI agents call any tool or API directly—over HTTP, CLI, WebSocket, gRPC, and more—without the overhead of extra wrapper layers or middleware. It leverages a modular, plugin-based architecture built around Pydantic models and separates the core functionality into a lightweight client and extensible protocol plugins, enabling secure...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Llama Cookbook

    Llama Cookbook

    Solve end to end problems using Llama model family

    The Llama Cookbook is the official Meta LLaMA guide for inference, fine‑tuning, RAG, and multi-step use-cases. It offers recipes, code samples, and integration examples across provider platforms (WhatsApp, SQL, long context workflows), enabling developers to quickly harness LLaMA models
    Downloads: 0 This Week
    Last Update:
    See Project
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 5
    Bolna

    Bolna

    Conversational voice AI agents

    Bolna is an end-to-end open-source platform for building conversational voice AI agents, enabling developers to create voice-first conversational assistants efficiently.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    ChatGPT Academic

    ChatGPT Academic

    ChatGPT extension for scientific research work

    ChatGPT extension for scientific research work, specially optimized academic paper polishing experience, supports custom shortcut buttons, supports custom function plug-ins, supports markdown table display, double display of Tex formulas, complete code display function, new local Python/C++/Go project tree Analysis function/Project source code self-translation ability, newly added PDF and Word document batch summary function/PDF paper full-text translation function. All buttons are dynamically...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Style Aligned

    Style Aligned

    Official code for Style Aligned Image Generation via Shared Attention

    StyleAligned is a diffusion-model editing technique and codebase that preserves the visual “style” of an original image while applying new semantic edits driven by text. Instead of fully re-generating an image—and risking changes to lighting, texture, or rendering choices—the method aligns internal features across denoising steps so the target edit inherits the source style. This alignment acts like a constraint on the model’s evolution, steering composition, palette, and brushwork even...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    CLIP-as-service

    CLIP-as-service

    Embed images and sentences into fixed-length vectors

    CLIP-as-service is a low-latency high-scalability service for embedding images and text. It can be easily integrated as a microservice into neural search solutions. Serve CLIP models with TensorRT, ONNX runtime and PyTorch w/o JIT with 800QPS[*]. Non-blocking duplex streaming on requests and responses, designed for large data and long-running tasks. Horizontally scale up and down multiple CLIP models on single GPU, with automatic load balancing. Easy-to-use. No learning curve, minimalist design...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    PPTAgent

    PPTAgent

    PPTAgent: Generating and Evaluating Presentations

    PPTAgent is a research system for generating and evaluating slide decks that goes beyond simple text-to-slides. It follows a two-stage, edit-based workflow: first it analyzes reference presentations to infer slide roles and structure, then it drafts an outline and iteratively performs editing actions to produce new slides. The project includes both the generation agent and an evaluation framework, PPTEval, to score content quality, design, and coherence. The repository highlights the EMNLP 2025...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Deliver secure remote access with OpenVPN. Icon
    Deliver secure remote access with OpenVPN.

    Trusted by nearly 20,000 customers worldwide, and all major cloud providers.

    OpenVPN's products provide scalable, secure remote access — giving complete freedom to your employees to work outside the office while securely accessing SaaS, the internet, and company resources.
    Get started — no credit card required.
  • 10
    Cleanlab

    Cleanlab

    The standard data-centric AI package for data quality and ML

    cleanlab helps you clean data and labels by automatically detecting issues in a ML dataset. To facilitate machine learning with messy, real-world data, this data-centric AI package uses your existing models to estimate dataset problems that can be fixed to train even better models. cleanlab cleans your data's labels via state-of-the-art confident learning algorithms, published in this paper and blog. See some of the datasets cleaned with cleanlab at labelerrors.com. This package helps you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Key-book

    Key-book

    Proofs, cases, concept supplements, and reference explanations

    The book "Introduction to Machine Learning Theory" (hereinafter referred to as "Introduction") written by Zhou Zhihua, Wang Wei, Gao Wei, and other teachers fills the regret of the lack of introductory works on machine learning theory in China. This book attempts to provide an introductory guide for readers interested in learning machine learning theory and researching machine learning theory in an easy-to-understand language. "Guide" mainly covers seven parts, corresponding to seven...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Jina

    Jina

    Build cross-modal and multimodal applications on the cloud

    Jina is a framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. Build applications that deliver fresh insights from multiple data types such as text, image, audio, video, 3D mesh, PDF with Jina AI’s DocArray. Polyglot gateway that supports gRPC, Websockets, HTTP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Stable Diffusion in Docker

    Stable Diffusion in Docker

    Run the Stable Diffusion releases in a Docker container

    ... a suitable GPU you can set the options --device cpu and --onnx instead. Since it uses the model, you will need to create a user access token in your Huggingface account. Save the user access token in a file called token.txt and make sure it is available when building the container. Create an image from an existing image and a text prompt. Modify an existing image with its depth map and a text prompt.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Super Easy AI Installer Tool

    Super Easy AI Installer Tool

    Application that simplifies the installation of AI-related projects

    .... But remains a great solution for users with minimal technical knowledge or expertise. Fixes underway. A tool that can generate animations and music from text, ideal for producing short videos and GIFs, as well as creating brief cinematic scenes.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    CogView

    CogView

    Text-to-Image generation. The repo for NeurIPS 2021 paper

    CogView is a large-scale pretrained text-to-image transformer model, introduced in the NeurIPS 2021 paper CogView: Mastering Text-to-Image Generation via Transformers. With 4 billion parameters, it was one of the earliest transformer-based models to successfully generate high-quality images from natural language descriptions in Chinese, with partial support for English via translation. The model incorporates innovations such as PB-relax and Sandwich-LN to enable stable training of very deep...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    funNLP

    funNLP

    Resources, corpora, and tools for Chinese natural language processing

    FunNLP is a large, curated collection of resources, corpora, and tools for Chinese natural language processing (NLP). It aggregates datasets, lexicons, wordlists, sentiment dictionaries, knowledge graphs, and pretrained model references, serving as a one-stop resource hub for Chinese NLP practitioners. The repository is organized into categories such as sentiment analysis, text classification, named entity recognition, knowledge graphs, and various lexicons (e.g. sensitive words, emotion...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 17
    Medusa

    Medusa

    Framework for Accelerating LLM Generation with Multiple Decoding Heads

    Medusa is a framework aimed at accelerating the generation capabilities of Large Language Models (LLMs) by employing multiple decoding heads. This approach allows for parallel processing during text generation, significantly enhancing throughput and reducing response times. Medusa is designed to be simple to implement and integrates with existing LLM infrastructures, making it a practical solution for scaling LLM applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    MusicLM - Pytorch

    MusicLM - Pytorch

    Implementation of MusicLM music generation model in Pytorch

    Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch. They are basically using text-conditioned AudioLM, but surprisingly with the embeddings from a text-audio contrastive learned model named MuLan. MuLan is what will be built out in this repository, with AudioLM modified from the other repository to support the music generation needs here.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    StoryTeller

    StoryTeller

    Multimodal AI Story Teller, built with Stable Diffusion, GPT, etc.

    A multimodal AI story teller, built with Stable Diffusion, GPT, and neural text-to-speech (TTS). Given a prompt as an opening line of a story, GPT writes the rest of the plot; Stable Diffusion draws an image for each sentence; a TTS model narrates each line, resulting in a fully animated video of a short story, replete with audio and visuals. To develop locally, install dev dependencies and install pre-commit hooks. This will automatically trigger linting and code quality checks before each...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    LLaMA

    LLaMA

    Inference code for Llama models

    “Llama” is the repository from Meta (formerly Facebook/Meta Research) containing the inference code for LLaMA (Large Language Model Meta AI) models. It provides utilities to load pre-trained LLaMA model weights, run inference (text generation, chat, completions), and work with tokenizers. Tokenizer utilities, download scripts, shell helpers to fetch model weights with correct licensing/permissions. Includes example scripts for chat completions and text completions to show how to call the models...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Basaran

    Basaran

    Basaran, an open-source alternative to the OpenAI text completion API

    Basaran is an open-source alternative to the OpenAI text completion API. It provides a compatible streaming API for your Hugging Face Transformers-based text generation models. The open source community will eventually witness the Stable Diffusion moment for large language models (LLMs), and Basaran allows you to replace OpenAI's service with the latest open-source model to power your application without modifying a single line of code. Stream generation using various decoding strategies...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Nougat

    Nougat

    Implementation of Nougat Neural Optical Understanding

    Nougat is a multi-modal generative modeling framework that bridges vision and text modalities with structured generation control (e.g. layout, scene composition) rather than treating images as flat contexts. It combines object-centric modules with transformer-based reasoning to propose, refine, and render scenes in a generative pipeline. The architecture allows you to specify or prompt a layout (which objects should be where) and then the model fills in appearance, context, lighting...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    UniEM

    UniEM

    Unified embedding model

    UniEM is a unified embedding model designed to create high-quality text embeddings for various natural language processing tasks.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    OpenFlamingo

    OpenFlamingo

    An open-source framework for training large multimodal models

    Welcome to our open source version of DeepMind's Flamingo model! In this repository, we provide a PyTorch implementation for training and evaluating OpenFlamingo models. We also provide an initial OpenFlamingo 9B model trained on a new Multimodal C4 dataset (coming soon). Please refer to our blog post for more details. This repo is still under development, and we hope to release better-performing and larger OpenFlamingo models soon. If you have any questions, please feel free to open an...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    doccano

    doccano

    Open source annotation tool for machine learning practitioners

    doccano is an open-source text annotation tool for humans. It provides annotation features for text classification, sequence labeling and sequence-to-sequence tasks. So, you can create labeled data for sentiment analysis, named entity recognition, text summarization and so on. Just create a project, upload data and start annotating. You can build a dataset in hours.
    Downloads: 2 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.