Showing 471 open source projects for "python text parser"

View related business solutions
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Cloud-based help desk software with ServoDesk Icon
    Cloud-based help desk software with ServoDesk

    Full access to Enterprise features. No credit card required.

    What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.
    Try ServoDesk for free
  • 1
    HumanEval

    HumanEval

    Code for the paper "Evaluating Large Language Models Trained on Code"

    human-eval is a benchmark dataset and evaluation framework created by OpenAI for measuring the ability of language models to generate correct code. It consists of hand-written programming problems with unit tests, designed to assess functional correctness rather than superficial metrics like text similarity. Each task includes a natural language prompt and a function signature, requiring the model to generate an implementation that passes all provided tests. The benchmark has become a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Phi-3-MLX

    Phi-3-MLX

    Phi-3.5 for Mac: Locally-run Vision and Language Models

    Phi-3-Vision-MLX is an Apple MLX (machine learning on Apple silicon) implementation of Phi-3 Vision, a lightweight multi-modal model designed for vision and language tasks. It focuses on running vision-language AI efficiently on Apple hardware like M1 and M2 chips.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    spacy-llm

    spacy-llm

    Integrating LLMs into structured NLP pipelines

    Large Language Models (LLMs) feature powerful natural language understanding capabilities. With only a few (and sometimes no) examples, an LLM can be prompted to perform custom NLP tasks such as text categorization, named entity recognition, coreference resolution, information extraction and more. This package integrates Large Language Models (LLMs) into spaCy, featuring a modular system for fast prototyping and prompting, and turning unstructured responses into robust outputs for various...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    LangChain

    LangChain

    ⚡ Building applications with LLMs through composability ⚡

    Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge. This library is aimed at assisting in the development of those types of applications.
    Downloads: 9 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Shap-E

    Shap-E

    Generate 3D objects conditioned on text or images

    The shap-e repository provides the official code and model release for Shap-E, a conditional generative model designed to produce 3D assets (implicit functions, meshes, neural radiance fields) from text or image prompts. The model is built with a two-stage architecture: first an encoder that maps existing 3D assets into parameterizations of implicit functions, and then a conditional diffusion model trained on those parameterizations to generate new assets. Because it works at the level of...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    OpenMLSys-ZH

    OpenMLSys-ZH

    Machine Learning Systems: Design and Implementation

    This repository is the Chinese translation (or localization) of the OpenMLSys project documentation. Its aim is to make the technical content, tutorials, architecture descriptions, and user guides of the OpenMLSys system more accessible to Chinese-speaking users. The repo mirrors the structure of the original OpenMLSys docs: sections on system design, API references, deployment instructions, module overviews, and example workflows. It helps bridge language barriers in open machine learning...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Scikit-LLM

    Scikit-LLM

    Seamlessly integrate LLMs into scikit-learn

    Seamlessly integrate powerful language models like ChatGPT into sci-kit-learn for enhanced text analysis tasks. At the moment the majority of the Scikit-LLM estimators are only compatible with some of the OpenAI models. Hence, a user-provided OpenAI API key is required. Additionally, Scikit-LLM will ensure that the obtained response contains a valid label. If this is not the case, a label will be selected randomly (label probabilities are proportional to label occurrences in the training...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Step-Audio 2

    Step-Audio 2

    Multi-modal large language model designed for audio understanding

    Step-Audio2 is an advanced, end-to-end multimodal large language model designed for high-fidelity audio understanding and natural speech conversation: unlike many pipelines that separate speech recognition, processing, and synthesis, Step-Audio2 processes raw audio, reasons about semantic and paralinguistic content (like emotion, speaker characteristics, non-verbal cues), and can generate contextually appropriate responses — including potentially generating or transforming audio output. It...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    Vidi is a family of large multimodal models developed for deep video understanding and editing tasks, integrating vision, audio, and language to allow sophisticated querying and manipulation of video content. It’s designed to process long-form, real-world videos and answer complex queries such as “when in this clip does X happen?” or “where in the frame is object Y during that moment?” — offering temporal retrieval, spatio-temporal grounding (i.e. locating objects over time + space), and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 10
    fairseq2

    fairseq2

    FAIR Sequence Modeling Toolkit 2

    fairseq2 is a modern, modular sequence modeling framework developed by Meta AI Research as a complete redesign of the original fairseq library. Built from the ground up for scalability, composability, and research flexibility, fairseq2 supports a broad range of language, speech, and multimodal content generation tasks, including instruction fine-tuning, reinforcement learning from human feedback (RLHF), and large-scale multilingual modeling. Unlike the original fairseq—which evolved into a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    DeepSeek Math

    DeepSeek Math

    Pushing the Limits of Mathematical Reasoning in Open Language Models

    DeepSeek-Math is DeepSeek’s specialized model (or dataset + evaluation) focusing on mathematical reasoning, symbolic manipulation, proof steps, and advanced quantitative problem solving. The repository is likely to include fine-tuning routines or task datasets (e.g. MATH, GSM8K, ARB), demonstration notebooks, prompt templates, and evaluation results on math benchmarks. The goal is to push DeepSeek’s performance in domains that require rigorous symbolic steps, calculus, linear algebra, number...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    DreamCraft3D is DeepSeek’s generative 3D modeling framework / model family that likely extends their earlier 3D efforts (e.g. Shap-E or Point-E style models) with more capability, control, or expression. The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    cognee

    cognee

    Deterministic LLMs Outputs for AI Applications and AI Agents

    We build for developers who need a reliable, production-ready data layer for AI applications. Cognee implements scalable, modular data pipelines that allow for creating the LLM-enriched data layer using graph and vector stores. Cognee acts a semantic memory layer, unveiling hidden connections within your data and infusing it with your company's language and principles. This self-optimizing process ensures ultra-relevant, personalized, and contextually aware LLM retrievals. Any kind of data...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    MegaTTS 3

    MegaTTS 3

    Official PyTorch Implementation

    MegaTTS3 is an open-source text-to-speech (TTS) and voice-cloning system from ByteDance that aims to deliver high-quality, expressive speech synthesis, including zero-shot voice cloning of previously unseen speakers. Its backbone is a lightweight diffusion-transformer (on the order of ~0.45 B parameters), which enables efficient inference while still producing high-fidelity audio. Given a reference audio sample (and corresponding latent representation), MegaTTS3 can generate speech in the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MiniMax-M1

    MiniMax-M1

    Open-weight, large-scale hybrid-attention reasoning model

    MiniMax-M1 is presented as the world’s first open-weight, large-scale hybrid-attention reasoning model, designed to push the frontier of long-context, tool-using, and deeply “thinking” language models. It is built on the MiniMax-Text-01 foundation and keeps the same massive parameter budget, but reworks the attention and training setup for better reasoning and test-time compute scaling. Architecturally, it combines Mixture-of-Experts layers with lightning attention, enabling the model to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ChatGPT Discord Bot

    ChatGPT Discord Bot

    Integrate ChatGPT into your own discord bot

    Build your own Discord bot using ChatGPT. Using certain personas may generate vulgar or disturbing content. Use at your own risk. Public mode (default), the bot directly replies on the channel. The bot's reply can only be seen by the person who used the command. Invite your bot to your server via OAuth2 URL Generator. Email/Password authentication (Not supported for Google/Microsoft accounts). A system prompt would be invoked when the bot is first started or reset. You can set it up by...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Large Concept Model

    Large Concept Model

    Language modeling in a sentence representation space

    Large Concept Model is a research codebase centered on concept-centric representation learning at scale, aiming to capture shared structure across many categories and modalities. It organizes training around concepts (rather than just raw labels), encouraging models to understand attributes, relations, and compositional structure that transfer across tasks. The repository provides training loops, data tooling, and evaluation routines to learn and probe these concept embeddings, typically...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Self-Operating Computer

    Self-Operating Computer

    A framework to enable multimodal models to operate a computer

    The Self-Operating Computer Framework is an innovative system that enables multimodal models to autonomously operate a computer by interpreting the screen and executing mouse and keyboard actions to achieve specified objectives. This framework is compatible with various multimodal models and currently integrates with GPT-4o, o1, Gemini Pro Vision, Claude 3, and LLaVa. Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen....
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    nanochat

    nanochat

    The best ChatGPT that $100 can buy

    nanochat is a from-scratch, end-to-end “mini ChatGPT” that shows the entire path from raw text to a chatty web app in one small, dependency-lean codebase. The repository stitches together every stage of the lifecycle: tokenizer training, pretraining a Transformer on a large web corpus, mid-training on dialogue and multiple-choice tasks, supervised fine-tuning, optional reinforcement learning for alignment, and finally efficient inference with caching. Its north star is approachability and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ML Ferret

    ML Ferret

    Refer and Ground Anything Anywhere at Any Granularity

    Ferret is Apple’s end-to-end multimodal large language model designed specifically for flexible referring and grounding: it can understand references of any granularity (boxes, points, free-form regions) and then ground open-vocabulary descriptions back onto the image. The core idea is a hybrid region representation that mixes discrete coordinates with continuous visual features, so the model can fluidly handle “any-form” referring while maintaining precise spatial localization. The repo...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    MetaCLIP

    MetaCLIP

    ICLR2024 Spotlight: curation/training code, metadata, distribution

    MetaCLIP is a research codebase that extends the CLIP framework into a meta-learning / continual learning regime, aiming to adapt CLIP-style models to new tasks or domains efficiently. The goal is to preserve CLIP’s strong zero-shot transfer capability while enabling fast adaptation to domain shifts or novel class sets with minimal data and without catastrophic forgetting. The repository provides training logic, adaptation strategies (e.g. prompt tuning, adapter modules), and evaluation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Transformers4Rec

    Transformers4Rec

    Transformers4Rec is a flexible and efficient library

    Transformers4Rec is an advanced recommendation system library that leverages Transformer models for sequential and session-based recommendations. The library works as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). Transformers4Rec makes state-of-the-art transformer architectures available for RecSys researchers and industry practitioners. Traditional recommendation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    windsurf.vim

    windsurf.vim

    Free, ultrafast Copilot alternative for Vim and Neovim

    windsurf.vim is a plugin for Vim and Neovim by Exafunction (formerly part of the Codeium project) that brings in AI-driven code completion and assistance capabilities. The aim is to provide a “free, ultrafast” alternative to other AI code assistants (such as GitHub Copilot) directly within Vim/Neovim. Once installed and configured, windsurf.vim can suggest code completions, generate multi-line snippets based on comments or invitation in code, and make the editing experience more predictive...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Laminar

    Laminar

    Open-source all-in-one platform for engineering AI products

    ...You can use it to set up better evaluations, as dynamic few-shot examples, and for fine-tuning. All traces are sent in the background via gRPC with minimal overhead. Tracing of text and image models is supported, audio models are coming soon. You can set up LLM-as-a-judge or Python script evaluators to run on each received span. Evaluators label spans, which is more scalable than human labeling, and especially helpful for smaller teams. Laminar lets you go beyond a single prompt. You can build and host complex chains, including mixtures of agents or self-reflecting LLM pipelines.
    Downloads: 3 This Week
    Last Update:
    See Project