75 projects for "python text parser" with 2 filters applied:

  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    Mistral Finetune

    Mistral Finetune

    Memory-efficient and performant finetuning of Mistral's models

    ... or instruct models. It supports function-calling style datasets (via "messages" keys) as well as plain text formats, with guidelines on formatting, tokenization, and vocabulary extension (e.g. extending vocab to 32768 for some models) before finetuning. The project also provides tutorial notebooks (e.g. mistral_finetune_7b.ipynb) to walk through the steps.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    ArXiv MCP Server

    ArXiv MCP Server

    A Model Context Protocol server for searching and analyzing arXiv

    arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    ML Ferret

    ML Ferret

    Refer and Ground Anything Anywhere at Any Granularity

    ... presents the vision-language pipeline, model assets, and paper resources that show how Ferret answers questions, follows instructions, and returns grounded outputs rather than just text. In practice, this enables tasks like “find that small red icon next to the chart and describe it” where both the linguistic reference and the visual region are ambiguous without fine spatial reasoning.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    GPT-2 Output Dataset

    GPT-2 Output Dataset

    Dataset of GPT-2 outputs for research in detection, biases, and more

    The GPT-2 Output Dataset is a large collection of model-generated text, released by OpenAI alongside the GPT-2 research paper to study the behaviors and limitations of large language models. It contains 250,000 samples of GPT-2 outputs, generated with different sampling strategies such as top-k truncation, to highlight the diversity and quality of model completions. The dataset also includes corresponding human-written text for comparison, enabling researchers to explore methods...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 5
    CLIP

    CLIP

    CLIP, Predict the most relevant text snippet given an image

    CLIP (Contrastive Language-Image Pretraining) is a neural model that links images and text in a shared embedding space, allowing zero-shot image classification, similarity search, and multimodal alignment. It was trained on large sets of (image, caption) pairs using a contrastive objective: images and their matching text are pulled together in embedding space, while mismatches are pushed apart. Once trained, you can give it any text labels and ask it to pick which label best matches a given...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    HunyuanDiT

    HunyuanDiT

    Diffusion Transformer with Fine-Grained Chinese Understanding

    HunyuanDiT is a high-capability text-to-image diffusion transformer with bilingual (Chinese/English) understanding and multi-turn dialogue capability. It trains a diffusion model in latent space using a transformer backbone and integrates a Multimodal Large Language Model (MLLM) to refine captions and support conversational image generation. It supports adapters like ControlNet, IP-Adapter, LoRA, and can run under constrained VRAM via distillation versions. LoRA, ControlNet (pose, depth, canny...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    OpenMLSys-ZH

    OpenMLSys-ZH

    Machine Learning Systems: Design and Implementation

    This repository is the Chinese translation (or localization) of the OpenMLSys project documentation. Its aim is to make the technical content, tutorials, architecture descriptions, and user guides of the OpenMLSys system more accessible to Chinese-speaking users. The repo mirrors the structure of the original OpenMLSys docs: sections on system design, API references, deployment instructions, module overviews, and example workflows. It helps bridge language barriers in open machine learning...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    DeepSeek VL

    DeepSeek VL

    Towards Real-World Vision-Language Understanding

    ... model weights (or pointers to them), evaluation metrics on standard vision + language benchmarks, and configuration or architecture files. It also supports inference tools for forwarding image + prompt through the model to produce text output. DeepSeek-VL is a predecessor to their newer VL2 model, and presumably shares core design philosophy but with earlier scaling, fewer enhancements, or capability tradeoffs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    nanochat

    nanochat

    The best ChatGPT that $100 can buy

    nanochat is a from-scratch, end-to-end “mini ChatGPT” that shows the entire path from raw text to a chatty web app in one small, dependency-lean codebase. The repository stitches together every stage of the lifecycle: tokenizer training, pretraining a Transformer on a large web corpus, mid-training on dialogue and multiple-choice tasks, supervised fine-tuning, optional reinforcement learning for alignment, and finally efficient inference with caching. Its north star is approachability and speed...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Photo and Video Editing APIs and SDKs Icon
    Photo and Video Editing APIs and SDKs

    Trusted by 150 million+ creators and businesses globally

    Unlock Picsart's full editing suite by embedding our Editor SDK directly into your platform. Offer your users the power of a full design suite without leaving your site.
    Learn More
  • 10
    Gemma in PyTorch

    Gemma in PyTorch

    The official PyTorch implementation of Google's Gemma models

    gemma_pytorch provides the official PyTorch reference for running and fine-tuning Google’s Gemma family of open models. It includes model definitions, configuration files, and loading utilities for multiple parameter scales, enabling quick evaluation and downstream adaptation. The repository demonstrates text generation pipelines, tokenizer setup, quantization paths, and adapters for low-rank or parameter-efficient fine-tuning. Example notebooks walk through instruction tuning and evaluation so...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    DreamCraft3D

    DreamCraft3D

    Official implementation of DreamCraft3D

    DreamCraft3D is DeepSeek’s generative 3D modeling framework / model family that likely extends their earlier 3D efforts (e.g. Shap-E or Point-E style models) with more capability, control, or expression. The name suggests a “dream crafting” metaphor—users probably supply textual or image prompts and generate 3D assets (point clouds, meshes, scenes). The repository includes model code, inference scripts, sample prompts, and possibly dataset preparation pipelines. It may integrate rendering or...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    State of Open Source AI

    State of Open Source AI

    Clarity in the current fast-paced mess of Open Source innovation

    ... the AI domain moves quickly, part of the aim is to make the content maintainable and updateable by the community. The structure includes chapters or sections about model formats, evaluation benchmarks, hardware/backends, MLOps systems, alignment and safety issues, and open datasets. The repository contains the text (in Markdown or similar), configuration for build or publishing (static site or e-book), and contributor guidelines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    OSS-Fuzz Gen

    OSS-Fuzz Gen

    LLM powered fuzzing via OSS-Fuzz

    OSS-Fuzz-Gen is a companion project that helps automatically create or improve fuzz targets for open-source codebases, aiming to increase coverage in OSS-Fuzz with minimal maintainer effort. It analyses a library’s APIs, examples, and tests to propose harnesses that exercise parsers, decoders, or protocol handlers—precisely the code where fuzzing pays off. The system integrates with modern LLM-assisted workflows to draft harness code and then iterates based on build errors or low coverage...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Generative AI Docs

    Generative AI Docs

    Documentation for Google's Gen AI site - including Gemini API & Gemma

    Generative AI Docs is Google’s official documentation repository for Gemini, Vertex AI, and related generative AI APIs. It contains guides, API references, and examples for developers building applications using Google’s large language models, text-to-image models, embeddings, and multimodal capabilities. The repository includes markdown source files that power the Google AI developer documentation site, as well as sample code snippets in Python, JavaScript, and other languages that demonstrate...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Universal Tool Calling Protocol (UTCP)

    Universal Tool Calling Protocol (UTCP)

    Official python implementation of UTCP. UTCP is an open standard

    The python-utcp repository is the official Python SDK implementation of the Universal Tool Calling Protocol (UTCP). UTCP is an open, modern standard designed to let AI agents call any tool or API directly—over HTTP, CLI, WebSocket, gRPC, and more—without the overhead of extra wrapper layers or middleware. It leverages a modular, plugin-based architecture built around Pydantic models and separates the core functionality into a lightweight client and extensible protocol plugins, enabling secure...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Style Aligned

    Style Aligned

    Official code for Style Aligned Image Generation via Shared Attention

    StyleAligned is a diffusion-model editing technique and codebase that preserves the visual “style” of an original image while applying new semantic edits driven by text. Instead of fully re-generating an image—and risking changes to lighting, texture, or rendering choices—the method aligns internal features across denoising steps so the target edit inherits the source style. This alignment acts like a constraint on the model’s evolution, steering composition, palette, and brushwork even...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    PPTAgent

    PPTAgent

    PPTAgent: Generating and Evaluating Presentations

    PPTAgent is a research system for generating and evaluating slide decks that goes beyond simple text-to-slides. It follows a two-stage, edit-based workflow: first it analyzes reference presentations to infer slide roles and structure, then it drafts an outline and iteratively performs editing actions to produce new slides. The project includes both the generation agent and an evaluation framework, PPTEval, to score content quality, design, and coherence. The repository highlights the EMNLP 2025...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    funNLP

    funNLP

    Resources, corpora, and tools for Chinese natural language processing

    FunNLP is a large, curated collection of resources, corpora, and tools for Chinese natural language processing (NLP). It aggregates datasets, lexicons, wordlists, sentiment dictionaries, knowledge graphs, and pretrained model references, serving as a one-stop resource hub for Chinese NLP practitioners. The repository is organized into categories such as sentiment analysis, text classification, named entity recognition, knowledge graphs, and various lexicons (e.g. sensitive words, emotion...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 19
    MusicLM - Pytorch

    MusicLM - Pytorch

    Implementation of MusicLM music generation model in Pytorch

    Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch. They are basically using text-conditioned AudioLM, but surprisingly with the embeddings from a text-audio contrastive learned model named MuLan. MuLan is what will be built out in this repository, with AudioLM modified from the other repository to support the music generation needs here.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Metaseq

    Metaseq

    Repo for external large-scale work

    ... implementation for scaling transformer architectures efficiently across GPUs and nodes. It supports both pretraining and fine-tuning workflows with data pipelines for text, multilingual corpora, and custom tokenization schemes. Metaseq also includes APIs for evaluation, generation, and model serving, enabling seamless transitions from training to inference.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    DiT (Diffusion Transformers)

    DiT (Diffusion Transformers)

    Official PyTorch Implementation of "Scalable Diffusion Models"

    ... noisy latent representations toward cleaner outputs through iterative denoising steps. DiT achieves strong results on benchmarks like ImageNet and LSUN while being architecturally simple and highly modular. It supports variable resolution, conditioning on class or text embeddings, and integration with latent autoencoders (like those used in Stable Diffusion).
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Stable Diffusion

    Stable Diffusion

    High-Resolution Image Synthesis with Latent Diffusion Models

    Stable Diffusion Version 2. The Stable Diffusion project, developed by Stability AI, is a cutting-edge image synthesis model that utilizes latent diffusion techniques for high-resolution image generation. It offers an advanced method of generating images based on text input, making it highly flexible for various creative applications. The repository contains pretrained models, various checkpoints, and tools to facilitate image generation tasks, such as fine-tuning and modifying the models...
    Downloads: 67 This Week
    Last Update:
    See Project
  • 23
    AnimateDiff

    AnimateDiff

    Plug-n-play module turning text-to-image models into animation

    AnimateDiff is an open-source project designed to enhance text-to-image diffusion models by adding animation capabilities. It allows users to turn static images generated by popular text-to-image models into animated sequences without requiring additional model training. This plug-and-play tool is compatible with a wide range of community models and facilitates the generation of animation directly from pre-existing text-to-image models. It supports various configurations to create animations...
    Downloads: 11 This Week
    Last Update:
    See Project
  • 24
    VideoCrafter2

    VideoCrafter2

    Overcoming Data Limitations for High-Quality Video Diffusion Models

    VideoCrafter is an open-source video generation and editing toolbox designed to create high-quality video content. It features models for both text-to-video and image-to-video generation. The system is optimized for generating videos from textual descriptions or still images, leveraging advanced diffusion models. VideoCrafter2, an upgraded version, improves on its predecessor by enhancing motion dynamics and concept combinations, especially in low-data scenarios. Users can explore a wide range...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 25
    CSM (Conversational Speech Model)

    CSM (Conversational Speech Model)

    A Conversational Speech Generation Model

    The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
    Downloads: 4 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.