Search Results for "python text parser" - Page 14

Showing 1756 open source projects for "python text parser"

View related business solutions
  • Stop Storing Third-Party Tokens in Your Database Icon
    Stop Storing Third-Party Tokens in Your Database

    Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

    Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.
    Try Auth0 for Free
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 1
    MCP Server OpenDAL

    MCP Server OpenDAL

    Model Context Protocol Server for Apache OpenDAL™

    Model Context Protocol Server for Apache OpenDAL™ is an MCP server implementation that provides access to various storage services via Apache OpenDAL. It enables seamless interactions with multiple storage backends through a unified interface. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    MindNLP

    MindNLP

    Easy-to-use and high-performance NLP and LLM framework

    MindNLP is a natural language processing library built on the MindSpore framework, providing tools and models for various NLP tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    LightAutoML

    LightAutoML

    Fast and customizable framework for automatic ML model creation

    LightAutoML is an automated machine learning (AutoML) framework optimized for efficient model training and hyperparameter tuning, focusing on both tabular and text data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Toot

    Toot

    toot - Mastodon CLI & TUI

    Toot is a CLI and TUI tool for interacting with Mastodon instances from the command line.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    video-use

    video-use

    Edit videos with Claude Code

    Video Use is an open-source AI-powered video editing tool that allows users to transform raw footage into polished videos using natural language commands. Designed to work with Claude Code, it automates the entire editing process—from cutting clips to rendering the final output—without requiring manual timelines or complex software interfaces. The system intelligently analyzes audio transcripts and visual cues to make precise, context-aware editing decisions. It supports a wide range of...
    Downloads: 24 This Week
    Last Update:
    See Project
  • 6
    LLMs-from-scratch

    LLMs-from-scratch

    Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

    ...By the end, you have a grounded sense of how data pipelines, optimization, and inference interact to produce fluent text.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling,...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Book4_Power-of-Matrix

    Book4_Power-of-Matrix

    Book_4_Matrix Power | The Iris Book: From Addition, Subtraction

    ...The project is designed to help readers progress from basic arithmetic toward machine learning fundamentals by building a strong conceptual understanding of vectors, matrices, and their operations. It combines explanatory text, diagrams, and Python examples to bridge theory and practical computation. The material emphasizes geometric interpretation and visual reasoning, which makes abstract linear algebra topics more accessible to beginners and self-learners. The repository is continuously updated and intended to accompany the broader Visualize-ML learning ecosystem. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    ESPnet

    ESPnet

    End-to-end speech processing toolkit

    ESPnet is a comprehensive end-to-end speech processing toolkit covering a wide spectrum of tasks, including automatic speech recognition (ASR), text-to-speech (TTS), speech translation (ST), speech enhancement, speaker diarization, and spoken language understanding. It uses PyTorch as its deep learning engine and adopts a Kaldi-style data processing pipeline for features, data formats, and experimental recipes. This combination allows researchers to leverage modern neural architectures while...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 10
    Dia

    Dia

    A TTS model capable of generating ultra-realistic dialogue

    Dia is a neural text-to-speech model designed specifically for generating ultra-realistic dialogue in a single pass. Instead of focusing on isolated sentences or flat narration, it is optimized for conversational audio, complete with natural turn-taking, prosody, and pacing. The model can be conditioned on a reference audio sample, allowing you to control emotion, tone, and other stylistic aspects of the speech. It can also produce nonverbal vocalizations like laughter, coughs, clearing the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    AutoGluon

    AutoGluon

    AutoGluon: AutoML for Image, Text, and Tabular Data

    AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code. Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge. Leverage automatic hyperparameter tuning,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Auto Synced & Translated Dubs

    Auto Synced & Translated Dubs

    Automatically translates the text of a video based on a subtitle file

    Auto-Synced-Translated-Dubs is a toolchain that automatically translates and re-dubs videos using AI voices while keeping the new speech aligned to the original timing via subtitle files. It assumes you have a human-made SRT (or similar) subtitle file; the script then uses translation services such as Google Cloud or DeepL to generate translated subtitle tracks in one or more target languages. Using the timestamps of each subtitle line, it computes the required duration of each spoken...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    HunyuanOCR

    HunyuanOCR

    OCR expert VLM powered by Hunyuan's native multimodal architecture

    HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Unstructured.IO

    Unstructured.IO

    Open source libraries and APIs to build custom preprocessing pipelines

    The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular bricks and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and is efficient in transforming unstructured data into...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Cactus

    Cactus

    Low-latency AI inference engine optimized for mobile devices

    Cactus is a low-latency, energy-efficient AI inference framework designed specifically for mobile devices and wearables, enabling advanced machine learning capabilities directly on-device. It provides a full-stack architecture composed of an inference engine, a computation graph system, and highly optimized hardware kernels tailored for ARM-based processors. Cactus emphasizes efficient memory usage through techniques such as zero-copy computation graphs and quantized model formats, allowing...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Semantra

    Semantra

    Multi-tool for semantic search

    Semantra is an open-source semantic search tool designed to help users explore large collections of documents by meaning rather than simple keyword matching. The software analyzes text and PDF documents stored locally and creates embeddings that allow queries to retrieve results based on conceptual similarity. It is primarily intended for individuals who need to extract insights from large document collections, including researchers, journalists, students, and historians. The system runs...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    HY-Motion 1.0

    HY-Motion 1.0

    HY-Motion model for 3D character animation generation

    HY-Motion 1.0 is an open-source, large-scale AI model suite developed by Tencent’s Hunyuan team that generates high-quality 3D human motion from simple text prompts, enabling the automatic production of fluid, diverse, and semantically accurate animations without manual keyframing or rigging. Built on advanced deep learning architectures that combine Diffusion Transformer (DiT) and flow matching techniques, HY-Motion scales these approaches to the billion-parameter level, resulting in strong...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    CogView4

    CogView4

    CogView4, CogView3-Plus and CogView3(ECCV 2024)

    CogView4 is the latest generation in the CogView series of vision-language foundation models, developed as a bilingual (Chinese and English) open-source system for high-quality image understanding and generation. Built on top of the GLM framework, it supports multimodal tasks including text-to-image synthesis, image captioning, and visual reasoning. Compared to previous CogView versions, CogView4 introduces architectural upgrades, improved training pipelines, and larger-scale datasets,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    Natural Language Toolkit
    The Natural Language Toolkit (NLTK) is a widely used open-source Python library designed for working with human language data and building natural language processing (NLP) applications. It provides a comprehensive suite of modules, datasets, and tutorials that support both symbolic and statistical approaches to language processing. The toolkit includes implementations of many foundational NLP algorithms and utilities, enabling developers to perform tasks such as tokenization, stemming,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Code-Graph-RAG

    Code-Graph-RAG

    The ultimate RAG for your monorepo

    Code-Graph-RAG is an advanced retrieval-augmented generation system designed specifically for understanding and interacting with large, multi-language codebases by transforming them into structured knowledge graphs. It uses Tree-sitter to parse source code into abstract syntax trees, extracting relationships between functions, classes, and modules to build a graph-based representation of the entire codebase. This structured approach enables more accurate and context-aware querying compared...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 21
    RecoverPy

    RecoverPy

    Interactively find and recover deleted or overwritten files

    RecoverPy is a powerful tool that leverages your system capabilities to recover lost files. Unlike others, you can not only recover deleted files but also overwritten data. Every block of your partition will be scanned. You can even find a string in binary files.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    SentenceTransformers

    SentenceTransformers

    Multilingual sentence & image embeddings with BERT

    SentenceTransformers is a Python framework for state-of-the-art sentence, text and image embeddings. The initial work is described in our paper Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. You can use this framework to compute sentence / text embeddings for more than 100 languages. These embeddings can then be compared e.g. with cosine-similarity to find sentences with a similar meaning.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    LinkChecker

    LinkChecker

    Check links in web documents or full websites

    LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Milvus Bootcamp

    Milvus Bootcamp

    Dealing with all unstructured data, such as reverse image search

    Milvus Bootcamp is a collection of tutorials, examples, and best practices for using Milvus, an open-source vector database designed for AI-powered similarity search and retrieval applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Ark Pixel Font

    Ark Pixel Font

    Ark pixel font - Open source Pan-CJK pixel font

    Ark pixel font - Open source Pan-CJK pixel font / Ark pixel font - Open source Pan-CJK pixel font. An open-source pan-Chinese-Japanese-Korean pixel font. 10, 12 and 16 pixels are supported. The goal is to provide an out-of-the-box pixel font solution for game development that can be used for text. This project not only provides all the glyph source files, but also provides the complete program needed to build the font. This font is still in early development and still lacks a large number of...
    Downloads: 17 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB