text parsing free download

Showing 238 open source projects for "text parsing"

View related business solutions

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

text-extract-api

Document (PDF, Word, PPTX ...) extraction and parse API

...Instead of requiring developers to integrate multiple document parsing libraries individually, the system centralizes text extraction capabilities into a unified API that standardizes the output. The platform supports automated processing pipelines that detect file types and apply the appropriate extraction method to obtain the most accurate text representation possible. It can be integrated into document analysis systems, knowledge retrieval tools, and AI pipelines that rely on clean textual data. ...

Downloads: 2 This Week

Last Update: 2026-03-05
See Project
2

TextFSM

Python module for parsing semi-structured text into python tables

TextFSM is a Python library created by Google that provides a template-based state machine engine for parsing semi-structured text. It is particularly useful for extracting structured data from command-line interface (CLI) outputs, such as those from network devices, routers, and switches. By defining parsing logic through reusable template files, TextFSM transforms unstructured text into structured data like lists or tables without requiring complex regular expression code. ...

Downloads: 0 This Week

Last Update: 2025-10-11
See Project
3

LiteParse

A fast, helpful, and open-source document parser

LiteParse is an open-source lightweight parsing library designed to extract structured data from unstructured text using large language models in an efficient and cost-effective manner. It focuses on simplifying the process of turning raw text into structured outputs such as JSON by providing a streamlined interface for prompt-based parsing. The system is designed to minimize overhead, making it suitable for applications where performance and cost are critical considerations. ...

Downloads: 2 This Week

Last Update: 5 days ago
See Project
4

ChordSheetJS

A JavaScript library for parsing and formatting chords and chord sheet

ChordSheetJS is a JavaScript library for parsing, formatting, and transposing chord sheets. It supports various chord sheet formats and provides tools for rendering and manipulating chord and lyric data.

Downloads: 6 This Week

Last Update: 2026-04-17
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

YAML

JavaScript parser and stringifier for YAML

yaml is a definitive library for YAML, the human friendly data serialization standard. This library supports both YAML 1.1 and YAML 1.2 and all common data schemas, passes all of the yaml-test-suite tests. It can accept any string as input without throwing, parsing as much YAML out of it as it can, and supports parsing, modifying, and writing YAML comments and blank lines. The library is released under the ISC open source license, and the code is available on GitHub. It has no external...

Downloads: 2 This Week

Last Update: 2026-03-21
See Project
6

LlamaParse

Parse files for optimal RAG

LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). Load in 160+ data sources and data formats, from unstructured, and semi-structured, to structured data (API's, PDFs, documents, SQL, etc.) Store and index your data for different use cases. Integrate with 40+ vector stores, document stores, graph stores, and SQL db providers.

Downloads: 4 This Week

Last Update: 2026-02-13
See Project
7

commonmark-java

Java library for parsing and rendering CommonMark (Markdown)

Java library for parsing and rendering Markdown text according to the CommonMark specification (and some extensions). Provides classes for parsing input to an abstract syntax tree of nodes (AST), visiting and manipulating nodes, and rendering to HTML. It started out as a port of commonmark.js, but has since evolved into a full library with a nice API.

Downloads: 3 This Week

Last Update: 2026-03-31
See Project
8

markdown-it

Markdown parser, done right. 100% CommonMark support, extensions

markdown-it is a fast and extensible JavaScript-based Markdown parser designed to convert Markdown text into HTML while maintaining strict compliance with the CommonMark specification and offering additional syntax enhancements. It is widely used in web applications, documentation tools, and content platforms due to its high performance and flexibility. The library is built with a rule-based parsing system that allows developers to customize or replace syntax rules, making it adaptable to a wide variety of use cases. ...

Downloads: 1 This Week

Last Update: 2026-03-29
See Project
9

npm-pdfreader

Parse text and tables from PDF files.

npm-pdfreader is a Node.js library for reading text and parsing tables from PDF files. It supports tabular data with automatic column detection and rule-based parsing, making it useful for extracting structured data from PDFs.

Downloads: 0 This Week

Last Update: 2025-11-01
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
10

GROBID

A machine learning software for extracting information

GROBID is a machine learning library for extracting, parsing, and re-structuring raw documents such as PDF into structured XML/TEI encoded documents with a particular focus on technical and scientific publications. First developments started in 2008 as a hobby. In 2011 the tool has been made available in open source. Work on GROBID has been steady as a side project since the beginning and is expected to continue as such. Header extraction and parsing from article in PDF format. The...

Downloads: 4 This Week

Last Update: 2026-04-07
See Project
11

Ksoup

Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML

Ksoup is a lightweight Kotlin Multiplatform library for parsing HTML, extracting HTML tags, attributes, and text, and encoding and decoding HTML entities.

Downloads: 0 This Week

Last Update: 2025-06-08
See Project
12

ANTLR

Parser generator to read, process, or translate structured text

ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build and walk parse trees. It’s widely used in academia and industry to build all sorts of languages, tools, and frameworks. Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. ...

Downloads: 15 This Week

Last Update: 2024-08-03
See Project
13

Markdig

A fast, powerful, CommonMark compliant, extensible Markdown processor

A fast, powerful, CommonMark compliant, extensible Markdown processor for .NET. Very fast parser and HTML renderer (no-regexp), very lightweight in terms of GC pressure. Abstract Syntax Tree with precise source code location for syntax tree, useful when building a Markdown editor. Check out MarkdownEditor for Visual Studio powered by Markdig! Even the core Markdown/CommonMark parsing is pluggable, so it allows to disable built-in Markdown/Commonmark parsing (e.g Disable HTML parsing) or...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
14

zpdf

Zero-copy PDF text extraction library written in Zig

zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches.

Downloads: 0 This Week

Last Update: 2026-02-01
See Project
15

RAG Anything

RAG-Anything: All-in-One RAG Framework

...The system uses a multi-stage pipeline (e.g., document parsing, content analysis, knowledge graph construction, intelligent retrieval) so queries can navigate across modalities with deeper understanding and relevance.

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
16

amrlib

A python library that makes AMR parsing, generation and visualization

A python library that makes AMR parsing, generation and visualization simple. amrlib is a python module designed to make processing for Abstract Meaning Representation (AMR) simple by providing the following functions. Sentence to Graph (StoG) parsing to create AMR graphs from English sentences. Graph to Sentence (GtoS) generation for turning AMR graphs into English sentences. A QT-based GUI to facilitate the conversion of sentences to graphs and back to sentences. Methods to plot AMR graphs...

Downloads: 0 This Week

Last Update: 2026-03-07
See Project
17

tree-sitter

An incremental parsing system for programming tools

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. General enough to parse any programming language. Fast enough to parse on every keystroke in a text editor. Robust enough to provide useful results even in the presence of syntax errors. Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application. ...

Downloads: 2 This Week

Last Update: 2026-03-31
See Project
18

SafeClaw

Chat with it via text and voice

SafeClaw is an open-source, entirely local alternative to cloud-based AI assistants like OpenClaw, enabling users to build a personal assistant that runs on their own machine without incurring API usage charges or exposing data to third-party services. It emphasizes privacy and predictability by using traditional programming, rule-based intent parsing, and established machine learning tools rather than large language models, meaning there are no per-token API costs and deterministic behavior. The assistant offers features such as voice control using fully local speech-to-text (Whisper) and text-to-speech (Piper) capabilities, news aggregation with extractive summarization, and smart home or Bluetooth device control. ...

Downloads: 5 This Week

Last Update: 2026-03-24
See Project
19

dots.ocr

Multilingual Document Layout Parsing in a Single Vision-Language Model

dots.ocr is a cutting-edge multilingual document parsing system built on a unified vision-language model that combines layout detection, text recognition, and structural understanding into a single architecture. Unlike traditional OCR pipelines that rely on multiple specialized components, dots.ocr integrates these processes end-to-end, reducing error propagation and improving consistency across tasks.

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
20

Helix

A post-modern modal text editor

Helix is a modal (Kakoune/Vim‑inspired) terminal-based text editor written in Rust. It features modern modal editing, multiple selections, smart syntax highlighting, and built-in language server (LSP) integration leveraging tree‑sitter for fast, incremental parsing and code intelligence.

Downloads: 2 This Week

Last Update: 2025-07-31
See Project
21

Extractous

Fast and efficient unstructured data extraction

Extractous is a Rust-based unstructured data extraction library focused on fast local parsing of documents and other content-heavy files. Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
22

ELisp Tree-sitter

Tree-sitter bindings for Emacs Lisp

...The minor mode tree-sitter-mode provides a buffer-local syntax tree, which is kept up-to-date with changes to the buffer’s text. Run M-x tree-sitter-hl-mode to replace the regex-based highlighting provided by font-lock-mode with tree-based syntax highlighting.

Downloads: 0 This Week

Last Update: 2026-01-16
See Project
23

Umi-OCR

OCR software, free and offline

Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. ...

Downloads: 50 This Week

Last Update: 2026-01-15
See Project
24

Hazm

Persian NLP Toolkit

Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.

Downloads: 0 This Week

Last Update: 2026-04-01
See Project
25

Notion-to-MD

Convert notion pages, block and list of blocks to markdown

Notion-to-MD is a Node.js package that allows you to convert Notion pages to Markdown format.Convert notion pages, blocks, and list of blocks to markdown (supports nesting) using notion-sdk-js.

Downloads: 0 This Week

Last Update: 2025-07-19
See Project