Page 2 | language processing free download

ExtractThinker

ExtractThinker is a Document Intelligence library for LLMs

ExtractThinker is a tool designed to facilitate the extraction and analysis of information from various data sources, aiding in data processing and knowledge discovery.

Downloads: 4 This Week

Last Update: 2025-06-09

See Project

Search-Index

A persistent, network resilient, full text search library

Search-Index is a lightweight and fast JavaScript-based search engine that enables full-text search indexing and retrieval for web applications.

Downloads: 17 This Week

Last Update: 2025-03-12

See Project

Hazm

Persian NLP Toolkit

Hazm is a natural language processing (NLP) library for Persian text, offering various tools for text preprocessing, tokenization, part-of-speech tagging, and more.

Downloads: 0 This Week

Last Update: 2026-04-01

See Project

Lingua-RS

The most accurate natural language detection library for Rust

Lingua-RS is a language detection library implemented in Rust, designed to accurately identify the language of given text samples. It tells you which language some text is written in. This is very useful as a preprocessing step for linguistic data in natural language processing applications such as text classification and spell checking. Other use cases, for instance, might include routing e-mails to the right geographically located customer service department, based on the e-mails' languages.

Downloads: 0 This Week

Last Update: 2026-03-09

See Project

Sparrow

Structured data extraction and instruction calling with ML, LLM

...The architecture is modular, allowing developers to build customizable processing pipelines that integrate with external tools and data extraction frameworks. Sparrow also includes workflow orchestration tools that allow multiple extraction tasks to be combined into automated pipelines for large-scale document processing.

Downloads: 8 This Week

Last Update: 6 days ago

See Project

WikiChat

WikiChat is an improved RAG

WikiChat is a chatbot framework designed to interactively retrieve and summarize Wikipedia information, allowing users to ask questions and get context-aware responses?

Downloads: 2 This Week

Last Update: 2025-04-29

See Project

OpenVINO

OpenVINO™ Toolkit repository

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime, Post-Training Optimization Tool, as well as CPU, GPU, MYRIAD, multi device and heterogeneous plugins to accelerate deep learning inferencing on Intel® CPUs and Intel® Processor Graphics. ...

Downloads: 26 This Week

Last Update: 2026-05-25

See Project

Keras Hub

Pretrained model hub for Keras 3

Keras Hub is a repository of pre-trained models for Keras 3, offering a collection of ready-to-use models for various machine-learning tasks. KerasHub is an extension of the core Keras API; KerasHub components are provided as Layer and Model implementations. If you are familiar with Keras, congratulations. You already understand most of KerasHub.

Downloads: 9 This Week

Last Update: 2026-06-02

See Project

MarkPDFDown

A high-quality PDF to Markdown tool based on large language model

MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists.

Downloads: 7 This Week

Last Update: 2026-03-06

See Project

LLM.swift

LLM.swift is a simple and readable library

LLM.swift is a Swift package that enables developers to run Large Language Models (LLMs) directly on Apple devices, including iOS, macOS, and watchOS. By leveraging Apple's hardware and software optimizations, LLM.swift facilitates on-device natural language processing tasks, ensuring user privacy and reducing latency associated with cloud-based solutions.

Downloads: 8 This Week

Last Update: 2025-12-06

See Project

Whisper

Robust Speech Recognition via Large-Scale Weak Supervision

OpenAI Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. ...

Downloads: 56 This Week

Last Update: 2025-06-26

See Project

textlint

The pluggable natural language linter for text and markdown

Textlint is an extensible linting tool for text and markdown files, designed to enforce style guidelines, detect errors, and improve writing quality.

Downloads: 3 This Week

Last Update: 2026-05-18

See Project

DeepSparse

Sparsity-aware deep learning inference runtime for CPUs

A sparsity-aware enterprise inferencing system for AI models on CPUs. Maximize your CPU infrastructure with DeepSparse to run performant computer vision (CV), natural language processing (NLP), and large language models (LLMs).

Downloads: 0 This Week

Last Update: 2025-06-02

See Project

DocETL

A system for agentic LLM-powered data processing and ETL

DocETL is an open-source system designed to build and execute data processing pipelines powered by large language models, particularly for analyzing complex collections of documents and unstructured datasets. The platform allows developers and researchers to construct structured workflows that extract, transform, and organize information from sources such as reports, transcripts, legal documents, and other text-heavy data.

Downloads: 7 This Week

Last Update: 2026-03-05

See Project

LLM-Aided OCR Project

Enhances Tesseract OCR output using LLMs (local or API)

LLM Aided OCR is an open-source system designed to improve optical character recognition accuracy by combining traditional OCR tools with large language models. The project addresses common OCR challenges such as distorted text, unusual fonts, historical documents, and complex layouts that often produce inaccurate results with standard OCR pipelines. The system first extracts raw text using OCR engines and then applies language models to analyze and correct recognition errors based on context. ...

Downloads: 4 This Week

Last Update: 2026-03-22

See Project

natural

General natural language facilities for node

"Natural" is a general natural language facility for nodejs. It offers a broad range of functionalities for natural language processing. Tokenizing, stemming, classification, phonetics, tf-idf, WordNet, string similarity, and some inflections are currently supported. It’s still in the early stages, so we’re very interested in bug reports, contributions and the like. Note that many algorithms from Rob Ellis’s node-nltools are being merged into this project and will be maintained from here onward. ...

Downloads: 1 This Week

Last Update: 2026-02-18

See Project

Awesome Fraud Detection Research Papers

A curated list of data mining papers about fraud detection

A curated list of data mining papers about fraud detection from several conferences.

Downloads: 0 This Week

Last Update: 2026-01-05

See Project

Datasets

Hub of ready-to-use datasets for ML models

Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. ...

Downloads: 7 This Week

Last Update: 6 days ago

See Project

BEIR

A Heterogeneous Benchmark for Information Retrieval

BEIR is a benchmark framework for evaluating information retrieval models across various datasets and tasks, including document ranking and question answering.

Downloads: 7 This Week

Last Update: 2025-06-04

See Project

Unstructured.IO

Open source libraries and APIs to build custom preprocessing pipelines

The unstructured library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of unstructured revolve around streamlining and optimizing the data processing workflow for LLMs. unstructured modular bricks and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and is efficient in transforming unstructured data into...

Downloads: 3 This Week

Last Update: 19 hours ago

See Project

Chinese-XLNet

Chinese XLNet pre-trained model

Chinese-XLNet is a Chinese language pre-trained model based on the XLNet architecture, providing an advanced foundation for natural language processing tasks in Mandarin and other Chinese dialects. Unlike traditional masked language modeling, XLNet uses a permutation language modeling objective that captures bidirectional context more effectively by training over all possible token orderings, yielding richer contextual representations.

Downloads: 0 This Week

Last Update: 2026-04-19

See Project

VideoCaptioner

AI-powered tool for generating, optimizing, and translating subtitles

VideoCaptioner is an open source AI-powered subtitle processing tool designed to simplify the workflow of creating subtitles for videos. It integrates speech recognition, language processing, and translation technologies to automatically generate and refine subtitles from video or audio sources. VideoCaptioner uses speech-to-text engines such as Whisper variants to transcribe spoken content and convert it into subtitle text with accurate timestamps.

Downloads: 24 This Week

Last Update: 2026-05-24

See Project

PaperAI

Semantic search and workflows for medical/scientific papers

PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.

Downloads: 7 This Week

Last Update: 2025-07-01

See Project

FastRAG

Efficient Retrieval Augmentation and Generation Framework

fastRAG is a research framework for efficient and optimized retrieval augmented generative pipelines, incorporating state-of-the-art LLMs and Information Retrieval. fastRAG is designed to empower researchers and developers with a comprehensive tool set for advancing retrieval augmented generation.

Downloads: 7 This Week

Last Update: 2025-01-24

See Project

LiteParse

A fast, helpful, and open-source document parser

...LiteParse supports integration with multiple language models, allowing developers to choose the best balance between accuracy and efficiency. It also includes mechanisms for validation and error handling, ensuring that outputs conform to expected schemas and reducing the need for manual postprocessing. The library is particularly useful for tasks such as data extraction, document processing, and building pipelines that require structured outputs from natural language input.

Downloads: 7 This Week

Last Update: 3 days ago

See Project

Search Results for "language processing" - Page 2

Showing 554 open source projects for "language processing"

ExtractThinker

Search-Index

Hazm

Lingua-RS

Sparrow

WikiChat

OpenVINO

Keras Hub

MarkPDFDown

LLM.swift

Whisper

textlint

DeepSparse

DocETL

LLM-Aided OCR Project

natural

Awesome Fraud Detection Research Papers

Datasets

BEIR

Unstructured.IO

Chinese-XLNet

VideoCaptioner

PaperAI

FastRAG

LiteParse

Search Results for "language processing" - Page 2

Showing 554 open source projects for "language processing"

ExtractThinker

Search-Index

Hazm

Lingua-RS

Sparrow

WikiChat

OpenVINO

Keras Hub

MarkPDFDown

LLM.swift

Whisper

textlint

DeepSparse

DocETL

LLM-Aided OCR Project

natural

Awesome Fraud Detection Research Papers

Datasets

BEIR

Unstructured.IO

Chinese-XLNet

VideoCaptioner

PaperAI

FastRAG

LiteParse

Related Searches

Related Categories