Page 3 | extraction free download

Showing 257 open source projects for "extraction"

View related business solutions

Artificial Intelligence Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

TikTok MCP

Model Context Protocol (MCP) with TikTok integration

The TikTok MCP integrates TikTok access into AI applications like Claude AI via TikNeuron. It enables analysis and interaction with TikTok content to determine virality factors and extract video content.

Downloads: 0 This Week

Last Update: 2026-02-27
See Project
2

PaddleOCR-json

OCR offline image text recognition command line windows program

PaddleOCR-json is an OCR engine based on the PaddleOCR project that provides a command-line interface and tools for extracting text from images and exporting results in structured JSON format. It wraps the PaddleOCR models, which are capable of detecting and recognizing text in a wide variety of languages and layouts, into a self-contained executable that can be run locally without needing a deep learning environment configured manually. This makes it practical for developers or system...

Downloads: 8 This Week

Last Update: 2026-01-15
See Project
3

FinGPT

Open-Source Financial Large Language Models

...The platform typically includes tools for fine-tuning, context engineering, and prompt templating, enabling users to build specialized assistants for tasks like sentiment analysis, earnings summary generation, risk profiling, trading signal interpretation, and document extraction from financial reports.

Downloads: 17 This Week

Last Update: 2026-04-03
See Project
4

DeepCamera

Open-Source AI Camera. Empower any camera/CCTV

...SharpAI yolov7_reid is an open-source Python application that leverages AI technologies to detect intruders with traditional surveillance cameras. The source code is here It leverages Yolov7 as a person detector, FastReID for person feature extraction, Milvus the local vector database for self-supervised learning to identify unseen persons, Labelstudio to host images locally and for further usage such as label data and train your own classifier. It also integrates with Home-Assistant to empower smart homes with AI technology.

Downloads: 17 This Week

Last Update: 2026-03-20
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Search1API MCP

A Model Context Protocol (MCP) server

The Search1API MCP Server is a Model Context Protocol server that provides search and crawl functionality using Search1API. It enables web and news searches, content extraction, and sitemap retrieval, integrating seamlessly with MCP clients.

Downloads: 0 This Week

Last Update: 2025-04-08
See Project
6

Superlinked

Superlinked is a Python framework for AI Engineers

Superlinked is a Python framework designed for AI engineers to build high-performance search and recommendation applications that combine structured and unstructured data.

Downloads: 0 This Week

Last Update: 2025-10-22
See Project
7

Chonkie

The no-nonsense RAG chunking library

Chonkie is an AI-powered framework designed for building conversational agents and chatbots with natural language understanding and multi-turn conversation support.

Downloads: 0 This Week

Last Update: 2025-03-01
See Project
8

HunyuanOCR

OCR expert VLM powered by Hunyuan's native multimodal architecture

HunyuanOCR is an open-source, end-to-end OCR (optical character recognition) Vision-Language Model (VLM) developed by Tencent‑Hunyuan. It’s designed to unify the entire OCR pipeline, detection, recognition, layout parsing, information extraction, translation, and even subtitle or structured output generation, into a single model inference instead of a cascade of separate tools. Despite being fairly lightweight (about 1 billion parameters), it delivers state-of-the-art performance across a wide variety of OCR tasks, outperforming many traditional OCR systems and even other multimodal models on benchmark suites. ...

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
9

AgenticSeek

Fully Local Manus AI. No APIs, No $200 monthly bills

...AgenticSeek includes intelligent agent selection, allowing it to determine the best internal agent to handle a given request. It also supports hands-free workflows such as automated web form interaction and information extraction. Overall, the project functions as a self-hosted, multi-capability AI agent designed for users who prioritize autonomy, privacy, and local execution.

Downloads: 4 This Week

Last Update: 7 days ago
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
10

Director

AI video agents framework for next-gen video interactions

Director is a video database management system designed to organize, search, and retrieve large collections of video content efficiently.

Downloads: 0 This Week

Last Update: 2025-01-29
See Project
11

Recognizers-Text

Recognition and resolution of numbers, units, date/time, etc.

Recognizers-Text is a multilingual text recognition library that extracts structured information such as dates, numbers, and currency values from unstructured text.

Downloads: 0 This Week

Last Update: 2025-02-12
See Project
12

BrowserOS

Agentic browser; privacy-first alternative to ChatGPT Atlas

BrowserOS is an open-source, agentic web browser built on a Chromium base that integrates AI agents directly into the browsing experience. Rather than just doing standard browsing, it places AI intelligence at the core: you can connect your own API keys (for e.g., OpenAI, Anthropic, Google Gemini) or run local models (via e.g., Ollama) so that your browsing data and automation stay on your machine — privacy and control are emphasized throughout. The interface remains familiar to users of...

Downloads: 16 This Week

Last Update: 2026-04-08
See Project
13

TTime

Screenshots, word marking, OCR, AI, translation software

TTime is a desktop productivity tool that combines translation, OCR, and screen capture capabilities into a unified application designed for fast and efficient text processing workflows. It allows users to translate text through multiple methods, including direct input, screenshot-based capture, and real-time word selection, making it versatile for both casual use and professional tasks. The software integrates a wide range of translation engines and OCR services, including cloud-based...

Downloads: 3 This Week

Last Update: 2026-03-18
See Project
14

LangChain Extract

Did you say you like data?

LangChain Extract is an open-source reference application designed to demonstrate how large language models can be used to extract structured data from unstructured text and document files. The project implements a lightweight web service that allows developers to define extraction schemas and apply them to various sources such as plain text, HTML, or PDF documents. Built using FastAPI and the LangChain framework, the application exposes a REST API that can process documents and return structured outputs that match user-defined JSON schemas. Developers can create reusable “extractors” that define what type of information should be pulled from a document, along with example prompts that improve extraction quality through in-context learning.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
15

AutoClip

AI-powered video clipping and highlight generation

AutoClip is an open-source, AI-powered video processing system designed to automate the extraction of “highlight” segments from full-length videos — ideal for creators who want to generate bite-sized clips, compilations, or highlight reels without manually sifting through hours of footage. The system supports downloading videos from major platforms (e.g. YouTube, Bilibili), or accepting local uploads, and then applies AI analysis to identify segments worth clipping based on content (e.g. high energy moments, speech, or other heuristics). ...

Downloads: 17 This Week

Last Update: 2025-12-08
See Project
16

Kor

LLM

This is a half-baked prototype that “helps” you extract structured data from text using LLMs. Specify the schema of what should be extracted and provide some examples. Kor will generate a prompt, send it to the specified LLM and parse out the output. You might even get results back.

Downloads: 0 This Week

Last Update: 2024-07-20
See Project
17

designlang

Extract any website's complete design system with one command

designlang is a powerful tool that extracts complete design systems from existing websites using automated analysis and converts them into reusable assets and tokens. It generates structured outputs such as design tokens, semantic components, and styling systems that can be used across multiple platforms. The tool supports exporting to frameworks like Tailwind, SwiftUI, Flutter, and WordPress, making it highly versatile for cross-platform development. It also integrates with tools like Figma...

Downloads: 6 This Week

Last Update: 2 days ago
See Project
18

GalTransl

Automated translation solution for visual novels

GalTransl is an automated translation system specifically designed for visual novels, particularly those in the “galgame” genre, leveraging large language models to streamline and enhance the translation process. It integrates support for multiple advanced LLM providers such as GPT-4, Claude, DeepSeek, and other models, enabling high-quality, context-aware translations that go beyond traditional machine translation approaches. The platform is built to handle the unique structure of visual...

Downloads: 6 This Week

Last Update: 3 days ago
See Project
19

GLM-OCR

Accurate × Fast × Comprehensive

GLM-OCR is an open-source multimodal optical character recognition (OCR) model built on a GLM-V encoder–decoder foundation that brings robust, accurate document understanding to complex real-world layouts and modalities. Designed to handle text recognition, table parsing, formula extraction, and general information retrieval from documents containing mixed content, GLM-OCR excels across major benchmarks while remaining highly efficient with a relatively compact parameter size (~0.9B), enabling deployment in high-concurrency services and edge environments. The model’s multimodal capabilities allow it to reason across image and text content holistically, capturing structured and unstructured information from pages that include dense tables, seals, code snippets, and varied document graphics. ...

Downloads: 6 This Week

Last Update: 2026-04-08
See Project
20

Docling

Get your documents ready for gen AI

...The project focuses on converting and parsing many document formats into a unified structured representation that downstream systems can easily consume. It supports advanced PDF understanding, including layout detection, table extraction, and reading order analysis, enabling high-fidelity document intelligence pipelines. Docling is designed to run efficiently on commodity hardware and can be used both as a Python API and a command-line tool. Its modular architecture allows developers to extend functionality and integrate specialized models for tasks such as OCR and audio transcription. ...

Downloads: 2 This Week

Last Update: 6 days ago
See Project
21

Pot Desktop

A cross-platform software for text translation and recognition

...It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. The tool supports external plugin extensions, which means its functionality can be expanded far beyond the built-in options: you can add translation engines, OCR backends, TTS engines, vocabulary export (e.g. for language learning), and more. Pot-Desktop works on Windows, macOS, and Linux (including Wayland environments), and offers convenient installers or package-manager installation methods (e.g. via brew or .deb, etc.), so it’s accessible for users on all major desktop OSes.

Downloads: 10 This Week

Last Update: 2025-11-28
See Project
22

Transformers

State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX

...Using pre-trained models can reduce your compute costs, carbon footprint, and save you the time and resources required to train a model from scratch. These models support common tasks in different modalities. Text, for tasks like text classification, information extraction, question answering, summarization, translation, text generation, in over 100 languages. Images, for tasks like image classification, object detection, and segmentation. Audio, for tasks like speech recognition and audio classification. Transformers provides APIs to quickly download and use those pretrained models on a given text, fine-tune them on your own datasets and then share them with the community on our model hub. ...

Downloads: 7 This Week

Last Update: 13 hours ago
See Project
23

spaCy models

Models for the spaCy Natural Language Processing (NLP) library

...The library respects your time, and tries to avoid wasting it. It's easy to install, and its API is simple and productive. spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. If your application needs to process entire web dumps, spaCy is the library you want to be using. Since its release in 2015, spaCy has become an industry standard with a huge ecosystem. Choose from a variety of plugins, integrate with your machine learning stack and build custom components and workflows.

Downloads: 3 This Week

Last Update: 2026-03-18
See Project
24

Actors MCP Server

Model Context Protocol (MCP) Server for Apify's Actors

The Apify Actors MCP Server is a Model Context Protocol (MCP) server that enables AI assistants to interact with Apify Actors. This integration allows AI models to utilize various web scraping and automation tools provided by Apify, facilitating tasks such as data extraction and web automation.

Downloads: 0 This Week

Last Update: 2 days ago
See Project
25

Dendrite

Tools to build web AI agents that can authenticate

Dendrite Python SDK is a toolkit for building web AI agents that can authenticate, interact with, and extract data from any website, facilitating web automation tasks.

Downloads: 0 This Week

Last Update: 2025-01-29
See Project