Page 5 | extraction free download

Showing 257 open source projects for "extraction"

View related business solutions

Artificial Intelligence Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Auth0 B2B Essentials: SSO, MFA, and RBAC Built In
Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.

Sign Up Free
1

AI Website Cloner Template

Clone any website with one command using AI coding agents

AI Website Cloner Template is a reusable project template that enables AI coding agents to reverse-engineer and recreate existing websites as modern, production-ready applications. It automates the process of analyzing a target website, extracting its design system, and rebuilding it using technologies such as Next.js, React, and Tailwind CSS. The system operates through a multi-stage pipeline that includes reconnaissance, component specification, parallel generation, and final assembly,...

Downloads: 0 This Week

Last Update: 2026-03-30
See Project
2

Markdownify MCP Server

Convert files and web content into clean, usable Markdown easily

...It supports formats such as PDFs, images, audio with transcription, DOCX, XLSX, and PPTX, along with web sources like YouTube transcripts, Bing results, and general webpages. Markdownify MCP is designed to simplify content extraction and make data easier to read, share, and reuse in structured workflows. Developers can install dependencies, build, and run the server locally, then extend functionality by modifying its TypeScript-based tools and server logic. It also allows retrieval of existing Markdown files, making it useful for documentation, research, and AI-assisted workflows. ...

Downloads: 0 This Week

Last Update: 5 hours ago
See Project
3

BrowserNode

Make websites accessible for AI agents. Automate tasks online

Browsernode is an open-source TypeScript framework that allows AI agents to interact directly with web browsers in order to automate tasks and gather information from websites. The project acts as a bridge between AI models and browser automation tools, enabling language models to control web pages programmatically. Built as an implementation compatible with the Browser-use ecosystem, Browsernode allows agents to perform actions such as navigating pages, extracting information, filling...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
4

browserable

Open source and self-hostable browser automation library for AI agents

Browserable is an open-source browser automation framework designed specifically for AI agents that need to interact with web interfaces in a human-like way. The project provides tools that allow automated agents to navigate websites, click buttons, fill out forms, and extract information from pages without manual scripting of each step. Built primarily in JavaScript, the framework offers both a developer-friendly SDK and a REST API that allow integration with AI applications and automation...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

MarkPDFDown

A high-quality PDF to Markdown tool based on large language model

MarkPDFdown is an open-source document processing tool designed to convert PDF files into structured Markdown output that can be easily used for documentation, content pipelines, and AI processing workflows. The project focuses on extracting text, formatting, and structural information from complex PDF documents and transforming that information into clean Markdown that preserves the original hierarchy of headings, paragraphs, tables, and lists. By producing Markdown rather than raw text,...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
6

nano-graphrag

A simple, easy-to-hack GraphRAG implementation

nano-graphrag is a lightweight implementation of the GraphRAG approach designed to simplify experimentation with graph-based retrieval-augmented generation systems. GraphRAG expands traditional RAG pipelines by constructing knowledge graphs from documents and using relationships between entities to improve the quality and reasoning of AI responses. The nano-GraphRAG project focuses on reducing complexity by providing a compact and readable codebase that preserves the core functionality of...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
7

GPT Crawler

Crawl a site to generate knowledge files to create your own custom GPT

GPT Crawler is an open-source tool designed to automatically crawl websites and generate structured knowledge that can be used to build AI assistants and retrieval systems. It focuses on extracting high-quality textual content from web pages and preparing it in formats suitable for embedding, indexing, or fine-tuning workflows. The project is especially useful for teams that want to turn documentation sites or knowledge bases into conversational AI backends without building custom scrapers...

Downloads: 0 This Week

Last Update: 2026-03-02
See Project
8

Playwriter

Chrome extension to let agents control your browser

...The system enables browser automation by running Playwright commands through a persistent session managed by a background extension, allowing agents or scripts to navigate, interact with, and query browser contexts without losing state between commands. This makes it valuable for scenarios where AI agents need to perform complex web automation tasks—like multi-step navigation, form interaction, or content extraction—without reinitializing context or state every time. Playwriter’s architecture supports both extension-based control for real browser windows and CLI integration, giving developers flexibility in how they build and run browser automation workflows.

Downloads: 0 This Week

Last Update: 2026-04-16
See Project
9

agentation

The visual feedback tool for agents

Agentation is a visual annotation and feedback tool designed to make interacting with AI coding agents more intuitive and precise by letting developers visually click on frontend elements in a browser and annotate them with context before sending structured feedback to an agent. Instead of describing UI elements in text — like “the blue button in the sidebar” — users click directly on elements to automatically capture selectors, positions, and contextual metadata that can be consumed by AI...

Downloads: 0 This Week

Last Update: 2026-03-25
See Project
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
10

GetProfile

User profile and long-term memory for your AI agent

GetProfile is a drop-in proxy layer that sits in front of your LLM provider to turn otherwise stateless chat requests into a system with persistent user profiles and long-term memory. Instead of forcing you to redesign your application, you route your model calls through GetProfile and it captures conversation context automatically as traffic flows. It then extracts structured traits and “memories” from those conversations, stores them, and injects the most relevant profile context back into...

Downloads: 0 This Week

Last Update: 2026-01-06
See Project
11

NeMo Curator

Scalable data pre processing and curation toolkit for LLMs

NeMo Curator is a Python library specifically designed for fast and scalable dataset preparation and curation for large language model (LLM) use-cases such as foundation model pretraining, domain-adaptive pretraining (DAPT), supervised fine-tuning (SFT) and paramter-efficient fine-tuning (PEFT). It greatly accelerates data curation by leveraging GPUs with Dask and RAPIDS, resulting in significant time savings. The library provides a customizable and modular interface, simplifying pipeline...

Downloads: 0 This Week

Last Update: 2026-02-23
See Project
12

Browserbase MCP Server

Allow LLMs to control a browser with Browserbase and Stagehand

Browserbase MCP Server is a server implementation of the Model Context Protocol (MCP) that enables large language models to interact with web browsers programmatically through cloud-based automation. The project provides a standardized interface for connecting AI systems to real-world web environments, allowing them to navigate pages, extract structured data, and perform user-like actions such as clicking, typing, and form submission. It leverages Browserbase infrastructure along with...

Downloads: 0 This Week

Last Update: 2026-03-31
See Project
13

Memori

SQL-native memory layer enabling persistent context for AI agents

Memori is an open source SQL-native memory engine designed to add persistent memory capabilities to AI applications, large language models, and multi-agent systems. It provides a memory layer that automatically captures conversations and interactions between users and AI models, allowing systems to retain knowledge across sessions instead of operating statelessly. It extracts structured information such as facts, preferences, rules, and summaries from interactions and stores them in standard...

Downloads: 0 This Week

Last Update: 2 days ago
See Project
14

FlexLLMGen

Running large language models on a single GPU

FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware. The architecture distributes computation and memory usage across the GPU, CPU, and disk in order to maximize the number of tokens processed during inference. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
15

Read Frog

Open Source Immersive Translate

Read Frog is an open-source browser extension designed to transform everyday web reading into an immersive language learning experience powered by artificial intelligence. The tool integrates translation, contextual explanations, and content analysis directly into the browsing workflow so users can learn languages naturally while reading authentic online content. Instead of forcing learners to switch between translation tools and the original text, the extension displays translations...

Downloads: 0 This Week

Last Update: 6 hours ago
See Project
16

Reader LLM

Convert any URL to an LLM-friendly input with a simple prefix

Reader LLM is an open-source tool designed to convert web content into formats that are easier for large language models to process. The system works by transforming a webpage into a clean text or Markdown representation that removes unnecessary formatting and highlights the core information within the page. Developers can use a simple URL prefix to retrieve a version of a webpage that has been optimized for machine consumption, making it suitable for use in AI agents or retrieval-augmented...

Downloads: 0 This Week

Last Update: 2026-04-16
See Project
17

ArXiv MCP Server

A Model Context Protocol server for searching and analyzing arXiv

arxiv-mcp-server bridges AI assistants and the arXiv repository through a clean MCP interface, enabling search, metadata retrieval, and content access without bespoke scraping. With simple tools like “search” and “fetch,” an agent can find papers, pull abstracts, and download PDFs for downstream summarization or analysis. The project includes packaging and CI to publish to PyPI, plus tests and linting for reliability. Issue threads show feature requests such as extracting embedded LaTeX and...

Downloads: 0 This Week

Last Update: 2026-04-06
See Project
18

Firecrawl MCP Server

Adds powerful web scraping and search to Cursor and Claude

firecrawl-mcp-server is the official MCP integration for Firecrawl that brings high-recall web scraping, crawling, and search into IDEs and agent runtimes. It exposes tools for single-page scrape, multi-URL batch jobs, site discovery, and search enrichment, returning cleaned, structured content suitable for downstream LLM reasoning. The server is designed to run with Firecrawl’s hosted API or self-hosted deployments, making it flexible for enterprise data-governance requirements. Built-in...

Downloads: 0 This Week

Last Update: 2025-10-08
See Project
19

MiniCPM-o

A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming

MiniCPM-o 2.6 is a cutting-edge multimodal large language model (MLLM) designed for high-performance tasks across vision, speech, and video. Capable of running on end-side devices such as smartphones and tablets, it provides powerful features like real-time speech conversation, video understanding, and multimodal live streaming. With 8 billion parameters, MiniCPM-o 2.6 surpasses its predecessors in versatility and efficiency, making it one of the most robust models available. It supports...

Downloads: 0 This Week

Last Update: 2025-05-15
See Project
20

Docspell

Assist in organizing your piles of documents

Docspell is a personal document organizer. Or sometimes called a "Document Management System" (DMS). You'll need a scanner to convert your papers into files. Docspell can then assist in organizing the resulting mess. It can unify your files from scanners, emails, and other sources. It is targeted for home use, i.e. families, households, and also for smaller groups/companies. You can associate tags, set correspondent,s and lots of other predefined and custom metadata. If your documents are...

Downloads: 0 This Week

Last Update: 2025-03-15
See Project
21

eos

A lightweight 3D Morphable Face Model library in modern C++

eos is a lightweight 3D Morphable Face Model fitting library that provides basic functionality to use face models, as well as camera and shape fitting functionality. It's written in modern C++11/14. MorphableModel and PcaModel classes to represent 3DMMs, with basic operations like draw_sample(). Supports the Surrey Face Model (SFM), 4D Face Model (4DFM), Basel Face Model (BFM) 2009 and 2017, and the Liverpool-York Head Model (LYHM) out-of-the-box.

Downloads: 0 This Week

Last Update: 2024-12-10
See Project
22

Node.js Client For NLP Cloud

NLP Cloud serves high performance pre-trained or custom models

...NLP Cloud serves high-performance pre-trained or custom models for NER, sentiment analysis, classification, summarization, dialogue summarization, paraphrasing, intent classification, product description and ad generation, chatbot, grammar and spelling correction, keywords and keyphrases extraction, text generation, image generation, blog post generation, text generation, question answering, automatic speech recognition, machine translation, language detection, semantic search, semantic similarity, tokenization, POS tagging, embeddings, and dependency parsing. It is ready for production, and served through a REST API. You can either use the NLP Cloud pre-trained models, fine-tune your own models, or deploy your own models.

Downloads: 0 This Week

Last Update: 2024-11-27
See Project
23

Vearch

A distributed system for embedding-based vector retrieval

...Through the module of the plugin, a complete default visual search system can be deployed just with one click. Otherwise, you can easily customize your own image, video, or text feature extraction algorithm plugin. This GIF provides a clear demonstration of the project vearch usage and its internal structure. The use of vearch is mainly divided into three steps. Firstly, create DB and Space, then import your data, and finally, you can search on your own dataset.

Downloads: 0 This Week

Last Update: 2026-02-04
See Project
24

Weaviate

Weaviate is a cloud-native, modular, real-time vector search engine

...Weaviate in detail: Weaviate is a low-latency vector search engine with out-of-the-box support for different media types (text, images, etc.). It offers Semantic Search, Question-Answer-Extraction, Classification, Customizable Models (PyTorch/TensorFlow/Keras), and more. Built from scratch in Go, Weaviate stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance of a cloud-native database, all accessible through GraphQL, REST, and various language clients.

Downloads: 1 This Week

Last Update: 6 days ago
See Project
25

Prompt Engineering Interactive Tutorial

Anthropic's Interactive Prompt Engineering Tutorial

...The course leans heavily on realistic failure modes (ambiguity, hallucination, brittle instructions) and shows how to iteratively debug prompts the way you would debug code. Lessons include building prompts from scratch for common tasks like extraction, classification, transformation, and step-by-step reasoning, with checkpoints that let you compare your outputs against solid baselines. You’ll also practice advanced patterns such as tool use, constrained generation, and response validation so outputs are trustworthy and machine-consumable.

Downloads: 1 This Week

Last Update: 2025-10-06
See Project