Showing 987 open source projects for "extraction"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    Let your crypto work for you

    Put idle assets to work with competitive interest rates, borrow without selling, and trade with precision. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • 1
    Fapello.Downloader

    Fapello.Downloader

    NSFW Windows app to batch download images and videos

    Fapello.Downloader is a Python-based desktop application designed to automate the bulk downloading of images and videos from the Fapello platform through a simple graphical interface. The tool allows users to paste a content URL and retrieve all associated media in a single operation, eliminating the need for manual downloading of individual files. It is built entirely in Python and leverages libraries such as BeautifulSoup and requests for scraping and data retrieval, while using a...
    Downloads: 74 This Week
    Last Update:
    See Project
  • 2
    KaraKeep

    KaraKeep

    A self-hostable bookmark-everything app

    ...Automatic fetching of link titles, descriptions, and images streamlines saving content without manual edits, while rule-based management lets users define customized workflows. With support for image OCR and structured data extraction, Karakeep functions as a flexible personal knowledge base for researchers, content creators, and heavy bookmarkers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    PDFMathTranslate

    PDFMathTranslate

    PDF scientific paper translation with preserved formats

    PDFMathTranslate is a Python-based tool that uses AI translation to convert academic PDFs into bilingual (e.g. Chinese-English) documents while preserving formatting, including math notation. It supports OCR-enhanced content and offers CLI, GUI, Docker, and Zotero integration under AGPL v3.
    Downloads: 26 This Week
    Last Update:
    See Project
  • 4
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    docext

    docext

    An on-premises, OCR-free unstructured data extraction

    docext is a document intelligence toolkit that uses vision-language models to extract structured information from documents such as PDFs, forms, and scanned images. The system is designed to operate entirely on-premises, allowing organizations to process sensitive documents without relying on external cloud services. Unlike traditional document processing pipelines that rely heavily on optical character recognition, docext leverages multimodal AI models capable of understanding both visual...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    GoWall

    GoWall

    A tool to convert a Wallpaper's color scheme / palette, image to pixel

    Gowall is a versatile command-line tool for processing images, initially created to convert wallpapers to match specific color schemes. It has evolved to include features like image-to-pixel-art conversion, color palette extraction, background removal, and more, making it a powerful utility for image manipulation.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    ripgrep

    ripgrep

    Regex pattern directory search tool that respects your .gitignore

    ...By default, ripgrep will ignore your .gitignore and skip hidden files or directories and binary files automatically. ripgrep has first class support on Windows, macOS and Linux, with binary downloads available for every release. ripgrep is similar to other popular search tools like The Silver Searcher, ack and grep. ripgrep supports arbitrary input preprocessing filters which could be PDF text extraction, less supported decompression, decrypting, automatic encoding detection and so on. In other words, use ripgrep if you like speed, filtering by default, fewer bugs and Unicode support.
    Downloads: 88 This Week
    Last Update:
    See Project
  • 8
    PDFPatcher

    PDFPatcher

    A versatile toolkit for PDF manipulation

    PDFPatcher (aka “PDF补丁丁”) is a versatile toolkit for PDF manipulation—editing document metadata, bookmarks, page layout, content restrictions, rotation, compression, merging/splitting, image extraction, and more, all within an intuitive interface. Merge/split PDFs or images, preserve or add bookmarks, and set page dimensions. Batch style/color/target changes, regex/XPath search/replace, mid‑page positioning. Modify PDF metadata, page numbers, links, initial view mode, and remove open actions.
    Downloads: 53 This Week
    Last Update:
    See Project
  • 9
    Kaldi

    Kaldi

    kaldi-asr/kaldi is the official location of the Kaldi project

    ...Kaldi is designed for researchers who need a highly customizable environment to experiment with new algorithms, as well as for practitioners who want robust, production-ready ASR pipelines. It includes extensive tools for data preparation, feature extraction, acoustic and language modeling, decoding, and evaluation. With its modular design, Kaldi allows users to adapt the system to a wide range of languages and domains. As one of the most influential projects in speech recognition, it has become a foundation for much of the modern work in ASR.
    Downloads: 4 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    imagefap-dl

    imagefap-dl

    ImageFap gallery downloader

    imagefap-dl is a command-line downloader designed to automate the retrieval of galleries and media from ImageFap, focusing on efficiency, reliability, and structured output. The tool enables users to download entire galleries or specific content collections by parsing URLs and systematically fetching associated media files. It is optimized for batch downloading scenarios, allowing users to archive large sets of images with minimal manual intervention. The program typically includes...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 11
    The Web MCP

    The Web MCP

    A powerful Model Context Protocol (MCP) server

    Bright Data’s Web MCP server gives AI assistants robust, real-time web capabilities through an MCP interface designed to avoid blocks, rate limits, and CAPTCHAs. It presents search, crawl, navigate, and extraction tools that agents can call directly, replacing brittle scraping prompts with typed operations. The README markets it as a “gateway” to the live web so assistants don’t fall back to stale training data. Bright Data also advertises a getting-started tier with a free monthly allotment, plus options for remote or self-hosted operation depending on governance needs. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    JavaScript Obfuscator

    JavaScript Obfuscator

    A powerful obfuscator for JavaScript and Node.js

    JavaScript Obfuscator is a Node.js library and CLI that transforms readable JavaScript into hardened, difficult-to-reverse code. It applies techniques such as identifier mangling, string array extraction/encoding, control-flow flattening, dead-code injection, and numeric literal transformations to disguise intent. Advanced options include self-defending code, domain locking, debug/console protection, and property key transformation, allowing you to tailor defenses to your threat model. The tool supports source maps and granular “threshold”/whitelist settings so you can balance protection with performance and debuggability. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    warp

    warp

    A super-easy, composable, web server framework for warp speeds

    The fundamental building block of warp is the Filter, they can be combined and composed to express rich requirements on requests. A Filter in warp is essentially a function that can operate on some input, either something from a request, or something from a previous Filter, and returns some output, which could be some app-specific type you wish to pass around, or can be some reply to send back as an HTTP response. That might sound simple, but the exciting part is the combinators that exist...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Tamagui

    Tamagui

    Style React fast with 100% parity on React Native

    ...Tamagui also includes a full UI kit with both styled and unstyled components, enabling flexible design system creation. Its compiler performs advanced optimizations such as CSS extraction, tree flattening, and dead code elimination, reducing bundle size and improving rendering speed. The system includes robust theming capabilities with support for design tokens, responsive props, and dynamic themes like dark mode.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    CL4R1T4S

    CL4R1T4S

    Archive of leaked AI system prompts and internal instruction sets

    ...According to the its description, the initiative is motivated by the idea that understanding the input instructions behind an AI system helps users better interpret its outputs. Contributors are encouraged to add newly discovered or extracted prompts, along with information such as the model version, extraction date, and etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    DINOv2

    DINOv2

    PyTorch code and models for the DINOv2 self-supervised learning

    ...The core promise is that a single pretrained backbone can transfer well to many downstream tasks—from linear probing on classification to retrieval, detection, and segmentation—often requiring little or no fine-tuning. The repository includes code for training, evaluating, and feature extraction, with utilities to run k-NN or linear evaluation baselines to assess representation quality. Pretrained checkpoints cover multiple model sizes so practitioners can trade accuracy for speed and memory depending on their deployment constraints.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    ReClip

    ReClip

    Download videos from almost any website

    ReClip is a lightweight, self-hosted media downloader that provides a simple web-based interface for downloading videos and audio from a wide range of online platforms. Built around the yt-dlp engine, it supports over a thousand websites, including major platforms like YouTube, TikTok, and Instagram, allowing users to retrieve media content in various formats. The application emphasizes simplicity and minimalism, featuring a clean interface built with plain HTML, CSS, and JavaScript without...
    Downloads: 83 This Week
    Last Update:
    See Project
  • 18
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. ...
    Downloads: 54 This Week
    Last Update:
    See Project
  • 19
    nhentai

    nhentai

    A library for interacting with the nhentai API

    nhentai is a JavaScript and TypeScript library designed to interact with the nhentai API and retrieve doujinshi metadata and content information. It enables developers to programmatically access galleries, titles, tags, covers, and page URLs from the nhentai platform. The library supports both CommonJS and ES6 module imports, making it easy to integrate into different Node.js projects. Developers can use it to fetch specific doujin entries, explore associated metadata, and process gallery...
    Downloads: 53 This Week
    Last Update:
    See Project
  • 20
    web-access

    web-access

    Skill for installing full networking capabilities for Claude Code

    web-access is a tool designed to give AI agents structured and controlled access to web content, enabling them to retrieve, navigate, and process information from online sources in real time. It abstracts common web interactions such as page loading, data extraction, and navigation into reusable functions that can be invoked by agents. The system emphasizes safety and control, likely including mechanisms to manage permissions, rate limits, and content filtering. This allows agents to operate within defined boundaries while still benefiting from dynamic, up-to-date information. The architecture supports integration with broader agent frameworks, making it a key component for building systems that require external knowledge. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    BlockArrays.jl

    BlockArrays.jl

    BlockArrays for Julia

    ...The type BlockArray stores each block contiguously while the type PseudoBlockArray stores the full matrix contiguously. This means that BlockArray supports fast noncopying extraction and insertion of blocks while PseudoBlockArray supports fast access to the full matrix to use in for example a linear solver.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    FFsubsync

    FFsubsync

    Automagically synchronize subtitles with video

    Language-agnostic automatic synchronization of subtitles with video, so that subtitles are aligned to the correct starting point within the video. First, make sure ffmpeg is installed. Make sure ffmpeg is on your path and can be referenced from the command line! Next, grab the script. It should work with both Python 2 and Python 3. There may be occasions where you have a correctly synchronized srt file in a language you are unfamiliar with, as well as an unsynchronized srt file in your...
    Downloads: 44 This Week
    Last Update:
    See Project
  • 23
    JS Analyzer

    JS Analyzer

    Burp Suite extension for JavaScript static analysis

    JS Analyzer is a powerful static analysis tool implemented as a Burp Suite extension that helps security researchers and web developers automatically uncover important artifacts in JavaScript files during web application testing. It parses JavaScript responses intercepted by Burp Suite and intelligently extracts API endpoints, full URLs (including cloud storage links), secrets like API keys or tokens, and email addresses while filtering out noise from irrelevant code patterns. The extension...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    3FS

    3FS

    A high-performance distributed file system

    The 3FS repository (standing likely for “Feature 3F System” or similar) is focused on providing a feature extraction and transformation framework tailored to deep and large models, especially in token-based systems. Its primary aim is to support efficient and scalable feature transformation pipelines—especially for inference environments—by batching, caching, and integrating feature-based modules like segmenters, sparse retrievers, and scorers seamlessly.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Zotero

    Zotero

    Tool to help you collect, organize, annotate, cite, and share research

    Zotero is a powerful, free, open-source research management application designed to help students, academics, and professionals collect, organize, annotate, cite, and share research sources and materials for papers, projects, or books. It can save web pages, PDFs, books, articles, and more with metadata, automatically extract bibliographic information, and organize items into collections and tag systems, while supporting notes and annotations directly alongside references. Zotero’s interface...
    Downloads: 26 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB