Showing 1224 open source projects for "extract"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    AUTOMATIC1111 Stable Diffusion web UI
    AUTOMATIC1111's stable-diffusion-webui is a powerful, user-friendly web interface built on the Gradio library that allows users to easily interact with Stable Diffusion models for AI-powered image generation. Supporting both text-to-image (txt2img) and image-to-image (img2img) generation, this open-source UI offers a rich feature set including inpainting, outpainting, attention control, and multiple advanced upscaling options. With a flexible installation process across Windows, Linux, and...
    Downloads: 293 This Week
    Last Update:
    See Project
  • 2
    LLM Scraper

    LLM Scraper

    Extract structured data from webpages using LLM-powered scraping

    LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    mtail

    mtail

    Extract internal monitoring data from application logs

    Extract internal monitoring data from application logs for collection in a time-series database. mtail is a tool for extracting metrics from application logs to be exported into a timeseries database or timeseries calculator for alerting and dashboarding. It fills a monitoring niche by being the glue between applications that do not export their own internal state (other than via logs) and existing monitoring systems, such that system operators do not need to patch those applications to instrument them or writing custom extraction code for every such application. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    Stagehand

    Stagehand

    An AI web browsing framework focused on simplicity and extensibility

    ...Each Stagehand function takes in an atomic instruction, such as act("click the login button") or extract("find the red shoes"), generates the appropriate Playwright code to accomplish that instruction, and executes it.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    OpenAPI.NET

    OpenAPI.NET

    Object model for OpenAPI documents in .NET

    The OpenAPI.NET SDK contains a useful object model for OpenAPI documents in .NET along with common serializers to extract raw OpenAPI JSON and YAML documents from the model. The OpenAPI.NET project holds the base object model for representing OpenAPI documents as .NET objects. Some developers have found the need to write processors that convert other data formats into this OpenAPI.NET object model. We'd like to curate that list of processors in this section of the readme.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    LosslessCut

    LosslessCut

    The swiss army knife of lossless video/audio editing

    ...The main feature is lossless trimming and cutting of video and audio files, which is great for saving space by rough-cutting your large video files taken from a video camera, GoPro, drone, etc. It lets you quickly extract the good parts from your videos and discard many gigabytes of data without doing a slow re-encode and thereby losing quality. Or you can add a music or subtitle track to your video without needing to encode. Everything is extremely fast because it does an almost direct data copy, fueled by the awesome FFmpeg which does all the grunt work. ...
    Downloads: 157 This Week
    Last Update:
    See Project
  • 7
    Scribe.js

    Scribe.js

    JavaScript OCR and text extraction for images and PDFs

    Scribe.js is a JavaScript library that provides Optical Character Recognition (OCR) and text extraction capabilities for both images and PDF documents, aimed at developers who want to build OCR features directly into their applications. The library can take image files (such as PNG or JPEG) and recognize the text they contain, and it can also extract text from PDF files that either already contain text or are image-based scans, using modern web standards and WebAssembly under the hood. In addition to simple text extraction, Scribe.js supports writing or injecting a high-quality invisible text layer back into PDFs, effectively making them searchable and improving usability for indexing or accessibility. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    Interface Design

    Interface Design

    Design engineering for Claude Code

    ...The plugin prompts users to confirm a design direction early in the process and then applies those principles consistently — from button sizes to spacing scales and color tokens — so work stays aligned with the established system. It also offers commands to inspect the current design system status, audit inconsistencies, and extract patterns back into a reusable format, making it a live feedback loop for quality UI work.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    Umi-OCR is a free and open-source optical character recognition (OCR) tool designed to provide fast, offline text extraction from images, screenshots, PDFs, and more without requiring a network connection. It includes a highly efficient offline OCR engine with built-in multilingual recognition libraries, so users can extract text across multiple languages with high accuracy directly on their machines. The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. ...
    Downloads: 51 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    Chandra

    Chandra

    OCR model for complex documents with layout-aware structured outputs

    Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    PDFPatcher

    PDFPatcher

    A versatile toolkit for PDF manipulation

    PDFPatcher (aka “PDF补丁丁”) is a versatile toolkit for PDF manipulation—editing document metadata, bookmarks, page layout, content restrictions, rotation, compression, merging/splitting, image extraction, and more, all within an intuitive interface. Merge/split PDFs or images, preserve or add bookmarks, and set page dimensions. Batch style/color/target changes, regex/XPath search/replace, mid‑page positioning. Modify PDF metadata, page numbers, links, initial view mode, and remove open actions.
    Downloads: 34 This Week
    Last Update:
    See Project
  • 12
    Link-Preview-JS

    Link-Preview-JS

    Extract web links information: title, description, images, videos, etc

    link-preview-js is a lightweight TypeScript library that extracts metadata from URLs or HTML content to generate rich link previews. By parsing Open Graph tags and other metadata, it retrieves information such as titles, descriptions, images, and videos. Designed primarily for Node.js and mobile environments, it facilitates the creation of link previews similar to those found on social media platforms.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Voice-Pro

    Voice-Pro

    Comprehensive Gradio WebUI for audio processing

    Voice-Pro is the best gradio WebUI for transcription, translation and text-to-speech. It can be easily installed with one click. Create a virtual environment using Miniconda, running completely separate from the Windows system (fully portable). Supports real-time transcription and translation, as well as batch mode.
    Downloads: 33 This Week
    Last Update:
    See Project
  • 14
    Tailwind CSS

    Tailwind CSS

    A utility-first CSS framework for rapid UI development

    Rapidly build modern websites without ever leaving your HTML. A utility-first CSS framework packed with classes like flex, pt-4, text-center and rotate-90 that can be composed to build any design, directly in your markup. Utility classes help you work within the constraints of a system instead of littering your stylesheets with arbitrary values. They make it easy to be consistent with color choices, spacing, typography, shadows, and everything else that makes up a well-engineered design...
    Downloads: 91 This Week
    Last Update:
    See Project
  • 15
    Epublifier

    Epublifier

    Converts some webnovels to epub format

    A tool to convert website-based books or lists of pages to ePub format to read on your eReader/Kindle/etc.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    GraphRAG

    GraphRAG

    A modular graph-based Retrieval-Augmented Generation (RAG) system

    The GraphRAG project is a data pipeline and transformation suite that is designed to extract meaningful, structured data from unstructured text using the power of LLMs.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    PDFIO.jl

    PDFIO.jl

    PDF Reader Library for Native Julia.

    PDFIO is a native Julia implementation for reading PDF files. It's a 100% Julia implementation of the PDF specification. Other than a few well-established algorithms like flate decode (zlib library) or cryptographic operations (OpenSSL library) almost all of the APIs are written in native Julia. PDF files are in existence for over three decades. Implementations of the PDF writers are not always to the specification or they may even vary significantly from vendor to vendor. Every time, you...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    go-i18n

    go-i18n

    Translate your Go program into multiple languages

    ...Code and tests are automatically generated from CLDR data. Supports strings with named variables using text/template syntax. Supports message files of any format (e.g. JSON, TOML, YAML). Use goi18n extract to extract all i18n.Message struct literals in Go source files to a message file for translation. Create an empty message file for the language that you want to add (e.g. translate.es.toml). Run goi18n merge active.en.toml translate.es.toml to populate translate.es.toml with the messages to be translated. The goi18n command manages message files used by the i18n package.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    YouTube Music Downloader

    YouTube Music Downloader

    A simple app to get songs from YouTube in mp3 format with artist name

    YouTube Music Downloader is a command-line music downloader written in Python that retrieves audio from YouTube and enriches it with detailed metadata from external sources. It combines tools like yt-dlp and FFmpeg to extract high-quality audio while automatically tagging files with artist name, album, release date, and artwork. The application distinguishes itself by integrating metadata providers such as Spotify and iTunes, ensuring that downloaded tracks resemble properly organized music library entries. It supports downloading single songs, playlists, or batches of tracks using flexible command-line options. ytmdl also allows customization of output formats, directory structures, and metadata handling through configuration files. ...
    Downloads: 47 This Week
    Last Update:
    See Project
  • 20
    adw-gtk3

    adw-gtk3

    The theme from libadwaita ported to GTK-3

    The theme from libadwaita ported to GTK-3.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    docext

    docext

    An on-premises, OCR-free unstructured data extraction

    ...This allows the system to detect and extract structured elements such as tables, signatures, key fields, and layout information while maintaining semantic understanding of the document content. The toolkit can also convert complex documents into structured markdown representations that preserve formatting and contextual relationships.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    AWS Encryption SDK for Dafny

    AWS Encryption SDK for Dafny

    AWS Encryption SDK for Dafny

    ...This repo uses Duvet to directly document the specification alongside this implementation. Refer to the specification for how to install duvet in order to generate reports. By default duvet_report will extract the spec only if it cannot find the compliance directory in the specification repo, but will re-use a previous extraction if it exists.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    pdfly

    pdfly

    CLI tool to extract (meta)data from PDF and manipulate PDF files

    A Python library designed for manipulating PDF files with functionalities for extraction, transformation, and document generation.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    deepfakes_faceswap

    deepfakes_faceswap

    Deepfakes Software For All

    Faceswap is the leading free and open source multi-platform deepfakes software. When faceswapping was first developed and published, the technology was groundbreaking, it was a huge step in AI development. It was also completely ignored outside of academia because the code was confusing and fragmentary. It required a thorough understanding of complicated AI techniques and took a lot of effort to figure it out. Until one individual brought it together into a single, cohesive collection.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    monolith

    monolith

    CLI tool for saving complete web pages as a single HTML file

    A data hoarder’s dream come true, bundle any web page into a single HTML file. You can finally replace that gazillion of open tabs with a gazillion of .html files stored somewhere on your precious little drive. Unlike the conventional “Save page as”, monolith not only saves the target document, it embeds CSS, image, and JavaScript assets all at once, producing a single HTML5 document that is a joy to store and share. If compared to saving websites with wget -mpk, this tool embeds all assets...
    Downloads: 7 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB