Showing 45 open source projects for "extraction"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    zpdf

    zpdf

    Zero-copy PDF text extraction library written in Zig

    zpdf is a high-performance PDF text extraction library written in Zig that focuses on speed, low overhead, and modern parsing techniques. It leans heavily on memory-mapped file reading and zero-copy patterns where possible, so it can scan large PDFs without repeatedly copying data around in memory. The library supports streaming extraction using efficient arena allocation, making it well suited for workloads that need to process big documents quickly or in batches.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    X-Crawl

    X-Crawl

    Flexible Node.js AI-assisted crawler library

    A high-performance web crawling and scraping framework for Node.js, designed for large-scale data extraction.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    Symfony DomCrawler

    Symfony DomCrawler

    Eases DOM navigation for HTML and XML documents

    Symfony DomCrawler is a PHP component that provides powerful tools for navigating and extracting data from HTML and XML documents. It allows developers to parse, filter, and manipulate web pages using CSS selectors and XPath expressions. DomCrawler is widely used for web scraping, testing, and processing structured content, and integrates well with other Symfony components like BrowserKit.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 4
    Tamagui

    Tamagui

    Style React fast with 100% parity on React Native

    ...Tamagui also includes a full UI kit with both styled and unstyled components, enabling flexible design system creation. Its compiler performs advanced optimizations such as CSS extraction, tree flattening, and dead code elimination, reducing bundle size and improving rendering speed. The system includes robust theming capabilities with support for design tokens, responsive props, and dynamic themes like dark mode.
    Downloads: 10 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    Hacks

    Hacks

    A collection of hacks and one-off scripts

    ...Rather than being a single cohesive application, it serves as a repository of practical command-line tools that can be used independently or combined into workflows. The scripts cover a wide range of tasks, including URL manipulation, parameter replacement, data extraction, and reconnaissance automation. Many of the tools in the repository are designed for efficiency and simplicity, enabling users to perform complex operations with minimal overhead. It is particularly popular among security researchers and developers who need quick, flexible solutions for niche problems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    MatImage

    MatImage

    Image Processing library for Matlab

    matImage is an open-source MATLAB library for image processing and analysis. It provides a variety of tools for image enhancement, segmentation, and feature extraction. It’s especially useful for users working on biomedical images or those needing detailed image analysis in MATLAB.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    unipdf

    unipdf

    Golang PDF library for creating and processing PDF files (pure go)

    UniDoc UniPDF is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is written and supported by FoxyUtils.com, where the library is used to power many of its services. Every release of our libraries is automatically tested against known vulnerabilities and do not pass unless everything is remediated. All changes are carefully reviewed by our team.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    sharp

    sharp

    High performance Node.js image processing module

    ...Colour spaces, embedded ICC profiles and alpha transparency channels are all handled correctly. Lanczos resampling ensures quality is not sacrificed for speed. As well as image resizing, operations such as rotation, extraction, compositing and gamma correction are available. Most modern macOS, Windows and Linux systems running Node.js v10+ do not require any additional install or runtime dependencies. This module supports reading JPEG, PNG, WebP, AVIF, TIFF, GIF and SVG images. Output images can be in JPEG, PNG, WebP, AVIF and TIFF formats as well as uncompressed raw pixel data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Train ML Models With SQL You Already Know Icon
    Train ML Models With SQL You Already Know

    BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

    Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.
    Try Free
  • 10
    Compiled

    Compiled

    A familiar and performant compile time CSS-in-JS library for React

    ...Using APIs and behavior you may already be familiar with, write your styles in JavaScript with the full power of CSS, leveraging the language to create expressive & dynamic experiences. Build with your bundler of choice or just Babel, resulting in very performant components that have their styles built ahead of time. Turn on extraction and all components styled in your app and sourced through NPM will have their runtime stripped and styles extracted to an atomic style sheet. Compiled brings distributed styles from the platform, product, and the wider ecosystem together. Add the loader to your Webpack config. Make sure this is defined after other loaders so it runs first. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Python-Spider

    Python-Spider

    Python3 web crawler practice

    Python-Spider is a repository intended to teach or provide examples for writing web spiders / crawlers in Python — part of a broader learning and resource collection by its author. The code and documentation are oriented toward beginners or intermediate learners who want to learn how to fetch, parse, and extract data from websites programmatically. As part of the author’s public learning-path repositories, python-spider likely includes examples of HTTP requests, HTML parsing, maybe...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Parsera

    Parsera

    Lightweight library for scraping web-sites with LLMs

    Scrape data from any website with only a link and column descriptions. Parsera is a tool designed to scrape web content, specifically handling poorly structured or messy websites.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    Prompt Engineering Interactive Tutorial

    Prompt Engineering Interactive Tutorial

    Anthropic's Interactive Prompt Engineering Tutorial

    ...The course leans heavily on realistic failure modes (ambiguity, hallucination, brittle instructions) and shows how to iteratively debug prompts the way you would debug code. Lessons include building prompts from scratch for common tasks like extraction, classification, transformation, and step-by-step reasoning, with checkpoints that let you compare your outputs against solid baselines. You’ll also practice advanced patterns such as tool use, constrained generation, and response validation so outputs are trustworthy and machine-consumable.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    LibPDF

    LibPDF

    A modern PDF library for TypeScript

    ...The library offers full read and write manipulation, including support for encryption with RC4 and modern AES cipher suites, form filling and flattening, digital signature creation and verification, page merging/splitting, rich text extraction with layout information, and font embedding with subsetting.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    LangExtract

    LangExtract

    A Python library for extracting structured information

    ...LangExtract supports a wide range of models, including Google Gemini, OpenAI GPT, and local LLMs via Ollama, making it adaptable to different deployment environments and compliance needs. The system excels at handling long documents using optimized chunking, multi-pass extraction, and parallel processing to ensure both high recall and structured consistency.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    Integrant

    Integrant

    Micro-framework for data-driven architecture

    Integrant is a minimalistic micro-framework for building applications following a data-driven architecture. It lets you define system components declaratively as configuration data and handles lifecycle actions (init, halt, resume) in dependency order, serving as a modern alternative to Component or Mount. Integrant was built as a reaction to fix some perceived weaknesses with Component. In Component, systems are created programmatically. Constructor functions are used to build records,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    UCO3D

    UCO3D

    Uncommon Objects in 3D dataset

    ...The repository includes automated downloaders with checksum verification, fine-grained controls to fetch only selected modalities or super-categories, and a lightweight Python API for loading frames, geometry, and splats on demand. Metadata is indexed in SQLite for quick queries at scale, and helper builders handle alignment, undistortion, frame extraction from videos, and cropping around the object.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Down

    Down

    Streaming downloads using Net::HTTP, http.rb or HTTPX

    Down is a small, reliable Ruby library for downloading files that favors correctness, streaming, and clear error handling. It follows redirects safely, supports timeouts and retries, and streams responses to disk to keep memory usage low—ideal for large downloads or server environments. The API returns file-like objects (often Tempfile) with helpful metadata such as original filename and content type, which plays nicely with file-attachment libraries and background jobs. Multiple HTTP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    java-pdf-table-extractor-lib

    java-pdf-table-extractor-lib

    Java Pdf Table extraction library

    The command line application is an example of usage of the Java library. The library is based on pdfbox library and works by looking for the layout of each selected pdf page, and looking for table structure patterns. After calling the library (passing the pdf filename, and the page range), the result is a List<PdfTextElement>. PdfTextElement is an interface that has two implementations. * A basic text (outside the tables) * And PdfTextTabulaElement, for table structures. That...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to expand its capabilities, focusing on versatile data extraction, platform support, and seamless integration with various systems. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21
    7-Zip-JBinding

    7-Zip-JBinding

    Java wrapper for 7z archiver engine

    Native (JNI) cross-platform library to extract (password protected, multi-part) 7z Zip Rar Tar Split Lzma Iso HFS GZip Cpio BZip2 Z Arj Chm Lhz Cab Nsis Deb Rpm Wim Udf archives and create 7z, Zip, Tar, GZip & BZip2 from Java.
    Leader badge
    Downloads: 24 This Week
    Last Update:
    See Project
  • 22
    Specter

    Specter

    Clojure(Script)'s missing piece

    Specter is a powerful Clojure (and ClojureScript) library that revolutionizes navigation and manipulation of deeply nested and recursive data structures through a flexible, high-performance API beyond what vanilla Clojure offers. Specter has an extremely simple core, just a single abstraction called "navigator". Queries and transforms are done by composing navigators into a "path" precisely targeting what you want to retrieve or change. Navigators can be composed with any other navigators,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Exifr

    Exifr

    The fastest and most versatile JS EXIF reading library

    Exifr is a fast and very versatile JavaScript EXIF reading library that works everywhere, parses everything and handles just about anything you throw at it. It can handle any input: buffers, url, <img> tag and more; .jpg, .tif, and .heic files; and TIFF (EXIF, GPS, etc.), XMP, ICC, IPTC, JFIF segments. It skips parsing tags you don’t need, and reads only the first few bytes. There’s no need to read the whole file to see if there’s an EXIF file in it, or extract all the data when you just...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 24
    WBBlades

    WBBlades

    WBBlades is a tool set based on Mach-O file parsing

    WBBlades is a toolset based on Mach-O file parsing, including one-click detection for the app (supports OC and Swift), package size analysis (supports a single static library/dynamic library), point-to-point crash analysis (analyze system crash log, based on symbol file and without symbol files), Class automatic extraction and Hook capability based on Mach-O file. It mainly uses __Text assembly code analysis, architecture extraction, DYSM file stripping, symbol table stripping, and crash file (ips) analysis technology. Support big method/small method parsing and iOS 15 above about dyld_chained_Fixups processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    CNN for Image Retrieval
    cnn-for-image-retrieval is a research-oriented project that demonstrates the use of convolutional neural networks (CNNs) for image retrieval tasks. The repository provides implementations of CNN-based methods to extract feature representations from images and use them for similarity-based retrieval. It focuses on applying deep learning techniques to improve upon traditional handcrafted descriptors by learning features directly from data. The code includes training and evaluation scripts that...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB