Showing 131 open source projects for "extraction"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    OSRFramework

    OSRFramework

    OSRFramework, the Open Sources Research Framework is a AGPLv3+ project

    ...They include references to a bunch of different applications related to username checking, DNS lookups, information leaks research, deep web search, regular expressions extraction and many others. At the same time, by means of ad-hoc Maltego transforms, OSRFramework provides a way of making these queries graphically as well as several interfaces to interact with like OSRFConsole or a Web interface. If everything went correctly (we hope so!), it's time for trying usufy., mailfy and so on. But where are they locally? ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    LangExtract

    LangExtract

    A Python library for extracting structured information

    ...LangExtract supports a wide range of models, including Google Gemini, OpenAI GPT, and local LLMs via Ollama, making it adaptable to different deployment environments and compliance needs. The system excels at handling long documents using optimized chunking, multi-pass extraction, and parallel processing to ensure both high recall and structured consistency.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Certificate Ripper

    Certificate Ripper

    A CLI tool to extract server certificates

    ...It can be used with or without Java, native executables are present in the releases. Extracts all the sub-fields of the certificate. Certificates can be formatted to PEM format. Bulk extraction of multiple different URLs with a single command is possible. Extracted certificates can be stored automatically in a p12 trust store. Works also behind a proxy.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    h265web.js

    h265web.js

    A HEVC/H.265 Web Player

    h265web.js is a WebAssembly-powered video decoding library designed to enable playback and processing of H.265/HEVC video streams directly in web browsers without relying on native browser codec support. It provides a low-level decoding API that allows developers to build custom video players capable of handling raw H.265 streams, which are typically not widely supported natively in browsers. The project includes components for parsing H.265 bitstreams into NAL units and decoding them into...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    KodExplorer

    KodExplorer

    A web based file manager, web IDE / browser based code editor

    ...Selectable files & folders support (mouse click & Ctrl & Shift & words & Keyboard shortcuts). Background file upload with Drag & Drop HTML5 support; Folder upload with Chrome, Firefox and Edge. Direct extraction to the current working directory.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 7
    txtai

    txtai

    Build AI-powered semantic search applications

    ...Innovation is happening at a rapid pace, models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. The following applications are powered by txtai.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    Playwriter

    Playwriter

    Chrome extension to let agents control your browser

    ...The system enables browser automation by running Playwright commands through a persistent session managed by a background extension, allowing agents or scripts to navigate, interact with, and query browser contexts without losing state between commands. This makes it valuable for scenarios where AI agents need to perform complex web automation tasks—like multi-step navigation, form interaction, or content extraction—without reinitializing context or state every time. Playwriter’s architecture supports both extension-based control for real browser windows and CLI integration, giving developers flexibility in how they build and run browser automation workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Integrant

    Integrant

    Micro-framework for data-driven architecture

    Integrant is a minimalistic micro-framework for building applications following a data-driven architecture. It lets you define system components declaratively as configuration data and handles lifecycle actions (init, halt, resume) in dependency order, serving as a modern alternative to Component or Mount. Integrant was built as a reaction to fix some perceived weaknesses with Component. In Component, systems are created programmatically. Constructor functions are used to build records,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • 10
    ANTLR

    ANTLR

    Parser generator to read, process, or translate structured text

    ...Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. The languages for Hive and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    Figma Code Connect

    Figma Code Connect

    A tool for connecting your design system components

    Figma Code Connect is an open-source tool that enhances collaboration between designers and developers by synchronizing design components with source code in real time. Instead of treating design files and codebases as separate artifacts, it creates a continuous link so when a designer updates a UI element in Figma, developers see corresponding code changes or annotations immediately, making handoffs more precise and frictionless. The system supports multiple frameworks and languages,...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Gooo

    Gooo

    Toolkit for developing web applications in Vue, Templ, and Go

    ...The project emphasizes simplicity and flexibility, enabling users to integrate its components into scripts or larger systems. While not as feature-heavy as enterprise frameworks, it serves as a foundation for experimentation and rapid prototyping in data extraction or automation tasks. Its design reflects a developer-centric approach, prioritizing extensibility and ease of modification over polished interfaces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Scalatra

    Scalatra

    Tiny Scala high-performance, async web framework

    Scalatra is a lightweight, high-performance micro web framework written in Scala, inspired by the Ruby framework Sinatra. Its goal is to provide a minimal but expressive foundation for building web applications or REST APIs in Scala without the verbosity or steep learning curve of larger frameworks. It supports asynchronous request handling, routing, filters, content negotiation, and easy integration with templating, JSON libraries, and other web middleware. Being unopinionated, it lets...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    react-docgen

    react-docgen

    A CLI and toolbox to extract information from React component files

    react-docgen is a CLI and toolbox to help extracting information from React components, and generate documentation from it. It uses @babel/parser to parse the source into an AST and provides methods to process this AST to extract the desired information. The output / return value is a JSON blob / JavaScript object. It provides a default implementation for React components defined via React.createClass, ES2015 class definitions or functions (stateless components). These component definitions...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ffsend

    ffsend

    Easily and securely share files from the command line

    Easily and securely share files and directories from the command line through a safe, private and encrypted link using a single simple command. Files are shared using the Send service and may be up to 1GB. Others are able to download these files with this tool, or through their web browser. All files are always encrypted on the client, and secrets are never shared with the remote host. An optional password may be specified, and a default file lifetime of 1 (up to 20) download or 24 hours is...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    UCO3D

    UCO3D

    Uncommon Objects in 3D dataset

    ...The repository includes automated downloaders with checksum verification, fine-grained controls to fetch only selected modalities or super-categories, and a lightweight Python API for loading frames, geometry, and splats on demand. Metadata is indexed in SQLite for quick queries at scale, and helper builders handle alignment, undistortion, frame extraction from videos, and cropping around the object.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Down

    Down

    Streaming downloads using Net::HTTP, http.rb or HTTPX

    Down is a small, reliable Ruby library for downloading files that favors correctness, streaming, and clear error handling. It follows redirects safely, supports timeouts and retries, and streams responses to disk to keep memory usage low—ideal for large downloads or server environments. The API returns file-like objects (often Tempfile) with helpful metadata such as original filename and content type, which plays nicely with file-attachment libraries and background jobs. Multiple HTTP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Patch-NetVLAD

    Patch-NetVLAD

    Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition

    This repository contains code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition".
    Downloads: 5 This Week
    Last Update:
    See Project
  • 19
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    java-pdf-table-extractor-lib

    java-pdf-table-extractor-lib

    Java Pdf Table extraction library

    The command line application is an example of usage of the Java library. The library is based on pdfbox library and works by looking for the layout of each selected pdf page, and looking for table structure patterns. After calling the library (passing the pdf filename, and the page range), the result is a List<PdfTextElement>. PdfTextElement is an interface that has two implementations. * A basic text (outside the tables) * And PdfTextTabulaElement, for table structures. That...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    OmniPull

    OmniPull

    Just pull anything

    OmniPull is a powerful, cross-platform download manager built with Python and PySide6. It provides a modern, intuitive interface for managing downloads with advanced features like multi-threading, queue management, and media extraction.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22

    wx-c.so.recompile

    Recompile of wx-c and wxGTK-2.8.12 for x86_64

    wx-c and wxGTK are all packed in file wx-c-0-9-0-2_x64_wxGTK2.8u.tar.gz Those .so files after extraction, should be put under /usr/local/lib or somewhere in system libraries searching path.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23

    anti-copy-paster

    A plugin for IntelliJ IDEA for just-in-time code duplicates extraction

    The plugin monitors the copying and pasting that takes place inside the IDE. As soon as a code fragment is pasted, the plugin checks if it introduces code duplication, and if it does, the plugin calculates a set of code metrics for it, and these metrics are compared against the currently selected metrics thresholds. If the chosen thresholds are surpassed, the plugin suggests the developer to perform the Extract Method refactoring and applies the refactoring if necessary.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to expand its capabilities, focusing on versatile data extraction, platform support, and seamless integration with various systems. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    [ARCHIVAL] The central forum for the MWE community. Share your open-source data sets and MWE extraction tools, exchange ideas on evaluation strategies and further development of the tools, and discuss theoretical definitions and linguistic properties of MWEs.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB