Showing 120 open source projects for "extraction"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 1
    RAG Anything

    RAG Anything

    RAG-Anything: All-in-One RAG Framework

    RAG-Anything is an open-source unified framework that extends the Retrieval-Augmented Generation (RAG) paradigm to fully multimodal document and knowledge retrieval, enabling systems to ingest, parse, represent, and query rich content that includes text, images, tables, formulas, and other structured or visual elements. Traditional RAG systems are typically limited to text and cannot effectively work across heterogeneous document layouts, but RAG-Anything addresses this by modeling...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    Prompt Engineering Interactive Tutorial

    Prompt Engineering Interactive Tutorial

    Anthropic's Interactive Prompt Engineering Tutorial

    ...The course leans heavily on realistic failure modes (ambiguity, hallucination, brittle instructions) and shows how to iteratively debug prompts the way you would debug code. Lessons include building prompts from scratch for common tasks like extraction, classification, transformation, and step-by-step reasoning, with checkpoints that let you compare your outputs against solid baselines. You’ll also practice advanced patterns such as tool use, constrained generation, and response validation so outputs are trustworthy and machine-consumable.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    LangExtract

    LangExtract

    A Python library for extracting structured information

    ...LangExtract supports a wide range of models, including Google Gemini, OpenAI GPT, and local LLMs via Ollama, making it adaptable to different deployment environments and compliance needs. The system excels at handling long documents using optimized chunking, multi-pass extraction, and parallel processing to ensure both high recall and structured consistency.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 4
    OSRFramework

    OSRFramework

    OSRFramework, the Open Sources Research Framework is a AGPLv3+ project

    ...They include references to a bunch of different applications related to username checking, DNS lookups, information leaks research, deep web search, regular expressions extraction and many others. At the same time, by means of ad-hoc Maltego transforms, OSRFramework provides a way of making these queries graphically as well as several interfaces to interact with like OSRFConsole or a Web interface. If everything went correctly (we hope so!), it's time for trying usufy., mailfy and so on. But where are they locally? ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    KodExplorer

    KodExplorer

    A web based file manager, web IDE / browser based code editor

    ...Selectable files & folders support (mouse click & Ctrl & Shift & words & Keyboard shortcuts). Background file upload with Drag & Drop HTML5 support; Folder upload with Chrome, Firefox and Edge. Direct extraction to the current working directory.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 6
    h265web.js

    h265web.js

    A HEVC/H.265 Web Player

    h265web.js is a WebAssembly-powered video decoding library designed to enable playback and processing of H.265/HEVC video streams directly in web browsers without relying on native browser codec support. It provides a low-level decoding API that allows developers to build custom video players capable of handling raw H.265 streams, which are typically not widely supported natively in browsers. The project includes components for parsing H.265 bitstreams into NAL units and decoding them into...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    txtai

    txtai

    Build AI-powered semantic search applications

    ...Innovation is happening at a rapid pace, models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. The following applications are powered by txtai.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 8
    Playwriter

    Playwriter

    Chrome extension to let agents control your browser

    ...The system enables browser automation by running Playwright commands through a persistent session managed by a background extension, allowing agents or scripts to navigate, interact with, and query browser contexts without losing state between commands. This makes it valuable for scenarios where AI agents need to perform complex web automation tasks—like multi-step navigation, form interaction, or content extraction—without reinitializing context or state every time. Playwriter’s architecture supports both extension-based control for real browser windows and CLI integration, giving developers flexibility in how they build and run browser automation workflows.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    ANTLR

    ANTLR

    Parser generator to read, process, or translate structured text

    ...Twitter search uses ANTLR for query parsing, with over 2 billion queries a day. The languages for Hive and Pig, the data warehouse and analysis systems for Hadoop, both use ANTLR. Lex Machina uses ANTLR for information extraction from legal texts. Oracle uses ANTLR within SQL Developer IDE and their migration tools. NetBeans IDE parses C++ with ANTLR. The HQL language in the Hibernate object-relational mapping framework is built with ANTLR.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Integrant

    Integrant

    Micro-framework for data-driven architecture

    Integrant is a minimalistic micro-framework for building applications following a data-driven architecture. It lets you define system components declaratively as configuration data and handles lifecycle actions (init, halt, resume) in dependency order, serving as a modern alternative to Component or Mount. Integrant was built as a reaction to fix some perceived weaknesses with Component. In Component, systems are created programmatically. Constructor functions are used to build records,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Gooo

    Gooo

    Toolkit for developing web applications in Vue, Templ, and Go

    ...The project emphasizes simplicity and flexibility, enabling users to integrate its components into scripts or larger systems. While not as feature-heavy as enterprise frameworks, it serves as a foundation for experimentation and rapid prototyping in data extraction or automation tasks. Its design reflects a developer-centric approach, prioritizing extensibility and ease of modification over polished interfaces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Scalatra

    Scalatra

    Tiny Scala high-performance, async web framework

    Scalatra is a lightweight, high-performance micro web framework written in Scala, inspired by the Ruby framework Sinatra. Its goal is to provide a minimal but expressive foundation for building web applications or REST APIs in Scala without the verbosity or steep learning curve of larger frameworks. It supports asynchronous request handling, routing, filters, content negotiation, and easy integration with templating, JSON libraries, and other web middleware. Being unopinionated, it lets...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Figma Code Connect

    Figma Code Connect

    A tool for connecting your design system components

    Figma Code Connect is an open-source tool that enhances collaboration between designers and developers by synchronizing design components with source code in real time. Instead of treating design files and codebases as separate artifacts, it creates a continuous link so when a designer updates a UI element in Figma, developers see corresponding code changes or annotations immediately, making handoffs more precise and frictionless. The system supports multiple frameworks and languages,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    react-docgen

    react-docgen

    A CLI and toolbox to extract information from React component files

    react-docgen is a CLI and toolbox to help extracting information from React components, and generate documentation from it. It uses @babel/parser to parse the source into an AST and provides methods to process this AST to extract the desired information. The output / return value is a JSON blob / JavaScript object. It provides a default implementation for React components defined via React.createClass, ES2015 class definitions or functions (stateless components). These component definitions...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ffsend

    ffsend

    Easily and securely share files from the command line

    Easily and securely share files and directories from the command line through a safe, private and encrypted link using a single simple command. Files are shared using the Send service and may be up to 1GB. Others are able to download these files with this tool, or through their web browser. All files are always encrypted on the client, and secrets are never shared with the remote host. An optional password may be specified, and a default file lifetime of 1 (up to 20) download or 24 hours is...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Down

    Down

    Streaming downloads using Net::HTTP, http.rb or HTTPX

    Down is a small, reliable Ruby library for downloading files that favors correctness, streaming, and clear error handling. It follows redirects safely, supports timeouts and retries, and streams responses to disk to keep memory usage low—ideal for large downloads or server environments. The API returns file-like objects (often Tempfile) with helpful metadata such as original filename and content type, which plays nicely with file-attachment libraries and background jobs. Multiple HTTP...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Patch-NetVLAD

    Patch-NetVLAD

    Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition

    This repository contains code for the CVPR2021 paper "Patch-NetVLAD: Multi-Scale Fusion of Locally-Global Descriptors for Place Recognition".
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    java-pdf-table-extractor-lib

    java-pdf-table-extractor-lib

    Java Pdf Table extraction library

    The command line application is an example of usage of the Java library. The library is based on pdfbox library and works by looking for the layout of each selected pdf page, and looking for table structure patterns. After calling the library (passing the pdf filename, and the page range), the result is a List<PdfTextElement>. PdfTextElement is an interface that has two implementations. * A basic text (outside the tables) * And PdfTextTabulaElement, for table structures. That...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    OmniPull

    OmniPull

    Just pull anything

    OmniPull is a powerful, cross-platform download manager built with Python and PySide6. It provides a modern, intuitive interface for managing downloads with advanced features like multi-threading, queue management, and media extraction.
    Downloads: 14 This Week
    Last Update:
    See Project
  • 21

    anti-copy-paster

    A plugin for IntelliJ IDEA for just-in-time code duplicates extraction

    The plugin monitors the copying and pasting that takes place inside the IDE. As soon as a code fragment is pasted, the plugin checks if it introduces code duplication, and if it does, the plugin calculates a set of code metrics for it, and these metrics are compared against the currently selected metrics thresholds. If the chosen thresholds are surpassed, the plugin suggests the developer to perform the Extract Method refactoring and applies the refactoring if necessary.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    wx-c.so.recompile

    Recompile of wx-c and wxGTK-2.8.12 for x86_64

    wx-c and wxGTK are all packed in file wx-c-0-9-0-2_x64_wxGTK2.8u.tar.gz Those .so files after extraction, should be put under /usr/local/lib or somewhere in system libraries searching path.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    DocWire SDK

    DocWire SDK

    Award-winning modern data processing SDK in C++20

    DocWire SDK, a standout C++20AI driven data processing tool, has received award from SourceForge and strong backing from Microsoft. It handles nearly 100 file types, empowering efficient text extraction, web data extraction, and document analysis. For businesses, the shift to DocWire SDK signifies a leap forward. It promises comprehensive document format support and the ability to extract valuable insights from email boxes, databases, and websites using cutting-edge AI. DocWire SDK aims to expand its capabilities, focusing on versatile data extraction, platform support, and seamless integration with various systems. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 24
    [ARCHIVAL] The central forum for the MWE community. Share your open-source data sets and MWE extraction tools, exchange ideas on evaluation strategies and further development of the tools, and discuss theoretical definitions and linguistic properties of MWEs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Botpress

    Botpress

    Dev tools to reliably understand text and automate conversations

    ...We propose you a complete dev-friendly platform that ships with all the tools you need to build, deploy and manage production-grade chatbots in record time. Built-in Natural Language Processing tasks such as intent recognition, spell checking, entity extraction, and slot tagging (and many others). A visual conversation studio to design multi-turn conversations and workflows. An emulator & a debugger to simulate conversations and debug your chatbot. Support for popular messaging channels like Slack, Telegram, MS Teams, Facebook Messenger, and an embeddable web chat. An SDK and code editor to extend the capabilities. ...
    Downloads: 14 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB