Showing 20 open source projects for "extraction"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    xq

    xq

    Command-line XML and HTML beautifier and content extractor

    Command-line XML and HTML beautifier and content extractor. Syntax highlighting, automatic indentation, and formatting. Automatic pagination and node content extraction.
    Downloads: 24 This Week
    Last Update:
    See Project
  • 2
    mtail

    mtail

    Extract internal monitoring data from application logs

    ...It fills a monitoring niche by being the glue between applications that do not export their own internal state (other than via logs) and existing monitoring systems, such that system operators do not need to patch those applications to instrument them or writing custom extraction code for every such application. The extraction is controlled by mtail programs which define patterns and actions. Metrics are exported for scraping by a collector as JSON or Prometheus format over HTTP, or can be periodically sent to a collectd, StatsD, or Graphite collector socket. Precompiled binaries for released versions are available in the Releases page on Github. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 3
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses Chrome/Chromium via Chrome Devtools Protocol to handle dynamically rendered web pages. ferret is extremely extensible, and creating custom functions and types is super easy. ferret allows users to focus on the data. ...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 4
    Hacks

    Hacks

    A collection of hacks and one-off scripts

    ...Rather than being a single cohesive application, it serves as a repository of practical command-line tools that can be used independently or combined into workflows. The scripts cover a wide range of tasks, including URL manipulation, parameter replacement, data extraction, and reconnaissance automation. Many of the tools in the repository are designed for efficiency and simplicity, enabling users to perform complex operations with minimal overhead. It is particularly popular among security researchers and developers who need quick, flexible solutions for niche problems.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Portable Executable Parser

    Portable Executable Parser

    lightweight Go package to parse, analyze and extract metadata

    Saferwall PE is a lightweight Go package for parsing, analyzing, and extracting metadata from Portable Executable (PE) binaries. Designed with malware analysis in mind, it is robust against malformed PE files and provides detailed insights into executable structures.​
    Downloads: 16 This Week
    Last Update:
    See Project
  • 6
    unipdf

    unipdf

    Golang PDF library for creating and processing PDF files (pure go)

    UniDoc UniPDF is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is written and supported by FoxyUtils.com, where the library is used to power many of its services. Every release of our libraries is automatically tested against known vulnerabilities and do not pass unless everything is remediated. All changes are carefully reviewed by our team.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Dungbeetle

    Dungbeetle

    A distributed job server

    Dungbeetle is a metadata and data lineage tracking tool developed by Zerodha to map and visualize how data flows across systems. It helps teams maintain data transparency by tracking dependencies between databases, tables, and reports, offering a centralized view of data pipelines. Dungbeetle is designed to enhance observability and trust in analytics ecosystems.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    chromedp

    chromedp

    A faster, simpler way to drive browsers supporting the Chrome DevTools

    ...Because it communicates directly with Chrome’s debugging interface, chromedp offers high performance and reliable automation compared with tools that rely on intermediary drivers. It is frequently used for web scraping, automated testing, performance monitoring, and browser-based data extraction workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    katana

    katana

    Fast CLI web crawler for discovering endpoints in modern web apps

    Katana is an open source command-line web crawling and spidering framework developed by ProjectDiscovery. It is designed to efficiently crawl websites and web applications in order to discover endpoints, resources, and other useful information that may not be easily visible through manual browsing. Katana focuses on speed and automation, making it suitable for use in security reconnaissance workflows and automated pipelines. Katana supports both standard HTTP crawling and headless browser...
    Downloads: 28 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Vearch

    Vearch

    A distributed system for embedding-based vector retrieval

    ...Through the module of the plugin, a complete default visual search system can be deployed just with one click. Otherwise, you can easily customize your own image, video, or text feature extraction algorithm plugin. This GIF provides a clear demonstration of the project vearch usage and its internal structure. The use of vearch is mainly divided into three steps. Firstly, create DB and Space, then import your data, and finally, you can search on your own dataset.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 11
    paperless-gpt

    paperless-gpt

    Use LLMs and LLM Vision (OCR) to handle paperless-ngx

    paperless-gpt is an AI-powered extension for document management systems that enhances the capabilities of paperless-ngx by integrating large language models and vision-based OCR to automate document processing and organization. It is designed to transform scanned or uploaded documents into structured, searchable, and intelligently categorized data without requiring manual tagging or sorting. The system uses OCR combined with LLM reasoning to extract text, classify documents, and generate...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Weaviate

    Weaviate

    Weaviate is a cloud-native, modular, real-time vector search engine

    ...Weaviate in detail: Weaviate is a low-latency vector search engine with out-of-the-box support for different media types (text, images, etc.). It offers Semantic Search, Question-Answer-Extraction, Classification, Customizable Models (PyTorch/TensorFlow/Keras), and more. Built from scratch in Go, Weaviate stores both objects and vectors, allowing for combining vector search with structured filtering with the fault-tolerance of a cloud-native database, all accessible through GraphQL, REST, and various language clients.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    GitHound

    GitHound

    Search GitHub for leaked API keys, credentials, and exposed secrets

    GitHound is a reconnaissance and security scanning tool designed to search GitHub for exposed secrets such as API keys, credentials, and other sensitive tokens. It works by combining GitHub search queries (often called “GitHub dorks”) with pattern matching techniques to locate potential secrets across public repositories. Instead of scanning only a limited set of repositories, the tool leverages GitHub’s Code Search API to analyze results from across the entire public GitHub ecosystem,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Toonily-dl

    Toonily-dl

    Download comics from Toonily.com website

    ...The project provides a command-line interface that allows users to input URLs or identifiers and automatically download entire series or selected chapters. It is built to handle structured content extraction, organizing downloaded files into directories that reflect series and chapter hierarchies. The tool typically includes features such as retry mechanisms, rate limiting, and progress tracking to ensure stable downloads even when dealing with large collections. It can be integrated into scripts or automation pipelines, making it suitable for users who want to batch download content or maintain personal archives. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 15
    Amplify

    Amplify

    Automatic enrichment, enhancement, and explanation of your data

    Amplify attaches afterburners to your data. Amplify explains metadata extraction, classification, tagging, and reporting. Eriches derivative data generation like thumbnails, previews, conversions, etc. Enhances batteries-included value-adds like data quality reports, image augmentation, OCR, translations, etc. Amplify leverages the decentralized compute provided by Bacalhau to magically enrich your data. A built-in suite of pipelines decides what your data is and how to best improve upon it. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    ldetool

    ldetool

    Code generator for fast log file parsers

    ldetool (Line Data Extraction Tool) is a command-line utility that generates Go code for fast log file parsing. By defining parsing rules, developers can produce efficient parsers tailored to specific log formats, outperforming traditional regex-based approaches. It's particularly useful for processing large volumes of log data.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    yubikey-agent

    yubikey-agent

    yubikey-agent is a seamless ssh-agent for YubiKeys

    yubikey-agent is a seamless SSH agent specifically built for secure hardware tokens such as YubiKey (and other PIV tokens). It aims to replace the standard SSH agent with a version tailored for these security devices; the key is generated on the hardware token (so it can’t be extracted), every session requires a PIN and a physical touch, and the agent is resilient to unplugging, sleep/suspend, and restarts. Setup is simple, one command and one environment variable, and then the agent just...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    crawlergo

    crawlergo

    Headless Chrome crawler for collecting URLs for vulnerability scans

    crawlergo is a browser-based web crawler designed to collect URLs and request data that can be used by web vulnerability scanning tools. It uses a Chrome headless environment to render web pages and observe behavior during the DOM rendering stage in order to capture as many accessible endpoints as possible. By monitoring the page lifecycle and interacting with web elements, the crawler automatically triggers JavaScript events and navigational actions that would normally occur during real...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 19
    gocrawl

    gocrawl

    Polite concurrent web crawler library for Go with flexible hooks

    gocrawl is a lightweight web crawling library written in the Go programming language that enables developers to build custom web crawlers and data extraction tools. gocrawl focuses on providing a minimal yet powerful crawling engine that can be easily extended and adapted for different web scraping or indexing tasks. It is designed to be polite when accessing websites by respecting crawling rules such as robots.txt policies and applying crawl delays for each host. It executes requests concurrently using Go’s goroutines, allowing efficient and scalable page retrieval across multiple URLs. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    OpenDiablo2

    OpenDiablo2

    An open source re-implementation of Diablo 2

    This is an open-source re-implementation of the classic action-RPG Diablo II (including its expansion) — or rather, a game engine that can run it. The engine is written in Go and cross-platform, aiming to bring the feel of the original 2000s-era ARPG to modern systems. Because the project does not include the original game assets, users must supply their legally purchased copy of Diablo II / Lord of Destruction; the engine then loads the MPQ archives and runs the game. The project is...
    Downloads: 10 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB