Showing 1224 open source projects for "extract"

View related business solutions
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    npmhub

    npmhub

    A browser extension to explore npm dependencies on GitHub

    npmhub is a browser extension that enhances GitHub repositories by displaying a list of npm dependencies directly on the repository page. This makes it easier for developers to inspect a project's dependencies without navigating to external sites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    ytDownloader

    ytDownloader

    Desktop App for downloading Videos and Audios from hundreds of sites

    ytDownloader is a modern desktop application designed to download videos and extract audio from hundreds of online platforms through a clean graphical user interface. Built as a cross-platform tool for Windows, macOS, and Linux, it leverages tools like yt-dlp and FFmpeg under the hood while abstracting their complexity into an intuitive user experience. The application supports downloading from major platforms such as YouTube, Facebook, TikTok, Instagram, Twitch, and Twitter, offering users the ability to retrieve content in multiple formats and resolutions including MP4, MP3, and WebM. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 3
    DouK-Downloader

    DouK-Downloader

    TikTok releases/likes/compilations/live streams/videos/atlases/music

    DouK-Downloader is a fully open-source data acquisition and media downloading tool designed to extract, collect, and download content from TikTok and its Chinese counterpart Douyin at scale. Built using Python and modern asynchronous networking libraries such as HTTPX, it enables batch downloading of videos, images, live streams, and metadata from accounts, playlists, and individual links. The software goes beyond simple downloading by offering comprehensive data collection features, including comments, user statistics, and trending data such as hot boards and search results. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 4
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    ...Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring and automated testing.
    Downloads: 13 This Week
    Last Update:
    See Project
  • AI-powered service management for IT and enterprise teams Icon
    AI-powered service management for IT and enterprise teams

    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
    Try it Free
  • 5
    i18n ally

    i18n ally

    All in one i18n extension for VS Code

    Lokalise is the fastest growing language cloud technology made by developers, for developers. As a collaborative productivity platform, it helps structure and automate the translation and localization process for any company in the world. This extension itself supports i18n as well. It will be auto-matched to the display language you use in your VS Code editor. Supports multi-root workspaces. Supports remote development. Supports numerous popular frameworks. Supports linked locale messages....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Scriberr

    Scriberr

    Self-hosted AI audio transcription

    ...The application includes a polished user interface that simplifies the management of recordings, transcripts, and annotations, making it suitable for both casual users and professionals handling large volumes of audio. Beyond transcription, Scriberr also integrates features such as summarization, tagging, and interaction with language models, allowing users to extract insights from conversations or meetings efficiently.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 7
    AgentQL MCP

    AgentQL MCP

    Model Context Protocol server that integrates AgentQL's data

    The AgentQL MCP Server is a Model Context Protocol (MCP) server that integrates AgentQL's data extraction capabilities, enabling users to extract structured data from web pages using natural language prompts. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    PaperAI

    PaperAI

    Semantic search and workflows for medical/scientific papers

    PaperAI is an open-source framework for searching and analyzing scientific papers, particularly useful for researchers looking to extract insights from large-scale document collections.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    unipdf

    unipdf

    Golang PDF library for creating and processing PDF files (pure go)

    UniDoc UniPDF is a PDF library for Go (golang) with capabilities for creating and reading, processing PDF files. The library is written and supported by FoxyUtils.com, where the library is used to power many of its services. Every release of our libraries is automatically tested against known vulnerabilities and do not pass unless everything is remediated. All changes are carefully reviewed by our team.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 10
    GoldenCheetah

    GoldenCheetah

    Performance Software for Cyclists, Runners, Triathletes and Coaches

    Analyze using summary metrics like BikeStress, TRIMP, or RPE. Extract insight via models like Critical Power and W'bal. Track and predict performance using models like Banister and PMC. Optimize aerodynamics using Virtual Elevation. Train indoors with ANT and BTLE trainers. Upload and Download with many cloud services including Strava, Withings, and Today's Plan. Import and export data to and from a wide range of bike computers and file formats.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    Scope Sentry

    Scope Sentry

    Cyberspace asset mapping and vulnerability scanning platform

    ...ScopeSentry combines multiple reconnaissance and vulnerability assessment capabilities such as subdomain enumeration, port scanning, directory scanning, and sensitive information detection. ScopeSentry can automatically identify assets and services, extract URLs, and crawl websites to collect useful security data for further analysis. It also includes vulnerability scanning and subdomain takeover detection to help identify common security weaknesses across web infrastructure. It supports distributed scanning with multiple nodes, allowing large scanning tasks to be performed efficiently across different systems.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 12
    PDFCraft

    PDFCraft

    PDFCraft is a free, privacy-focused PDF toolkit

    ...At its core, the project provides a clean, modern UI where you can rearrange pages, annotate text, insert images, fill forms, and export to multiple formats, all without needing a heavyweight commercial PDF suite. But beyond manual editing, it also offers a programmable layer so developers can write scripts to batch process documents, generate templated reports, or extract structured data from PDFs for integration in workflows. The design emphasizes quality and compatibility: output PDFs render accurately across readers, preserve metadata, and support interactive elements like hyperlinks and form fields.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 13
    pikepdf

    pikepdf

    A Python library for reading and writing PDF, powered by QPDF

    pikepdf is a Python library allowing the creation, manipulation, and repair of PDFs. It provides a Pythonic wrapper around the C++ PDF content transformation library, QPDF. Python + QPDF = “py” + “qpdf” = “pyqpdf”, which looks like a dyslexia test and is no fun to type. But say “pyqpdf” out loud, and it sounds like “pikepdf”. pikepdf is a library intended for developers who want to create, manipulate, parse, repair, and abuse the PDF format. It supports reading and write PDFs, including...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    FFmpeg Sidecar

    FFmpeg Sidecar

    Wrap a standalone FFmpeg binary in an intuitive Iterator interface

    ...The library leverages FFmpeg’s CLI instead of low-level bindings, simplifying setup and avoiding heavy dependencies. It enables streaming data through stdin and stdout, making it suitable for real-time processing pipelines. Additionally, it parses FFmpeg logs to extract structured information such as progress, metadata, and errors. The project is designed for cross-platform compatibility and minimal configuration. Its architecture emphasizes simplicity while retaining access to FFmpeg’s powerful capabilities.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    Pot Desktop

    Pot Desktop

    A cross-platform software for text translation and recognition

    Pot-Desktop is a cross-platform productivity tool aimed at helping users quickly translate, perform OCR (optical character recognition), and synthesize speech for selected text or images — all with minimal friction. It supports picking text via mouse selection (“highlight-and-translate”), clipboard listening, or screenshot-based OCR; this makes it ideal for reading webpages, documents, images — or any on-screen text — and instantly getting translations or text extraction. The tool supports...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 16
    esquisse

    esquisse

    RStudio add-in to make plots interactively with ggplot2

    The purpose of this add-in is to let you explore your data quickly to extract the information they hold. You can create visualization with {ggplot2}, filter data with {dplyr} and retrieve generated code. This addin allows you to interactively explore your data by visualizing it with the ggplot2 package. It allows you to draw bar plots, curves, scatter plots, histograms, boxplot and sf objects, then export the graph or retrieve the code to reproduce the graph.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Hugo Theme Stack

    Hugo Theme Stack

    Card-style Hugo theme designed for bloggers

    Card-style Hugo theme designed for bloggers. Stack is a simple card-style Hugo theme designed for bloggers, some of its features are responsive images support, lazy load images, dark mode, local search, PhotoSwipe integration, archive page template, full native JavaScript, and no jQuery or any other frameworks are used, no CSS framework, keep it simple and minimal, properly cropped thumbnails. Subsection support, table of contents, multilingual mode and RTL support. It's necessary to use...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    AI-Crawler

    AI-Crawler

    Crawl a website starting from a URL, find relevant pages

    ...Unlike traditional web scrapers that rely on static selectors and manual scripting, it uses AI to dynamically identify and prioritize pages based on user intent, making it more flexible and resilient to changes in website structure. Users can define their data requirements in plain English, and the system will interpret those instructions to crawl a domain and extract structured data. The tool supports output formats such as JSON and Markdown, and it can generate or accept schemas to ensure that extracted data is structured according to application needs. It is designed as a low-code solution, reducing the complexity of building and maintaining custom scraping pipelines.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    AudioMuse-AI

    AudioMuse-AI

    AudioMuse-AI is an Open Source Dockerized environment

    ...AudioMuse-AI integrates with several popular self-hosted music servers including Jellyfin, Navidrome, and Emby, allowing users to extend existing media servers with advanced AI-powered recommendation capabilities. The system uses machine learning and audio analysis tools such as Librosa and ONNX models to extract features directly from audio tracks.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 20
    Open Semantic Search

    Open Semantic Search

    Open source semantic search and text analytics for large document sets

    ...Open Semantic Search includes an ETL framework that can ingest documents, process them through analysis steps, and enrich the data with extracted information such as named entities and metadata. It also supports optical character recognition to extract text from images and scanned documents, including images embedded inside PDF files. It integrates text mining and analytics capabilities that allow users to examine relationships, topics, and structured data within document collections.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 21
    TaxHacker

    TaxHacker

    Self-hosted AI accounting app. LLM analyzer for receipts

    ...The system is designed to simplify bookkeeping by automatically processing financial documents such as receipts, invoices, and transaction records. It integrates large language models to analyze these documents, extract relevant financial information, and categorize expenses or income based on configurable rules. Users can deploy the application on their own infrastructure, ensuring that financial data remains private and under their control rather than being processed by external services. The software provides tools for tracking income streams, monitoring expenses, and organizing financial records in a structured format. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    dategrep

    dategrep

    Print lines matching a time range

    dategrep is a command-line utility designed to extract lines from log files that fall within a specified time range. It efficiently processes large log files by performing a binary search to locate the relevant entries, making it a valuable tool for system administrators and developers analyzing time-specific events.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    MCP Server RAG Web Browser

    MCP Server RAG Web Browser

    A MCP Server for the RAG Web Browser Actor

    The MCP Server for the RAG Web Browser Actor allows AI assistants and LLMs to perform web searches and extract information from web pages. It facilitates interaction with the web, enabling up-to-date context retrieval for AI applications. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Dendrite

    Dendrite

    Tools to build web AI agents that can authenticate

    Dendrite Python SDK is a toolkit for building web AI agents that can authenticate, interact with, and extract data from any website, facilitating web automation tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    fastdup

    fastdup

    An unsupervised and free tool for image and video dataset analysis

    fastdup is a powerful free tool designed to rapidly extract valuable insights from your image & video datasets. Assisting you to increase your dataset images & labels quality and reduce your data operations costs at an unparalleled scale.
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB