Showing 31 open source projects for "extract"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Documind

    Documind

    Open-source platform for extracting structured data from documents

    Documind is an advanced document processing tool that leverages AI to extract structured data from PDFs. It is built to handle PDF conversions, extract relevant information, and format results as specified by customizable schemas.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    LLM Scraper

    LLM Scraper

    Extract structured data from webpages using LLM-powered scraping

    LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 3
    Stagehand

    Stagehand

    An AI web browsing framework focused on simplicity and extensibility

    ...Each Stagehand function takes in an atomic instruction, such as act("click the login button") or extract("find the red shoes"), generates the appropriate Playwright code to accomplish that instruction, and executes it.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 4
    react-docgen

    react-docgen

    A CLI and toolbox to extract information from React component files

    react-docgen is a CLI and toolbox to help extracting information from React components, and generate documentation from it. It uses @babel/parser to parse the source into an AST and provides methods to process this AST to extract the desired information. The output / return value is a JSON blob / JavaScript object. It provides a default implementation for React components defined via React.createClass, ES2015 class definitions or functions (stateless components). These component definitions must follow certain guidelines in order to be analyzable. Installing the module adds a react-docgen executable which allows you to convert a single file, multiple files or an input stream. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 5
    Link-Preview-JS

    Link-Preview-JS

    Extract web links information: title, description, images, videos, etc

    link-preview-js is a lightweight TypeScript library that extracts metadata from URLs or HTML content to generate rich link previews. By parsing Open Graph tags and other metadata, it retrieves information such as titles, descriptions, images, and videos. Designed primarily for Node.js and mobile environments, it facilitates the creation of link previews similar to those found on social media platforms.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Epublifier

    Epublifier

    Converts some webnovels to epub format

    A tool to convert website-based books or lists of pages to ePub format to read on your eReader/Kindle/etc.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    LiteParse

    LiteParse

    A fast, helpful, and open-source document parser

    LiteParse is an open-source lightweight parsing library designed to extract structured data from unstructured text using large language models in an efficient and cost-effective manner. It focuses on simplifying the process of turning raw text into structured outputs such as JSON by providing a streamlined interface for prompt-based parsing. The system is designed to minimize overhead, making it suitable for applications where performance and cost are critical considerations.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    OpenBrand

    OpenBrand

    Extract brand assets (logos, colors, backdrops) from any website

    OpenBrand is an open-source platform aimed at helping users generate, manage, and experiment with branding assets using modern AI-driven workflows. It focuses on simplifying the creation of brand identities by integrating tools for generating logos, visual assets, and design systems in a cohesive environment. The project is built with extensibility in mind, allowing developers to integrate additional AI models or design pipelines to expand its capabilities. It provides a structured approach...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    TaxHacker

    TaxHacker

    Self-hosted AI accounting app. LLM analyzer for receipts

    ...The system is designed to simplify bookkeeping by automatically processing financial documents such as receipts, invoices, and transaction records. It integrates large language models to analyze these documents, extract relevant financial information, and categorize expenses or income based on configurable rules. Users can deploy the application on their own infrastructure, ensuring that financial data remains private and under their control rather than being processed by external services. The software provides tools for tracking income streams, monitoring expenses, and organizing financial records in a structured format. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 10
    Verba

    Verba

    Retrieval Augmented Generation (RAG) chatbot powered by Weaviate

    Welcome to Verba: The Golden RAGtriever, a community-driven open-source application designed to offer an end-to-end, streamlined, and user-friendly interface for Retrieval-Augmented Generation (RAG) out of the box. In just a few easy steps, explore your datasets and extract insights with ease, either locally with Ollama and Huggingface or through LLM providers such as Anthrophic, Cohere, and OpenAI. This project is built with and for the community, please be aware that it might not be maintained with the same urgency as other Weaviate production applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    HeadlessX

    HeadlessX

    The undetected self-hosted browser automation platform

    HeadlessX is an open-source, self-hosted browser automation platform designed to run headless browsers for tasks such as web scraping, automation, and testing. The system provides a centralized service that allows developers to programmatically control browser sessions and extract data from websites through a structured API. It is built using modern technologies including Node.js, Next.js, TypeScript, and Playwright, and uses a specialized browser engine called Camoufox based on Firefox. One of the platform’s goals is to bypass common bot-detection systems by implementing advanced fingerprint spoofing and stealth techniques. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    Hugo Theme Stack

    Hugo Theme Stack

    Card-style Hugo theme designed for bloggers

    Card-style Hugo theme designed for bloggers. Stack is a simple card-style Hugo theme designed for bloggers, some of its features are responsive images support, lazy load images, dark mode, local search, PhotoSwipe integration, archive page template, full native JavaScript, and no jQuery or any other frameworks are used, no CSS framework, keep it simple and minimal, properly cropped thumbnails. Subsection support, table of contents, multilingual mode and RTL support. It's necessary to use...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    i18n ally

    i18n ally

    All in one i18n extension for VS Code

    Lokalise is the fastest growing language cloud technology made by developers, for developers. As a collaborative productivity platform, it helps structure and automate the translation and localization process for any company in the world. This extension itself supports i18n as well. It will be auto-matched to the display language you use in your VS Code editor. Supports multi-root workspaces. Supports remote development. Supports numerous popular frameworks. Supports linked locale messages....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Summarize

    Summarize

    Point at any URL/YouTube/Podcast or file

    Summarize is a toolset that lets you point at almost any content and quickly extract the gist, whether that content is a webpage, a YouTube video, a podcast, or a local file. It’s built around a CLI workflow so you can summarize from the terminal, but it also includes a Chrome extension so you can do the same thing directly while browsing. The project pairs an on-device “daemon” style background service with user-facing commands and extension UI, so summaries can feel immediate and repeatable once installed. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 15
    Figma Sprite Generator

    Figma Sprite Generator

    A Figma plugin to generate sprite sheets and JSON files from selected

    ...This tool is ideal for designers and developers working on web projects or UI libraries, as it automates sprite generation for more efficient workflows. Users can quickly convert their icons into PNG sprites and extract JSON files with icon dimensions and positions, streamlining the process of incorporating sprites into websites or applications.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Browserbase MCP Server

    Browserbase MCP Server

    Allow LLMs to control a browser with Browserbase and Stagehand

    Browserbase MCP Server is a server implementation of the Model Context Protocol (MCP) that enables large language models to interact with web browsers programmatically through cloud-based automation. The project provides a standardized interface for connecting AI systems to real-world web environments, allowing them to navigate pages, extract structured data, and perform user-like actions such as clicking, typing, and form submission. It leverages Browserbase infrastructure along with Stagehand to deliver high-performance browser automation with improved speed and efficiency through caching and optimized execution pipelines. The system supports multiple AI models and integrates seamlessly into agent workflows, making it suitable for applications such as web scraping, testing, and intelligent automation. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    DocStrange

    DocStrange

    Extract and convert data from any document, images, pdfs, word doc

    DocStrange is an open-source document understanding and extraction library designed to convert complex files into structured, LLM-ready outputs such as Markdown, JSON, CSV, and HTML. Developed by Nanonets, the project combines OCR, layout detection, table understanding, and structured extraction into one end-to-end pipeline, which reduces the need to stitch together multiple separate services. It is built for developers who need high-quality parsing from scans, photos, PDFs, office files,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 18
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. WaterCrawl supports customizable extraction rules so users can focus only on relevant elements while ignoring unnecessary page components. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Open Deep Research

    Open Deep Research

    An AI-powered research assistant that performs iterative research

    ...It is intentionally kept compact, with a codebase under roughly 500 lines, making it highly approachable for experimentation and learning. The architecture demonstrates how modern agent pipelines can continuously gather evidence, extract learnings, and adjust research direction over time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    NGX-Translate

    NGX-Translate

    The internationalization (i18n) library for Angular

    ...The main part of the library is named core. You can use it on its own, but it is usually a good idea to add a loader to load your translations into your application. You can also extract the strings from your code with the extractor. This makes it really easy to start and maintain your translations. By default, there is no loader available. You can add translations manually using setTranslation but it is better to use a loader. You can write your own loader, or import an existing one.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    Reader LLM

    Reader LLM

    Convert any URL to an LLM-friendly input with a simple prefix

    ...In addition to converting individual pages, the service can perform web searches and return relevant content that can be ingested directly by AI systems. The tool relies on specialized models and parsing techniques to handle complex HTML structures and extract meaningful content while preserving important context.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    linaria

    linaria

    Zero-runtime CSS in JS library

    ...Optionally use any CSS preprocessor such as Sass or PostCSS. Easily find where the style was defined with CSS source maps. Linaria currently supports webpack and Rollup to extract the CSS at build time. Optionally, add the @linaria preset to your Babel configuration at the end of the presets list to avoid errors when importing the components in your server code or tests. Linaria can be used with any framework, with additional helpers for React.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    BRAID

    BRAID

    Themeable design system for the SEEK Group

    Braid aims to make cross-brand UI development as fast as possible while maintaining a high level of quality and accessibility. In order to achieve this, Braid provides as a set of React components and CSS variable-based styling themes using vanilla-extract. As much as possible, we want Braid code to make sense to non-developers. We’re aggressively focused on the simplicity and composability of its API. Along with our work on Playroom, our goal is to empower designers and developers to iterate together in the same medium using the same components, reducing the need for high fidelity mock-ups before development starts. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    XIAOJUSURVEY

    XIAOJUSURVEY

    Powerful survey system for creating, managing, and analyzing forms

    ...Xiaoju Survey includes tools for managing survey workflows, enabling teams to organize responses and monitor participation efficiently. It also focuses on data analysis capabilities, helping users extract insights from collected responses through built-in reporting and visualization features. Xiaoju Survey is designed with extensibility in mind, allowing developers to customize or integrate it into existing systems. Its architecture supports high availability and scalability, making it suitable for enterprise-level deployments. Overall, Xiaoju Survey aims to streamline the entire lifecycle of survey management from creation to actionable insights.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Nativefier

    Nativefier

    Make any web page a desktop application

    Nativefier is a command-line tool designed to create a desktop app for any web site with minimal configuration. Apps are wrapped by Electron (using Chromium under the hood) in an OS executable (.app, .exe, etc) for use on Windows, macOS and Linux. Nativefier will try to determine the app name, and well as lots of other options. If desired, these options can be overwritten. For example, to override the name, nativefier --name 'My Medium App' 'medium.com' Read the API documentation or run...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB