Search Results for "extract website content"

Showing 965 open source projects for "extract website content"

View related business solutions
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Website Stalker

    Website Stalker

    Track changes on websites via git

    This tool checks all the websites listed in its config. When a change is detected, the new site is added to a git commit. It can then be inspected via normal git tooling. The config describes a list of sites. Each site has a URL. Additionally, each site can have editors which are used before saving the file. Each editor manipulates the content of the URL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 3
    PdfPig

    PdfPig

    Read and extract text and other content from PDFs in C#

    This project allows users to read and extract text and other content from PDF files. In addition the library can be used to create simple PDF documents containing text and geometrical shapes.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    watercrawl

    watercrawl

    AI-ready web crawler that extracts and structures website content

    WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Java Design Patterns Website

    Java Design Patterns Website

    Next generation website for Java Design Patterns

    This project is the VuePress-powered web front end for the well-known “java-design-patterns” project, which documents classic and modern design patterns in Java. Its purpose is to present that large body of content as a browsable, fast, static website with organized navigation, search, and pattern categorization. Instead of reading patterns only in GitHub markdown, users can consume them in a more pleasant documentation format with sections, sidebars, and themed pages. The site structure makes it easier to discover related patterns, see intent and applicability, and jump between creational, structural, and behavioral groups. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Article Extractor

    Article Extractor

    To extract main article from given URL with Node.js

    A Node.js library for extracting main content from web articles, removing unnecessary clutter like ads and navigation elements.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Epublifier

    Epublifier

    Converts some webnovels to epub format

    A tool to convert website-based books or lists of pages to ePub format to read on your eReader/Kindle/etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Reader LLM

    Reader LLM

    Convert any URL to an LLM-friendly input with a simple prefix

    ...In addition to converting individual pages, the service can perform web searches and return relevant content that can be ingested directly by AI systems. The tool relies on specialized models and parsing techniques to handle complex HTML structures and extract meaningful content while preserving important context.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Microweber

    Microweber

    Drag and Drop Website Builder and CMS with E-commerce

    ...Its revolutionary Real-Time Text Writing & Editing feature alongside its Drag and Drop feature means the user experience is significantly improved, and users are able to achieve a visually appealing website and easier content management with a lot less time and effort.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 10
    TikTok MCP

    TikTok MCP

    Model Context Protocol (MCP) with TikTok integration

    The TikTok MCP integrates TikTok access into AI applications like Claude AI via TikNeuron. It enables analysis and interaction with TikTok content to determine virality factors and extract video content. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Spider

    Spider

    High-performance Rust web crawler and scraper for large-scale data

    ...It focuses on speed, concurrency, and reliability by using asynchronous and multi-threaded processing to handle large volumes of web pages. It can rapidly crawl websites to collect links, retrieve page content, and extract structured information from HTML documents. Spider can operate concurrently across many pages, allowing it to gather large datasets in a short period of time. Spider also provides mechanisms for subscribing to crawl events so developers can process page data such as URLs, status codes, or HTML content as it is discovered. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    PyPDF

    PyPDF

    A pure-python PDF library capable of splitting, merging, cropping

    pypdf is a pure Python library for working with PDF files, allowing developers to split, merge, rotate, encrypt, and extract content from PDFs. It’s an actively maintained fork of PyPDF2, improving performance, compatibility, and support for modern PDF standards. Suitable for both automation scripts and full-featured applications, pypdf handles PDFs without requiring external dependencies.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 13
    Keiyoushi Extensions

    Keiyoushi Extensions

    Extension repository for Mihon and variants

    ...It includes a wide variety of sources covering different languages, regions, and content types, making it highly adaptable for global audiences. The repository is actively maintained by contributors who update sources to keep up with website changes and ensure continued functionality. It also enforces certain standards for extension development, promoting consistency and reliability across different providers.
    Downloads: 47 This Week
    Last Update:
    See Project
  • 14

    ldif-extract

    Extrect selected entries from LDIF files like grep

    ldif-extract is a small 'grep' like tool to extract and convert data from LDIF files. It could be used standalone or also in a pipe together with other tools like ldapsearch.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    LLM Scraper

    LLM Scraper

    Extract structured data from webpages using LLM-powered scraping

    LLM Scraper is a TypeScript library designed to extract structured data from webpages using large language models. Instead of relying on fragile HTML selectors or manual parsing rules, the tool interprets webpage content with language models and converts it into structured data according to a defined schema. Developers can specify the data structure using tools such as Zod or JSON Schema, enabling the model to extract relevant information directly into typed objects. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Skyvern

    Skyvern

    Automate browser-based workflows with LLMs and Computer Vision

    Skyvern uses a combination of computer vision and AI to understand content on a webpage, making it adaptable to any website. Skyvern takes instructions in natural language, allowing it to execute complex objectives with simple commands. Skyvern is an API-first product. Workflows execute in the cloud, allowing it to run hundreds of workflows at the same time. Skyvern's AI decisions come with built-in explanations, providing clear summaries and justifications for every action. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    Dendrite

    Dendrite

    Tools to build web AI agents that can authenticate

    Dendrite Python SDK is a toolkit for building web AI agents that can authenticate, interact with, and extract data from any website, facilitating web automation tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Astro

    Astro

    The web framework for content-driven websites

    Astro powers the world's fastest marketing sites, blogs, e-commerce websites, and more. Astro improves website performance by rendering components on the server, sending lightweight HTML to the browser with zero unnecessary JavaScript overhead. Astro was designed to work with your content, no matter where it lives. Load data from your file system, external API, or your favorite CMS. Extend Astro with your favorite tools. Bring your own JavaScript UI components, CSS libraries, themes, integrations, and more. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    Laravel Sharp

    Laravel Sharp

    Laravel 10+ Content management framework

    Sharp is a content management framework, a toolset that provides help to build a CMS section in a website, with some rules in mind. The public website should not have any knowledge of the CMS, the CMS is a part of the system, not the center of it. In fact, removing the CMS should not have any effect on the project. Content administrators should work with their data and terminology, not CMS terms.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Hugo

    Hugo

    The world’s fastest framework for building websites

    Hugo is a popular, fast and flexible open source static site generator written in Go. It’s designed for speed and flexibility, while also being very easy to use. Hugo has the amazing ability to render a typical, moderately-sized website in just a fraction of a second. It takes Hugo around 1 millisecond to render each piece of content, making it the fastest tool of its kind. Hugo supports unlimited content types, and ships with pre-made templates to make SEO, analytics and many other functions quick and easy to achieve. It’s got a robust theming system, capable of producing even the most complex websites. ...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 21

    sm-website

    Content management system. Goal is to make it possible to rapidly finn

    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Planet

    Planet

    Build and host decentralized blogs and websites on your Mac

    ...Did you know that you can use an Ethereum Name (ENS) to set up a website? It's true! You can use the Content Hash field, just like you would use an A or CNAME record for a traditional domain name. The standard for this is EIP-1577, and the Content Hash field can accept a few different values.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    model-viewer

    model-viewer

    Easily display interactive 3D models on the web and in AR

    Easily display interactive 3D models on the web & in AR. Use our Editor to test your 3D models and download a starter website. Generate your own 3D Twitter card for any website. :focus-visible is an as-yet unimplemented web platform feature that enables content authors to style a component on the condition that it received focus in such a way that suggests the focus state should be visibly evident. The :focus-visible capability has not been implemented in any stable browsers yet. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 24
    Chandra

    Chandra

    OCR model for complex documents with layout-aware structured outputs

    Chandra is an advanced OCR model designed to extract and structure information from complex documents such as tables, forms, handwritten notes, and mathematical content. It focuses on preserving full document layout, meaning that extracted text is accompanied by positional metadata like bounding boxes for each element. Chandra supports multiple output formats including Markdown, HTML, and JSON, making it suitable for downstream processing and integration into data pipelines. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Link-Preview-JS

    Link-Preview-JS

    Extract web links information: title, description, images, videos, etc

    link-preview-js is a lightweight TypeScript library that extracts metadata from URLs or HTML content to generate rich link previews. By parsing Open Graph tags and other metadata, it retrieves information such as titles, descriptions, images, and videos. Designed primarily for Node.js and mobile environments, it facilitates the creation of link previews similar to those found on social media platforms.​
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB