Showing 7 open source projects for "scraping"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    Happy DOM

    Happy DOM

    Happy DOM is a JavaScript implementation of a web browser

    Happy DOM is a JavaScript implementation of a web browser without its graphical user interface. It includes many web standards from WHATWG DOM and HTML. The goal of Happy DOM is to emulate enough of a web browser to be useful for testing, scraping web sites, and server-side rendering. Happy DOM focuses heavily on performance and can be used as an alternative to JSDOM. Happy DOM now supports Declarative Shadow DOM which can be used for server-side rendering of web components. This package makes it possible to use Happy DOM with Jest.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    unfluff

    unfluff

    Automatically extract body content (and other cool stuff) from HTML

    unfluff is a Node.js library designed to automatically extract the main content from an HTML document — stripping away navigation bars, ads, footers and other boilerplate to leave you with the “body content”, metadata (title, author, date) and other useful fields. It’s a tool very much aimed at content-analysis, web scraping, building datasets, or repurposing article text for downstream processing (like machine-learning or summarization). The API is simple: you feed in raw HTML and it returns a structured object with the extracted text and other fields. It supports caching internal representations to speed up repeated extractions. While its language support is best for English, it is still widely used in web-content-processing pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Simple-Scrape is a simple web-scraping library that allows for programmatic access to HTML code. No further techniques are needed and the library is very compact and thus easy to use.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 5

    HTML DOM Parser

    HTML parser which can be used for screen-scraping applications

    htmldom parses the HTML file and provides methods for iterating and searching the parse tree in a similar way as Jquery. To report bugs please mail me at bhimsen.pes@gmail.com
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    datalus
    PHP web API designed to simplify object handling(loading, saving, querying, displaying, and editing), abstract the data from its display structure, and layout and allow the target data to be delivered to any supported format without special logic.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    Xidel

    Xidel is a cli webpage scraping tool supporting XPath/XQuery 3 and CSS

    Xidel is a command line tool to download web pages and extract data from them. This data can be extracted using XPath/XQuery 3.0 (with a compatibility modes for XPath 2.0 and XQuery 1.0), JSONiq, CSS 3 selectors, and custom, pattern-matching templates that are like an annotated version of the processed page. It can download files over HTTP/S connections, follow redirections, links, or extracted values, and also process local files. The extracted values can then be exported as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB