Showing 11 open source projects for "xpath"

View related business solutions
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • 1
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 2
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    ...Developers define crawlers using components such as spiders, parsers, and items, allowing them to organize crawling logic and data extraction rules clearly. Gain supports CSS selectors and XPath expressions for parsing page content and extracting specific elements. Gain also allows developers to configure headers, concurrency levels, and proxy settings to control how crawlers interact with target websites. Because it uses asynchronous programming, Gain can handle multiple requests efficiently while minimizing blocking operations.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Crawl4AI

    Crawl4AI

    Open-source LLM Friendly Web Crawler & Scraper

    Crawl4AI is a high-performance, AI‑ready web crawler tailored for LLM data ingestion and RAG pipelines. It supports adaptive crawling heuristics (stopping when enough info is gathered), structured markdown output, and high-speed parallel execution. Designed to operate at scale with optional Docker deployment and framework integrations.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    kimuraframework

    kimuraframework

    AI-first Ruby framework for building fast, flexible web scraping spide

    Kimurai is an open source web scraping framework written in Ruby that simplifies the process of building automated data extraction tools. It provides a clean domain-specific language that allows developers to define scraping logic and data schemas with minimal boilerplate code. Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance....
    Downloads: 1 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Scraper of Death
    Scraper of Death is a web scraper. Multiple Scraping Methods Requests + BeautifulSoup (fast, lightweight) Selenium (JavaScript support, dynamic content)
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    PHPScraper

    PHPScraper

    A universal web-util for PHP

    PHPScraper is a universal web-scraping util for PHP, built with simplicity in mind. The goal is to make xPath Selectors optional and avoid the commonly needed boilerplate code. Just create an instance of PHPScraper, go to a website, and start collecting data. All scraping functionality can be accessed either as a function call or a property call. For example, the title can be accessed in two ways. Many common use cases are covered already.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages. Once trained, the generated scraper can process new pages and return the extracted data in structured formats such as dictionaries or lists. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 10
    SEO MACROSCOPE

    SEO MACROSCOPE

    SEO Macroscope is a website scanning tool, to check your website

    ...Generate and export text and XML sitemaps from the crawled pages. Analyze redirect chains. Use custom filters to verify the presence/absence of tracking tags. Use CSS Selectors, XPath Queries, and Regular Expressions to scrape website data.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    ...webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy to master. The standard webStraktor output format is XML based, either in ASCII, UTF-8 or ISO-8859-1 (Latin1) code pages. webStraktor relies on the Apache HttpClient for retrieving content via the HTTP protocol. It adheres to the Robots Exclusion Protocol and it can be configured to operate in an anonymous way by connecting to the predominant types of web proxy servers. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo