Showing 65 open source projects for "simple-xml"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    Scrapy

    Scrapy

    A fast, high-level web crawling and web scraping framework

    Scrapy is a fast, open source, high-level framework for crawling websites and extracting structured data from these websites. Portable and written in Python, it can run on Windows, Linux, macOS and BSD. Scrapy is powerful, fast and simple, and also easily extensible. Simply write the rules to extract the data, and add new functionality if you wish without having to touch the core. Scrapy does the rest, and can be used in a number of applications. It can be used for data mining, monitoring and automated testing.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 2
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    ...It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features include the fact that it is multi-thread and has distribution support. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ScrapeGraphAI

    ScrapeGraphAI

    Python scraper based on AI

    Extracting content from websites and local documents using LLM. ScrapeGraphAI is a web scraping python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, Markdown, etc.). Just say which information you want to extract and the library will do it for you.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    pandora-box

    pandora-box

    Lightweight cross-platform desktop client for managing Mihomo proxies

    Pandora-Box is a lightweight desktop client designed to provide a graphical interface for the Mihomo proxy core. It allows users to manage proxy configurations and subscriptions through a simple and user-friendly interface rather than working directly with configuration files. Pandora-Box supports multiple proxy protocols and provides tools to organize and control network routing rules. It is designed to work for both casual users who want an easy setup and advanced users who need more control over proxy behavior. It also supports automatic rule grouping and features such as TUN mode to enable system-wide proxy routing. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 5
    owllook

    owllook

    Vertical novel search engine with unified reading and tracking tools

    ...Instead of redirecting users to different sites, the system parses content from many novel platforms and presents it in a unified reading interface. It focuses on providing a simple and comfortable reading experience with features such as searching for books, following updates, bookmarking chapters, and maintaining a personal bookshelf. It aggregates results from multiple search engines and applies parsing rules to extract novel metadata, chapters, and content in a consistent format. Owllook also includes functionality for tracking reading history, displaying rankings based on search activity, and recommending books using a similarity-based approach. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Ferret

    Ferret

    Declarative web scraping

    A web scraping system aiming to simplify data extraction from the web. ferret has a declarative query language that makes it easy to focus on the data that you need to get. ferret has the ability to scrape JS rendered pages, handle all page events, and emulate user interactions. the ferret was designed as a library from the ground up. it can be easily embedded into any Go application. ferret helps you to focus on the data you need using an easy-to-learn declarative language. ferret uses...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    SimpDL

    SimpDL

    A tool to scrape images from SimpCity

    SimpDL is an open-source media downloading tool designed to retrieve content from subscription-based or creator platforms, focusing on simplicity and ease of use. It enables users to download images, videos, and other media associated with specific creators or accounts, often through authenticated sessions. The project emphasizes a straightforward workflow where users provide login credentials or tokens, and the tool handles the retrieval and storage of content automatically. It is designed...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    videodl

    videodl

    Lightweight Python tool for downloading videos from many platforms

    Videodl is a lightweight video downloader implemented entirely in Python that allows users to retrieve videos from a wide range of online media platforms. It focuses on providing a fast and simple way to parse video pages and download media files, often prioritizing high-definition versions without watermarks when available. It supports numerous video platforms across both Chinese and international streaming ecosystems, enabling users to fetch content from many popular services through a unified interface. Videodl works by implementing platform-specific client modules that extract video information and download links from supported services. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Python API for JMComic

    Python API for JMComic

    Python crawler and API for downloading JMComic albums and images

    JMComic-Crawler-Python is a Python library and crawler framework designed to programmatically access and download comic content from the JMComic platform. It provides a structured API that allows developers to retrieve albums, chapters, and images using simple Python code while handling the necessary network requests and data processing behind the scenes. It supports both web-based and mobile API interfaces, enabling flexible interaction with the platform depending on the available endpoints. Its architecture includes components for configuration management, download orchestration, and client communication, allowing users to automate the retrieval of manga chapters or entire albums. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Cut Data Warehouse Costs by 54% Icon
    Cut Data Warehouse Costs by 54%

    Easily migrate from Snowflake, Redshift, or Databricks with free tools.

    BigQuery delivers 54% lower TCO with exabyte scale and flexible pricing. Free migration tools handle the SQL translation automatically.
    Try Free
  • 10
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    ...From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the browser steps configuration, add basic steps before performing change detection, such as logging into websites, adding a product to a cart, accepting cookie logins, entering dates, and refining searches. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    rvest

    rvest

    Simple web scraping for R

    rvest helps you scrape (or harvest) data from web pages. It is designed to work with magrittr to make it easy to express common web scraping tasks, inspired by libraries like beautiful soup and RoboBrowser. If you’re scraping multiple pages, I highly recommend using rvest in concert with polite. The polite package ensures that you’re respecting the robots.txt and not hammering the site with too many requests.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    furl

    furl

    The easiest way to parse and modify URLs in Python

    ...It wraps URL components into convenient objects, so developers can work directly with schemes, usernames, passwords, hosts, ports, paths, queries, and fragments. The library supports simple path editing, query argument changes, fragment manipulation, inline method chaining, and URL joining. It also handles encoding automatically, including percent-encoding, Unicode domains, Unicode paths, and query strings. furl supports Python 3 and PyPy3 and is designed to be well tested and practical for everyday backend, scripting, and data-processing workflows. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    Grab is a python framework for building web scrapers. With Grab you can build web scrapers of various complexity, from simple 5-line scripts to complex asynchronous website crawlers processing millions of web pages. Grab provides an API for performing network requests and for handling the received content e.g. interacting with DOM tree of the HTML document. The single request/response API that allows you to build network request, perform it and work with the received content.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    kimuraframework

    kimuraframework

    AI-first Ruby framework for building fast, flexible web scraping spide

    ...Kimurai can use AI-assisted extraction to identify where data resides in HTML pages, automatically generating selectors that are cached for future use so subsequent scraping runs operate with pure Ruby performance. Kimurai supports scraping both static and JavaScript-rendered websites by working with multiple engines, including headless browsers and simple HTTP-based approaches. Developers can also interact with pages using browser automation features such as form filling, clicking elements, or navigating through dynamic content. It includes tools for scheduling, parallel scraping, and structured data output, making it suitable for building reliable large-scale crawlers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    tumblr-crawler

    tumblr-crawler

    Python crawler to download photos and videos from Tumblr blogs

    ...Once executed, the script fetches media from the Tumblr API and stores the downloaded files in folders named after each blog. tumblr-crawler avoids re-downloading files that have already been saved, making repeated runs safe and useful for recovering missing media. It also supports optional proxy configuration, which can help when access to Tumblr content requires routing requests through a proxy server. With simple dependencies and straightforward configuration, the project offers a practical way to archive media content from Tumblr blogs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    QueryList

    QueryList

    Progressive PHP web crawler framework with jQuery-like DOM parsing

    QueryList is an extensible PHP web scraping and crawling framework designed to extract and process data from web pages. It provides a simple and expressive API that allows developers to collect structured information from HTML documents using familiar DOM traversal techniques. It is built on top of phpQuery and uses CSS3 selectors similar to those found in jQuery, making it easy for developers to query and manipulate page elements during scraping tasks. QueryList supports common data extraction scenarios such as retrieving lists of titles, links, images, and other page elements from structured or semi-structured content. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Roach

    Roach

    The complete web scraping toolkit for PHP

    ...It is a shameless clone heavily inspired by the popular Scrapy package for Python. Roach allows us to define spiders that crawl and scrape web documents. But wait, there’s more. Roach isn’t just a simple crawler, but includes an entire pipeline to clean, persist and otherwise process extracted data as well. It’s your all-in-one resource for web scraping in PHP. Roach doesn’t depend on a specific framework. Instead, you can use the core package on its own or install one of the framework-specific adapters. Currently, there’s a first-party adapter available to use Roach in your Laravel projects with more coming. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    WebHarvest - web data extraction tool
    Web data extraction (web data mining, web scraping) tool. It leverages well proved XML and text processing techologies in order to easely extract useful data from arbitrary web pages.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    WFDownloader App

    WFDownloader App

    Free batch downloader for image, wallpaper, video, audio, document,

    Use as an image gallery, wallpaper, audio/music, video, document, and other media bulk downloader from supported websites. Also use to download sequential website urls that have a certain pattern (e.g. image01.png to image100.png). Also use app's built-in site crawler for advanced link search or extraction. There is also special support for forum media downloading, forum thread offline archiving, rss feed downloading, and open directory downloading. It's a programmable downloader and also...
    Leader badge
    Downloads: 262 This Week
    Last Update:
    See Project
  • 20
    PHPScraper

    PHPScraper

    A universal web-util for PHP

    ...You can find prepared extractors for various HTML tags, including interesting attributes. You can filter and combine these to your needs. In some cases there is an option to get a simple or detailed version. PHPScraper can assist in collecting feeds such as RSS feeds, sitemap.xml-entries and static search indexes. This can be useful when deciding on the next page to crawl or building up a list of pages on a website.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    autocrawler

    autocrawler

    Multiprocess Selenium crawler for downloading images by keywords

    ...AutoCrawler supports multiprocess and multithreaded downloading, which allows it to retrieve images faster by running several tasks simultaneously. Users provide search terms through a simple keyword file, and the crawler organizes downloaded images into directories for each keyword. It can download either thumbnails or full resolution images and supports multiple image formats such as JPG, GIF, and PNG. It also includes configuration options such as headless mode, download limits, proxy usage, and thread count to customize crawling behavior.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    ...Several scripts also incorporate multi-threading and proxy usage to improve scraping efficiency and help avoid common anti-scraping limitations. In addition to raw data collection, some spiders include basic data processing and analysis using tools such as pandas and simple visualization with matplotlib. It also contains examples of proxy pool integration and encapsulation to support more reliable crawling when working with sites that enforce request limits.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Web Link Collector 1000

    Web Link Collector 1000

    Automatically collect all links from websites to a clean txt file

    ## About Easily and automatically collect all your links into a neat txt list from a particular website or an entire section of a multi-page website network! Web Link Collector 1000 is a simple tool for gathering links from websites with minimal effort. It helps you collect resources for research, create reference lists, or save useful links without manual copying and pasting. ## Features - Two Collection Modes: Single page or multiple pages of specific website section, or even the entire domain! - Smart Filtering: Include only same-domain links or gather external links too - Duplicate Prevention: Automatically removes duplicate links - Website-Friendly: Uses respectful delays between requests - Custom File Naming: Save your collections with custom meaningful names - Modern Interface: Clean design with status updates - Link Normalization: Standardizes URLs for proper formatting
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    ConsoleWebScraper

    ConsoleWebScraper

    It allows you to input a URL and it will scrape the HTML content...

    ConsoleWebScraper is a simple console application that allows you to scrape web pages and save the results. Usage Open a command prompt as administrator. Navigate to the directory containing the utility. Run the utility as a command-line argument. For example: .\ConsoleWebScraper.exe When you start the application, you'll see a title and a menu guide.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    Dude is a very simple framework for writing web scrapers using Python decorators. The design, inspired by Flask, was to easily build a web scraper in just a few lines of code. Dude has an easy-to-learn syntax. Dude is currently in Pre-Alpha. Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Auth0 Logo