Showing 24 open source projects for "scripts"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 1
    crawler

    crawler

    Collection of JS reverse engineering examples for web scraping study

    ...Each directory in the project focuses on a specific target service or scenario, showing how browser network requests and JavaScript code can be studied to reproduce API calls programmatically. Many examples illustrate techniques such as debugging scripts, intercepting requests, analyzing encrypted parameters, and understanding authentication flows. crawler also explores common anti-scraping defenses and demonstrates how developers can examine them through debugging tools and reverse engineering techniques.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    wombat

    wombat

    Lightweight Ruby DSL for scraping structured data from web pages

    ...The DSL approach helps make scraping definitions more readable and maintainable, especially when dealing with multiple fields or nested data structures. Because it is implemented as a Ruby library, it integrates easily into Ruby applications and scripts that need to gather information from web pages. Wombat also includes examples and tests that demonstrate how scraping definitions can be written and executed within Ruby environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    Grab is a python framework for building web scrapers. With Grab you can build web scrapers of various complexity, from simple 5-line scripts to complex asynchronous website crawlers processing millions of web pages. Grab provides an API for performing network requests and for handling the received content e.g. interacting with DOM tree of the HTML document. The single request/response API that allows you to build network request, perform it and work with the received content.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    single-file-cli

    single-file-cli

    CLI tool to save complete web pages as single self-contained HTML file

    SingleFile CLI is an open source command-line tool designed to save complete web pages as a single self-contained HTML file. It captures the rendered page in a headless browser and embeds all required resources directly into the output document, including stylesheets, scripts, images, and fonts. By consolidating every dependency into one file, it allows users to preserve a faithful copy of a web page that can be viewed offline without requiring external assets. SingleFile CLI works by controlling a browser through the Chrome DevTools Protocol, rendering the page before extracting and packaging all necessary resources. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 5
    Pydoll

    Pydoll

    Async Python library in automating Chromium browsers without WebDriver

    Pydoll is a Python library designed for automating Chromium-based web browsers such as Chrome and Edge without relying on a traditional WebDriver layer. Instead of using external drivers, it connects directly to the Chrome DevTools Protocol through WebSocket, allowing scripts to control browser behavior more efficiently and with fewer compatibility issues. It provides a high-level API that simplifies common browser automation tasks while still offering access to low-level protocol features for advanced control. Its architecture is built around asynchronous programming using Python’s asyncio framework, enabling concurrent automation of multiple tabs and browser contexts. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    spider_collection

    spider_collection

    Collection of Python web scraping scripts for data extraction tasks

    spider_collection is a collection of Python web crawler scripts created primarily for experimentation, learning, and practical scraping tasks. spider_collection gathers multiple independent spiders designed to collect data from different platforms and services, demonstrating a variety of scraping techniques and workflows. These crawlers make use of common Python scraping tools such as requests, parsel, BeautifulSoup, and the Scrapy framework to extract structured information from web pages. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    dude uncomplicated data extraction

    dude uncomplicated data extraction

    dude uncomplicated data extraction: A simple framework

    ...Please expect breaking changes. You can run your scraper from terminal/shell/command-line by supplying URLs, the output filename of your choice and the paths to your python scripts to dude scrape command.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    webotron

    Using industrial automation techniques for creating web scraping tools

    Industry uses machines that can easily maim or kill their operators and is also used in very adverse environments. In spite of this, production quality must be close to perfect without reliance on operator skill or attentiveness. Control programs must be robust, yet simple enough to be understood and maintained by non programmer skilled trades like electricians . The main programming model is the PLC which implements double buffering and an event loop. The most advanced production model...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DecryptLogin

    DecryptLogin

    Python library providing APIs for automated website login workflows

    ...DecryptLogin supports a wide variety of online services and platforms, including social media sites, developer platforms, cloud services, and other web portals. Developers can integrate these login routines into automation scripts, crawlers, or data collection tools that require authenticated sessions. It also provides example utilities and automation scripts demonstrating how the login APIs can be used in practical scenarios.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Error to trace to log to deploy. One click. No SSH. Icon
    Error to trace to log to deploy. One click. No SSH.

    Catch the cause before the pager goes off.

    AppSignal links every error to the trace, the trace to the log, the log to the deploy that shipped it.
    Free 30 days.
  • 10
    SecretAgent

    SecretAgent

    The web scraper that's nearly impossible to block

    SecretAgent is a headless browser that’s nearly impossible to detect. It achieves this by emulating real users. And it has powerful auto-replay functionality that lets you create and debug scripts in record setting time.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Scylla

    Scylla

    Intelligent proxy pool for collecting and managing public proxies

    ...It includes a JSON API that allows developers and applications to retrieve proxy information programmatically, making it easier to integrate proxy rotation into scraping tools or automation scripts. Scylla also runs a built-in HTTP forward proxy server that can dynamically select a recently validated proxy whenever a request is made. In addition to the API, the system provides a web-based interface where users can view available proxies and monitor their global distribution through a visual dashboard. It is commonly used by developers who need scalable proxy management when gathering data from the internet or building datasets for machine learning.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 12
    instagram-profilecrawl

    instagram-profilecrawl

    Instagram profile crawler that extracts posts, tags, and stats

    ...The collected data can include profile metadata, post details, engagement metrics, and commenter activity, allowing users to analyze account behavior or monitor profile growth over time. It also provides scripts for downloading images from crawled profiles and logging statistics into CSV files for tracking metrics like followers, likes, and comments. Authentication is optional, meaning the crawler can access public profile data without logging in.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    lxspider

    lxspider

    Educational Python web scraping case collection for many sites

    lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    GOPA

    GOPA

    GOPA, a spider written in Golang, for Elasticsearch

    GOPA, a spider written in Golang, for Elasticsearch. Lightweight, low footprint, memory requirement should, be 100MB. Easy to deploy, no runtime or dependency required. Easy to use, no programming or script ability needed, out-of-box features. First of all, get it, two opinions: download the pre-built package or compile it yourself. Besides Elasticsearch, Gopa doesn't require any other dependencies, just simply run ./gopa to start the program. It's safety to press ctrl+c to stop the current...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    mzitu is a Python-based web crawling project designed to automatically download and organize image galleries from a specific photography site. It demonstrates how to build a scraper that navigates gallery pages, retrieves image links, and saves the images locally in a structured directory layout. It focuses on automating the collection of large sets of images by programmatically parsing page content and iterating through gallery entries. mzitu also includes a simple analysis script that...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    WeChatSogou

    WeChatSogou

    Python library to crawl and retrieve data from WeChat accounts

    ...These components work together to process HTTP requests, handle verification mechanisms, and transform HTML or JSON responses into usable objects. Developers can integrate the library into scripts or larger data collection systems to automate gathering content from public accounts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    jd-autobuy

    jd-autobuy

    Python tool that automates JD.com login and product purchase tasks

    ...Users can configure parameters such as the product ID, quantity, refresh interval, and purchase behavior using command-line options. jd-autobuy is intended primarily for learning purposes and demonstrates how automated scripts can interact with web services and online shopping systems .
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18

    newsscrape

    news headline collecting for analysis in determining the category

    ... - Each news headline is matched against Google News category like Entertainment, Sports, etc. - Called from scheduler to collect this data at 5 minutes interval and be accumulated in a database. - It contains R statistical computing scripts to learn the pattern on words in the headline resulting a particular category. - To test its accuracy in predicting the category from a news headline, select a news title from other sources - e.g. http://rss.news.yahoo.com/rss/entertainment - and incorporate it into the R script for outputting a news category it assumes on the news title.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19

    Domain Analyzer Security Tool

    Finds all the security information for a given domain name

    Domain analyzer is a security analysis tool which automatically discovers and reports information about the given domain. Its main purpose is to analyze domains in an unattended way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    A simple to set up web scraper written in Java. It uses modified regEx to quickly write complex patterns to parse data out of a website. It contains a GUI tool for testing your configuration scripts and is fully automated through the command line
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    APC Anti Crawler is a php5 class based on APC which can be used to limit the amount of http request per IP. It stop web crawler to download your entire website.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Larbin is a Web crawler intended to fetch a large number of Web pages, it should be able to fetch more than 100 millions pages on a standard PC with much u/d. This set of PHP and Perl scripts, called webtools4larbin, can handle the output of Larbin and p
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Wadsworth is a java based web scripting engine. It uses user-defined XML scripts to define its actions. It can be used as a web testing tool, or as a web scraper, or to automate any web actions you wish. It can also be invoked and controlled by another
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo