Open Source Python Internet Software - Page 5

Sort By:

Python Internet Software

Internet Python Clear Filters

Browse free open source Python Internet Software and projects below. Use the toggles on the left to filter open source Python Internet Software by OS, license, language, programming language, and project status.

Compliant and Reliable File Transfers Backed by Top Security Certifications
Cerberus FTP Server delivers SOC 2 Type II certified security and FIPS 140-2 validated encryption.

Stop relying on non-certified, legacy file transfer tools that creak under the weight of modern security demands. Get full audit trails, advanced access controls and more supported by an award-winning team of experts. Start your free 25-day trial today.

Start Free Trial
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Linkedin Scraper

A library that scrapes Linkedin for user data

Linkedin Scraper is a library that scrapes Linkedin for user data. Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper. The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run person.scrape(), it'll scrape and close the browser. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

Downloads: 1 This Week

Last Update: 2026-04-10
See Project
2

RPA for Python

Python package for doing RPA

Python package for doing RPA. RPA for Python's simple and powerful API makes robotic process automation fun! You can use it to quickly automate away repetitive time-consuming tasks on websites, desktop applications, or the command line. See sample Python script, the RPA Challenge solution, and RedMart groceries example. To send a Telegram app notification, simply look up @rpapybot to allow receiving messages. To automate Chrome browser invisibly, use headless mode. To run 10X faster instead of normal human speed, use turbo mode (read the caveats!). Some CAPTCHAs can be solved using services like 2Captcha or directly by replicating user actions. TagUI is a leading open-source RPA software with tens of thousands of users. It was created in 2016-2017 when I left DBS Bank as a test automation engineer, for a one-year sabbatical to Eastern Europe. Most of its code base was written in Novi Sad Serbia. In 2018, I joined AI Singapore to continue development of TagUI.

Downloads: 1 This Week

Last Update: 2023-07-07
See Project
3

Sanic

Async Python 3.6+ web server/framework

Build fast, run fast with Sanic! Sanic is a Python 3.6+ web server and web framework designed to go fast. It provides a way to get a highly performant HTTP server up and running fast, while also making it easy to build, expand, and eventually scale. Sanic aspires to be as simple as possible while delivering the performance that you require. It allows the usage of the async/await syntax added in Python 3.5, so your code is guaranteed to be non-blocking and speedy. It's also ASGI compliant, so it's possible to deploy with an alternative ASGI webserver.

Downloads: 1 This Week

Last Update: 2025-12-31
See Project
4

Scrapling

An adaptive Web Scraping framework

Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling, pause and resume functionality, and real-time streaming of scraped data. Scrapling combines high performance, memory efficiency, and extensive async support to deliver blazing-fast scraping workflows. With a developer-friendly API, CLI tools, MCP server integration for AI-assisted extraction, and Docker support, it offers a complete solution for modern web scrapers.

Downloads: 1 This Week

Last Update: 2026-05-11
See Project
Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure
Native application identity and user-based security for your Azure cloud

Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.

Get a free trial
5

Selectolax

Python binding to Modest and Lexbor engines

A fast HTML5 parser with CSS selectors using Modest and Lexbor engines. Selectolax supports two backends: Modest and Lexbor. By default, all examples use the Modest backend. Most of the features between backends are almost identical, but there are still some differences. Currently, the Lexbor backend is in beta and missing some of the features. To use lexbor, just import the parser and use it in the similar way to the HTMLParser.

Downloads: 1 This Week

Last Update: 5 days ago
See Project
6

SeleniumBase

A framework for browser automation and testing with Selenium

SeleniumBase automatically handles common WebDriver actions such as launching web browsers before tests, saving screenshots during failures, and closing web browsers after tests. SeleniumBase lets you customize test runs from the command line. SeleniumBase uses simple syntax for commands. pytest includes automatic test discovery. If you don't specify a specific file or folder to run, pytest will automatically search through all subdirectories for tests to run. No More Flaky Tests! SeleniumBase methods automatically wait for page elements to finish loading before interacting with them (up to a timeout limit). This means you no longer need random time.sleep() statements in your scripts. SeleniumBase includes an automated/manual hybrid solution called MasterQA, which speeds up manual testing by having automation perform all the browser actions while the manual tester handles validation.

Downloads: 1 This Week

Last Update: 18 hours ago
See Project
7

SimpDL

A tool to scrape images from SimpCity

SimpDL is an open-source media downloading tool designed to retrieve content from subscription-based or creator platforms, focusing on simplicity and ease of use. It enables users to download images, videos, and other media associated with specific creators or accounts, often through authenticated sessions. The project emphasizes a straightforward workflow where users provide login credentials or tokens, and the tool handles the retrieval and storage of content automatically. It is designed to reduce the complexity of manual downloading while still offering flexibility in how content is saved and organized. SimpDL typically supports batch downloads, allowing users to archive entire profiles or content collections efficiently. The tool is often used for offline access or backup purposes, especially for platforms where content may be time-limited.

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
8

autocrawler

Multiprocess Selenium crawler for downloading images by keywords

AutoCrawler is a Python-based image crawling tool designed to automatically download large numbers of images from search engines using automated browser interaction. It uses Selenium and a Chrome browser driver to navigate image search pages and collect image sources based on keywords provided by the user. AutoCrawler supports multiprocess and multithreaded downloading, which allows it to retrieve images faster by running several tasks simultaneously. Users provide search terms through a simple keyword file, and the crawler organizes downloaded images into directories for each keyword. It can download either thumbnails or full resolution images and supports multiple image formats such as JPG, GIF, and PNG. It also includes configuration options such as headless mode, download limits, proxy usage, and thread count to customize crawling behavior.

Downloads: 1 This Week

Last Update: 1 day ago
See Project
9

changedetection.io

The best free open source website change detection and restock service

Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the browser steps configuration, add basic steps before performing change detection, such as logging into websites, adding a product to a cart, accepting cookie logins, entering dates, and refining searches. Monitor and track PDF file changes, and know when a PDF file has text changes. Know when your favourite product is on sale, or other special deals are announced before anyone else. Detect and monitor changes in JSON API responses.

Downloads: 1 This Week

Last Update: 6 days ago
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

dxy-covid-19-crawler

Realtime crawler for COVID-19 outbreak statistics from DXY data

DXY-COVID-19-Crawler is a Python-based project designed to collect real-time COVID-19 infection data from the public dataset provided by Ding Xiang Yuan (DXY). The crawler periodically retrieves pandemic statistics and stores them in a database so that historical changes in the outbreak can be preserved and analyzed later. It was created to make up-to-date infection data more accessible for developers, researchers, and analysts who wanted to build visualizations or conduct data analysis during the early stages of the pandemic. DXY-COVID-19-Crawler automatically crawls data at regular intervals, typically every minute, ensuring that newly published statistics are captured as quickly as possible. Retrieved data is stored in MongoDB and archived so that the entire progression of the outbreak can be traced over time. It also provided an API that allowed developers to easily access the collected data for building dashboards, visualizations, and other analytical tools.

Downloads: 1 This Week

Last Update: 1 day ago
See Project
11

finvizfinance

Finviz analysis python library

finvizfinance is a package that collects financial information from FinViz website. Stock charts, fundamental & technical information, insider information and stock news. Forex charts and performance. Crypto charts and performance. Screener and Group provide data frames for comparing stocks according to different filters and trading signals. Getting information (fundament, description, outer rating, stock news, inside trader) of an individual stock.

Downloads: 1 This Week

Last Update: 2026-01-03
See Project
12

googler

Google from the terminal

googler is a power tool to Google (web, news, videos and site search) from the command line. It shows the title, URL and abstract for each result, which can be directly opened in a browser from the terminal. Results are fetched in pages (with page navigation). Supports sequential searches in a single googler instance. googler was initially written to cater to headless servers without X. You can integrate it with a text-based browser. However, it has grown into a very handy and flexible utility that delivers much more. For example, fetch any number of results or start anywhere, limit the search by any duration, define aliases to google search any number of websites, and switch domains easily, all of this in a very clean interface without ads or stray URLs. The shell completion scripts make sure you don't need to remember any options.

Downloads: 1 This Week

Last Update: 2022-05-12
See Project
13

grab-site

Web crawler for archiving and backing up sites into WARC archives

grab-site is an open source web crawling tool designed to archive and back up websites by recursively downloading their content. It works by taking a starting URL and systematically following links across the site, capturing pages and resources and saving them into WARC archive files for long-term preservation. Internally, the crawler uses a fork of the wpull engine to fetch and process web pages efficiently during large-scale crawls. grab-site includes a built-in dashboard that displays real-time crawl activity, including which URLs are currently being processed and how many remain in the queue. Users can dynamically apply ignore patterns during an active crawl, allowing them to skip problematic or unnecessary URLs that could slow down or block the archiving process. grab-site also provides predefined ignore sets for common site structures such as forums and other complex web platforms. Additional mechanisms like duplicate page detection help avoid re-crawling identical content.

Downloads: 1 This Week

Last Update: 6 days ago
See Project
14

jd-autobuy

Python tool that automates JD.com login and product purchase tasks

jd-autobuy is an open source Python-based automation tool designed to simulate the purchasing process on the JD e-commerce platform. It uses web scraping and HTTP request techniques to log into an account, check product availability, and attempt to purchase specified items automatically. It supports login through methods such as QR code authentication, allowing users to sign in through the platform’s mobile application. Once authenticated, the script can retrieve product details including price, stock status, and item information. It can automatically add items to the shopping cart and prepare an order submission workflow for faster purchasing during high-demand sales or limited stock releases. Users can configure parameters such as the product ID, quantity, refresh interval, and purchase behavior using command-line options. jd-autobuy is intended primarily for learning purposes and demonstrates how automated scripts can interact with web services and online shopping systems .

Downloads: 1 This Week

Last Update: 1 day ago
See Project
15

node-gyp

Node.js native addon build tool

node-gyp is a cross-platform command-line tool written in Node.js for compiling native addon modules for Node.js. It contains a vendored copy of the gyp-next project that was previously used by the Chromium team, extended to support the development of Node.js native addons. Note that node-gyp is not used to build Node.js itself. Multiple target versions of Node.js are supported (i.e. 0.8, ..., 4, 5, 6, etc.), regardless of what version of Node.js is actually installed on your system (node-gyp downloads the necessary development files or headers for the target version). node-gyp requires that you have installed a compatible version of Python, one of: v3.6, v3.7, v3.8, or v3.9. If you have multiple Python versions installed, you can identify which Python version node-gyp should use. A binding.gyp file describes the configuration to build your module, in a JSON-like format. This file gets placed in the root of your package, alongside package.json.

Downloads: 1 This Week

Last Update: 2026-04-21
See Project
16

proxy.py

Utilize all available CPU cores for accepting new client connections

proxy.py is made with performance in mind. By default, proxy.py will try to utilize all available CPU cores to it for accepting new client connections. This is achieved by starting AcceptorPool which listens on configured server port. Then, AcceptorPool starts Acceptor processes (--num-acceptors) to accept incoming client connections. Alongside, if --threadless is enabled, ThreadlessPool is setup which starts Threadless processes (--num-workers) to handle the incoming client connections. Each Acceptor process delegates the accepted client connection to a threadless process via Work class. Currently, HttpProtocolHandler is the default work class. HttpProtocolHandler simply assumes that incoming clients will follow HTTP specification. Specific HTTP proxy and HTTP server implementations are written as plugins of HttpProtocolHandler.

Downloads: 1 This Week

Last Update: 2025-02-18
See Project
17

pyinfra

pyinfra turns Python code into shell commands

pyinfra is a high-performance infrastructure automation and configuration management framework that uses Python instead of YAML to define deployments and operational workflows. The system converts Python code into shell commands and executes them across servers, Docker containers, and local machines through an agentless architecture. Designed as an alternative to tools like Ansible, pyinfra prioritizes speed, scalability, and developer flexibility while maintaining a declarative operational model. It supports ad-hoc command execution, reusable operations, inventory management, and parallel deployments across thousands of hosts. The framework integrates naturally with existing DevOps ecosystems and allows users to create highly customizable deployment logic using native Python syntax. Its architecture combines infrastructure-as-code concepts with efficient remote execution, making it suitable for modern cloud and server automation workflows.

Downloads: 1 This Week

Last Update: 2026-05-06
See Project
18

python-proxy

HTTP/HTTP2/HTTP3/Socks4/Socks5/Shadowsocks/ShadowsocksR/SSH

python-proxy, also known as pproxy, is a lightweight proxy tool written in Python for flexible local and remote traffic forwarding. It supports multiple proxy protocols, making it useful for developers, testers, and network administrators who need a compact proxy layer without a heavy service stack. The project can operate as a client, server, forward proxy, reverse proxy, or protocol bridge depending on how it is configured. It supports HTTP, SOCKS4, SOCKS5, Shadowsocks, and newer transport options such as HTTP/2, HTTP/3, and QUIC when the required dependencies are installed. python-proxy also includes SSL-related options, chaining behavior, and multiple connection modes for advanced routing setups. Its main value is giving users a scriptable, portable, and protocol-diverse proxy utility in a Python ecosystem.

Downloads: 1 This Week

Last Update: 2026-05-23
See Project
19

tumblr-crawler

Python crawler to download photos and videos from Tumblr blogs

tumblr-crawler is an open source Python-based utility designed to download media content from Tumblr blogs. It provides a script that automatically retrieves photos and videos from specified Tumblr sites and saves them locally for offline access. Users can specify one or multiple blogs to crawl by editing a configuration file or by passing parameters through the command line. Once executed, the script fetches media from the Tumblr API and stores the downloaded files in folders named after each blog. tumblr-crawler avoids re-downloading files that have already been saved, making repeated runs safe and useful for recovering missing media. It also supports optional proxy configuration, which can help when access to Tumblr content requires routing requests through a proxy server. With simple dependencies and straightforward configuration, the project offers a practical way to archive media content from Tumblr blogs.

Downloads: 1 This Week

Last Update: 1 day ago
See Project
20

watercrawl

AI-ready web crawler that extracts and structures website content

WaterCrawl is an open source web crawling and data extraction platform designed to transform website content into structured data suitable for machine learning and AI workflows. It enables developers and researchers to crawl web pages, extract meaningful information, and convert it into formats that are easier to process and analyze. It provides a modern crawling system that can automatically navigate links, control crawl depth, and collect content from targeted sections of a website. WaterCrawl supports customizable extraction rules so users can focus only on relevant elements while ignoring unnecessary page components. WaterCrawl also offers real-time monitoring capabilities, allowing users to track crawling progress, performance metrics, and errors during large data collection jobs. Developers can integrate the tool into applications through a REST API and multiple client SDKs, enabling automated data pipelines and AI data preparation workflows.

Downloads: 1 This Week

Last Update: 2026-05-20
See Project
21

FSP - File Service Protocol Suite

UDP File transfer protocol

FSP - File Service Protocol. FSP is lightweight UDP based protocol for transferring files. It is designed for anonymous transfers over unreliable networks.

2 Reviews

Downloads: 9 This Week

Last Update: 2026-04-28
See Project
22

Gnuplot.py

A Python interface to the gnuplot plotting program.

8 Reviews

Downloads: 5 This Week

Last Update: 2012-12-06
See Project
23

DenyHosts

DenyHosts is a python program that automatically blocks ssh attacks by adding entries to /etc/hosts.deny. DenyHosts will also inform Linux administrators about offending hosts, attacked users and suspicious logins. This project is being actively developed on GitHub (https://github.com/denyhosts)

17 Reviews

Downloads: 4 This Week

Last Update: 2020-05-19
See Project
24

Cub Linux

Chromium + Ubuntu = Cub Linux

The best of Chromium and Ubuntu. Cub Linux is a project to replicate the Chromium OS experience on an Ubuntu Linux base system. Cub Linux is free to download and use forever.

Downloads: 18 This Week

Last Update: 2016-05-10
See Project
25

Strict DLP Chinese

Strict DLP Chinese (SDC) is a set of strict DLP (Dynamic Leech Protection) DLLs based on the eMule Xtreme Mod's official version. SDC variants put easyMule v2, easyMule v1 and/or eMule VeryCD Mod into "Soft Ban" or "Hard Ban" list because of GPL violation, private network, community leeching, and other behaviors.

3 Reviews

Downloads: 5 This Week

Last Update: 2026-03-30
See Project