Page 2 | web crawler source code free download

Showing 107 open source projects for "web crawler source code"

View related business solutions

Internet Python Clear Filters & Widen Search

Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
1

Crawlab

Distributed web crawler admin platform for spiders management

Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate...

Downloads: 10 This Week

Last Update: 2023-07-26
See Project
2

Gerapy

Distributed Crawler Management Framework Based on Scrapy

Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python. If you use Scrapy as a crawler, then of course we can use our own host to crawl when crawling, but when the crawl is very large, we can’t...

Downloads: 0 This Week

Last Update: 2023-07-19
See Project
3

dirhunt

Web crawler that finds hidden web directories without brute force

Dirhunt is an open source security tool designed to discover web directories and analyze website structures without relying on brute-force techniques. Instead of sending large numbers of guess-based requests, it operates as a specialized crawler that intelligently explores websites to identify accessible or hidden directories. Dirhunt can detect directories that expose “Index Of” listings, which may reveal files and other resources that were not intended to be publicly visible. ...

Downloads: 8 This Week

Last Update: 2026-03-11
See Project
4

barcraft

A simple QrCode / barcode generator in python

A simple QrCode / barcode generator that you can also use from this website version : https://secret-guest.github.io/barcraft/ Interface made with pyQt5, made with a MSI installer with Inno setup

Downloads: 4 This Week

Last Update: 2026-02-17
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

RPA for Python

Python package for doing RPA

Python package for doing RPA. RPA for Python's simple and powerful API makes robotic process automation fun! You can use it to quickly automate away repetitive time-consuming tasks on websites, desktop applications, or the command line. See sample Python script, the RPA Challenge solution, and RedMart groceries example. To send a Telegram app notification, simply look up @rpapybot to allow receiving messages. To automate Chrome browser invisibly, use headless mode. To run 10X faster instead...

Downloads: 0 This Week

Last Update: 2023-07-07
See Project
6

Scrapyd

A service daemon to run Scrapy spiders

Scrapyd can manage multiple projects and each project can have multiple versions uploaded, but only the latest one will be used for launching new spiders. A common (and useful) convention to use for the version name is the revision number of the version control tool you’re using to track your Scrapy project code. For example: r23. The versions are not compared alphabetically but using a smarter algorithm (the same packaging uses) so r10 compares greater to r9, for example. Scrapyd is an...

Downloads: 0 This Week

Last Update: 2023-04-11
See Project
7

DecryptLogin

Python library providing APIs for automated website login workflows

DecryptLogin is a Python library designed to simplify automated login processes for many popular websites by providing ready-to-use APIs that simulate authentication behavior. It focuses on implementing login mechanisms through HTTP requests, allowing developers to programmatically authenticate with supported services without manually replicating complex login flows. It includes modules that handle different authentication modes such as PC login, mobile login, and QR code login depending on...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
8

FormaVid

Small Business Appliance

The FormaVid Small Business Appliance https://formavid.org is designed to integrate a content management system (CMS), an issues tracker and an invoicing application into a single, well constructed, offering. It is an excellent starting point for any developer(s) wishing to support the CMS or any of the other components, including the appliance itself. All components are stable, open source and well supported. The appliance is built using scripts so no hidden "monkey business" and you can...

Downloads: 0 This Week

Last Update: 2023-11-17
See Project
9

grab-site

Web crawler for archiving and backing up sites into WARC archives

grab-site is an open source web crawling tool designed to archive and back up websites by recursively downloading their content. It works by taking a starting URL and systematically following links across the site, capturing pages and resources and saving them into WARC archive files for long-term preservation. Internally, the crawler uses a fork of the wpull engine to fetch and process web pages efficiently during large-scale crawls. grab-site includes a built-in dashboard that displays real-time crawl activity, including which URLs are currently being processed and how many remain in the queue. ...

Downloads: 4 This Week

Last Update: 2 days ago
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
10

pspider

Simple Python framework for building multithreaded web crawlers

PSpider is a lightweight web crawling framework written in Python designed to simplify the development of custom web spiders. It focuses on providing an easy-to-understand architecture while still supporting concurrent crawling for improved performance. It uses a multithreaded model that separates the crawling workflow into several components responsible for fetching, parsing, and saving data. Tasks are managed through queues, allowing different parts of the crawler to process work...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
11

instagram-profilecrawl

Instagram profile crawler that extracts posts, tags, and stats

instagram-profilecrawl is a Python-based automation script designed to collect publicly available information from Instagram profiles. It crawls profile data such as follower counts, post information, hashtags, and other engagement-related metadata. It operates by automating a web browser using Selenium and performing requests to gather structured information from the platform. instagram-profilecrawl can analyze multiple usernames in a single run and store the extracted information locally...

Downloads: 1 This Week

Last Update: 6 days ago
See Project
12

HomeTabs

HomeTabs project helps you to organize bookmarks for web browsers

HomeTabs project helps you to organize bookmarks for web browsers (like a standart browser's home page, but cooler and more comfortable). Design of HomeTabs was inspiried by Mozilla Firefox startpage, i think this is the best way to organise bookmarks, but history of browsing saved on homepage - is bad idea. GitHub: https://github.com/grildroid/HomeTabs Discord: https://discord.gg/6ZGDgFjDVm

Downloads: 0 This Week

Last Update: 2021-08-16
See Project
13

lxspider

Educational Python web scraping case collection for many sites

lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms,...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
14

speedtest-cli

Command line interface for testing internet bandwidth using speedtest

Command line interface for testing internet bandwidth using speedtest.net. It is not a goal of this application to be a reliable latency reporting tool. Latency reported by this tool should not be relied on as a value indicative of ICMP style latency. It is a relative value used for determining the lowest latency server for performing the actual speed test against. Speedtest CLI brings the trusted technology and global server network behind Speedtest to the command line. Measure internet...

Downloads: 2 This Week

Last Update: 2021-05-28
See Project
15

ruia

Async Python framework for fast and flexible web scraping spiders

Ruia is an asynchronous web scraping micro-framework built for Python that focuses on simplicity, speed, and flexibility when creating web crawlers. Ruia is powered by Python’s asyncio library along with aiohttp, enabling developers to perform concurrent network requests efficiently and scrape data from websites with minimal overhead. Ruia follows a “write less, run faster” philosophy, emphasizing concise code and streamlined spider development. It provides a structured approach to building...

Downloads: 8 This Week

Last Update: 2026-03-11
See Project
16

TRACARDI - Customer Data Platform

TRACARDI free open-source customer data platform

...TRACARDI is free open-source platform which you can extend the way you want it. TRACARDI is a low-code framework you can integrate with other parts of your ecosystem. Integrate it with other open source platforms.

1 Review

Downloads: 0 This Week

Last Update: 2021-05-04
See Project
17

QPyDesk

Code editor and real-time QR code generator for QPython

QPyDesk is a code editor, and real-time QR code generator for QPython. It is a Python code editor with syntax highlighting that also generates the QR code that represents said code in real time. This application also allows you to print the generated QR code to distribute the created application. However, because QR codes have a limited storage capacity, if the code is very long, QPyDesk creates a QR code that is only valid while the application is running, that is, the QR code generated...

Downloads: 0 This Week

Last Update: 2021-01-10
See Project
18

TCellXTalk

TCellXTalk Web-App from LP CSIC/UAB

TCellXTalk is a comprehensive database of experimentally detected phosphorylation, ubiquitination and acetylation sites in human T cells. The web-app at www.TCellXTalk.org makes TCellXTalk accessible from Internet, and enables the in silico prediction of potential co-modified peptides to facilitate their experimental detection, using targeted or directed mass spectrometry, for the study of protein post-translational modification cross-talk. More detailed information on TCellXTalk and...

Downloads: 0 This Week

Last Update: 2020-07-13
See Project
19

BotSlayer

BotSlayer Community Edition

BotSlayer is an application that helps track and detect potential manipulation of information spreading on Twitter. The tool is developed by the Observatory on Social Media at Indiana University --- the same lab that brought to you Botometer and Hoaxy. BotSlayer is not a tool to detect and remove likely social bots from your list of Twitter followers or friends. For that purpose, check out Botometer. If you just want to visualize the spread of some piece of information, consider Hoaxy....

Downloads: 0 This Week

Last Update: 2023-07-13
See Project
20

ECommerceCrawlers

Collection of Python ecommerce and website crawler examples projects

ECommerceCrawlers is a collection of practical Python web crawler projects designed to gather data from a variety of ecommerce platforms, websites, and online services. It aggregates many independent crawler examples created by contributors and organized into separate subprojects that target specific sites or data sources. These examples demonstrate how to build and operate web scrapers capable of collecting structured information such as product listings, news content, job postings, social...

Downloads: 2 This Week

Last Update: 1 hour ago
See Project
21

Requests-HTML

Pythonic HTML Parsing for Humans

This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When using this library you automatically get full JavaScript support! (Using Chromium, thanks to puppeteer) CSS Selectors (a.k.a jQuery-style, thanks to PyQuery). XPath Selectors, for the faint of heart. Mocked user-agent (like a real web browser). Automatic following of redirects. Connection–pooling and cookie persistence. The Requests experience you know and love, with magical parsing...

Downloads: 1 This Week

Last Update: 2023-04-10
See Project
22

Jupyter Server Proxy

Jupyter notebook server extension to proxy web services.

Jupyter Server Proxy lets you run arbitrary external processes (such as RStudio, Shiny Server, Syncthing, PostgreSQL, Code Server, etc) alongside your notebook server and provide authenticated web access to them using a path like /rstudio next to others like /lab. Alongside the Python package that provides the main functionality, the JupyterLab extension (@jupyterhub/jupyter-server-proxy) provides buttons in the JupyterLab launcher window to get to RStudio for example.

Downloads: 0 This Week

Last Update: 2023-12-21
See Project
23

WeChatSogou

Python library to crawl and retrieve data from WeChat accounts

WechatSogou is an open source Python library designed to retrieve data from WeChat official accounts by using the Sogou WeChat search service as its data source. It provides developers with a programmatic way to search for public accounts and collect article information without manually browsing the search interface. It functions as a crawler interface that sends requests to the search engine, retrieves results, and converts the returned pages into structured data that can be used in...

Downloads: 8 This Week

Last Update: 2026-03-10
See Project
24

pyspider

A powerful Spider(Web Crawler) system in Python

pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking. Since pyspider has various components, you can just run pyspider to start a standalone and...

Downloads: 0 This Week

Last Update: 2021-03-31
See Project
25

haipproxy

Distributed proxy IP pool for web crawlers using Scrapy and Redis

HAipproxy is a distributed proxy IP pool system designed to collect, manage, and provide large numbers of proxy addresses for web crawling tasks. It automatically crawls proxy resources from the internet and aggregates them into a centralized pool that can be accessed by distributed spiders and scraping systems. It is built using Python and relies on Scrapy for high-performance crawling while Redis is used for data storage, communication, and task coordination between components. It includes...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project