Showing 1049 open source projects for "python web crawler"

View related business solutions
  • Level Up Your Cyber Defense with External Threat Management Icon
    Level Up Your Cyber Defense with External Threat Management

    See every risk before it hits. From exposed data to dark web chatter. All in one unified view.

    Move beyond alerts. Gain full visibility, context, and control over your external attack surface to stay ahead of every threat.
    Try for Free
  • Cloud-based help desk software with ServoDesk Icon
    Cloud-based help desk software with ServoDesk

    Full access to Enterprise features. No credit card required.

    What if You Could Automate 90% of Your Repetitive Tasks in Under 30 Days? At ServoDesk, we help businesses like yours automate operations with AI, allowing you to cut service times in half and increase productivity by 25% - without hiring more staff.
    Try ServoDesk for free
  • 1
    HTTPie

    HTTPie

    A CLI, cURL-like tool for humans

    HTTPie is a modern command-line HTTP client that makes CLI interaction with web services as human-friendly as possible. It offers a plethora of friendly features that make it an excellent curl alternative. It is equipped with an intuitive UI, JSON support, syntax highlighting and so much more. HTTPie gives a single http command for sending arbitrary HTTP requests with a simple, natural syntax, and displayed in a formatted, colorized terminal output. HTTPie can be installed on macOS,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    OpenWPM

    OpenWPM

    A web privacy measurement framework

    OpenWPM is a web privacy measurement framework that makes it easy to collect data for privacy studies on a scale of thousands to millions of websites. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection. Check out the instrumentation section below for more details. OpenWPM is tested on Ubuntu 18.04 via TravisCI and is commonly used via the docker container that this repo builds, which is also based on Ubuntu. Although we...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Listen 1

    Listen 1

    One for all free music in china (chrome extension)

    ...Select "Load unpacked extension..." and select the folder you just unpacked. Download the Windows zip file and choose the 32-bit or 64-bit version according to the system. The original web player, using Python to develop a web server. Can run directly on the server, or use the packaged Windows and Mac versions to run the web server locally. Windows, Mac, Linux desktop. Using Electron framework, based on Listen 1 Chrome plug-in version JS library development.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    JobFunnel

    JobFunnel

    Scrape job websites into a single spreadsheet with no duplicates.

    Scrape job websites into a single spreadsheet with no duplicates. Automated tool for scraping job postings into a .csv file. You can search for jobs with YAML configuration files or by passing command arguments. By performing regular scraping and reviewing, you can cut through the noise of even the busiest job markets. Run funnel with your settings YAML to populate your master CSV file with jobs from available providers. JobFunnel can be easily automated to run nightly with crontab. If you...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    Build gen AI apps with an all-in-one modern database: MongoDB Atlas

    MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
    Start Free
  • 5
    Gunicorn

    Gunicorn

    WSGI HTTP Server for UNIX, fast clients and sleepy applications

    Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX. It's a pre-fork worker model. The Gunicorn server is broadly compatible with various web frameworks, simply implemented, light on server resources, and fairly speedy. You can run Gunicorn by using commands or integrate with popular frameworks like Django, Pyramid, or TurboGears. For deploying Gunicorn in production see Deploying Gunicorn.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    changedetection.io

    changedetection.io

    The best free open source website change detection and restock service

    Loved by smart shoppers, data journalists, research engineers, data scientists, security researchers, and more. From simply monitoring website pages that have a change (such as watching prices, and restocking notifications), to deep inspection such as PDF text support, JSON and XML monitoring, and extensive text triggers. Monitor out-of-stock products and get alerts when those products are back in stock, get restock alerts via Discord, Slack, email, and many other platforms. Using the...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    CapRover

    CapRover

    Scalable PaaS (automated Docker+nginx), aka Heroku on Steroids

    CapRover is an extremely easy-to-use app/database deployment & web server manager for your NodeJS, Python, PHP, ASP.NET, Ruby, MySQL, MongoDB, Postgres, WordPress (and etc...) applications! It's blazingly fast and very robust as it uses Docker, Nginx, LetsEncrypt and NetData under the hood behind its simple-to-use interface. For a developer who does not like spending hours and days setting up a server, building tools, sending code to the server, building it, getting an SSL certificate, installing it, update nginx over and over again. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Synapse Machine Learning

    Synapse Machine Learning

    Simple and distributed Machine Learning

    SynapseML (previously MMLSpark) is an open source library to simplify the creation of scalable machine learning pipelines. SynapseML builds on Apache Spark and SparkML to enable new kinds of machine learning, analytics, and model deployment workflows. SynapseML adds many deep learning and data science tools to the Spark ecosystem, including seamless integration of Spark Machine Learning pipelines with the Open Neural Network Exchange (ONNX), LightGBM, The Cognitive Services, Vowpal Wabbit,...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 9
    Letterboxd Recommendations

    Letterboxd Recommendations

    Scraping publicly-accessible Letterboxd data for movie recommendations

    Scraping publicly-accessible Letterboxd data and creating a movie recommendation model with it that can generate recommendations when provided with a Letterboxd username. A user's "star" ratings are scraped from their Letterboxd profile and assigned numerical ratings from 1 to 10 (accounting for half stars). Their ratings are then combined with a sample of ratings from the top 4000 most active users on the site to create a collaborative filtering recommender model using singular value...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Keep company data safe with Chrome Enterprise Icon
    Keep company data safe with Chrome Enterprise

    Protect your business with AI policies and data loss prevention in the browser

    Make AI work your way with Chrome Enterprise. Block unapproved sites and set custom data controls that align with your company's policies.
    Download Chrome
  • 10
    Mezzanine

    Mezzanine

    CMS framework for Django

    Mezzanine is a powerful open source content management platform built using the Django framework. In many ways it is like many other content management tools, offering an intuitive interface for managing all of your content. But Mezzanine is different in that it provides most of its functionality by default. While other platforms rely heavily on modules or reusable applications, Mezzanine comes ready with all the functionality you need, making it the more efficient choice. Mezzanine has a...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    ConsoleMe

    ConsoleMe

    A central control plane for AWS permissions and access

    ConsoleMe is a web service that makes AWS IAM permissions and credential management easier for end-users and cloud administrators. ConsoleMe provides numerous ways to log in to the AWS Console. An IAM Self-Service Wizard lets users request IAM permissions in plain English. Cross-account resource policies will be automatically generated and can be applied with a single click for certain resource types. Weep (ConsoleMe’s CLI) supports 5 different ways of serving AWS credentials locally. Cloud...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Requests for PHP

    Requests for PHP

    Requests for PHP is a humble HTTP request library

    Requests is a HTTP library written in PHP, for human beings. It is roughly based on the API from the excellent Requests Python library. Requests is ISC Licensed (similar to the new BSD license) and has no dependencies, except for PHP 5.6+. Despite PHP’s use as a language for the web, its tools for sending HTTP requests are severely lacking. cURL has an interesting API, to say the least, and you can’t always rely on it being available. Sockets provide only low-level access and require you to build most of the HTTP response parsing yourself. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    django CMS

    django CMS

    Easy-to-use and developer-friendly enterprise CMS powered by Django

    Create modern websites that content editors love. django CMS was originally conceived by web developers frustrated with the technical and security limitations of other systems. Its lightweight core makes it easy to integrate with other software and put to use immediately, while its ease of use makes it the go-to choice for content managers, content editors and website admins. Developers can integrate other existing Django applications rapidly, or build brand new compatible apps that take...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Playwright for .NET

    Playwright for .NET

    .NET version of the Playwright testing and automation library

    Playwright for .NET is the official language port of Playwright, the library to automate Chromium, Firefox and WebKit with a single API. Playwright is built to enable cross-browser web automation that is ever-green, capable, reliable and fast. Cross-browser. Playwright supports all modern rendering engines including Chromium, WebKit, and Firefox. Cross-platform. Test on Windows, Linux, and macOS, locally or on CI, headless or headed. Cross-language. Use the Playwright API in TypeScript, JavaScript, Python, .NET, Java. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    ahCrawler

    A PHP search engine for your website and web analytics tool. GNU GPL3

    ahCrawler is a set to implement your own search on your website and an analyzer for your web content. It can be used on a shared hosting. It consists of * crawler (spider) and indexer * search for your website(s) * search statistics * website analyzer (http header, short titles and keywords, linkchecker, ...) You need to install it on your own server. So all crawled data stay in your environment. You never know when an external webspider updated your content. ...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 16
    Gerapy

    Gerapy

    Distributed Crawler Management Framework Based on Scrapy

    Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Someone who has worked as a crawler with Python may use Scrapy. Scrapy is indeed a very powerful crawler framework. It has high crawling efficiency and good scalability. It is basically a necessary tool for developing crawlers using Python.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Ascoos Web Server

    Ascoos Web Server

    Is a web server for all Web Developers and Web Designers

    For PHP 5.6 - 8.4.X see: Ascoos Web Extended Studio (AWES) is here : https://sourceforge.net/projects/ascoos-web-extended-studio/ ASCOOS Web Server is a rich package designed as a versatile web server for development purposes. It incorporates third-party components such as PHP, MySQL, pgSQL, MongoDB and FileZilla and stands out through a compact setup and a well-built administrative panel. ASCOOS Web Server allows you to work with multiple versions of PHP and MySQL without having to...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    Ascoos Web Extended Studio

    Ascoos Web Extended Studio

    Is a portable web server suite for windows 64Bit, for Web Development.

    Ascoos Web Extended Studio (AWES) is a portable, free 64-bit web server environment for Windows, designed for professional web developers and designers who need flexibility, modularity, and multi-version testing capabilities. It provides a complete local development stack based on technologies such as Apache, PHP, Node.js, Python, MariaDB, MongoDB, FileZilla, and other essential tools
    Downloads: 9 This Week
    Last Update:
    See Project
  • 20
    Web Link Collector 1000

    Web Link Collector 1000

    Automatically collect all links from websites to a clean txt file

    ## About Easily and automatically collect all your links into a neat txt list from a particular website or an entire section of a multi-page website network! Web Link Collector 1000 is a simple tool for gathering links from websites with minimal effort. It helps you collect resources for research, create reference lists, or save useful links without manual copying and pasting. ## Features - Two Collection Modes: Single page or multiple pages of specific website section, or even the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    WallPaper (alias crawlpaper)
    WallPaper (alias crawlpaper) is a desktop changer (NOT a screensaver) which includes a web crawler for picture download, an audio stream ripper, an audio player, a mini mp3 tag editor,etc. Also included support for .zip and .rar files and an interface to the BerkleyDB code for small databases.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    WFDownloader App

    WFDownloader App

    Free batch downloader for image, wallpaper, video, audio, document,

    Use as an image gallery, wallpaper, audio/music, video, document, and other media bulk downloader from supported websites. Also use to download sequential website urls that have a certain pattern (e.g. image01.png to image100.png). Also use app's built-in site crawler for advanced link search or extraction. There is also special support for forum media and open directory downloading. It's a programmable downloader and also works with password protected sites. Say goodbye to downloading one...
    Leader badge
    Downloads: 232 This Week
    Last Update:
    See Project
  • 23
    KemonoDownloader

    KemonoDownloader

    Kemono Downloader - A cross-platform Python app built with PyQt6

    Welcome to Kemono Downloader, a versatile Python-based desktop application built with PyQt6, designed to download content from Kemono.su. This tool enables users to archive individual posts or entire creator profiles from services like Patreon, Fanbox, and more, supporting a wide range of file types with customizable settings and advanced features.
    Leader badge
    Downloads: 358 This Week
    Last Update:
    See Project
  • 24
    Eric Integrated Development Environment

    Eric Integrated Development Environment

    Python Development Environment with all batteries included

    Eric is a Python IDE written using PyQt and QScintilla. It provides various features such as any number of open editors, an integrated (remote) debugger, project management facilities, unit test, refactoring and much more.
    Leader badge
    Downloads: 231 This Week
    Last Update:
    See Project
  • 25
    ZEG / Zero-Effort-Groupware

    ZEG / Zero-Effort-Groupware

    SOGo Zero-Effort-Groupware

    The ZEG (Zero Effort Groupware) edition of SOGo is intended to provide a complete out-of-the-box testing environment of SOGo, the Open Source messaging and calendaring software.
    Leader badge
    Downloads: 24 This Week
    Last Update:
    See Project