Showing 61 open source projects for "python data analysis"

View related business solutions
  • Earn up to 16% annual interest with Nexo. Icon
    Earn up to 16% annual interest with Nexo.

    More flexibility. More control.

    Generate interest, access liquidity without selling, and execute trades seamlessly. All in one platform. Geographic restrictions, eligibility, and terms apply.
    Get started with Nexo.
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    Scrapling

    Scrapling

    An adaptive Web Scraping framework

    Scrapling is an adaptive web scraping framework designed to handle everything from a single HTTP request to large-scale, concurrent crawls. Built for modern websites, it intelligently adapts to structural changes by automatically relocating elements when page layouts update. The framework includes advanced fetchers capable of bypassing anti-bot protections such as Cloudflare Turnstile using stealth and browser automation techniques. Its powerful spider system supports multi-session crawling,...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    python-fxxk-spider

    python-fxxk-spider

    Collection of 100+ Python web scraping projects and crawler examples

    python-fxxk-spider is a curated collection of Python web scraping and crawler projects gathered in a single repository for reference and learning. It aggregates many independent scraping examples that target a wide range of websites, online services, and public data sources. Instead of being a single crawler tool, it functions as a catalog of ready-made Python spider implementations that demonstrate different scraping techniques. python-fxxk-spider includes scrapers for social media, e-commerce platforms, job listings, music services, video platforms, and various content sites. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    bilili

    bilili

    Command-line Bilibili video and danmaku downloader with batch support

    bilili is a command-line tool designed to download videos and related content from the Bilibili video platform. It focuses on enabling users to retrieve user-uploaded videos as well as serialized content such as bangumi episodes directly from the terminal environment. It provides automated downloading capabilities that handle video streams and associated data efficiently while minimizing manual interaction. bilili supports retrieving both the video files and danmaku comments, which are the...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    ...Tasks are scheduled by the task scheduler module in the master node, and received by the task handler module in worker nodes, which executes these tasks in task runners. Task runners are actually processes running spider or crawler programs, and can also send data through gRPC (integrated in SDK) to other data sources, e.g. MongoDB.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    dirhunt

    dirhunt

    Web crawler that finds hidden web directories without brute force

    Dirhunt is an open source security tool designed to discover web directories and analyze website structures without relying on brute-force techniques. Instead of sending large numbers of guess-based requests, it operates as a specialized crawler that intelligently explores websites to identify accessible or hidden directories. Dirhunt can detect directories that expose “Index Of” listings, which may reveal files and other resources that were not intended to be publicly visible. It can also...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    scraper-with-chatgpt
    It is a powerful data scraping tool that helps you extract information from various online sources. Easily collect data from Google SERP, Maps, Shopify, Zillow, and more. With a user-friendly interface, you can scrape and save data in JSON or Excel formats. Unlock insights from the web effortlessly with scrape-it.cloud API.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    DecryptLogin

    DecryptLogin

    Python library providing APIs for automated website login workflows

    DecryptLogin is a Python library designed to simplify automated login processes for many popular websites by providing ready-to-use APIs that simulate authentication behavior. It focuses on implementing login mechanisms through HTTP requests, allowing developers to programmatically authenticate with supported services without manually replicating complex login flows.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    mlscraper

    mlscraper

    ML-based HTML scraper that learns extraction rules from examples

    mlscraper is a Python library designed to automatically extract structured data from HTML pages without requiring developers to manually write CSS selectors or XPath rules. Instead of defining extraction logic by hand, users provide a few examples of the data they want to retrieve from a webpage. It analyzes those examples within the HTML document and determines patterns or rules that can be used to extract the same type of information from similar pages.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 9
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8 Monitoring Tools in One APM. Install in 5 Minutes. Icon
    8 Monitoring Tools in One APM. Install in 5 Minutes.

    Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

    AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.
    Start Free
  • 10
    pspider

    pspider

    Simple Python framework for building multithreaded web crawlers

    PSpider is a lightweight web crawling framework written in Python designed to simplify the development of custom web spiders. It focuses on providing an easy-to-understand architecture while still supporting concurrent crawling for improved performance. It uses a multithreaded model that separates the crawling workflow into several components responsible for fetching, parsing, and saving data.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Scylla

    Scylla

    Intelligent proxy pool for collecting and managing public proxies

    Scylla is an open source proxy pool system designed to collect, validate, and manage large numbers of public proxy servers for use in web scraping and data extraction workflows. It automatically crawls the internet to discover proxy IP addresses and evaluates their availability and reliability before adding them to a usable pool. It includes a JSON API that allows developers and applications to retrieve proxy information programmatically, making it easier to integrate proxy rotation into...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    instagram-profilecrawl

    instagram-profilecrawl

    Instagram profile crawler that extracts posts, tags, and stats

    instagram-profilecrawl is a Python-based automation script designed to collect publicly available information from Instagram profiles. It crawls profile data such as follower counts, post information, hashtags, and other engagement-related metadata. It operates by automating a web browser using Selenium and performing requests to gather structured information from the platform. instagram-profilecrawl can analyze multiple usernames in a single run and store the extracted information locally in structured formats such as JSON. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    lxspider

    lxspider

    Educational Python web scraping case collection for many sites

    lxSpider is a collection of web scraping examples designed primarily for learning and experimentation with data extraction techniques. It gathers numerous crawler implementations that demonstrate how to collect data from a wide range of websites and online services. It focuses heavily on practical cases that illustrate how different platforms handle requests, authentication parameters, and anti-scraping protections. lxSpider includes examples targeting areas such as e-commerce platforms, social media services, content sites, research databases, and information portals. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    ruia

    ruia

    Async Python framework for fast and flexible web scraping spiders

    Ruia is an asynchronous web scraping micro-framework built for Python that focuses on simplicity, speed, and flexibility when creating web crawlers. Ruia is powered by Python’s asyncio library along with aiohttp, enabling developers to perform concurrent network requests efficiently and scrape data from websites with minimal overhead. Ruia follows a “write less, run faster” philosophy, emphasizing concise code and streamlined spider development.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    ast-hook-forjs-re

    ast-hook-forjs-re

    AST-based JavaScript reverse engineering and variable tracing toolkit

    ast-hook-for-js-RE is an open source JavaScript reverse engineering toolkit designed to help analysts locate and understand client-side encryption logic used by web applications. It works by intercepting browser traffic through a local proxy server and modifying JavaScript code before it executes in the browser. Using Abstract Syntax Tree (AST) transformations, it injects hook functions into the code to monitor variable assignments and other runtime changes during execution. This allows...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    yabasta

    yabasta

    Yet Another BAsic Scraper and Text Analysis

    YA BASTA! is a Python/R application for Lyrics Web Scraper and Text Analysis. Web scraping is developed in Python, text analysis in R as Python subprocesses. YA BASTA! is only tested on windows OS. To run YA BASTA! just type on window command prompt: python.exe yabasta.py
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    GoogleScraper

    GoogleScraper

    Python tool for scraping search engine results from many providers

    GoogleScraper is a Python-based tool designed to automatically collect and process search engine results from multiple providers. It enables developers and researchers to programmatically query search engines and extract useful information such as links, titles, and result descriptions. GoogleScraper supports several major search engines and can be used to gather structured datasets from search result pages for further analysis.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    xsrfprobe

    xsrfprobe

    Advanced toolkit for detecting and exploiting CSRF vulnerabilities

    XSRFProbe is an advanced security auditing toolkit designed to detect and analyze Cross Site Request Forgery (CSRF/XSRF) vulnerabilities in web applications. It uses an automated crawling engine that continuously scans a target application, collects forms and endpoints, and evaluates them for potential CSRF weaknesses. XSRFProbe performs numerous systematic checks to determine whether a web endpoint is vulnerable, including inspection of anti-CSRF tokens, cookie validation behavior, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    ECommerceCrawlers

    ECommerceCrawlers

    Collection of Python ecommerce and website crawler examples projects

    ECommerceCrawlers is a collection of practical Python web crawler projects designed to gather data from a variety of ecommerce platforms, websites, and online services. It aggregates many independent crawler examples created by contributors and organized into separate subprojects that target specific sites or data sources. These examples demonstrate how to build and operate web scrapers capable of collecting structured information such as product listings, news content, job postings, social media data, and other publicly available web data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    Requests-HTML

    Requests-HTML

    Pythonic HTML Parsing for Humans

    This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When using this library you automatically get full JavaScript support! (Using Chromium, thanks to puppeteer) CSS Selectors (a.k.a jQuery-style, thanks to PyQuery). XPath Selectors, for the faint of heart. Mocked user-agent (like a real web browser). Automatic following of redirects. Connection–pooling and cookie persistence. The Requests experience you know and love, with magical parsing...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    django-dynamic-scraper

    django-dynamic-scraper

    Creating Scrapy scrapers via the Django admin interface

    Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface. With Django Dynamic Scraper (DDS) you can define your Scrapy scrapers dynamically via the Django admin interface and save your scraped items in the database you defined for your Django project. Since it simplifies things DDS is not usable for all kinds of scrapers, but...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    ...Using text segmentation and frequency analysis, the project can create a word cloud representing common keywords found in the dataset. This makes the repository both a scraping example and a small data analysis experiment built around the collected content. Overall, mzitu serves as a learning-oriented implementation of Python web scraping, data processing, and visualization techniques.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Twitter Intelligence

    Twitter Intelligence

    Twitter Intelligence OSINT project performs tracking and analysis

    A project written in Python for Twitter tracking and analysis without using Twitter API. This project is a Python 3.x application. The package dependencies are in the file requirements.txt. Run that command to install the dependencies. SQLite is used as the database. Tweet data is stored on the Tweet, User, Location, Hashtag, HashtagTweet tables. The database is created automatically. analysis.py performs analysis processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    WeChatSogou

    WeChatSogou

    Python library to crawl and retrieve data from WeChat accounts

    WechatSogou is an open source Python library designed to retrieve data from WeChat official accounts by using the Sogou WeChat search service as its data source. It provides developers with a programmatic way to search for public accounts and collect article information without manually browsing the search interface. It functions as a crawler interface that sends requests to the search engine, retrieves results, and converts the returned pages into structured data that can be used in applications or analysis pipelines. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 25
    pyspider

    pyspider

    A powerful Spider(Web Crawler) system in Python

    pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB