Showing 143 open source projects for "web pages"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Add Two Lines of Code. Get Full APM. Icon
    Add Two Lines of Code. Get Full APM.

    AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

    Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.
    Start Free
  • 1
    AutoScraper

    AutoScraper

    A Smart, Automatic, Fast and Lightweight Web Scraper for Python

    This project is made for automatic web scraping to make scraping easy. It gets a URL or the HTML content of a web page and a list of sample data that we want to scrape from that page. This data can be text, URL or any HTML tag value of that page. It learns the scraping rules and returns similar elements. Then you can use this learned object with new URLs to get similar content or the exact same element of those new pages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Webifier

    Webifier

    A GitHub Action to deploy Notebooks, Markdowns

    Webifier is a stand-alone build tool for converting any repository into a deployable jekyll website. You can define your pages via yaml files and provide notebooks, markdown and pdf and other files for Webifier to render. It uses python markdown providing additional control over attributes and other extensive functionalities. It lets you define and direct how your web pages feel and automatically manages your assets, making it a perfect solution for fast static website development and a straightforward tool for creating Github pages as a Github action. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ruia

    ruia

    Async Python framework for fast and flexible web scraping spiders

    Ruia is an asynchronous web scraping micro-framework built for Python that focuses on simplicity, speed, and flexibility when creating web crawlers. Ruia is powered by Python’s asyncio library along with aiohttp, enabling developers to perform concurrent network requests efficiently and scrape data from websites with minimal overhead. Ruia follows a “write less, run faster” philosophy, emphasizing concise code and streamlined spider development. It provides a structured approach to building...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    googler

    googler

    Google from the terminal

    googler is a power tool to Google (web, news, videos and site search) from the command line. It shows the title, URL and abstract for each result, which can be directly opened in a browser from the terminal. Results are fetched in pages (with page navigation). Supports sequential searches in a single googler instance. googler was initially written to cater to headless servers without X.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 5
    googler

    googler

    Google Search, Google Site Search, Google News from the terminal

    googler is a power tool to Google (Web & News) and Google Site Search from the command-line. It shows the title, URL and abstract for each result, which can be directly opened in a browser from the terminal. Results are fetched in pages (with page navigation). Supports sequential searches in a single googler instance. googler was initially written to cater to headless servers without X.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Python Handout

    Python Handout

    Turn Python scripts into handouts with Markdown and figures

    Handout is a lightweight library for embedding rich, interactive components such as exercises, charts, and interactive diagrams directly into static documents like Markdown, Jupyter notebooks, or static HTML pages, enabling authors to create more engaging technical handouts, tutorials, and interactive essays. It’s particularly aimed at educators, presenters, and researchers who want to make their written material come alive with runnable demonstrations and interactive problem sets without bundling a full web framework. Handout supports embedding executable exercises where learners can type code, run it in place, and receive immediate feedback inline; it also integrates seamlessly with charting libraries so that data visualizations can be interactive rather than static. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    GoogleScraper

    GoogleScraper

    Python tool for scraping search engine results from many providers

    GoogleScraper is a Python-based tool designed to automatically collect and process search engine results from multiple providers. It enables developers and researchers to programmatically query search engines and extract useful information such as links, titles, and result descriptions. GoogleScraper supports several major search engines and can be used to gather structured datasets from search result pages for further analysis. It provides two different scraping approaches: sending direct...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    xsrfprobe

    xsrfprobe

    Advanced toolkit for detecting and exploiting CSRF vulnerabilities

    XSRFProbe is an advanced security auditing toolkit designed to detect and analyze Cross Site Request Forgery (CSRF/XSRF) vulnerabilities in web applications. It uses an automated crawling engine that continuously scans a target application, collects forms and endpoints, and evaluates them for potential CSRF weaknesses. XSRFProbe performs numerous systematic checks to determine whether a web endpoint is vulnerable, including inspection of anti-CSRF tokens, cookie validation behavior, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    Budou

    Budou

    Budou is an auto organizer tool for beautiful line breaking in CJK

    Budou is a Python library developed by Google to improve web typography for CJK (Chinese, Japanese, Korean) languages by producing semantically meaningful line breaks. Unlike English, CJK scripts lack spaces or hyphenation cues, often resulting in awkward or unreadable text wrapping on web pages. Budou addresses this issue by segmenting sentences into logical lexical chunks and wrapping each chunk in non-breaking HTML <span> tags.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    Fav-up

    Fav-up

    Look up IP addresses using favicon hashes via Shodan

    ...This technique is commonly used in security research and OSINT investigations to discover related infrastructure or services that may belong to the same organization. fav-up can retrieve favicon data from several sources, including local files, direct favicon URLs, or full web pages where the favicon is automatically extracted. fav-up then computes the favicon hash and performs Shodan queries to locate IP addresses that match the same hash. To support larger investigations, the tool can iterate over lists of URLs, domains, or favicon files in bulk. Results can be printed to the console or exported into structured formats such as CSV or JSON for further analysis and reporting.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    AET

    AET

    Detects visual changes on websites and performs page health checks

    AET is a system that detects visual changes on websites and performs basic page health checks (like w3c compliance, accessibility, HTTP status codes, JS Error checks and others). AET is designed as a flexible system that can be adapted and tailored to the regression requirements of a given project. The tool has been developed to aid front-end client-side layout regression testing of websites or portfolios, in essence assessing the impact or change of a website from one snapshot to the next.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    lazynlp

    lazynlp

    Library to scrape and clean web pages to create massive datasets

    LazyNLP is a lightweight tool for collecting and curating large-scale text datasets for machine learning and NLP applications with minimal manual effort.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    mzitu

    mzitu

    Python crawler that downloads image galleries and analyzes titles

    mzitu is a Python-based web crawling project designed to automatically download and organize image galleries from a specific photography site. It demonstrates how to build a scraper that navigates gallery pages, retrieves image links, and saves the images locally in a structured directory layout. It focuses on automating the collection of large sets of images by programmatically parsing page content and iterating through gallery entries. mzitu also includes a simple analysis script that processes downloaded folder names to generate statistics and visualizations. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 14
    WeChatSogou

    WeChatSogou

    Python library to crawl and retrieve data from WeChat accounts

    WechatSogou is an open source Python library designed to retrieve data from WeChat official accounts by using the Sogou WeChat search service as its data source. It provides developers with a programmatic way to search for public accounts and collect article information without manually browsing the search interface. It functions as a crawler interface that sends requests to the search engine, retrieves results, and converts the returned pages into structured data that can be used in...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Toapi

    Toapi

    Convert websites into structured APIs automatically with Python tool

    Toapi is a Python library designed to transform ordinary websites into usable API services. Instead of building a traditional web crawler that collects and stores data before exposing it through an API, Toapi simplifies the process by allowing developers to define data structures that automatically generate an API layer from existing web pages. It works by parsing HTML content from a source site and mapping selected elements into structured data that can be returned as JSON through API endpoints. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    pyspider

    pyspider

    A powerful Spider(Web Crawler) system in Python

    pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking. Since pyspider has various components, you can just run pyspider to start a standalone and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    sqliv

    sqliv

    Massive SQL injection vulnerability scanner for automated web testing

    SQLiv is a command-line security tool designed to identify SQL injection vulnerabilities in web applications through automated scanning techniques. Written primarily in Python, the project focuses on discovering potentially vulnerable web pages by analyzing URLs that contain database query parameters. It can perform large-scale scanning by using search engine queries known as SQL injection dorks to collect candidate websites and then test them for vulnerabilities. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    pyadselfservice
    pyadselfservice is a software created using Python 3.5 and Django 1.10. This project aims to provide web based password change interface to the end users, for their Active Directory account. While changing the password, users won't not need to enter their current password. Which means users can change their password even if they have forgotten their current password. Moreover, while changing the password, this software will automatically unlock the user account if it is locked. The...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    MangaPark-DL

    MangaPark-DL

    A Python script to download mangas from MangaPark

    ...The tool allows users to specify individual chapters or ranges, automating the retrieval of images and their conversion into a single document format. It is particularly useful for mobile users who prefer reading manga in PDF form rather than browsing online chapter pages. The script includes options for resizing images to reduce file size and avoid processing issues during PDF generation. It is lightweight and relies on standard Python dependencies, making it easy to run in most environments. The project emphasizes simplicity and efficiency, focusing on a single workflow that converts web-hosted manga into portable reading files.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    htmlarea

    htmlarea

    Small, powerful, full featured WYSIWYG editor

    HTMLArea 4 is a browser based WYSIWYG editor that easily replaces the TEXTAREA in your web pages. It is written in JavaScript, and suitable for use in any modern web browser, and any page on your web site. Current version is 4.0-2016-08-29
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    QAL

    QAL

    Query Abstraction Layer

    Project has moved to: https://github.com/OptimalBPM/qal QAL is a collection of libraries for mining, transforming and writing data from and to a number of places. Sources and destinations include different SQL and NoSQL backends, file formats like .csv, XML and excel. Even untidy HTML web pages. It has a database abstraction layer that supports connectivity to Postgres, MySQL, DB2, Oracle, MS SQL server. JSON and MongoDB is coming. It uses XML/JSON formats(self-generated SQL schemas) for representing queries, transformation and merging, making it scriptable. This means that QAL can be backend agnostic about a subset of SQL features and data types. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    COAR-DMS

    COAR-DMS

    DMS for linux, C++ library, server, webUI , SOAP

    COAR-DMS is document management system for 32/64 bit. linux. Acts as library, server and tools. Library features: - storage management, free pages recycling - transaction log - indexing: full text, tags, metadata, document attributes - inverted index - versioning, collaboration - document trees, trees versionning - folders - plugins for auth (PAM,LDAP), db, file types plugins - tags - metadata (key value pairs) - object level security, folders documents ACL, - unix...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    sitecheck

    Modular web site spider for web developers.

    More than just a link checker, sitecheck is a website spider (also known as a crawler) which can assist with SEO by testing an entire site plus both inbound links from search engines and outbound links to other sites for the following issues: looping redirects (HTTP 301/302), broken links (HTTP 404), server errors (HTTP 500), spelling mistakes, low readability scores (using the Flesch Reading Ease test), missing/empty/duplicate meta tags, duplicate content, slow page speed, W3C validation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24

    SE Auditor

    Free SEO audit software.

    SE Auditor is a program for analyzing web pages for search engines. SE Auditor is application that you can use to view statistical data about your website, in order to improve its position within the Web search results. SE Auditor is addressed to SEO professionals, website designers, developers, website testers and owners. SE Auditor enables you to check meta description, keywords, sitemap, the number of links and keyword consistency, the text/HTML ratio and many more ranking / usability / social factors. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    WebChemViewer

    A simple program for sharing molecular structures with associated data

    Sharing lists of molecular structures with associated chemical properties is a common task in computer-aided drug design and medicinal chemistry. WebChem Viewer is a simple, free, open-source program that generates HTML-formatted output that can be viewed in any modern web browser, on any operating system (including mobile), without requiring the installation of additional software. The output can also be easily incorporated into existing web pages. WebChem Viewer is released under the FreeBSD license. It was created by Jacob Durrant, a post-doc in the lab of Rommie E. Amaro.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB