Showing 334 open source projects for "python web crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB 8.0 on Atlas | Run anywhere Icon
    MongoDB 8.0 on Atlas | Run anywhere

    Now available in even more cloud regions across AWS, Azure, and Google Cloud.

    MongoDB 8.0 brings enhanced performance and flexibility to Atlas—with expanded availability across 125+ regions globally. Build modern apps anywhere your users are, with the power of a modern database behind you.
    Learn More
  • 1
    Odigos

    Odigos

    Distributed tracing without code changes

    Odigos supports any application written in Java, Python, .NET, Node.js and Go. Historically, compiled languages like Go have been difficult to instrument without code changes. Odigos solves this problem by uniquely leveraging eBPF. Odigos currently supports all the popular managed and open source destinations. By producing data in the OpenTelemetry format, Odigos can be used with any observability tool that supports OTLP. Odigos automatically scales OpenTelemetry collectors based...
    Downloads: 28 This Week
    Last Update:
    See Project
  • 2
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt. Heritrix is designed to respect the robots.txt exclusion directives...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. Master node and worker nodes communicate...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    QR Code generator library

    QR Code generator library

    High-quality QR Code generator library in Java, TypeScript/JavaScript

    ... to TypeScript, Python, Rust, C++, and C. It is open source under the MIT License. For each language, the codebase is roughly 1000 lines of code and has no dependencies other than the respective language’s standard library.
    Downloads: 13 This Week
    Last Update:
    See Project
  • Secure remote access solution to your private network, in the cloud or on-prem. Icon
    Secure remote access solution to your private network, in the cloud or on-prem.

    Deliver secure remote access with OpenVPN.

    OpenVPN is here to bring simple, flexible, and cost-effective secure remote access to companies of all sizes, regardless of where their resources are located.
    Get started — no credit card required.
  • 5
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other features...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    PULSAR

    PULSAR

    Distributed pub-sub messaging system

    ... durability guarantees. Configurable replication between data centers across multiple geographic regions. Built from the ground up as a multi-tenant system. Supports isolation, authentication, authorization and quotas. Persistent message storage based on Apache BookKeeper. IO-level isolation between write and read operations. Flexible messaging models with high-level APIs for Java, Go, Python, C++, Node.js, WebSocket and C#.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Siddhi Core Libraries

    Siddhi Core Libraries

    Stream Processing and Complex Event Processing Engine

    ... to various endpoints in real time. Agile development experience with SQL-like query language and graphical drag-and-drop editor supporting event simulation. Lightweight runtime that can natively run on Kubernetes, Docker, VM, or bare metal, and embedded in any Java or Python application. Scalable, and highly available distributed event processing on Kubernetes, with NATS Streaming and Siddhi Kubernetes Operator.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    ACHE Focused Crawler

    ACHE Focused Crawler

    ACHE is a web crawler for domain-specific search

    ACHE is a focused web crawler. It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Framework Benchmarks

    Framework Benchmarks

    Source for the TechEmpower Framework Benchmarks project

    If you're new to the project, welcome! Please feel free to ask questions here. We encourage new frameworks and contributors to ask questions. We're here to help! This project provides representative performance measures across a wide field of web application frameworks. With much help from the community, coverage is quite broad and we are happy to broaden it further with contributions. The project presently includes frameworks on many languages including Go, Python, Java, Ruby, PHP, C#, F...
    Downloads: 0 This Week
    Last Update:
    See Project
  • No-Nonsense Code-to-Cloud Security for Devs | Aikido Icon
    No-Nonsense Code-to-Cloud Security for Devs | Aikido

    Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

    Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.
    Start for Free
  • 10
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender...
    Downloads: 30 This Week
    Last Update:
    See Project
  • 11
    WFDownloader App

    WFDownloader App

    Free batch downloader for image, wallpaper, video, audio, document,

    Use as an image gallery, wallpaper, audio/music, video, document, and other media bulk downloader from supported websites. Also use to download sequential website urls that have a certain pattern (e.g. image01.png to image100.png). Also use app's built-in site crawler for advanced link search or extraction. There is also special support for forum media and open directory downloading. It's a programmable downloader and also works with password protected sites. Say goodbye to downloading one...
    Leader badge
    Downloads: 256 This Week
    Last Update:
    See Project
  • 12
    HEALPix

    HEALPix

    Data Analysis, Simulations and Visualization on the Sphere

    Software for pixelization, hierarchical indexation, synthesis, analysis, and visualization of data on the sphere. Please acknowledge HEALPix by quoting the web page http://healpix.sourceforge.net (or https://healpix.sourceforge.io) and publication: K.M. Gorski et al., 2005, Ap.J., 622, p.759 Full software documentation available at https://healpix.sourceforge.io/documentation.php Wiki Pages: https://sourceforge.net/p/healpix/wiki/Home Exchanging Data with HEALPix (in FITS files): https...
    Leader badge
    Downloads: 227 This Week
    Last Update:
    See Project
  • 13
    The Sashimi project hosts the Trans-Proteomic Pipeline (TPP), a mature suite of tools for mass-spec (MS, MS/MS) based proteomics: statistical validation, quantitation, visualization, and converters from raw MS data to the open mzML/mzXML formats.
    Leader badge
    Downloads: 32 This Week
    Last Update:
    See Project
  • 14
    ZK - Simply Ajax and Mobile
    Ajax+Mobile Java Web framework. With 200+ Ajax components and event-driven, Ajax/RIA apps are as effortless and rich as desktop apps and HTML/XUL pages. Support JSP/JSF/JavaEE/Spring, Ajax Push and Client-fusion; also Java/Groovy/Python/JavaScript.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 15
    StrongKey FIDO Server (SKFS)

    StrongKey FIDO Server (SKFS)

    FIDO® Certified StrongKey FIDO Server (SKFS)

    An open source implementation of the FIDO2 protocol to support passwordless strong authentication using public-key cryptography. Supports registration, authentication (all platforms), and transaction authorization (for native Android apps).
    Downloads: 15 This Week
    Last Update:
    See Project
  • 16
    YehDown

    YehDown

    A video downloader from youtube viemo and all major sites

    A video downloader . The Official home page for the YehDown tool has been published for the new feature updates. : https://Yehdown.yehigo.com The new Yehdown software download the video with improved download speed. The current update has a best user friendly UI. The tool has support for live , in real-time update for new features. Tested on windows 11.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 17

    PMS for REGZA

    A DLNA-compliant UPnP Media Server

    PMS for REGZA is a DLNA-compliant Media Server. As a fork build of well-known "PS3 Media Server", This aims especially to improve functionality on TOSHIBA REGZA TVs With preserving applicabilities to other Renderers. Details: Home Page: http://www32.atwiki.jp/pms_regza
    Leader badge
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    This is a C library to check the validity of German and Austrian Bank Account Numbers. All currently defined test methods by Deutsche Bundesbank (Dec 2017: 00 to E4) are implemented. Modules for AWK, Perl, PHP, Python, Ruby, C#.net and VB.net are included too. The package includes also an IBAN converter to generate (german) IBANs and BICs from account data. All currently defined IBAN rules by Deutsche Bundesbank are implemented (Dec 2017: 57 rules) and tested against independent solutions.
    Leader badge
    Downloads: 3 This Week
    Last Update:
    See Project
  • 19
    EulerSharp

    EulerSharp

    Euler Yet another proof Engine

    EYE [1] is a reasoning engine supporting the Semantic Web layers [2]. It performs controlled chaining and it supports Euler paths [3]. Via N3 [4] it is interoperable with Cwm [5]. [1] http://eulersharp.sourceforge.net/README [2] http://www.w3.org/DesignIssues/diagrams/sweb-stack/2006a [3] http://mathworld.wolfram.com/KoenigsbergBridgeProblem.html [4] http://www.w3.org/TeamSubmission/n3/ [5] http://www.w3.org/2000/10/swap/doc/cwm
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    Autoplot

    Autoplot

    Autoplot is an interactive browser for data on the web

    Autoplot is an interactive browser for data on the web. Give Autoplot a URL or local file name and it creates a sensible plot of the data. Autoplot allows you to interactively browse data stored in ascii, .cdf, netcdf, and many other formats. Autoplot's source has been moved to GitHub. Thanks to SourceForge for many years of hosting!
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    NanoH5 (tsl2nano)

    NanoH5 (tsl2nano)

    java bean / database driven zero code application framework

    NanoH5 (or FullRelation) is a fullstack UI implementation framework providing a model driven design (MDA). Build a complete html5 application through a given class- or database-model without coding (coding APIs are available).
    Leader badge
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    lixa

    lixa

    LIXA, LIbre XA, is a free and open source XA transaction manager

    ... technology enables every application container, like a web server or a shell, to become a two phase commit application server. The client/server architecture of LIXA allows many application containers to share a single LIXA (state) server: this is ideal when horizontal scalability is a must and many identical application containers must refer to a single transactional environment. LIXA can be used with the C, C++, Java, Python and COBOL programming languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    eCxx

    eCxx

    A C++ library for AVR and NodeMCU

    NOTE: This project is marked with 'Status: Abandoned' on SourceForge because not enough time can be dedicated to this project. However it may still get sporadic commits to the repository. eCxx is a library for AVR and NodeMCU tailored for micro LED displays and lighting effects. eCxx is utilizing Makefile build system. Java and Python based applications/tools are also included to ease the development and debugging process using the host PC. On one side, eCxx supports the original...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Repeat

    Repeat

    Mouse/keyboard record/replay and automation hotkeys/macros creation

    Full-fledged mouse/keyboard record/replay and automation hotkeys/macros creation using modern programming languages, and more advanced automation features. Working across three major OSes: Windows, OSX, and Linux. See more at https://github.com/repeats/Repeat Repeat yourself with some intelligence. This, if used correctly, can improve your productivity greatly.
    Leader badge
    Downloads: 70 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.