Showing 334 open source projects for "python web crawler"

View related business solutions
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    WebMagic

    WebMagic

    A scalable web crawler framework for Java

    WebMagic is a scalable crawler framework. It covers the whole lifecycle of crawler, downloading, url management, content extraction and persistent. It can simplify the development of a specific crawler. WebMagic is a simple but scalable crawler framework. You can develop a crawler easily based on it. WebMagic has a simple core with high flexibility, a simple API for html extracting. It also provides annotation with POJO to customize a crawler, and no configuration is needed. Some other...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Heritrix

    Heritrix

    Internet Archive's open-source, web-scale, web crawler project

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. Heritrix (sometimes spelled heretrix, or misspelled or missaid as heratrix/heritix/heretix/heratix) is an archaic word for heiress (woman who inherits). Since our crawler seeks to collect and preserve the digital artifacts of our culture for the benefit of future researchers and generations, this name seemed apt.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    Odigos

    Odigos

    Distributed tracing without code changes

    ...Odigos automatically scales OpenTelemetry collectors based on observability data volume. Manage and configure collectors via a convenient web UI. Installing Odigos takes less than 5 minutes, and requires no code changes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • The All-in-One Commerce Platform for Businesses - Shopify Icon
    The All-in-One Commerce Platform for Businesses - Shopify

    Shopify offers plans for anyone that wants to sell products online and build an ecommerce store, small to mid-sized businesses as well as enterprise

    Shopify is a leading all-in-one commerce platform that enables businesses to start, build, and grow their online and physical stores. It offers tools to create customized websites, manage inventory, process payments, and sell across multiple channels including online, in-person, wholesale, and global markets. The platform includes integrated marketing tools, analytics, and customer engagement features to help merchants reach and retain customers. Shopify supports thousands of third-party apps and offers developer-friendly APIs for custom solutions. With world-class checkout technology, Shopify powers over 150 million high-intent shoppers worldwide. Its reliable, scalable infrastructure ensures fast performance and seamless operations at any business size.
    Learn More
  • 5
    QR Code generator library

    QR Code generator library

    High-quality QR Code generator library in Java, TypeScript/JavaScript

    This project aims to be the best, clearest library for generating QR Codes. My primary goals are flexible options and absolute correctness. The secondary goals are compact implementation size and good documentation comments. This work is an independent implementation based on reading the official ISO specification documents. I believe that my library has a more intuitive API and shorter code length than competing libraries out there. The library is designed first in Java and then ported to...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 6
    Siddhi Core Libraries

    Siddhi Core Libraries

    Stream Processing and Complex Event Processing Engine

    Fully open source, cloud-native, scalable, micro streaming, and complex event processing system capable of building event-driven applications for use cases such as real-time analytics, data integration, notification management, and adaptive decision-making. Event processing logic can be written using Streaming SQL queries via graphical and source editors, to capture events from diverse data sources, process and analyze them, integrate with multiple services and data stores, and publish...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    PULSAR

    PULSAR

    Distributed pub-sub messaging system

    Apache Pulsar is a cloud-native, distributed messaging and streaming platform originally created at Yahoo! and now a top-level Apache Software Foundation project. Easy to deploy, lightweight compute process, developer-friendly APIs, no need to run your own stream processing engine. Run in production at Yahoo! scale for over 5 years, with millions of messages per second across millions of topics. Expand capacity seamlessly to hundreds of nodes. Low publish latency (< 5ms) at scale with strong...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Framework Benchmarks

    Framework Benchmarks

    Source for the TechEmpower Framework Benchmarks project

    ...Please feel free to ask questions here. We encourage new frameworks and contributors to ask questions. We're here to help! This project provides representative performance measures across a wide field of web application frameworks. With much help from the community, coverage is quite broad and we are happy to broaden it further with contributions. The project presently includes frameworks on many languages including Go, Python, Java, Ruby, PHP, C#, F#,Clojure, Groovy, Dart, JavaScript, Erlang, Haskell, Scala, Perl, Lua, C, and others. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    WFDownloader App

    WFDownloader App

    Free batch downloader for image, wallpaper, video, audio, document,

    Use as an image gallery, wallpaper, audio/music, video, document, and other media bulk downloader from supported websites. Also use to download sequential website urls that have a certain pattern (e.g. image01.png to image100.png). Also use app's built-in site crawler for advanced link search or extraction. There is also special support for forum media and open directory downloading. It's a programmable downloader and also works with password protected sites. Say goodbye to downloading one...
    Leader badge
    Downloads: 237 This Week
    Last Update:
    See Project
  • Enterprise and Small Business CRM Solution | Clear C2 C2CRM Icon
    Enterprise and Small Business CRM Solution | Clear C2 C2CRM

    Voted Best CRM System with Top Ranked Customer Support. CRM Management includes Sales, Marketing, Relationship Management, and Help Desk.

    C2CRM consists of four modules that integrate to provide a comprehensive CRM solution: Relationship Management, Sales Automation, Marketing Automation, and Customer Service. Only buy what each user needs.
    Learn More
  • 10
    HEALPix

    HEALPix

    Data Analysis, Simulations and Visualization on the Sphere

    Software for pixelization, hierarchical indexation, synthesis, analysis, and visualization of data on the sphere. Please acknowledge HEALPix by quoting the web page http://healpix.sourceforge.net (or https://healpix.sourceforge.io) and publication: K.M. Gorski et al., 2005, Ap.J., 622, p.759 Full software documentation available at https://healpix.sourceforge.io/documentation.php Wiki Pages: https://sourceforge.net/p/healpix/wiki/Home Exchanging Data with HEALPix (in FITS files):...
    Leader badge
    Downloads: 433 This Week
    Last Update:
    See Project
  • 11
    The Sashimi project hosts the Trans-Proteomic Pipeline (TPP), a mature suite of tools for mass-spec (MS, MS/MS) based proteomics: statistical validation, quantitation, visualization, and converters from raw MS data to the open mzML/mzXML formats.
    Leader badge
    Downloads: 243 This Week
    Last Update:
    See Project
  • 12
    ZK - Simply Ajax and Mobile
    ZK is an open-source Java framework for building modern web and mobile applications. It enables developers to create rich, interactive UIs using only Java — no JavaScript required. With 200+ Ajax-powered components, event-driven architecture, and support for popular technologies like Spring, Java EE, and JSP/JSF, ZK makes it simple to deliver powerful and user-friendly web applications.
    Downloads: 44 This Week
    Last Update:
    See Project
  • 13
    NanoH5 (tsl2nano)

    NanoH5 (tsl2nano)

    java bean / database driven zero code application framework

    NanoH5 (or FullRelation) is a fullstack UI implementation framework providing a model driven design (MDA). Build a complete html5 application through a given class- or database-model without coding (coding APIs are available).
    Leader badge
    Downloads: 774 This Week
    Last Update:
    See Project
  • 14
    YehDown

    YehDown

    A video downloader from youtube viemo and all major sites

    A video downloader . The Official home page for the YehDown tool has been published for the new feature updates. : https://Yehdown.yehigo.com The new Yehdown software download the video with improved download speed. The current update has a best user friendly UI. The tool has support for live , in real-time update for new features. Tested on windows 11.
    Downloads: 9 This Week
    Last Update:
    See Project
  • 15
    EulerSharp

    EulerSharp

    Euler Yet another proof Engine

    EYE [1] is a reasoning engine supporting the Semantic Web layers [2]. It performs controlled chaining and it supports Euler paths [3]. Via N3 [4] it is interoperable with Cwm [5]. [1] http://eulersharp.sourceforge.net/README [2] http://www.w3.org/DesignIssues/diagrams/sweb-stack/2006a [3] http://mathworld.wolfram.com/KoenigsbergBridgeProblem.html [4] http://www.w3.org/TeamSubmission/n3/ [5] http://www.w3.org/2000/10/swap/doc/cwm
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    Autoplot

    Autoplot

    Autoplot is an interactive browser for data on the web

    Autoplot is an interactive browser for data on the web. Give Autoplot a URL or local file name and it creates a sensible plot of the data. Autoplot allows you to interactively browse data stored in ascii, .cdf, netcdf, and many other formats. Autoplot's source has been moved to GitHub. Thanks to SourceForge for many years of hosting!
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    StrongKey FIDO Server (SKFS)

    StrongKey FIDO Server (SKFS)

    FIDO® Certified StrongKey FIDO Server (SKFS)

    An open source implementation of the FIDO2 protocol to support passwordless strong authentication using public-key cryptography. Supports registration, authentication (all platforms), and transaction authorization (for native Android apps).
    Downloads: 14 This Week
    Last Update:
    See Project
  • 18
    This is a C library to check the validity of German and Austrian Bank Account Numbers. All currently defined test methods by Deutsche Bundesbank (Dec 2017: 00 to E4) are implemented. Modules for AWK, Perl, PHP, Python, Ruby, C#.net and VB.net are included too. The package includes also an IBAN converter to generate (german) IBANs and BICs from account data. All currently defined IBAN rules by Deutsche Bundesbank are implemented (Dec 2017: 57 rules) and tested against independent solutions.
    Leader badge
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    lixa

    lixa

    LIXA, LIbre XA, is a free and open source XA transaction manager

    ...LIXA is a Transaction Manager but it's not a Transaction Monitor: this is the distinguishing feature of the project. LIXA technology enables every application container, like a web server or a shell, to become a two phase commit application server. The client/server architecture of LIXA allows many application containers to share a single LIXA (state) server: this is ideal when horizontal scalability is a must and many identical application containers must refer to a single transactional environment. LIXA can be used with the C, C++, Java, Python and COBOL programming languages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    eCxx

    eCxx

    A C++ library for AVR and NodeMCU

    NOTE: This project is marked with 'Status: Abandoned' on SourceForge because not enough time can be dedicated to this project. However it may still get sporadic commits to the repository. eCxx is a library for AVR and NodeMCU tailored for micro LED displays and lighting effects. eCxx is utilizing Makefile build system. Java and Python based applications/tools are also included to ease the development and debugging process using the host PC. On one side, eCxx supports the original...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ACHE Focused Crawler

    ACHE Focused Crawler

    ACHE is a web crawler for domain-specific search

    ACHE is a focused web crawler. It collects web pages that satisfy some specific criteria, e.g., pages that belong to a given domain or that contain a user-specified pattern. ACHE differs from generic crawlers in sense that it uses page classifiers to distinguish between relevant and irrelevant pages in a given domain. A page classifier can be defined as a simple regular expression (e.g., that matches every page that contains a specific word) or a machine-learning-based classification model. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22

    PMS for REGZA

    A DLNA-compliant UPnP Media Server

    PMS for REGZA is a DLNA-compliant Media Server. As a fork build of well-known "PS3 Media Server", This aims especially to improve functionality on TOSHIBA REGZA TVs With preserving applicabilities to other Renderers. Details: Home Page: http://www32.atwiki.jp/pms_regza
    Leader badge
    Downloads: 9 This Week
    Last Update:
    See Project
  • 23
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 3 This Week
    Last Update:
    See Project
  • 24
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Repeat

    Repeat

    Mouse/keyboard record/replay and automation hotkeys/macros creation

    Full-fledged mouse/keyboard record/replay and automation hotkeys/macros creation using modern programming languages, and more advanced automation features. Working across three major OSes: Windows, OSX, and Linux. See more at https://github.com/repeats/Repeat Repeat yourself with some intelligence. This, if used correctly, can improve your productivity greatly.
    Leader badge
    Downloads: 87 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next