Showing 53 open source projects for "python web crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • 1
    JupyterLab

    JupyterLab

    JupyterLab computational environment

    JupyterLab is the next-generation web-based user interface for Project Jupyter. Try it on Binder. JupyterLab follows the Jupyter Community Guides. JupyterLab enables you to work with documents and activities such as Jupyter notebooks, text editors, terminals, and custom components in a flexible, integrated, and extensible manner. You can arrange multiple documents and activities side by side in the work area using tabs and splitters. Documents and activities integrate with each other, enabling...
    Downloads: 74 This Week
    Last Update:
    See Project
  • 2
    Render Farm Manager, Project Tracker.

    Render Farm Manager, Project Tracker.

    CGRU: Afanasy render farm manager and RULES project tracker.

    CGRU is an open source CG tools pack, includes Afanasy render farm manager and RULES project tracker.
    Leader badge
    Downloads: 21 This Week
    Last Update:
    See Project
  • 3
    migrid

    migrid

    A grid middleware with minimal user and resource requirements

    Minimum intrusion Grid (MiG) is an attempt to design a new platform for Grid computing which is driven by a stand-alone approach to Grid, rather than integration with existing systems. The goal of the MiG project is to provide Grid infrastructure where the requirements on users and resources alike is as small as possible (minimum intrusion). MiG strives for minimum intrusion but will seek to provide a feature rich and dependable Grid solution. In line with the minimum intrusion concept,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    lixa

    lixa

    LIXA, LIbre XA, is a free and open source XA transaction manager

    ... technology enables every application container, like a web server or a shell, to become a two phase commit application server. The client/server architecture of LIXA allows many application containers to share a single LIXA (state) server: this is ideal when horizontal scalability is a must and many identical application containers must refer to a single transactional environment. LIXA can be used with the C, C++, Java, Python and COBOL programming languages.
    Downloads: 0 This Week
    Last Update:
    See Project
  • No-Nonsense Code-to-Cloud Security for Devs | Aikido Icon
    No-Nonsense Code-to-Cloud Security for Devs | Aikido

    Connect your GitHub, GitLab, Bitbucket, or Azure DevOps account to start scanning your repos for free.

    Aikido provides a unified security platform for developers, combining 12 powerful scans like SAST, DAST, and CSPM. AI-driven AutoFix and AutoTriage streamline vulnerability management, while runtime protection blocks attacks.
    Start for Free
  • 5
    Pholcus

    Pholcus

    Distributed high-concurrency crawler software written in pure golang

    Pholcus is a high-concurrency crawler software written in pure Go language that supports distributed, only used for programming learning and research. It supports three operating modes of stand-alone, server and client, and has three operating interfaces, Web, GUI, and command line; simple and flexible rules, concurrent batch tasks, and rich output methods (mysql/mongodb/kafka/csv/excel, etc.); In addition, it also supports horizontal and vertical grabbing modes, and a series of advanced...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Unified Sessions Manager

    Unified Sessions Manager

    Pioneering Private and Public Cloud Management since 2008

    The UnifiedSessionsManager supports the integrated management of user sessions within Private-Clouds, comprising heterogeneous IT landscapes of various physical and virtual machines, hypervisor management, and virtual user sessions with remote desktops. Extracted documents see https://sourceforge.net/projects/ctys-doc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7

    Ganglia

    Scalable, distributed monitoring system for high-performance computing

    Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. Supports clusters up to 2000 nodes in size.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 8
    GloVe

    GloVe

    GloVe model for distributed word representation

    ... are made available under the Public Domain Dedication and License. If the web datasets above don't match the semantics of your end use case, you can train word vectors on your own corpus. The demo.sh script downloads a small corpus, consisting of the first 100M characters of Wikipedia. It collects unigram counts, constructs and shuffles cooccurrence data, and trains a simple version of the GloVe model. It also runs a word analogy evaluation script in python to verify word vector quality.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    The goal of this project is to make possible to access Progress database from any external program that can use sockets. The server (broker and agents) are written in Progress 4GL and made use of sockets capabilities of Progress V9.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Deliver secure remote access with OpenVPN. Icon
    Deliver secure remote access with OpenVPN.

    Trusted by nearly 20,000 customers worldwide, and all major cloud providers.

    OpenVPN's products provide scalable, secure remote access — giving complete freedom to your employees to work outside the office while securely accessing SaaS, the internet, and company resources.
    Get started — no credit card required.
  • 10

    ReorJS

    Distributed Computing with JavaScript

    Create your own distributed computer that can distributed javascript based applications to any computer with a web browser, headless browser or node.js installation. For more information and updates please see our website - http://reorjs.com.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    TaskManager

    TaskManager

    TaskManager manages calculation jobs in a computer cluster environment

    TaskManager is an open source infrastructure software for distributing and managing calculation jobs in a Unix computer cluster environment. The TaskManager was designed to control the utilization of a set of hosts even if you are not the administrator of the system. The hosts are embedded in a Unix environment and the user's home directories are mounted on each host. The hosts may have different numbers of CPUs/cores and different kernels. Keep in mind that a user is able to log into each...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12

    RainforestCluster

    Dynamically manage Amazon EC2 clusters

    RainforestCluster is an Amazon EC2 python program that manages and load-balances dynamic clusters to allow for maximum workflow flexibility and speed at minimal cost. It enables one to quickly and cheaply create dynamic compute clusters in the cloud, which can then run computational pipelines generically. It is also able to optimize the use of spot instances - idle computers in Amazon's cloud that are available at drastically reduced cost (5x-10x cheaper) - but can be terminated at any moment...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    Ganglia Job Monarch

    Batch system monitoring and archiving

    Job Monarch is an addon to the Ganglia Monitoring System that provides batch job monitoring and archiving plus a graphical overview of clusters and assorted batch systems. Fully supported batch system: Torque, PBS and SLURM. Experimental: LSF, SGE
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Portable Linux

    Portable Linux

    Portable Ubuntu Linux for Scientific Computing

    Released August 22, 2013 Lubuntu Blends: Biochemistry 13.04 (Raring) v5.44 Linux Kernel Image 3.8.0-29 Lubuntu Blends are pre-installed Wubi disk image remixes of Ubuntu and Debian Science meta packages, A custom boot loader allows installations to be copied and automatically booted from most external or USB flash drives. Once up and running, use earlier Lubuntu Remix README instructions here until documentation is updated....
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    WatchTower

    WatchTower

    WatchTower is a cloud server monitoring and management tool

    WatchTower is a cloud server monitoring and management tool. This is actually a suite of tools that includes a dashboard and associated RESTful web services required for managing the servers and services. The dashboard uses PHP/MySQL (requires php5+), html, and css. It's all open source and very easy to work with and make changes. The client I'm using is included and is written in python. Currently tested with Python 2.4 and 2.6, but should work with any version. I'm not using anything special...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Ex-Crawler
    Ex-Crawler is divided into 3 subprojects (Crawler Daemon, distributed gui Client, (web) search engine) which together provide a flexible and powerful search engine supporting distributed computing. More informations: http://ex-crawler.sourceforge.net
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    A metering and charge-back mechanism based on real time pricing (RTP) for the cloud services. This project caters to the need of a dynamic pay-per-use model in the cloud environment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    HaDeS is a deployment system for large scale installation. Designed to be scalable with respect the number of nodes and agnostic with respect the OS deployed, it has a Web Service interface that allows an easy integration in complex SOA systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Spyse is a software framework for building multi-agent systems. It allows Python developers to build distributed intelligent systems of multiple cooperative agents based on FIPA, OWL, SOA and many others. Spyse is designed for ease-of-use and fun.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    This project provides a fast distributed system for image processing, written in Python. It aim is to be used as service to PHP, Perl and Python application servers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    Because we have huge customer demand that we are working hard to support, we have not had the time to make contributions to the open source community in recent months.  But this is only a temporary situation – we will be back!
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    A highly modular client remote/web services library written in Python supporting multiple protocols and transports through a unified interface. All modules are as independent as possible from each other to ensure high re-usability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    IDEAIS is a enteprise service bus integration plataform for software development tools and activities. It uses Web Services (SOAP/HTTP) to integrate best of the breed software development tools (Eclipse, Subversion, Bugzilla, dotProject, vTiger).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Universal information crawler is a fast precise and reliable Internet crawler. Uicrawler is a program/automated script which browses the World Wide Web in a methodical, automated manner and creates the index of documents that it accesses.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    AKIRA aims to create a C++ development framework to build cognitive architectures and complex artificial intelligent agents.Features:KQML,Fuzzy Logic,Neural Net,Fuzzy Cognitive Maps and DIPRA (a distributed BDI - Belief Desire Intention goals model)
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.