Showing 88 open source projects for "python web crawler"

View related business solutions
  • Simple, Secure Domain Registration Icon
    Simple, Secure Domain Registration

    Get your domain at wholesale price. Cloudflare offers simple, secure registration with no markups, plus free DNS, CDN, and SSL integration.

    Register or renew your domain and pay only what we pay. No markups, hidden fees, or surprise add-ons. Choose from over 400 TLDs (.com, .ai, .dev). Every domain is integrated with Cloudflare's industry-leading DNS, CDN, and free SSL to make your site faster and more secure. Simple, secure, at-cost domain registration.
    Sign up for free
  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • 1
    This is an ***old archive*** of tools developed for facilitating the use of Creative Commons licenses and metadata. --- For the most up to date representation of any of the projects listed here, please see: http://creativecommons.org/project/Developer.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 2
    elk is a powerful open-source python based command-line web crawler that can recursively search for files and text on websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Nucular Archiving System for creating full text indices for fielded data. Python API, web, and command line interfaces. Fast. Very light weight. Concurrent read/writes with no possible locking issues. No server process. Proximity. Facets. Funny name.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    A threaded Web graph (Power law random graph) generator written in Python. It can generate a synthetic Web graph of about one million nodes in a few minutes on a desktop machine. It implements a threaded variant of the RMAT algorithm.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    "Filtered communication" is the source code for a website which facilitates collaborative filtering of information on the internet. Users can create "filters", criteria which are defined in English. Activity mode (http://bayleshanks.com/pamv1): aslee
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    pyTube is a python-based commandline YouTube search. One can search for videos and display them in their default web browser. Requires python 2.5 and gdata.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    This project implements a Python API for the Yahoo Search Webservices API. pYsearch is an OO abstraction of the web services, with emphasis on ease of use and extensibility.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Cortez, for create new news service model for RSS and blogging. Cortez will just offer the environment to create post, read news thru RSS(ATOM) and syndicate within the multiple blogs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    FTP crawler is designed to provide an easy web interface to searching files on the FTP and a crawler to index files on FTP servers.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    This project aims to provide an offline version of wikipedia, available from the web browser.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    DiskAt is disk/media catalogue app supporting multiple categories per item, good search and features which allow to use it as Movie/DVD/etc database. Written with PHP/Python/SQLite.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    This is a Python script to parse your irssi logs and input them into a MySQL database which you can then use to search and display your logs on the web. It incrementally updates the database from the logs and is ideally run as a cronjob often.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Milim fetches the lyrics for your Hebrew songs from the web. The project features plugins for various media-players.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    Ruya is a Python-based breadth-first, level-, delayed, event-based-crawler for crawling English, Japanese websites. It is targeted solely towards developers who want crawling functionality in their projects using API, and crawl control.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    zSearch is a simple python based crawler and search engine. Raw HTML are stored in bzip2 archives, the index is created using pylucene, and twsited is used to provide internal http server. Results are sent back as XML over HTTP.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    metamax_en is a quite simple but very usefull webtool to generate HTML-Meta-Tags. It can be used to improve the search-relevance of your own page. Also you can place it as a free tool in your download-area. See: http://www.eudict.eu/metamax_en.html
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    This plug-in for Google Desktop is a simple web spider (Könguló is Icelandic for spider) that crawls websites you specify, e.g. intranet websites, and dumps them into Google Desktop. You must install Google Desktop prior to installing the plug-in.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    imgSeekWeb is based on imgSeek project.The final goal is a distributed server side content-based image search engine.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    CLOD: CL On Demand. Caches CraigsList in your database for freetext searching and alerts.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Each user can run their own threaded search engine and contribute to a global search database searching only the sites they want. It is built using Turbogears.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    A configurable knowledge management framework. It works out of the box, but it's meant mainly as a framework to build complex information retrieval and analysis systems. The 3 major components: Crawler, Analyzer and Indexer can also be used separately.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Eligante is a software for archivation, management and browsing (with full-text search functions) of all your communications, be it via email, chat (IRC, ICQ, MSN,...) and even messaging websites (hi5, orkut,...).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    A drop-in framework for adding tagging (folksonomy) capabilities to existing applications
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Fast SMB Search is a search engine for local SMB-based networks (e.g Windows networks). It's key feature is the ability to quickly search for a file in a large network. Also supports FTP search, so project name is not strict
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    A web-based search interface tailored to the New Zealand Gazette PDF archive for the NZ library community. A generic Python-based Swish-e search interface.
    Downloads: 0 This Week
    Last Update:
    See Project