Showing 107 open source projects for "web crawler source code"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    gain

    gain

    Asyncio-based Python framework for building fast web crawling spiders

    Gain is a Python web crawling framework designed to simplify the process of building efficient and scalable web scrapers. It is built on top of asynchronous technologies such as asyncio, aiohttp, and uvloop to support high-performance crawling with concurrent network requests. It provides a structured framework for creating spiders that can navigate websites, extract structured data, and process the collected results. Developers define crawlers using components such as spiders, parsers, and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Toapi

    Toapi

    Convert websites into structured APIs automatically with Python tool

    Toapi is a Python library designed to transform ordinary websites into usable API services. Instead of building a traditional web crawler that collects and stores data before exposing it through an API, Toapi simplifies the process by allowing developers to define data structures that automatically generate an API layer from existing web pages. It works by parsing HTML content from a source site and mapping selected elements into structured data that can be returned as JSON through API endpoints. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Holarse

    Holarse

    website software for holarse

    HolaCMS 3 Source Code which will power the new Holarse website.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    jd-autobuy

    jd-autobuy

    Python tool that automates JD.com login and product purchase tasks

    jd-autobuy is an open source Python-based automation tool designed to simulate the purchasing process on the JD e-commerce platform. It uses web scraping and HTTP request techniques to log into an account, check product availability, and attempt to purchase specified items automatically. It supports login through methods such as QR code authentication, allowing users to sign in through the platform’s mobile application.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    icemac.addressbook

    icemac.addressbook

    Multi user address book application accessable through the web.

    Multi user address book application accessable through the web. Store, edit, search and export addresses, phone numbers, … using a web browser. Code moved to https://bitbucket.org/icemac/icemac.addressbook Documentation see https://icemacaddressbook.readthedocs.io/en/latest/ New releases (after 6.0.2) see https://pypi.org/project/icemac.addressbook/#history
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    NTK RTMP SERVER

    NTK RTMP SERVER

    Naam Tamilar Web TV Live Streamer

    Naam Tamilar RTMP Server This project updated as open source for future use of Naam Tamilar Political Party. To contribute to the party and in some case if there is any possibilities if i cannot support them for long term. I thought of sharing this source code so in future it may be helpful for the community and party in which other software developers can help them to upgrade. This source is forked from - https://github.com/arut/nginx-rtmp-module and modified with multiple broadcast...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7

    PyCancerDB

    Cancer Proteomics Database display and management

    PyCancerDB is a source code distribution providing a Web-based interface for browsing and updating the Cancer Proteomics Database, together with scripts for maintaining the database.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8

    sitecheck

    Modular web site spider for web developers.

    More than just a link checker, sitecheck is a website spider (also known as a crawler) which can assist with SEO by testing an entire site plus both inbound links from search engines and outbound links to other sites for the following issues: looping redirects (HTTP 301/302), broken links (HTTP 404), server errors (HTTP 500), spelling mistakes, low readability scores (using the Flesch Reading Ease test), missing/empty/duplicate meta tags, duplicate content, slow page speed, W3C validation...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9

    Domain Analyzer Security Tool

    Finds all the security information for a given domain name

    Domain analyzer is a security analysis tool which automatically discovers and reports information about the given domain. Its main purpose is to analyze domains in an unattended way.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • 10
    Felix Felicis

    Felix Felicis

    Felix Felicis (aka liquidluck) is a static blog generator in python

    LiquidLuck is an open-source static site generator powered by Ruby, originally built to enable fast, flexible blog and documentation sites with modern features such as Markdown authoring, custom taxonomies, tags, collections, and theming. It is similar in purpose to Jekyll, Hugo or Hexo, but emphasizes ease of customization and rich templating, allowing authors to define custom content models and build plug-in extensions. The generator supports live previews and incremental builds so content...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Framework (scripts, configuration, code) to build free and public services around travel and leisure data. That project makes an extensive use of already existing data sources such as Geonames and dbPedia, and adds some glue around those (eg, links).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    UpStage
    WE ARE NO LONGER USING SOURCEFORGE. Please visit http://www.upstage.org.nz for the most up-to-date code (v3 to be released january 2014, beta version available November 2013) and information. UpStage is a web-based venue for cyberformance: artists compile digital media in real time to create live theatrical performance for online audiences.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    Web Crawler Security Tool

    A web crawler oriented to information security.

    Last update on tue mar 26 16:25 UTC 2012 The Web Crawler Security is a python based tool to automatically crawl a web site. It is a web crawler oriented to help in penetration testing tasks. The main task of this tool is to search and list all the links (pages and files) in a web site. The crawler has been completely rewritten in v1.0 bringing a lot of improvements: improved the data visualization, interactive option to download files, increased speed in crawling, exports list of...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    This is where web developers can get tools that can make their life easier. Web technologies and languages used contain but are not limited to HTML, XHTML, CSS, JavaScript, PHP, and AJAX. All code is extremely slim, fast running, and is W3C compliant.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    This is a "high-end" wiki system based on the Django framework for Python. Intention of this project was to write a transparent wiki system which allows direct editting of HTML on wiki pages. Notice: Recent code is available on the Launchpad(!) project page: https://launchpad.net/aintnowiki/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    The web lint checks HTML and XHTML pages for possible markup problems. It attempts to find problems with your code that an HTML validator does not.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    PTML is a Python module which lets you embed Python code in text documents. Its most common application is dynamic content generation on web servers, however it can be used anywhere you need to generate text files on-the-fly.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    zetadb is a python/zope tool that allows a rapid application development of relational database oriented web applications. It generates transladable and user friendly applications to maintain data over the web. It also implements OpenOffice integration
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    PyH
    A powerful python module that lets you output HTML code from within a python script in a very efficient and convenient fashion. Code your web-page like a GUI! Create tags and modify their attributes at anytime during your script. http://pyh/googlecod
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Cheetah is a template engine and code generation tool, written in Python. Web development is its principle use, but Cheetah is very flexible and is also being used to generate C++ game code, Java, sql, form emails and even Python code.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    elk is a powerful open-source python based command-line web crawler that can recursively search for files and text on websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    "Filtered communication" is the source code for a website which facilitates collaborative filtering of information on the internet. Users can create "filters", criteria which are defined in English. Activity mode (http://bayleshanks.com/pamv1): aslee
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Framework for the development of new Content Types in Zope/CMF/Plone. Schema driven automatic form generation, simple integration with rich content types, and a lower entry bar to the complex requirements Zope places on new content objects.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 24
    Txt2tags converts a text file with minimal markup to HTML, XHTML, SGML, LaTeX, Lout, UNIX Man Page, Wikipedia, Google Code Wiki, DokuWiki, MoinMoin, MagicPoint(mgp), PageMaker. Features: simple, fast, automatic TOC, macros, filters, include, GUI/CLI/
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    A toolkit of nitty-gritty classes from real-life projects. Contains generic snippets along with things like server-side DOM implementation or RSA or code generation tools.
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB