Showing 921 open source projects for "python web crawler"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Zero Install
    Zero Install is a decentralised cross-distribution software installation system. Create one package that works everywhere! With dependency handling and automatic updates, full support for shared libraries, and integration with native package managers
    Leader badge
    Downloads: 2,405 This Week
    Last Update:
    See Project
  • 2
    googler

    googler

    Google Search, Google Site Search, Google News from the terminal

    googler is a power tool to Google (Web & News) and Google Site Search from the command-line. It shows the title, URL and abstract for each result, which can be directly opened in a browser from the terminal. Results are fetched in pages (with page navigation). Supports sequential searches in a single googler instance. googler was initially written to cater to headless servers without X. You can integrate it with a text-based browser. However, it has grown into a very handy and flexible...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    TimothyDocs

    TimothyDocs

    Timothy is a cloud base storage system designed to document your work

    Timothy is a cloud based documentation system. Timothy will document any endeavor because it will store not only the documents created during the project but also store information about those files. Like most storge schemes timothy creates a hierarchy of categories through which one may browse. Timothy displays information about the document or category as well as its name. This use of meta data explains the structure and content of the project to the user as he browses. Users...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4

    PHP mini vulnerability suite

    Multiple server/webapp vulnerability scanner

    github: https://github.com/samedog/phpmvs
    Leader badge
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    Decentralized Internet

    Decentralized Internet

    SDK for building decentralized web and distributed computing projects

    This project was created in order to support a new internet. One that is more open, free, and censorship-resistant in comparison to the old internet. An internet that eventually wouldn't need to rely on telecom towers, an outdated grid, or all these other "old school" forms of tech. We believe P2P compatibility is an important part of the future of the net. Grid Computing also plays a role in having a better means of transferring information in a speedy, more cost-efficient and reliable manner.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    LymPHOS2

    LymPHOS2

    LymPHOS2 Web-App

    LymPHOS2 is a web-based Application at www.LymPHOS.org containing peptidic and protein sequences and spectrometric information on the PhosphoProteome of human T-Lymphocytes. - Nguyen, TD., Vidal-Cortes, O., Gallardo, Ó., Abian, J., Carrascal, M., LymPHOS 2.0: an update of a phosphosite database of primary human T cells. Database 2015, 2015. DOI: 10.1093/database/bav115 - Carrascal, M., Ovelleiro, D., Casas, V., Gay, M., Abian, J., Phosphorylation analysis of primary human T lymphocytes...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    istSOS

    istSOS

    Free and Open Source Sensor Observation Service Data Management System

    istSOS is an OGC SOS server implementation written in Python. istSOS allows for managing and dispatch observations from monitoring sensors according to the Sensor Observation Service standard. The project provides also a Graphical user Interface that allows for easing the daily operations and a RESTful Web api for automatizing administration procedures. istSOS is released under the GPL License, and runs on all major platforms (Windows, Linux, Mac OS X), even though tests were conducted under a Linux environment.
    Downloads: 58 This Week
    Last Update:
    See Project
  • 8
    SFM2Web reads text and database files encoded with SFMs (Standard Format Markers) and then generates a web site according to flags specified in control files. This is useful for web publication of MDF lexicons, USFM Bible books, texts, phrasebooks, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    CountBookmarks

    CountBookmarks

    Makes a detailed count of your browser bookmarks by folder

    This simple program performs a detailed count of exported web browser bookmarks by folder. Its output file can be imported into a spreadsheet and sorted to show the relative size of all your bookmark folders.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    BotSlayer

    BotSlayer

    BotSlayer Community Edition

    BotSlayer is an application that helps track and detect potential manipulation of information spreading on Twitter. The tool is developed by the Observatory on Social Media at Indiana University --- the same lab that brought to you Botometer and Hoaxy. BotSlayer is not a tool to detect and remove likely social bots from your list of Twitter followers or friends. For that purpose, check out Botometer. If you just want to visualize the spread of some piece of information, consider Hoaxy....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11

    pyindi-client

    Python binding to the libindi library

    ...PyQt applications may also be built on top of IndiClient, thus allowing rapid development of GUI Indi clients. Besides Python there are also bindings for node.js, Tcl (incomplete) and PHP (not useful). As application examples you will find a Python Websocket server with which you may build a web application interacting with Indi servers, and a simple PyQt application similar to the Kstars Indi Control Panel (was built as an exercise). Finally there is an equatorial mount 3D simulator written with Freecad and Python, planned to be connected with the PyIndi module. *** The pyindi-client binding has moved to github...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    AET

    AET

    Detects visual changes on websites and performs page health checks

    AET is a system that detects visual changes on websites and performs basic page health checks (like w3c compliance, accessibility, HTTP status codes, JS Error checks and others). AET is designed as a flexible system that can be adapted and tailored to the regression requirements of a given project. The tool has been developed to aid front-end client-side layout regression testing of websites or portfolios, in essence assessing the impact or change of a website from one snapshot to the next.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 13
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    YouTube Video Downloader

    YouTube Video Downloader

    Allows you to download youtube videos into a video/audio format.

    YouTube Video Downloader By Chase, This is a tool developed in python, by web scraping I can get the videos from YouTube and download it on my machine in a video/audio format, easy-to-use GUI for your needs, dark theme.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    django-dynamic-scraper

    django-dynamic-scraper

    Creating Scrapy scrapers via the Django admin interface

    Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface. With Django Dynamic Scraper (DDS) you can define your Scrapy scrapers dynamically via the Django admin interface and save your scraped items in the database you defined for your Django project. Since it simplifies things DDS is not usable for all kinds of scrapers, but...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Jupyter Server Proxy

    Jupyter Server Proxy

    Jupyter notebook server extension to proxy web services.

    Jupyter Server Proxy lets you run arbitrary external processes (such as RStudio, Shiny Server, Syncthing, PostgreSQL, Code Server, etc) alongside your notebook server and provide authenticated web access to them using a path like /rstudio next to others like /lab. Alongside the Python package that provides the main functionality, the JupyterLab extension (@jupyterhub/jupyter-server-proxy) provides buttons in the JupyterLab launcher window to get to RStudio for example.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Transcrypt

    Transcrypt

    Python in the Browser

    Lean and mean Python 3.6 to JavaScript compiler. Supports multiple inheritance, operator overloading and Python source level debugging, even of minified Javascript files. Transcrypt code is as fast and compact as its Javascript counterpart, and it is precompiled for page load speed. You can now develop your web applications completely in Python, with full access to any Javascript library.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    gdpr

    Tool to maintain gdpr data protection declaration

    Admins often maintain multiple web pages, each of which under EU-GDPR requires a privacy statement. In order to keep them coherent, up-to-date and at the same time avoiding doing the same work multiple times, this project provides a tool to automatically create the appropriate statements for each page from a single source. The project is currently available in PHP, however if anyone is willing to provide a version in Python or Perl or whatever, it is more than welcome. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Twitter Intelligence

    Twitter Intelligence

    Twitter Intelligence OSINT project performs tracking and analysis

    A project written in Python for Twitter tracking and analysis without using Twitter API. This project is a Python 3.x application. The package dependencies are in the file requirements.txt. Run that command to install the dependencies. SQLite is used as the database. Tweet data is stored on the Tweet, User, Location, Hashtag, HashtagTweet tables. The database is created automatically. analysis.py performs analysis processing. User, hashtag, and location analyzes are performed. You must write...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    pyspider

    pyspider

    A powerful Spider(Web Crawler) system in Python

    pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    crawler4j

    crawler4j

    Open source web crawler for Java

    crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23

    PiHass

    Pre-defined and easy to use Home-Assistant Image for raspberry pi

    This is a Raspbain Strech base image with Home-Assistant on it. i used Virtual Env based installation and added some Custom Ui and Custom Components. i have also configured MySQL server and database and also some scripts, sensors and groups to help users start working with the system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Holarse

    Holarse

    website software for holarse

    HolaCMS 3 Source Code which will power the new Holarse website.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    survol

    RDF-based framework monitoring business systems activity

    A Python agent and a web interface aiming to help the analysis and investigation of a legacy application. A set of machines, processes, databases, programs etc ... all communicating with each other, manipulating your data, and whose software architecture has become, with time, complicated, difficult to understand, and undocumented. Data are aggregated with an RDF inference engine, creating a global vision of the business information processing.
    Downloads: 0 This Week
    Last Update:
    See Project