Showing 1049 open source projects for "python web crawler"

View related business solutions
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    CountBookmarks

    CountBookmarks

    Makes a detailed count of your browser bookmarks by folder

    This simple program performs a detailed count of exported web browser bookmarks by folder. Its output file can be imported into a spreadsheet and sorted to show the relative size of all your bookmark folders.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    AET

    AET

    Detects visual changes on websites and performs page health checks

    AET is a system that detects visual changes on websites and performs basic page health checks (like w3c compliance, accessibility, HTTP status codes, JS Error checks and others). AET is designed as a flexible system that can be adapted and tailored to the regression requirements of a given project. The tool has been developed to aid front-end client-side layout regression testing of websites or portfolios, in essence assessing the impact or change of a website from one snapshot to the next.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    X-RAY

    X-RAY

    The next web scraper, see through the <html> noise

    Supports strings, arrays, arrays of objects, and nested object structures. The schema is not tied to the structure of the page you're scraping, allowing you to pull the data in the structure of your choosing. The API is entirely composable, giving you great flexibility in how you scrape each page. Paginate through websites, scraping each page. X-ray also supports a request delay and a pagination limit. Scraped pages can be streamed to a file, so if there's an error on one page, you won't...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4

    pyindi-client

    Python binding to the libindi library

    ... there are also bindings for node.js, Tcl (incomplete) and PHP (not useful). As application examples you will find a Python Websocket server with which you may build a web application interacting with Indi servers, and a simple PyQt application similar to the Kstars Indi Control Panel (was built as an exercise). Finally there is an equatorial mount 3D simulator written with Freecad and Python, planned to be connected with the PyIndi module. *** The pyindi-client binding has moved to github. ***
    Downloads: 0 This Week
    Last Update:
    See Project
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    Requests-HTML

    Requests-HTML

    Pythonic HTML Parsing for Humans

    This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When using this library you automatically get full JavaScript support! (Using Chromium, thanks to puppeteer) CSS Selectors (a.k.a jQuery-style, thanks to PyQuery). XPath Selectors, for the faint of heart. Mocked user-agent (like a real web browser). Automatic following of redirects. Connection–pooling and cookie persistence. The Requests experience you know and love, with magical parsing...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    django-dynamic-scraper

    django-dynamic-scraper

    Creating Scrapy scrapers via the Django admin interface

    Django Dynamic Scraper (DDS) is an app for Django build on top of the scraping framework Scrapy. While preserving many of the features of Scrapy it lets you dynamically create and manage spiders via the Django admin interface. With Django Dynamic Scraper (DDS) you can define your Scrapy scrapers dynamically via the Django admin interface and save your scraped items in the database you defined for your Django project. Since it simplifies things DDS is not usable for all kinds of scrapers, but...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Rendora

    Rendora

    dynamic server-side rendering using headless Chrome

    Rendora is a dynamic renderer to provide zero-configuration server-side rendering mainly to web crawlers in order to effortlessly improve SEO for websites developed in modern Javascript frameworks such as React.js, Vue.js, Angular.js, etc. Rendora works totally independently of your frontend and backend stacks. Rendora can be seen as a reverse HTTP proxy server sitting between your backend server (e.g. Node.js/Express.js, Python/Django, etc...) and potentially your frontend proxy server (e.g...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Jupyter Server Proxy

    Jupyter Server Proxy

    Jupyter notebook server extension to proxy web services.

    Jupyter Server Proxy lets you run arbitrary external processes (such as RStudio, Shiny Server, Syncthing, PostgreSQL, Code Server, etc) alongside your notebook server and provide authenticated web access to them using a path like /rstudio next to others like /lab. Alongside the Python package that provides the main functionality, the JupyterLab extension (@jupyterhub/jupyter-server-proxy) provides buttons in the JupyterLab launcher window to get to RStudio for example.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    YouTube Video Downloader

    YouTube Video Downloader

    Allows you to download youtube videos into a video/audio format.

    YouTube Video Downloader By Chase, This is a tool developed in python, by web scraping I can get the videos from YouTube and download it on my machine in a video/audio format, easy-to-use GUI for your needs, dark theme.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Cloud SQL for MySQL, PostgreSQL, and SQL Server Icon
    Cloud SQL for MySQL, PostgreSQL, and SQL Server

    Focus on your application, and leave the database to us

    Fully managed, cost-effective relational database service for PostgreSQL, MySQL, and SQL Server. Try Enterprise Plus edition for a 99.99% availability SLA and category-leading performance.
    Try it for free
  • 10
    Twitter Intelligence

    Twitter Intelligence

    Twitter Intelligence OSINT project performs tracking and analysis

    A project written in Python for Twitter tracking and analysis without using Twitter API. This project is a Python 3.x application. The package dependencies are in the file requirements.txt. Run that command to install the dependencies. SQLite is used as the database. Tweet data is stored on the Tweet, User, Location, Hashtag, HashtagTweet tables. The database is created automatically. analysis.py performs analysis processing. User, hashtag, and location analyzes are performed. You must write...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Grab Framework Project

    Grab Framework Project

    Web Scraping Framework

    Grab is a python framework for building web scrapers. With Grab you can build web scrapers of various complexity, from simple 5-line scripts to complex asynchronous website crawlers processing millions of web pages. Grab provides an API for performing network requests and for handling the received content e.g. interacting with DOM tree of the HTML document. The single request/response API that allows you to build network request, perform it and work with the received content. The API is built...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    pyspider

    pyspider

    A powerful Spider(Web Crawler) system in Python

    pyspider is a powerful Spider(Web Crawler) system in Python. Components are connected by message queue. Every component, including message queue, is running in their own process/thread, and replaceable. That means, when process is slow, you can have many instances of processor and make full use of multiple CPUs, or deploy to multiple machines. This architecture makes pyspider really fast. benchmarking. Since pyspider has various components, you can just run pyspider to start a standalone...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    crawler4j

    crawler4j

    Open source web crawler for Java

    crawler4j is an open source web crawler for Java which provides a simple interface for crawling the Web. Using it, you can setup a multi-threaded web crawler in few minutes. You need to create a crawler class that extends WebCrawler. This class decides which URLs should be crawled and handles the downloaded page. shouldVisit function decides whether the given URL should be crawled or not. In the above example, this example is not allowing .css, .js and media files and only allows pages within...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Transcrypt

    Transcrypt

    Python in the Browser

    Lean and mean Python 3.6 to JavaScript compiler. Supports multiple inheritance, operator overloading and Python source level debugging, even of minified Javascript files. Transcrypt code is as fast and compact as its Javascript counterpart, and it is precompiled for page load speed. You can now develop your web applications completely in Python, with full access to any Javascript library.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15

    gdpr

    Tool to maintain gdpr data protection declaration

    Admins often maintain multiple web pages, each of which under EU-GDPR requires a privacy statement. In order to keep them coherent, up-to-date and at the same time avoiding doing the same work multiple times, this project provides a tool to automatically create the appropriate statements for each page from a single source. The project is currently available in PHP, however if anyone is willing to provide a version in Python or Perl or whatever, it is more than welcome. The project...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 16
    OpenSearchServer Search Engine

    OpenSearchServer Search Engine

    An open source search engine with RESTFul API and crawlers

    OpenSearchServer is a powerful, enterprise-class, search engine program. Using the web user interface, the crawlers (web, file, database, etc.) and the client libraries (REST/API , Ruby, Rails, Node.js, PHP, Perl) you will be able to integrate quickly and easily advanced full-text search capabilities in your application: Full-text with basic semantic, join queries, boolean queries, facet and filter, document (PDF, Office, etc.) indexation, web scrapping,etc. OpenSearchServer runs on Windows...
    Downloads: 20 This Week
    Last Update:
    See Project
  • 17

    blog99

    A blog engine that does html and gopher

    This is the blog engine for HTML and Gopher. Blog entries are written as html files. For HTML, it is an Apache/MySQL/Python application using WSGI. For Gopher, it is Gophernicus/MySQL/Python using CGI.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    PiHass

    Pre-defined and easy to use Home-Assistant Image for raspberry pi

    This is a Raspbain Strech base image with Home-Assistant on it. i used Virtual Env based installation and added some Custom Ui and Custom Components. i have also configured MySQL server and database and also some scripts, sensors and groups to help users start working with the system.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Holarse

    Holarse

    website software for holarse

    HolaCMS 3 Source Code which will power the new Holarse website.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20

    survol

    RDF-based framework monitoring business systems activity

    A Python agent and a web interface aiming to help the analysis and investigation of a legacy application. A set of machines, processes, databases, programs etc ... all communicating with each other, manipulating your data, and whose software architecture has become, with time, complicated, difficult to understand, and undocumented. Data are aggregated with an RDF inference engine, creating a global vision of the business information processing.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    icemac.addressbook

    icemac.addressbook

    Multi user address book application accessable through the web.

    Multi user address book application accessable through the web. Store, edit, search and export addresses, phone numbers, … using a web browser. Code moved to https://bitbucket.org/icemac/icemac.addressbook Documentation see https://icemacaddressbook.readthedocs.io/en/latest/ New releases (after 6.0.2) see https://pypi.org/project/icemac.addressbook/#history
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Perl Web Scraping Project

    Perl Web Scraping Project

    Perl Web Scraping Project

    Web scraping (web harvesting or web data extraction) is data scraping used for extracting data from websites.[1] Web scraping software may access the World Wide Web directly using the Hypertext Transfer Protocol, or through a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23

    Offline Websites

    Website2Pdf application helps to get offline form of webpages.

    Favorite webpages can be made available offline as pdf files. Enter your favorite website url, with just one click pdf files will be created without loss of any css, styling of html. All the web files will be retained. Please make sure to use help button before you convert webpages to offline files.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24

    PortableWinPy

    Portable server for python web development

    Portable server for python web development
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    SpiderFoot

    SpiderFoot

    Open Source Intelligence Automation.

    SpiderFoot is an open source intelligence automation tool. Its goal is to automate the process of gathering intelligence about a given target, which may be an IP address, domain name, hostname or network subnet. SpiderFoot can be used offensively, i.e. as part of a black-box penetration test to gather information about the target or defensively to identify what information your organisation is freely providing for attackers to use against you.
    Leader badge
    Downloads: 194 This Week
    Last Update:
    See Project
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.