Showing 80 open source projects for "extensible web spider"

View related business solutions
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 1
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    Eclipse Jetty Canonical Repository

    Eclipse Jetty Canonical Repository

    Eclipse Jetty - Web Container & Clients - supports HTTP/2, HTTP

    Jetty provides a web server and servlet container, additionally providing support for HTTP/2, WebSocket, OSGi, JMX, JNDI, JAAS and many other integrations. These components are open source and are freely available for commercial use and distribution. Jetty is used in a wide variety of projects and products, both in development and production. Jetty has long been loved by developers due to its long history of being easily embedded in devices, tools, frameworks, application servers, and modern...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Lobo Evolution - Java Web Browser

    Lobo Evolution - Java Web Browser

    Lobo Evolution is an extensible all-Java web browser and RIA platform

    Lobo Evolution is a fork of Lobo Browser. The project continuing the work of Lobo Browser(lobochief). Lobo Evolution is an extensible all-Java web browser and RIA platform. It supports HTML 4, HTML5 Javascript, CSS 3 and Java (Swing) rendering. CobraEvolution is the web browser's renderer API; also a Javascript-aware HTML parser. Lobo Evolution 5.0 relesed CHANGELOG: https://github.com/LoboEvolution/LoboEvolution/releases Read wiki: https://loboevolution.github.io/LoboEvolution/project-info.html Javadoc site: https://oswetto.github.io/LoboEvolution Now you can fork the project and help me with code. ...
    Leader badge
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    TJWS is an Open Source HTTP Server and Servlet container written in 100% Java. It's designed to be a light weight, high performing, secure, embeddable, extensible and flexible. Very small footprinted (~100K), CGI, J2EE/JSP compatible. Servlet spec 3.1
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    WFDownloader App

    WFDownloader App

    Free batch downloader for image, wallpaper, video, audio, document,

    Use as an image gallery, wallpaper, audio/music, video, document, and other media bulk downloader from supported websites. Also use to download sequential website urls that have a certain pattern (e.g. image01.png to image100.png). Also use app's built-in site crawler for advanced link search or extraction. There is also special support for forum media downloading, forum thread offline archiving, rss feed downloading, and open directory downloading. It's a programmable downloader and also...
    Leader badge
    Downloads: 358 This Week
    Last Update:
    See Project
  • 6
    LogicalDOC Document Management - DMS

    LogicalDOC Document Management - DMS

    smart and open source document management system

    LogicalDOC is both document management and collaboration system. The software is loaded with many functions and allows organizing, index, retrieving, controlling and distributing important business documents securely and safely for any organization and individual. Gone are the days when companies used paper-based processes such as printing, mailing and manual filing of paper documents; our document management system replaces all of this with electronic procedures that allow your...
    Leader badge
    Downloads: 120 This Week
    Last Update:
    See Project
  • 7
    Scribe
    Scribe is a CMS for the Liferay Portal framework. It includes Web Content Management as well as Learning Management System features.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    eXo Platform - Digital Workplace

    eXo Platform - Digital Workplace

    The open-source digital workplace for growing teams and enterprises.

    eXo Platform is an open-source digital workplace solutions for growing teams and enterprises, featuring: ✅ Internal Communications ✅ Team Collaboration ✅ Knowledge Management ✅ Productivity and Employee Recognition use cases. eXo stands out by: 👍 its fluid and integrated employe experience, on desktop and mobile 👍 the platform’s ease of use 👍 innovative employee engagement features. eXo Platform is developed on open-source technology and supports open...
    Downloads: 10 This Week
    Last Update:
    See Project
  • 9
    Crawlab

    Crawlab

    Distributed web crawler admin platform for spiders management

    Golang-based distributed web crawler management platform, supporting various languages including Python, NodeJS, Go, Java, PHP and various web crawler frameworks including Scrapy, Puppeteer, Selenium. Please use docker-compose to one-click to start up. By doing so, you don't even have to configure MongoDB database. The frontend app interacts with the master node, which communicates with other components such as MongoDB, SeaweedFS and worker nodes.
    Downloads: 4 This Week
    Last Update:
    See Project
  • $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • 10
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Cerberus Content Management System

    Cerberus Content Management System

    Cerberus Content Management System

    Cerberus Content Management System is a Monolithic and Modular Content Management System that is written in 100% Pure PHP code with 100% Pure HTML output, and it supports multiple Database Management Systems. Cerberus Content Management System source code is completely handwritten by the author(s). The CerberusCMS project is focused on data security and ease of use, therefore we have decided to make very little use of JavaScript in the PurePHP Releases. The still-secure, and...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    Zeleos
    The Zeleos Project aims to provide a bottom-up open source solution to develop Rich Internet Applications. The Zeleos Web Toolkit is a Web Rich Client Application Framework that was designed to be highly extensible and very easy to use.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    phoneutria
    A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14

    eXtensible Text Framework (XTF)

    Framework for search and display of heterogenous document collections.

    NOTICE: This code repository is deprecated. Please visit https://github.com/cdlib/xtf for the latest updates. Obsolete Description: The eXtensible Text Framework (XTF) is an architecture that supports searching across collections of heterogeneous textual data (XML, PDF, HTML, text, and more), and the presentation of results and documents in a highly configurable manner. Includes highly customized versions of the proven open-source components Lucene and Saxon.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Site monitoring

    Site monitoring

    Monitoring of websites with spider and email notifications

    Free website monitoring software, easy to set up and use for monitoring web sites. It is a web application programmed in Java programming language. You can monitor HTML pages, JSON and XML, pages in sitemap and even your whole web site using spider. Naturally you can check multiple websites. You can check HTTP result codes and even contents of the checked pages. Website checking is done periodically using build-in cron mechanism.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Pachyderm
    Pachyderm is a web-based rich-media interactive (flash) presentation authoring and publishing system that meets most accessibility requirements. It works off of an extensible template system. Released under Creative Commons General Public License.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    webStraktor is a programmable World Wide Web data extraction client. Its purpose is to scrape HTML based content via the HTTP protocol and extract relevant information. webStraktor features a scripting language to facilitate the collection, the extraction and the storage of information available on the web, including images. The scripting language uses elements of the Regular Expression and xPath syntax. The webStraktor scripting language has a small instruction set and its syntax is easy...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Tornado HTTP Server is a multi-threaded web server written in Java. It aims to be secure, efficient, and portable, and provide a full implementation of HTTP 1.1. Advanced features such as GZip output compression and web-based administration are planned.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 20
    Spider web scritto in java che consente un utilizzo sia come applicazione stand alone, sia come core di altre applicazioni che sfruttino le sue funzionalità.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ItSucks
    This project is a java web spider (web crawler) with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionalities are also available in a separate library.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    ABK (secure)SiteHoster
    ABK SiteHoster is aLEHNS (a Lightweight Extensible HTTP Network Server). Developed in pure Java. Currently supports HTTP v1.1 Protocol's subset. Adding features to make fully compliant. Aiming to be a full-fledged WebSite Server with all Web Service
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    An automated website testing framework. Includes a utility to spider a site to determine content and a variety of testing plugins to ensure the content complies to validity and accessibility. A report is then generated with the results of the test.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Other spiders has a limited link depth, follows links not randomized or are combined with heavy indexing machines. This spider will has not link depth limits, randomize next url, that will be checked for new urls.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Groovy + Facelets = Gracelets. This combination enables you to develop/prototype/live edit your JSF views, controllers and libraries in the groovy language.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • Next