Showing 19 open source projects for "web archive extractor"

View related business solutions
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby and MySQL Database - Written in Java Cross Platform Also See Free email Sender : https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    newpipeextractor

    newpipeextractor

    Library for extracting streaming site data without official APIs

    ...It handles many low-level tasks involved in web data extraction, including parsing responses, managing platform-specific logic, and handling errors, allowing developers to focus on implementing application features rather than scraping mechanics. Each supported service is implemented through its own extractor components that conform to a common interface, enabling consistent access to data across different platforms.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    ResCarta

    ResCarta

    Archive your personal history

    ResCarta Toolkit offers an open source solution to creating, storing, viewing, and searching digital collections. Applications in the toolkit let users create and edit metadata, convert data to open standard ResCarta format, index and host collections.
    Leader badge
    Downloads: 38 This Week
    Last Update:
    See Project
  • 4
    Web Spider, Web Crawler, Email Extractor

    Web Spider, Web Crawler, Email Extractor

    Free Extracts Emails, Phones and custom text from Web using JAVA Regex

    In Files there is WebCrawlerMySQL.jar which supports MySql Connection Please follow this link to get latest version https://sourceforge.net/projects/web-spider-web-crawler-extract/ Free Web Spider & Crawler. Extracts Information from Web by parsing millions of pages. Store data into Derby OR MySQL Database and data are not being lost after force closing the spider. - Free Web Spider , Parser, Extractor, Crawler - Extraction of Emails , Phones and Custom Text from Web - Export to Excel File - Data Saved into Derby Database - Written in Java Cross Platform See also Free Email Sender in this link: https://sourceforge.net/projects/gitst-free-email-ender/ Please install Microsoft OpenJDK to start the application https://www.microsoft.com/openjdk
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    MyCoRe

    MyCoRe

    your repository framework

    MyCoRe is an Open Source project for the development of Repositories, Digital Library and archive solutions. The technical base of the system is formed of Java class libraries, XML technology and different database backends. Since 2015 we use https://mycore.atlassian.net/ for bug tracking. Please use our ticket system there.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Piggydb

    Piggydb

    Piggydb helps you have more fun with knowledge creation.

    Piggydb is a flexible and scalable knowledge building platform that supports a heuristic or bottom-up approach to discover new concepts or ideas based on your input. You can begin with using it as a flexible outliner, diary or notebook, and as your database grows, Piggydb helps you to shape or elaborate your own knowledge. Piggydb is a Web application provided as a self-contained package that contains a Web server and database engine. With Piggydb, you can create highly structured content...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    The archive-crawler project is building Heritrix: a flexible, extensible, robust, and scalable web crawler capable of fetching, archiving, and analyzing the full diversity and breadth of internet-accesible content.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 8
    HttpdBase4J is an embeddable Java web server framework that supports HTTP, HTTPS, templated content and serving content from inside an archive.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    DoxMentor4J is a standalone cross platform Web/Ajax based documentation library that is fully searchable and may be hosted in the file system, in an archive or embedded in the Java classpath.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 10
    A tool that can extract data from the Calibre SQLite database, including a command line tool that generates OPDS catalogs. In everyone's words, it takes the metadata out of Calibre, and generates catalogs for Stanza, Aldiko and web browsers.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    This is an ***old archive*** of tools developed for facilitating the use of Creative Commons licenses and metadata. --- For the most up to date representation of any of the projects listed here, please see: http://creativecommons.org/project/Developer.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 12
    RSS EXTRACTOR is a java library for generating RSS newsfeeds considering the RSS web feeds from multiple websites. It extracts the best of newsfeed entries and a produces a RSS file which is a fusion of newsfeed entries from several websites.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    agorum core is the Document Management System (DMS) with the drive-interface. It allows access to the stored documents with a network share. It offers a wide range of DMS-Functionality:Search,Versioning,Archive,Workflow,... See http://www.agorum.com
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    SCAM is a development environment for building metadata stores for RDF and the Semantic Web. SCAM is built upon international technology standards and metadata standards. Such as RDF, Dublin Core, IEEE/LOM and IMS.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Kepler (http://kepler.cs.odu.edu) is the federated search service. It includes harvester which can harvests OAI-PMH 1.x and OAI-PMH 2.0 compliant repositories, basic search engine which is based on database and an OAI-PMH and self-archive data provider
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    DUAP is a server-side system for downloading DAISY (Digital Talking Books) files. It is based on Java. DUAP project was closed at its initial stage (2004) and the system was never developed into a more generic form.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Starting from an open archive content repository to an e-learning platform, backo provides an integrated web environment for organizations to manage the entire life cycle of knowledge creation, publication and dissemination.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    J2EE based picture archive software - allows browsing, online updates, e-postcards. (Was originally perl based. The perl code will remain for download but is not maintained)
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    A Java web archive for uploading files through a jsp container
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB