Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "web archive extractor"

x

Sort By:

Relevance

Clear All Filters

OS

Linux 40
Windows 36
Mac 33
More...
BSD 24
ChromeOS 19

Category

Internet 12
Software Development 10
Communications 6
Multimedia 6
System 5
Artificial Intelligence 4
Business 4
Scientific/Engineering 4
Security 4
Education 3
Formats and Protocols 2
Social sciences 2
Database 1
Games 1

License

OSI-Approved Open Source 40
Public Domain 2
Creative Commons Attribution License 1
Other License 1

Translations

English 16
Spanish 5
French 3
German 3
More...
Italian 3
Chinese (Simplified) 2
Brazilian Portuguese 1
Czech 1
Greek 1
Hungarian 1
Korean 1
Norwegian 1
Polish 1
Portuguese 1
Russian 1
Ukrainian 1

Programming Language

Python 46
Java 4
JavaScript 4
PHP 3
Unix Shell 2
More...
C 1
C# 1
Common Lisp 1
Ruby 1

Status

Beta 11
Production/Stable 11
Planning 1
Pre-Alpha 1
More...
Alpha 1
Mature 1
Inactive 1

Showing 46 open source projects for "web archive extractor"

View related business solutions

Python Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
1

Archive Extractor

To use this tool, you need to have WinRar installed at the path "C:\Program Files\WinRAR" (which is typically set by default). Alternatively, you can have 7z installed at the path "C:\Program Files\7-Zip" (this is usually set by default as well). Please note that if you only have 7z installed, you will not be able to extract .rar files, but only .zip or .7z files. This tool is primarily designed to extract files from password-protected Rar/Zip/7z archives, although it also works on...

Downloads: 1 This Week

Last Update: 2025-01-20
See Project
2

Anna’s Archive

Comprehensive search engine for books, papers, comics, magazines

Anna’s Archive is a large-scale open-source search engine and data aggregation platform designed to index and provide access to a vast collection of books, academic papers, comics, magazines, and other digital texts through a unified interface. The project includes all the infrastructure required to run a full instance locally or in production, combining web servers, databases, and search indexing systems into a scalable architecture.

Downloads: 105 This Week

Last Update: 2026-03-23
See Project
3

Google CTF

Google CTF

Google CTF is the public repository that houses most of the challenges from Google’s Capture-the-Flag competitions since 2017 and the infrastructure used to run them. It’s a learning and practice archive: competitors and educators can replay tasks across categories like pwn, reversing, crypto, web, sandboxing, and forensics. The code and binaries intentionally contain vulnerabilities—by design—so users can explore exploit chains and patching in realistic settings. The repo also includes infrastructure components and links to a scoreboard implementation, giving organizers reference material for hosting their own events. ...

Downloads: 5 This Week

Last Update: 2026-02-11
See Project
4

claude-code-transcripts

Tools for publishing transcripts for Claude Code sessions

claude-code-transcripts is a command-line utility that takes session files exported from Claude Code (in JSON or JSONL format) and turns them into clean, navigable HTML transcripts that can be viewed in any modern web browser. It is designed to make the often dense and verbose outputs from AI coding sessions easier to read, share, and archive by breaking conversations into paginated, annotated pages with navigable timelines of prompts and responses. Users can run this tool locally or fetch sessions from the Claude API, giving flexibility for individual workflows or team documentation practices. ...

Downloads: 2 This Week

Last Update: 2026-01-30
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
5

Trafilatura

Python & command-line tool to gather text on the Web

...The extractor tries to strike a balance between limiting noise (precision) and including all valid parts (recall). It also has to be robust and reasonably fast, it runs in production on millions of documents.

Downloads: 0 This Week

Last Update: 2024-12-03
See Project
6

yt-dlp

A youtube-dl fork with additional features and fixes

yt-dlp is a youtube-dl fork based on the now inactive youtube-dlc. The main focus of this project is adding new features and patches while also keeping up to date with the original project

Downloads: 706 This Week

Last Update: 2026-03-17
See Project
7

bilibili-manga-downloader

Download and manage Bilibili Manga chapters with GUI downloader

...It also offers multiple output formats, allowing chapters to be saved as image folders or compressed comic archive formats suitable for local manga readers.

Downloads: 7 This Week

Last Update: 2026-03-13
See Project
8

ArchiveBox

Open source self-hosted web archiving

...Archive.org does a great job as a centralized service, but saved URLs have to be public, and they can't save every type of content. ArchiveBox is an open source tool that lets organizations & individuals archive both public & private web content while retaining control over their data. It can be used to save copies of bookmarks, preserve evidence for legal cases, backup photos from FB/Insta/Flickr or media from YT/Soundcloud/etc., save research papers, and more. ArchiveBox is an open-source, self-hosted web archiving tool for saving websites offline. ...

Downloads: 2 This Week

Last Update: 2024-12-15
See Project
9

AutoPkg

Automating packaging and software distribution on macOS

AutoPkg is a system that automatically prepares software for distribution to managed clients. Recipes allow you to specify a series of simple actions which combined together can perform complex tasks, similar to Automator workflows or Unix pipes.

Downloads: 3 This Week

Last Update: 2026-02-03
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
10

SimpDL

A tool to scrape images from SimpCity

SimpDL is an open-source media downloading tool designed to retrieve content from subscription-based or creator platforms, focusing on simplicity and ease of use. It enables users to download images, videos, and other media associated with specific creators or accounts, often through authenticated sessions. The project emphasizes a straightforward workflow where users provide login credentials or tokens, and the tool handles the retrieval and storage of content automatically. It is designed...

Downloads: 3 This Week

Last Update: 2026-03-18
See Project
11

ipwb

A distributed and persistent archive replay system using IPFS

...An important aspect of archival replay systems is rewriting various resource references for proper memento reconstruction so that they are dereferenced properly from the archive from around the same datetime as of the root memento and not from the live site (in which case the resource might have changed or gone missing). Many archival replay systems perform server-side rewriting, but it has its limitations when URIs are generated using JavaScript.

Downloads: 0 This Week

Last Update: 2024-10-24
See Project
12

ChatTTS webUI & API

A simple native web interface that uses ChatTTS to synthesize text

...From version 0.96 onward, ffmpeg installation is required for deployment, and previous CSV/PT voice tables are no longer valid, so users instead work with updated “voice value” parameters. For convenience, there is a prepackaged Windows build: you download a release archive, extract it, and double-click app.exe to start the web UI, which opens on localhost:9966.

Downloads: 3 This Week

Last Update: 2025-11-28
See Project
13

Paperless-AI

AI-powered document analysis and tagging for Paperless-ngx

...It integrates with multiple OpenAI-compatible services as well as local models, giving users flexibility in how document intelligence is handled. A key capability is its use of retrieval-augmented generation, which enables semantic search and natural language interaction across an entire document archive. Users can ask contextual questions about their files and receive precise answers based on full document understanding rather than simple keyword matching. Paperless-AI also includes a web interface for manual review and tagging, allowing greater control when handling sensitive or complex documents.

Downloads: 2 This Week

Last Update: 2026-03-17
See Project
14

LinkChecker

Check links in web documents or full websites

LinkChecker is a free, GPL licensed website validator. LinkChecker checks links in web documents or full websites. It runs on Python 3 systems, requiring Python 3.8 or later. The version in the pip repository may be old, to find out how to get the latest code, plus platform-specific information and other advice see doc/install.txt in the source code archive. If you do not want to install any additional libraries/dependencies you can use the Docker image which is published on GitHub Packages.

Downloads: 0 This Week

Last Update: 2025-07-28
See Project
15

tumblr-crawler

Python crawler to download photos and videos from Tumblr blogs

tumblr-crawler is an open source Python-based utility designed to download media content from Tumblr blogs. It provides a script that automatically retrieves photos and videos from specified Tumblr sites and saves them locally for offline access. Users can specify one or multiple blogs to crawl by editing a configuration file or by passing parameters through the command line. Once executed, the script fetches media from the Tumblr API and stores the downloaded files in folders named after...

Downloads: 1 This Week

Last Update: 5 days ago
See Project
16

GNNPCSAFT Web App

Smart Thermodynamic Modeling with Graph Neural Networks

The GNNPCSAFT Web App is an implementation of our project that focuses on using Graph Neural Networks (GNN) to estimate the pure-component parameters of the Equation of State PC-SAFT. We developed this app so the scientific community can access the model's results easily. In this app, the estimated pure-component parameters can be used to calculate thermodynamic properties and compare them with experimental data from the ThermoML Archive.

Downloads: 3 This Week

Last Update: 6 days ago
See Project
17

Kemono Downloader

Kemono Downloader - A cross-platform Python app built with PyQt6

Welcome to Kemono Downloader, a versatile Python-based desktop application built with PyQt6, designed to download content from Kemono.su. This tool enables users to archive individual posts or entire creator profiles from services like Patreon, Fanbox, and more, supporting a wide range of file types with customizable settings and advanced features.

1 Review

Downloads: 1,555 This Week

Last Update: 2026-03-07
See Project
18

paramspider

Mine parameterized URLs from web archives for security testing

ParamSpider is an open source command-line tool designed to discover URLs that contain parameters by mining historical data from web archives such as the Wayback Machine. It helps security researchers, penetration testers, and bug bounty hunters collect potential attack surfaces by automatically gathering archived URLs related to a specific domain. Instead of returning every discovered URL, the tool intelligently filters results to highlight parameterized endpoints that are more useful for...

Downloads: 4 This Week

Last Update: 2026-03-06
See Project
19

Yark

Simple OSINT tool for archiving and browsing YouTube channels offline

...The project focuses on OSINT (Open Source Intelligence) workflows by allowing users to collect and store videos, metadata, and thumbnails from a YouTube channel in a structured local archive. Instead of simply downloading individual videos, Yark creates a self-contained archive directory that includes metadata files and organized folders for media assets. This format allows users to maintain a historical record of a channel and track updates or changes over time. The tool also provides a local offline web interface that lets users browse and watch archived videos directly in their browser. ...

Downloads: 0 This Week

Last Update: 2026-03-06
See Project
20

grab-site

Web crawler for archiving and backing up sites into WARC archives

grab-site is an open source web crawling tool designed to archive and back up websites by recursively downloading their content. It works by taking a starting URL and systematically following links across the site, capturing pages and resources and saving them into WARC archive files for long-term preservation. Internally, the crawler uses a fork of the wpull engine to fetch and process web pages efficiently during large-scale crawls. grab-site includes a built-in dashboard that displays real-time crawl activity, including which URLs are currently being processed and how many remain in the queue. ...

Downloads: 1 This Week

Last Update: 1 day ago
See Project
21

Whisper Library

Whisper is a file-based time-series database format for Graphite

Whisper is one of three components within the Graphite project. Whisper is a fixed-size database, similar in design and purpose to RRD (round-robin-database). It provides fast, reliable storage of numeric data over time. Whisper allows for higher resolution (seconds per point) of recent data to degrade into lower resolutions for long-term retention of historical data. Copies data from src in dst, if missing. Unlike whisper-merge, don't overwrite data that's already present in the target...

Downloads: 0 This Week

Last Update: 2024-05-24
See Project
22

OpenEMM e-mail & marketing automation

software for email automation (newsletters, transaction mails, etc.)

This is an archive for old versions of OpenEMM. You will find the latest version of OpenEMM here: https://wiki.openemm.org OpenEMM is a browser-based enterprise application for email automation like info and marketing newsletters, transaction mails or multi-stage email campaigns. OpenEMM offers tons of features for professional users, among them: a great user interface, template-based HTML mailings, automatic bounce management, mail opening and link tracking, lots of graphical realtime...

19 Reviews

Downloads: 0 This Week

Last Update: 2022-03-02
See Project
23

Reminiscence

Self-Hosted Bookmark And Archive Manager

Bookmark links and edit its metadata (like title, tags, summary) via web interface. Archive links to content in HTML, PDF or full-page PNG format. Automatic archival of links to non-html content like pdf, jpg, txt etc. i.e. Bookmarking links to pdf, jpg etc.. via the web interface will automatically save those files on the server. Supports archival of media elements of a web page using third-party download managers.

Downloads: 0 This Week

Last Update: 2022-08-31
See Project
24

Paperless-ng

A supercharged version of paperless, scan, index and archive docs

Paperless is a simple Django application running in two parts, a Consumer (the thing that does the indexing) and a Web server (the part that lets you search & download already-indexed documents). Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. I wrote this to make “going paperless” easier. I do not have to...

Downloads: 0 This Week

Last Update: 2022-03-04
See Project
25

TransPose

PyTorch Implementation for "TransPose, Keypoint localization

TransPose is a human pose estimation model based on a CNN feature extractor, a Transformer Encoder, and a prediction head. Given an image, the attention layers built in Transformer can efficiently capture long-range spatial relationships between keypoints and explain what dependencies the predicted keypoints locations highly rely on.

Downloads: 1 This Week

Last Update: 2024-07-12
See Project

Previous
You're on page 1
2
Next

Related Searches

yt-dlp

kemono downloader v5.3.0

kemono downloader

kemonodownloader

kemono

youtube

google

youtube downloader

w-kemonodownloader-5.9.0-x86_64-portable.exe

rar extractor

Related Categories

Internet

Software Development

Communications

Multimedia

System

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise