batch-waybackmachine-urlsaver is a project designed to automate the archiving of URL(s) in the Internet Archive Wayback Machine.
Using a local or separate python virtual environment separate from the global one is required in some operating systems, and also is a good practice just by itself.
This local venv can be located inside each project's dir or in another place to be used for all projects/scripts.
Clone this project repository to a new directory:
$ cd /projects/
$ git clone git://git.code.sf.net/p/batch-waybackmachine-urlsaver/code batch-waybackmachine-urlsaver-code
$ cd batch-waybackmachine-urlsaver-code/
Some operating systems do not allow to install modules/extensions system-wide, can be forced but it's not recommended:
$ python3 --version
Python 3.12.6
$ which python3
/opt/homebrew/bin/python3
$ which pip3
/opt/homebrew/bin/pip3
$ pip3 install beautifulsoup4
error: externally-managed-environment
× This environment is externally managed
╰─> To install Python packages system-wide, try brew install
xyz, where xyz is the package you are trying to
install.
If you wish to install a Python library that isn't in Homebrew,
use a virtual environment:
python3 -m venv path/to/venv
source path/to/venv/bin/activate
python3 -m pip install xyz
If you wish to install a Python application that isn't in Homebrew,
it may be easiest to use 'pipx install xyz', which will manage a
virtual environment for you. You can install pipx with
brew install pipx
You may restore the old behavior of pip by passing
the '--break-system-packages' flag to pip, or by adding
'break-system-packages = true' to your pip.conf file. The latter
will permanently disable this error.
If you disable this error, we STRONGLY recommend that you additionally
pass the '--user' flag to pip, or set 'user = true' in your pip.conf
file. Failure to do this can result in a broken Homebrew installation.
Read more about this behavior here: <https://peps.python.org/pep-0668/>
note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
So the recommendation is to create a local (isolated) python virtual environment inside the git repository:
$ python3 -m venv .venv
The venv directory should be excluded from git tracking when in same repo, unless it's being already ignored:
printf ".venv/\n" >> .gitignore
Then activate the venv each time a new shell/terminal/console instance is opened, if the local venv is not configured as default:
$ source .venv/bin/activate
The current venv location can be obtained to check if it's properly activated as local:
$ echo $VIRTUAL_ENV
/projects/batch-waybackmachine-urlsaver-code/.venv
$ python3 -c "import os; print(os.getenv('VIRTUAL_ENV'))"
/projects/batch-waybackmachine-urlsaver-code/.venv
Install in the local venv the required modules for this project (the python scripts themselves also check for it):
$ python3 -m pip install waybackpy requests beautifulsoup4 urllib3 tqdm
Collecting waybackpy
Using cached waybackpy-3.0.6-py3-none-any.whl.metadata (9.9 kB)
Collecting requests
Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting beautifulsoup4
Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting urllib3
Using cached urllib3-2.2.3-py3-none-any.whl.metadata (6.5 kB)
Collecting tqdm
Using cached tqdm-4.66.5-py3-none-any.whl.metadata (57 kB)
Collecting click (from waybackpy)
Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting charset-normalizer<4,>=2 (from requests)
Using cached charset_normalizer-3.3.2-cp312-cp312-macosx_11_0_arm64.whl.metadata (33 kB)
Collecting idna<4,>=2.5 (from requests)
Using cached idna-3.10-py3-none-any.whl.metadata (10 kB)
Collecting certifi>=2017.4.17 (from requests)
Using cached certifi-2024.8.30-py3-none-any.whl.metadata (2.2 kB)
Collecting soupsieve>1.2 (from beautifulsoup4)
Using cached soupsieve-2.6-py3-none-any.whl.metadata (4.6 kB)
Using cached waybackpy-3.0.6-py3-none-any.whl (34 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
Using cached urllib3-2.2.3-py3-none-any.whl (126 kB)
Using cached tqdm-4.66.5-py3-none-any.whl (78 kB)
Using cached certifi-2024.8.30-py3-none-any.whl (167 kB)
Using cached charset_normalizer-3.3.2-cp312-cp312-macosx_11_0_arm64.whl (119 kB)
Using cached idna-3.10-py3-none-any.whl (70 kB)
Using cached soupsieve-2.6-py3-none-any.whl (36 kB)
Using cached click-8.1.7-py3-none-any.whl (97 kB)
Installing collected packages: urllib3, tqdm, soupsieve, idna, click, charset-normalizer, certifi, requests, beautifulsoup4, waybackpy
Successfully installed beautifulsoup4-4.12.3 certifi-2024.8.30 charset-normalizer-3.3.2 click-8.1.7 idna-3.10 requests-2.32.3 soupsieve-2.6 tqdm-4.66.5 urllib3-2.2.3 waybackpy-3.0.6
$ python3 -m pip install waybackpy requests beautifulsoup4 urllib3 tqdm
Requirement already satisfied: waybackpy in ./.venv/lib/python3.12/site-packages (3.0.6)
Requirement already satisfied: requests in ./.venv/lib/python3.12/site-packages (2.32.3)
Requirement already satisfied: beautifulsoup4 in ./.venv/lib/python3.12/site-packages (4.12.3)
Requirement already satisfied: urllib3 in ./.venv/lib/python3.12/site-packages (2.2.3)
Requirement already satisfied: tqdm in ./.venv/lib/python3.12/site-packages (4.66.5)
Requirement already satisfied: click in ./.venv/lib/python3.12/site-packages (from waybackpy) (8.1.7)
Requirement already satisfied: charset-normalizer<4,>=2 in ./.venv/lib/python3.12/site-packages (from requests) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in ./.venv/lib/python3.12/site-packages (from requests) (3.10)
Requirement already satisfied: certifi>=2017.4.17 in ./.venv/lib/python3.12/site-packages (from requests) (2024.8.30)
Requirement already satisfied: soupsieve>1.2 in ./.venv/lib/python3.12/site-packages (from beautifulsoup4) (2.6)
To deactivate the local venv and return to global venv just run deactivate. Now will look something like this:
$ deactivate
$ echo $VIRTUAL_ENV
$ python3 -c "import os; print(os.getenv('VIRTUAL_ENV'))"
None
recursive-crawler.py is a python3 script which scans given URL as argument (tested with https://packages-prod.broadcom.com/tools/releases/latest/), and detects recursively all files and directories listed as html links. It generates the files crawler_visited_urls.txt and crawler_extracted_urls.txt (this can be used for the next script).
save-urls-online-waybackpy.py is a python3 script which reads the input text file given as first argument. Each line must be a URL. Successfully saved URLs are deleted from the input file and saved to completed.txt and the ones which fail are moved to failed.txt. By pressing Q/q, the script will exit safely after current URL archival, without having to wait for all urls to be archived. Exiting or pausing with Ctrl+C or Ctrl+Z is also safe.
It uses the module waybackpy to communicate with Internet Archive servers, but offers very little configuration and is prone to suffer shadowbanning.
save-urls-online-headers.py is like the previous one, but instead of using the third party module waybackpy.py, sends custom crafted headers like those send by a desktop web browser. The Internet Web Archive is persistent in not wanting to save pages except by a desktop web browser.
1) Sometimes the Web Archive does not return any bad HTTP code, so makes it lookslike the page has been correctly archived. However, when actually browsing the returned archived URL, different messages may appear as web text in the DOM object:
<div class="row">
<div class="col-md-4 col-md-offset-4">
<h2>Sorry</h2>
<p>Job failed</p>
<div class="text-center">
<a href="/save">Return to Save Page Now</a>
</div>
</div>
</div>
<div class="row">
<div class="col-md-4 col-md-offset-4">
<h2>Sorry</h2>
<p>You cannot make more than (200,) captures per day. Please email us at "info@archive.org" if you would like to discuss this more.</p>
<div class="text-center">
<a href="/save">Return to Save Page Now</a>
</div>
</div>
</div>
<div class="row">
<div class="col-md-4 col-md-offset-4">
<h2>Sorry</h2>
<p>This URL is in the Save Page Now service block list and cannot be captured. Please email us at "info@archive.org" if you would like to discuss this more.</p>
<div class="text-center">
<a href="/save">Return to Save Page Now</a>
</div>
</div>
</div>
<noscript>
<div class="no-script-message">
The Wayback Machine requires your browser to support JavaScript, please email <a href="mailto:info@archive.org">info@archive.org</a><br/>if you have any questions about this.
</div>
</noscript>
<footer>
<div id="footerHome">
<p>
The Wayback Machine is an initiative of the
<a href="//archive.org/">Internet Archive</a>,
a 501(c)(3) non-profit, building a digital library of
Internet sites and other cultural artifacts in digital form.
<br>Other <a href="//archive.org/projects/">projects</a> include
<a href="https://openlibrary.org/">Open Library</a> &
<a href="https://archive-it.org">archive-it.org</a>.
</p>
<p>
Your use of the Wayback Machine is subject to the Internet Archive's
<a href="//archive.org/about/terms.php">Terms of Use</a>.
</p>
</div>
</footer>
This is a sample stripped log when trying to archive some URLs with save-urls-online-headers.py, it's noticeable that the date and time in the response URL is missing:
$ python3 save-urls-online-headers.py urls.txt
Checking modules availability...
Starting key listener thread for exit command...
Starting jobs, press 'Q'/'q' to exit after any current iteration...
Archiving (1/359985): https://packages-prod.broadcom.com/tools/esx/4.0ep09/rhel6/i686/vmware-open-vm-tools-kmod-8.0.5-989856.el6.i686.rpm
Archived URL: https://web.archive.org/save/https://packages-prod.broadcom.com/tools/esx/4.0ep09/rhel6/i686/vmware-open-vm-tools-kmod-8.0.5-989856.el6.i686.rpm
Failed to archive https://packages-prod.broadcom.com/tools/esx/4.0ep09/rhel6/i686/vmware-open-vm-tools-kmod-8.0.5-989856.el6.i686.rpm: Failed to retrieve archived page. Status Code: 520
Exit requested, waiting for current operation to complete...
Sleeping for 30 seconds (1/359984)...: 7%|▍ | 2/30 [00:02<00:28, 1.01s/s]
Exiting by request.
A valid URL would be something like this:
https://web.archive.org/web/20240923083252/https://packages-prod.broadcom.com/tools/esx/4.0ep09/rhel6/i686/vmware-open-vm-tools-kmod-8.0.5-989856.el6.i686.rpm
The correct way to detect if a URL was archived without inspecting the DOM itself, it to check the response URL format. For example to archive the URL:
https://www.rapidtables.com/web/color/orange-color.html
the correct format is:
https://web.archive.org/web/20240822144645/https://www.rapidtables.com/web/color/orange-color.html
a failed URL is:
https://web.archive.org/save/https://www.rapidtables.com/web/color/orange-color.html
The python script now detects the failure:
$ python3 save-urls-online-headers.py urls.txt
Checking modules availability...
Starting key listener thread for exit command...
Starting jobs, press 'Q'/'q' to exit after any current iteration...
Archiving (1/359981): https://packages-prod.broadcom.com/tools/esx/4.0ep09/rhel6/i686/vmware-open-vm-tools-xorg-utilities-8.0.5-989856.el6.i686.rpm
Failed to archive https://packages-prod.broadcom.com/tools/esx/4.0ep09/rhel6/i686/vmware-open-vm-tools-xorg-utilities-8.0.5-989856.el6.i686.rpm: Archived URL does not start with 'https://web.archive.org/web/'.
Exit requested, waiting for current operation to complete...
Sleeping for 30 seconds (1/359980)...: 13%|▊ | 4/30 [00:04<00:26, 1.00s/s]
Exiting by request.
The problem does not end here, because sending a valid 'Cookie' header with curl, does not make it work either (even being a registered account with donations). The Wayback Machine requires JavaScript for that kind of unlimited archival which can be achieved using a fully-fledged desktop web browser. They want a physical person commiting the requests to prevent abuse.
2) This is a sample stripped log when trying to archive some URLs with save-urls-online-waybackpy.py:
$ python3 save-urls-online.py urls.txt
Checking modules availability...
Starting key listener thread for exit command...
Starting jobs, press 'Q'/'q' to exit after any current iteration...
Archiving (1/364017): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel4/x86_64/headers/vmware-tools-kmod-0-7.4.8-396269.423167.el4.x86_64.hdr
Archived URL: https://web.archive.org/web/20240920084704/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel4/x86_64/headers/vmware-tools-kmod-0-7.4.8-396269.423167.el4.x86_64.hdr
Sleeping for 30 seconds (1/364016)...: 100%|█████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (8/364017): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel4/x86_64/open-vm-tools-xorg-drv-mouse-12.4.1.0-0.396269.423167.el4.x86_64.rpm
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel4/x86_64/open-vm-tools-xorg-drv-mouse-12.4.1.0-0.396269.423167.el4.x86_64.rpm: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel4/x86_64/open-vm-tools-xorg-drv-mouse-12.4.1.0-0.396269.423167.el4.x86_64.rpm.
Response URL:
https://web.archive.org/save/_embed/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel4/x86_64/open-vm-tools-xorg-drv-mouse-12.4.1.0-0.396269.423167.el4.x86_64.rpm
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 08:53:29 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'cache-control': 'no-cache', 'x-app-server': 'wwwb-app52', 'x-ts': '520', 'x-tr': '24', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=1', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'BYPASS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Sleeping for 30 seconds (8/364016)...: 100%|█████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (19/364017): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel4/x86_64/vmware-tools-nox-7.4.8-396269.423167.el4.x86_64.rpm
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel4/x86_64/vmware-tools-nox-7.4.8-396269.423167.el4.x86_64.rpm: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel4/x86_64/vmware-tools-nox-7.4.8-396269.423167.el4.x86_64.rpm.
Response URL:
https://web.archive.org/save/_embed/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel4/x86_64/vmware-tools-nox-7.4.8-396269.423167.el4.x86_64.rpm
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 09:02:42 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'cache-control': 'no-cache', 'x-app-server': 'wwwb-app53', 'x-ts': '520', 'x-tr': '12', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=1', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'BYPASS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Sleeping for 30 seconds (19/364016)...: 100%|████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (20/364017): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/.
Response URL:
https://web.archive.org/save/_embed/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 09:05:04 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'cache-control': 'no-cache', 'x-app-server': 'wwwb-app14', 'x-ts': '520', 'x-tr': '12', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=0', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'BYPASS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Sleeping for 30 seconds (20/364016)...: 100%|████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (25/364017): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-7.4.8-396269.423167.el5.i686.rpm
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-7.4.8-396269.423167.el5.i686.rpm: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-7.4.8-396269.423167.el5.i686.rpm.
Response URL:
https://web.archive.org/save/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-7.4.8-396269.423167.el5.i686.rpm
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 09:21:56 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'cache-control': 'no-cache', 'x-app-server': 'wwwb-app14', 'x-ts': '520', 'x-tr': '26789', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=1', 'X-location': 'save-sync', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Sleeping for 30 seconds (25/364016)...: 100%|████| 30/30 [00:30<00:00, 1.01s/s]
Archiving (26/364017): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-common-7.4.8-396269.423167.el5.i686.rpm
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-common-7.4.8-396269.423167.el5.i686.rpm: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-common-7.4.8-396269.423167.el5.i686.rpm.
Response URL:
https://web.archive.org/save/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-common-7.4.8-396269.423167.el5.i686.rpm
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 09:23:32 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'cache-control': 'no-cache', 'x-app-server': 'wwwb-app53', 'x-ts': '520', 'x-tr': '1278', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=1', 'X-location': 'save-sync', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Sleeping for 30 seconds (26/364016)...: 100%|████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (27/364017): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-kmod-7.4.8-396269.423167.el5.i686.rpm
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-kmod-7.4.8-396269.423167.el5.i686.rpm: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-kmod-7.4.8-396269.423167.el5.i686.rpm.
Response URL:
https://web.archive.org/save/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-kmod-7.4.8-396269.423167.el5.i686.rpm
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 09:25:41 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'cache-control': 'no-cache', 'x-app-server': 'wwwb-app14', 'x-ts': '520', 'x-tr': '7730', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=0', 'X-location': 'save-sync', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Sleeping for 30 seconds (27/364016)...: 100%|████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (28/364017): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-nox-7.4.8-396269.423167.el5.i686.rpm
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-nox-7.4.8-396269.423167.el5.i686.rpm: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-nox-7.4.8-396269.423167.el5.i686.rpm.
Response URL:
https://web.archive.org/save/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-nox-7.4.8-396269.423167.el5.i686.rpm
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 09:37:48 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '232', 'Connection': 'keep-alive', 'x-app-server': 'wwwb-app52', 'x-ts': '404', 'x-tr': '60025', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=1', 'X-location': 'save-sync', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Sleeping for 30 seconds (28/364016)...: 100%|████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (29/364017): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-xorg-drv-display-10.15.0.0-0.396269.423167.el5.i686.rpm
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-xorg-drv-display-10.15.0.0-0.396269.423167.el5.i686.rpm: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-xorg-drv-display-10.15.0.0-0.396269.423167.el5.i686.rpm.
Response URL:
https://web.archive.org/save/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-xorg-drv-display-10.15.0.0-0.396269.423167.el5.i686.rpm
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 10:02:10 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '232', 'Connection': 'keep-alive', 'x-app-server': 'wwwb-app52', 'x-ts': '404', 'x-tr': '30015', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=1', 'X-location': 'save-sync', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Sleeping for 30 seconds (29/364016)...: 100%|████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (30/364017): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-xorg-drv-mouse-12.4.1.0-0.396269.423167.el5.i686.rpm
Exit requested, waiting for current operation to complete...
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-xorg-drv-mouse-12.4.1.0-0.396269.423167.el5.i686.rpm: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-xorg-drv-mouse-12.4.1.0-0.396269.423167.el5.i686.rpm.
Response URL:
https://web.archive.org/save/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/open-vm-tools-xorg-drv-mouse-12.4.1.0-0.396269.423167.el5.i686.rpm
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 10:13:34 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '232', 'Connection': 'keep-alive', 'x-app-server': 'wwwb-app14', 'x-ts': '404', 'x-tr': '30019', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=0', 'X-location': 'save-sync', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Sleeping for 30 seconds (6/363986)...: 100%|█████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (7/363987): https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/repodata/repomd.xml.asc
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/repodata/repomd.xml.asc: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/repodata/repomd.xml.asc.
Response URL:
https://web.archive.org/save/_embed/https://packages-prod.broadcom.com/tools/esx/3.5latest/rhel5/i686/repodata/repomd.xml.asc
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 19:08:04 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'cache-control': 'no-cache', 'x-app-server': 'wwwb-app53', 'x-ts': '520', 'x-tr': '11', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=1', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'BYPASS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Sleeping for 30 seconds (1/363839)...: 100%|█████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (2/363840): https://packages-prod.broadcom.com/tools/esx/3.5latest/ubuntu/dists/hardy/main/binary-amd64/vmware-open-vm-tools-common_7.4.8-0.396269.423167_ubuntu8.04.amd64.deb
Unexpected error while archiving https://packages-prod.broadcom.com/tools/esx/3.5latest/ubuntu/dists/hardy/main/binary-amd64/vmware-open-vm-tools-common_7.4.8-0.396269.423167_ubuntu8.04.amd64.deb: HTTPSConnectionPool(host='web.archive.org', port=443): Max retries exceeded with url: /save/https://packages-prod.broadcom.com/tools/esx/3.5latest/ubuntu/dists/hardy/main/binary-amd64/vmware-open-vm-tools-common_7.4.8-0.396269.423167_ubuntu8.04.amd64.deb (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x107b99160>: Failed to establish a new connection: [Errno 61] Connection refused'))
Sleeping for 30 seconds (2/363839)...: 100%|█████| 30/30 [00:30<00:00, 1.00s/s]
Archiving (3/363840): https://packages-prod.broadcom.com/tools/esx/3.5latest/ubuntu/dists/hardy/main/binary-amd64/vmware-open-vm-tools-kmod-2.6.24-16-generic_7.4.8-0.396269.423167_ubuntu8.04.amd64.deb
Failed to archive https://packages-prod.broadcom.com/tools/esx/3.5latest/ubuntu/dists/hardy/main/binary-amd64/vmware-open-vm-tools-kmod-2.6.24-16-generic_7.4.8-0.396269.423167_ubuntu8.04.amd64.deb: Tried 8 times but failed to save and retrieve the archive for https://packages-prod.broadcom.com/tools/esx/3.5latest/ubuntu/dists/hardy/main/binary-amd64/vmware-open-vm-tools-kmod-2.6.24-16-generic_7.4.8-0.396269.423167_ubuntu8.04.amd64.deb.
Response URL:
https://web.archive.org/save/https://packages-prod.broadcom.com/tools/esx/3.5latest/ubuntu/dists/hardy/main/binary-amd64/vmware-open-vm-tools-kmod-2.6.24-16-generic_7.4.8-0.396269.423167_ubuntu8.04.amd64.deb
Response Header:
{'Server': 'nginx', 'Date': 'Fri, 20 Sep 2024 23:20:50 GMT', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'cache-control': 'no-cache', 'x-app-server': 'wwwb-app52', 'x-ts': '520', 'x-tr': '10', 'server-timing': 'TR;dur=0,Tw;dur=0,Tc;dur=1', 'X-location': 'save-sync', 'X-RL': '0', 'X-NA': '0', 'X-Page-Cache': 'MISS', 'X-NID': '-', 'Referrer-Policy': 'no-referrer-when-downgrade', 'Permissions-Policy': 'interest-cohort=()'}
Exit requested, waiting for current operation to complete...
Sleeping for 30 seconds (3/363839)...: 3%|▏ | 1/30 [00:01<00:29, 1.01s/s]
Exiting by request.
One archival was successful and the others failed. By default the module retries 8 times for each url, which makes the script very slow if the remote server starts throttling or shadowbanning the Wayback Machine.