|
From: Dan L. <da...@la...> - 2025-03-05 16:08:38
|
On Wed, Mar 5, 2025, at 9:27 AM, Rob Gerber wrote: > > > Robert Gerber > 402-237-8692 > ro...@cr... > > On Wed, Mar 5, 2025, 8:06 AM Dan Langille <da...@la...> wrote: >> __ >> On Tue, Mar 4, 2025, at 11:03 PM, Rob Gerber wrote: >>> I don't think that the problem is in bacula, for sure. I suspect other traffic over the link might be similarly impacted. My searching indicated that 0a000119 is a generic openssl error. Could be many things. I might be suspicious of the openssl version or implementation installed on your new router / firewall. The router or firewall may have flawed firmware. >> >> Is that theory contingent upon some kind of hardware acceleration on said firewall? > I don't know if hardware acceleration could be contributing to this. Maybe? >> If so, I should be able to verify that that is occurring and perhaps disable that acceleration so it's all done in software, removing the firmware from the equation. > I agree, swap to a simpler configuration and see if it has an impact. >> >> >>> >>> Consider running wireshark to analyze failed ssl transactions. >>> >>> I think I maybe got lucky when I searched this in duckduckgo. The top result contained something sort of relevant, with further breadcrumbs to chase. The next million results didn't even contain the 0a000119 keyword and look unrelated. >>> Check out this, and follow the links therein, and the links inside those links. I have about 10 tabs open now and I see some interesting stuff. Some people turned off segmentation offloading on their nic, others made new certs, others got rid of their netgear router. 0a000119 is a vague error. >>> https://forum.proxmox.com/threads/decryption-failed-or-bad-record-on-remote-sync.145131/ >> >> Thank you for that research. It is appreciated. >> >>> Have you verified that data can be sent over the network link? I assume yes, so what about data larger than a single packet size (ie, if a packet is fragmented, then what happens?)? >> >> The network link of the firewall? Yes. I think that is fine and working as expected. To test, I ran "wget https://download.freebsd.org/releases/ISO-IMAGES/14.2/FreeBSD-14.2-RELEASE-amd64-memstick.img". >> >> It completed in about 42s without errors. I verified the checksum is correct. >> >> Does that do the test you wanted? This test does not involve the VPN. > I think that is a good test, because it does appear to verify that the network is up and functional on at least one end. I meant to test through the VPN, which you did in subsequent tests. However, verifying the stability and function of your network on EACH end in a manner like you did here is an important test, since it appears to be up, yet doesn't "just work". > > I would be curious to see if you are able to send traffic directly from host to host without any VPN involved, though I think simply testing the remote end's ability to download a large file successfully could be more important. The hosts have been in place for years. This is not a new VPN - it's been around about 10 years. What is new: the gateway. It was replaced. It went from pfSense to vanilla FreeBSD. I think I'm missing some of the magic pfSense did in the configuration. I just completed a test fetch (of the same file as above) on each client. FYI, fetch times ranged from 22s to 7.5 minutes - all hosts downloaded the file without error (checksum was verified to be correct). > > I would want to check each end for internet connection packet loss by running a continuous ping to some stable internet IP like 8.8.8.8. Good idea. I just started: this on each host. ping -c 1000 8.8.8.8 Alexa: time, 1000s After the alarm, all completed like this: --- 8.8.8.8 ping statistics --- 1000 packets transmitted, 1000 packets received, 0.0% packet loss >> >> >> However, your suggestion made me try another test: >> >> [8:42 pro02 dan ~/tmp] % time scp -r foo.example:~bar/backups/Bacula . >> >> That grabs all the .bsr files I've backed up to that how. The copy involves about 2.6M and 221 files. >> >> Let's try that same backup over the VPN: >> >> [8:43 pro02 dan ~/tmp] % time scp -r foo.vpn.example.org:~rsyncer/backups/Bacula Bacula-vpn >> .. about five files are copied >> 0% 0 0.0KB/s --:-- ETAssh_dispatch_run_fatal: Connection to 10.14.0.217 port 22: message authentication code incorrect >> scp: Connection closed >> scp -r foo.vpn.example.org:~rsyncer/backups/Bacula Bacula-vpn 0.21s user 0.02s system 25% cpu 0.938 total >> >> To me, that says something is very wonky with the VPN. > I agree, though I would want to test for packet loss on each end. Maybe file downloads are more resilient than OpenVPN traffic and your OpenVPN session is only serving as a canary here. If you have a basic network connectivity issue, it'd be better to find that before you get elbows deep into openvpn and troubleshooting openssl. > > I wonder if ping tests would successfully pass through the VPN. If yes, this implies that small data can pass, but not bigger data. That sort of failure mode, if verified, would indicate an issue when a packet is fragmented. > > If pings won't pass, then I'd wonder if you ever had a working VPN configuration despite the paths appearing to be up. All the clients pass this test with the similar results: 1000 packets transmitted, 999 packets received, 0.1% packet loss (note: 5 of the 6 had 0 package loss) > >> >> >> Which also means, this is not a Bacula issue but a transport issue - solve that first, and the Bacula issue should resolve. >> >> Does that make sense? > I agree with your conclusion. This appears to be a transport issue. >> >> >> Thank you >> -- Dan Langille da...@la... |