I was hesitant to open this thread, since the problem seems to be with Samba - but I'm only having the issue when using Clonezilla, and when backing up my NVME.
I have been using 3.2.2-5. I know there is a later version, but also don't see any specific Samba fixes in it. Also, I don't even know if Clonezilla is truly the issue, or if it's more on my remote device. So, I'm mainly looking for some guidance.
My network is all gigabit Ethernet. I have a raspberry Pi 4 that has Debian Bookworm installed. This has a couple of USB hard drives attached and essentially serves as a "cheap/quick" NAS for me - via Samba shares. For everything except this issue, it works perfectly.
As an example, I have one PC that has a SATA based SSD, and it's able to back up to the Samba share with no issue. It takes about 40 minutes to back up the roughly 500GB of data on the 1TB drive.
Then my other PC has a 2 TB NVMe PCIe 4 drive. This one has about 1.5 TB of data on it. When I start the backup for this one, it starts out stating it's running at around 10GB/min, then the speed slowly decreases for some time - maybe about 30 minutes. Eventually it will drop to 0 and just get "soft locked" and stop transferring. I've tried with and without the direct IO enabled for NVMe. What is interesting is only when backing up this NVMe drive, the "smbd" process on the Raspberry Pi starts taking up all of the RAM and also about half of the swap space. I think it just eventually runs out of RAM on that device and it crashes and I have to reboot the Raspberry Pi. But, it's only triggered by backing up this NVMe.
I thought maybe it was due to the size of the image, so I backed up the NVMe to another drive that was local to the PC, and that worked. Now I'm copying that folder from the PC to that Samba share inside of Debian Bookworm, and the Raspberry Pi is taking it with no issue. There's only 100 MB of RAM in total being occupied on that device now during the transfer.
I can't explain why the memory usage of smbd seems to get way out of hand to the point of bringing down the Pi/NAS, especially again since it only seems to happen when backing up from the NVMe. Unless it's a speed issue, and when I'm backing up a SATA SSD or copying files from a mechanical drive it's just not fast enough to cause smbd to eat all the available RAM.
Any advice on what might be happening? Is there any way to throttle the transfer on the Clonezilla side, if it is a speed issue - such as forcing 100mbps Ethernet, or other controls? Or are there any settings on the Samba side to restrict memory usage. I've seen notes on the "write cache size" parameter, but this is not in use on my server.
Any assistance will be greatly appreciated.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Mm... Since you are talking about NVMe and RAM, maybe you can try to enter expert mode, and enable "-edio" to enable the direct IO. Not sure if this will ease this issue you encountered or not... Just a quick idea though.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, that is the setting I was referring to when I mentioned with and without direct IO enabled. I couldn't remember the exact text of the setting.
The behavior is essentially the same for both. With it enabled, it starts at about 13 GB/min, but then slowly tapers off until it gets down to 0 and stops - about 7% or so into the backup of that partition.
If I disable that setting, the only difference is it only starts at around 10 GB/min, but then also slowly tapers off until it gets down to 0 GB/min - also around the 7% mark.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I should be able to try that during the week at some point. I don't believe the other two devices I have handy have enough storage for the full backup, but as long as it lets me start it I can see if the behavior is the same.
Or worst case I can temporarily move the USB drive over to one of the other devices, that shouldn't be too difficult for a short-term test.
I was hoping someone else may have already had the same issue and could chime in - but I don't mind doing the additional testing either.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I can try to set up NFS and see how it does, if it's not too involved.
I did try to use SSH before, but since there is also encyption in transit, it is very CPU intensive on the recieving end, and the transfer speed gets bottlenecked significantly buy the Pi's CPU.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I looked into NFS and it's not one I'd be interested in setting up.
I may give SSH a try again. I may need to see if I want to add a private key to the SquashFS, I sometimes boot from USB and sometimes from PXE. I remember it's slow, but I don't remember how slow. Maybe it's bearable if that's the option that works.
I did just replace that NVMe with a 4TB unit. So far doing the restore over Samba has not had any issues. So, I do imagine it had something to do with how fast the writing was occurring. The data was being read from NVMe, sent over gigabit Ethernet, but then landing on a mechanical disk over USB on the server side of Samba. So I'm assuming Samba was write caching even though it's not configured to do so, and the cache grew too large. Seems like a Samba issue to me.
Edit: Once I'm back up and running, I'll also still try backing up to a different Samba device. I have a Windows box and another Debian box I can use, but they're on another subnet and I also need to set up firewall rules for the test.
Edit 2: I found another potential solution. There is a forum post from 2021 where a few people were reporting the same issue (Samba using all RAM and swap when backing up from Clonezilla, or copying large files at high speed). The one user reported they solved it by using the image split feature. I normally don't use that as I don't need to. But if that's what does it, it can be a perfect solution.
Last edit: Russ Kubes 2025-07-15
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Just posting a final update. I have what I feel is a successful solution to this.
As noted in my previous post, there was a suggestion to use the split option. On my first attempt using this, I selected split size of 10240 MB (just around 10GB). Monitoring the receiving device, I see that the smbd process takes about 500MB of RAM for most of the transfer. With that said, once the "current file" is within the last 500MB to be written to disk, the RAM usage of smbd goes up to 1GB. Then once that file finishes writing, smbd usage goes back down to 500MB of RAM as it starts writing the next file. So, it seems that smbd is indeed write caching on its own, and not just relying on the built in OS caching. I don't know the logic it uses, as I'm not sure it can know the filesize before the transfer begins. But it does seem like this write cache grows uncontrollably when the file being transfered is "very large." However, keeping the split size to 10GB seems to keep this smbd write cache reigned in to reasonable amounts.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, got it. Thanks for sharing that. At least you have found a workaround for this.
Maybe an updated version of cifs will fix this? Or did you try to find any related bugs?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I couldn't find any related bugs. I did realize that device was actually still on Bullseye. So I reimaged it to Bookworm, but the issue still occurred. I then installed the backport version of Samba, which is the latest 4.22.3. Still the issue persists.
It definitely seems like a memory leak that occurs with large files if you're able to sustain high speeds. I wonder if I'm somewhat an edge case by transmitting 1 gbps to a small Raspberry Pi CPU. Meaning i can't say whether 1gbps is the breaking point, or if the CPU speed slso plays a role.
What I can say is when I transfer from a slower device that can essentially only read around 500mbps, I have no issues even on the large files. Also, using the 10GB file split did actually fail. The 500MB of RAM it used that I noticed would go back down after finishing a file didn't actually ALWAYS release. Every so often it would stay occupied and grow another 500MB. The backup of the 1.5 TB of data got to 97% complete before Samba finally took all the RAM. Once it starts taking Swap, it only takes a few seconds before it fills swap, too, then the Pi just becomes barely responsive.
On the latest backup I did just 4GB for the split. This one finished successfully. But even then Samba was taking about 100MB of RAM that slowly grew to 500MB. Now almost a week later, Samba still hasn't released that 500MB. This is why I think it is a leak, since there's no reason for it to still have that transfer cached at all this long after.
I did ponder about CIFS causing the issue, but I'm skeptical that it would be able to cause a leak if the problem wasn't in Samba itself. I'm going to see if I can reproduce it by copying large files from the NVMe in Windows to Samba on the Pi, and if so maybe open a ticket with the Samba team.
Last edit: Russ Kubes 2025-07-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It suggests that OpLocks are the potential cause of the leak specifically over 1gbe, or essentially when the source can read faster than the destination can write.
Interestingly the default cache size for the "smbd2 credits" (where the excess is cached in RAM before written to disk) is exactly 500MB. And that's the exact size the leak kept growing by. So I do feel that this correlates the most closely to my issue so far.
I don't fully understand what OpLocks have to do with whether these credits are used, as the way I understand it an OpLock basically tells the server it will cache the file on the client side. But maybe Samba is expecting that means it might change a lot before it should flush to disk.
There is a setting to limit the max number of credits, but since I know the leak grows over time, I don't anticipate making that smaller to "solve" the problem.
I have disabled OpLocks on my shares, though. This configuration is server side. I then started a backup with 10GB splits. So far absolutely no memory issue. Samba is staying under 256MB of RAM usage and no indication of growing even 16 files in.
Now, one thought I have is Clonezilla probably doesn't even benefit from OpLocks. I believe everything is just written out (when saving) then read back all at once (if checksums or check image is selected). And I believe it is all sequential. I don't believe there's any random access. Thus, I wonder if there's a way to prevent CIFS from requesting OpLocks if you needed to avoid it on the Clonezilla side. Really Samba should fix the leak, though. Moreover I could not find any reliable documentation for a way to use mount.cifs in a way to disable OpLocks from the client side.
Last edit: Russ Kubes 2025-07-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
From the manual of mount.cifs:
"nolease"
Do not request lease/oplock when openning a file on the server. This turns off local caching of IO, byte-range lock and read meta-data operations (see actimeo for more details about metadata caching). Requires SMB2 and above (see vers).
Is this the one? Maybe you can mount your CIFS server like the following and do some tests:
sudo mount.cifs //server/share /home/partimag -o nolease,username=<username>,password=<password>,vers=3.0</password></username>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm back to being puzzled and frustrated with this one.
So, I reverted my configuration change to prevent oplocks (to allow them again), in order to ensure I could recreate the problem again. As expected, the problem was readily reproducible.
I then tried the mount command that you suggested above, but oddly the problem still occurred.
I then tried putting the "oplocks = no" configuration back into the smb.conf file and restarted Samba on the server again. I also unmounted the partimage and remounted it using the UI (no special flags). Unfortunately, the issue was still reproducible even with the configuration that I had that was previously working.
I still believe this is a memory leak issue with Samba, specifically smbd. However, I am puzzled and frustrated as to why the previous workaround no longer seems to work.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sometimes the bug is a bug... No reason... Not logical... I knew the frustration...
Anyhow, so you mentioned you use RPI and is running Debian Linux. Maybe you can upgrade the samba package to newer version?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was hesitant to open this thread, since the problem seems to be with Samba - but I'm only having the issue when using Clonezilla, and when backing up my NVME.
I have been using 3.2.2-5. I know there is a later version, but also don't see any specific Samba fixes in it. Also, I don't even know if Clonezilla is truly the issue, or if it's more on my remote device. So, I'm mainly looking for some guidance.
My network is all gigabit Ethernet. I have a raspberry Pi 4 that has Debian Bookworm installed. This has a couple of USB hard drives attached and essentially serves as a "cheap/quick" NAS for me - via Samba shares. For everything except this issue, it works perfectly.
As an example, I have one PC that has a SATA based SSD, and it's able to back up to the Samba share with no issue. It takes about 40 minutes to back up the roughly 500GB of data on the 1TB drive.
Then my other PC has a 2 TB NVMe PCIe 4 drive. This one has about 1.5 TB of data on it. When I start the backup for this one, it starts out stating it's running at around 10GB/min, then the speed slowly decreases for some time - maybe about 30 minutes. Eventually it will drop to 0 and just get "soft locked" and stop transferring. I've tried with and without the direct IO enabled for NVMe. What is interesting is only when backing up this NVMe drive, the "smbd" process on the Raspberry Pi starts taking up all of the RAM and also about half of the swap space. I think it just eventually runs out of RAM on that device and it crashes and I have to reboot the Raspberry Pi. But, it's only triggered by backing up this NVMe.
I thought maybe it was due to the size of the image, so I backed up the NVMe to another drive that was local to the PC, and that worked. Now I'm copying that folder from the PC to that Samba share inside of Debian Bookworm, and the Raspberry Pi is taking it with no issue. There's only 100 MB of RAM in total being occupied on that device now during the transfer.
I can't explain why the memory usage of smbd seems to get way out of hand to the point of bringing down the Pi/NAS, especially again since it only seems to happen when backing up from the NVMe. Unless it's a speed issue, and when I'm backing up a SATA SSD or copying files from a mechanical drive it's just not fast enough to cause smbd to eat all the available RAM.
Any advice on what might be happening? Is there any way to throttle the transfer on the Clonezilla side, if it is a speed issue - such as forcing 100mbps Ethernet, or other controls? Or are there any settings on the Samba side to restrict memory usage. I've seen notes on the "write cache size" parameter, but this is not in use on my server.
Any assistance will be greatly appreciated.
Mm... Since you are talking about NVMe and RAM, maybe you can try to enter expert mode, and enable "-edio" to enable the direct IO. Not sure if this will ease this issue you encountered or not... Just a quick idea though.
Yes, that is the setting I was referring to when I mentioned with and without direct IO enabled. I couldn't remember the exact text of the setting.
The behavior is essentially the same for both. With it enabled, it starts at about 13 GB/min, but then slowly tapers off until it gets down to 0 and stops - about 7% or so into the backup of that partition.
If I disable that setting, the only difference is it only starts at around 10 GB/min, but then also slowly tapers off until it gets down to 0 GB/min - also around the 7% mark.
OK.
So is this issue reproducible in different Samba server? Say, a MS Windows with its SAMBA service enabled?
I should be able to try that during the week at some point. I don't believe the other two devices I have handy have enough storage for the full backup, but as long as it lets me start it I can see if the behavior is the same.
Or worst case I can temporarily move the USB drive over to one of the other devices, that shouldn't be too difficult for a short-term test.
I was hoping someone else may have already had the same issue and could chime in - but I don't mind doing the additional testing either.
OK.
BTW, since you are running Debian on your RPi4, have you tried to run ssh server or NFS server? The issue is also reproducible?
I can try to set up NFS and see how it does, if it's not too involved.
I did try to use SSH before, but since there is also encyption in transit, it is very CPU intensive on the recieving end, and the transfer speed gets bottlenecked significantly buy the Pi's CPU.
I looked into NFS and it's not one I'd be interested in setting up.
I may give SSH a try again. I may need to see if I want to add a private key to the SquashFS, I sometimes boot from USB and sometimes from PXE. I remember it's slow, but I don't remember how slow. Maybe it's bearable if that's the option that works.
I did just replace that NVMe with a 4TB unit. So far doing the restore over Samba has not had any issues. So, I do imagine it had something to do with how fast the writing was occurring. The data was being read from NVMe, sent over gigabit Ethernet, but then landing on a mechanical disk over USB on the server side of Samba. So I'm assuming Samba was write caching even though it's not configured to do so, and the cache grew too large. Seems like a Samba issue to me.
Edit: Once I'm back up and running, I'll also still try backing up to a different Samba device. I have a Windows box and another Debian box I can use, but they're on another subnet and I also need to set up firewall rules for the test.
Edit 2: I found another potential solution. There is a forum post from 2021 where a few people were reporting the same issue (Samba using all RAM and swap when backing up from Clonezilla, or copying large files at high speed). The one user reported they solved it by using the image split feature. I normally don't use that as I don't need to. But if that's what does it, it can be a perfect solution.
Last edit: Russ Kubes 2025-07-15
Just posting a final update. I have what I feel is a successful solution to this.
As noted in my previous post, there was a suggestion to use the split option. On my first attempt using this, I selected split size of 10240 MB (just around 10GB). Monitoring the receiving device, I see that the smbd process takes about 500MB of RAM for most of the transfer. With that said, once the "current file" is within the last 500MB to be written to disk, the RAM usage of smbd goes up to 1GB. Then once that file finishes writing, smbd usage goes back down to 500MB of RAM as it starts writing the next file. So, it seems that smbd is indeed write caching on its own, and not just relying on the built in OS caching. I don't know the logic it uses, as I'm not sure it can know the filesize before the transfer begins. But it does seem like this write cache grows uncontrollably when the file being transfered is "very large." However, keeping the split size to 10GB seems to keep this smbd write cache reigned in to reasonable amounts.
OK, got it. Thanks for sharing that. At least you have found a workaround for this.
Maybe an updated version of cifs will fix this? Or did you try to find any related bugs?
I couldn't find any related bugs. I did realize that device was actually still on Bullseye. So I reimaged it to Bookworm, but the issue still occurred. I then installed the backport version of Samba, which is the latest 4.22.3. Still the issue persists.
It definitely seems like a memory leak that occurs with large files if you're able to sustain high speeds. I wonder if I'm somewhat an edge case by transmitting 1 gbps to a small Raspberry Pi CPU. Meaning i can't say whether 1gbps is the breaking point, or if the CPU speed slso plays a role.
What I can say is when I transfer from a slower device that can essentially only read around 500mbps, I have no issues even on the large files. Also, using the 10GB file split did actually fail. The 500MB of RAM it used that I noticed would go back down after finishing a file didn't actually ALWAYS release. Every so often it would stay occupied and grow another 500MB. The backup of the 1.5 TB of data got to 97% complete before Samba finally took all the RAM. Once it starts taking Swap, it only takes a few seconds before it fills swap, too, then the Pi just becomes barely responsive.
On the latest backup I did just 4GB for the split. This one finished successfully. But even then Samba was taking about 100MB of RAM that slowly grew to 500MB. Now almost a week later, Samba still hasn't released that 500MB. This is why I think it is a leak, since there's no reason for it to still have that transfer cached at all this long after.
I did ponder about CIFS causing the issue, but I'm skeptical that it would be able to cause a leak if the problem wasn't in Samba itself. I'm going to see if I can reproduce it by copying large files from the NVMe in Windows to Samba on the Pi, and if so maybe open a ticket with the Samba team.
Last edit: Russ Kubes 2025-07-20
I feel like I'm getting close on this one.
I found this thread https://bugzilla.samba.org/show_bug.cgi?id=15261
It suggests that OpLocks are the potential cause of the leak specifically over 1gbe, or essentially when the source can read faster than the destination can write.
Interestingly the default cache size for the "smbd2 credits" (where the excess is cached in RAM before written to disk) is exactly 500MB. And that's the exact size the leak kept growing by. So I do feel that this correlates the most closely to my issue so far.
I don't fully understand what OpLocks have to do with whether these credits are used, as the way I understand it an OpLock basically tells the server it will cache the file on the client side. But maybe Samba is expecting that means it might change a lot before it should flush to disk.
There is a setting to limit the max number of credits, but since I know the leak grows over time, I don't anticipate making that smaller to "solve" the problem.
I have disabled OpLocks on my shares, though. This configuration is server side. I then started a backup with 10GB splits. So far absolutely no memory issue. Samba is staying under 256MB of RAM usage and no indication of growing even 16 files in.
Now, one thought I have is Clonezilla probably doesn't even benefit from OpLocks. I believe everything is just written out (when saving) then read back all at once (if checksums or check image is selected). And I believe it is all sequential. I don't believe there's any random access. Thus, I wonder if there's a way to prevent CIFS from requesting OpLocks if you needed to avoid it on the Clonezilla side. Really Samba should fix the leak, though. Moreover I could not find any reliable documentation for a way to use mount.cifs in a way to disable OpLocks from the client side.
Last edit: Russ Kubes 2025-07-21
From the manual of mount.cifs:
"nolease"
Do not request lease/oplock when openning a file on the server. This turns off local caching of IO, byte-range lock and read meta-data operations (see actimeo for more details about metadata caching). Requires SMB2 and above (see vers).
Is this the one? Maybe you can mount your CIFS server like the following and do some tests:
sudo mount.cifs //server/share /home/partimag -o nolease,username=<username>,password=<password>,vers=3.0</password></username>
I'm back to being puzzled and frustrated with this one.
So, I reverted my configuration change to prevent oplocks (to allow them again), in order to ensure I could recreate the problem again. As expected, the problem was readily reproducible.
I then tried the mount command that you suggested above, but oddly the problem still occurred.
I then tried putting the "oplocks = no" configuration back into the smb.conf file and restarted Samba on the server again. I also unmounted the partimage and remounted it using the UI (no special flags). Unfortunately, the issue was still reproducible even with the configuration that I had that was previously working.
I still believe this is a memory leak issue with Samba, specifically smbd. However, I am puzzled and frustrated as to why the previous workaround no longer seems to work.
Sometimes the bug is a bug... No reason... Not logical... I knew the frustration...
Anyhow, so you mentioned you use RPI and is running Debian Linux. Maybe you can upgrade the samba package to newer version?