From: <bac...@li...> - 2008-03-25 07:05:46
|
The following issue has been CLOSED ====================================================================== http://bugs.bacula.org/view.php?id=1038 ====================================================================== Reported By: silhouette Assigned To: ====================================================================== Project: bacula Issue ID: 1038 Category: Win32 File Daemon (client) Reproducibility: always Severity: major Priority: normal Status: closed Resolution: fixed Fixed in Version: 2.2.9 ====================================================================== Date Submitted: 01-14-2008 08:58 UTC Last Modified: 03-25-2008 07:05 UTC ====================================================================== Summary: Out of memory exception when restoring large fileset to Windows client Description: I was trying to restore a large fileset (1.6 Million files, about 110 GB) to a windows file daemon (Server is a OpenSuse 10.2 with Version 2.0.3 as well). After about 1.4 Million files the restore stopped as the windows file daemon ran out of memory (memory usage on the windows client was at about 2.5 GB). The bacula logs showed these messages on the server (the german message "Für diesen Befehl ist nicht genügend Speicher verfügbar." translates to "There is not enough memory available for this command."): <log> 11-Jan 20:56 windows-fd: RestoreFilesWindows.2008-01-11_19.34.25 Error: ../../filed/restore.c:918 Write error on F:/F/directories/fileX.htm: Für diesen Befehl ist nicht genügend Speicher verfügbar. ... (this message is repeated about 2500 times) ... 11-Jan 20:56 windows-fd: RestoreFilesWindows.2008-01-11_19.34.25 Error: ../../findlib/create_file.c:221 Could not create F:/F/directories/fileY.htm: ERR=Für diesen Befehl ist nicht genügend Speicher verfügbar. 11-Jan 20:56 windows-fd: RestoreFilesWindows.2008-01-11_19.34.25 Error: ../../findlib/create_file.c:221 Could not create F:/F/directories/fileZ.htm: ERR=Das System kann den angegebenen Pfad nicht finden. 11-Jan 20:56 windows-fd: RestoreFilesWindows.2008-01-11_19.34.25 Error: ../../findlib/create_file.c:221 Could not create F:/F/directories/fileA.htm: ERR=Das System kann den angegebenen Pfad nicht finden. 11-Jan 20:56 bacula-dir: RestoreFilesWindows.2008-01-11_19.34.25 Fatal error: bnet.c:241 Packet size too big from "File daemon:192.168.21.19:9102. Terminating connection. 11-Jan 20:56 bacula-dir: RestoreFilesWindows.2008-01-11_19.34.25 Fatal error: No Job status returned from FD. 11-Jan 20:56 bacula-dir: RestoreFilesWindows.2008-01-11_19.34.25 Error: Bacula 2.0.3 (06Mar07): 11-Jan-2008 20:56:49 End time: 11-Jan-2008 20:56:49 </log> I succeeded to restore the fileset in small steps (<= 1 Million files for each step). I noticed that the memory usage was about 1.8 GB per 1 Million files). ====================================================================== ---------------------------------------------------------------------- kern - 01-16-08 21:21 ---------------------------------------------------------------------- Could you attach the Job and FileSet portions of your bacula-dir.conf file (or the whole file) to this bug report. I would like to see what options you are using for the backup and for the FileSet. Also, the Bacula job output for the Windows job would be interesting to see. It sounds like Bacula or the OS is not releasing some resource during the restore ... ---------------------------------------------------------------------- silhouette - 01-17-08 08:35 ---------------------------------------------------------------------- The backup-job + jobdef + schedule is: Job { Name = "windows" Client = windows-fd JobDefs = "WindowsJob" Write Bootstrap = "/var/bacula/windows.bsr" } JobDefs { Name = "WindowsJob" Type = Backup Level = Incremental Client = windows-fd FileSet = "WindowsFileSet" Schedule = "WeeklyCycle" Storage = File Messages = Standard Pool = Default Priority = 1 } Schedule { Name = "WeeklyCycle" Run = Full fri at 18:00 Run = Incremental sun-thu at 20:00 } The job for restore is: Job { Name = "RestoreFilesWindows" Type = Restore Client = windows-fd FileSet = "WindowsFileSet" Storage = File Pool = Default Messages = Standard Where = "D:/Restore" } The Fileset is: # Windows FileSet { Name = "WindowsFileSet" Include { Options { signature = MD5 Exclude = yes IgnoreCase = yes # Exclude user's registry files - they're always in use anyway. WildFile = "[A-Z]:/Dokumente und Einstellungen/*/Lokale Einstellungen/Anwendungsdaten/Microsoft/Windows/usrclass.*" WildFile = "[A-Z]:/Dokumente und Einstellungen/*/ntuser.*" # Exclude directories full of lots and lots of useless little files WildDir = "[A-Z]:/Dokumente und Einstellungen/*/Cookies" WildDir = "[A-Z]:/Dokumente und Einstellungen/*/Zuletzt verwendete Dokumente" WildDir = "[A-Z]:/Dokumente und Einstellungen/*/Lokale Einstellungen/Verlauf" WildDir = "[A-Z]:/Dokumente und Einstellungen/*/Lokale Einstellungen/Temp" WildDir = "[A-Z]:/Dokumente und Einstellungen/*/Lokale Einstellungen/Temporary Internet Files" # These are always open and unable to be backed up WildFile = "[A-Z]:/Dokumente und Einstellungen/All Users/Anwendungsdaten/Microsoft/Network/Downloader/qmgr[01].dat" # Temporary directories & files WildDir = "[A-Z]:/WINNT/Temp" WildDir = "[A-Z]:/temp" WildFile = "*.tmp" WildDir = "[A-Z]:/tmp" WildDir = "[A-Z]:/var/tmp" # Recycle bins WildDir = "[A-Z]:/RECYCLER" # Swap files WildFile = "[A-Z]:/pagefile.sys" # Windows System files WildDir = "[A-Z]:/WINDOWS/NTDS" WildDir = "[A-Z]:/WINDOWS/NTFRS" WildFile = "[A-Z]:/WINDOWS/system32/config/SAM*" WildFile = "[A-Z]:/WINDOWS/system32/config/SECURITY*" WildFile = "[A-Z]:/WINDOWS/system32/config/software*" WildFile = "[A-Z]:/WINDOWS/system32/config/system*" WildFile = "[A-Z]:/WINDOWS/system32/config/default*" # XXXX archive-files WildDir = "D:/some_folder/XXXX/archiv" WildDir = "E:/some_other_folder/subfolder" } File = "C:/" File = "D:/" File = "E:/" File = "F:/" } } About the bacula job output for the windows job - you mean the complete bacula logfile for the restore from the bacula-server? ---------------------------------------------------------------------- kern - 01-17-08 09:46 ---------------------------------------------------------------------- Thanks for the bacula-dir.conf parts. That is exactly what I wanted to see, and it looks perfectly normal. For the rest, what I would like to see is the job report for a single BACKUP preferably the Full, not for the restore. I am interested to see if it was using VSS or not and if there were any unusual messages. This is not going to be so easy for me to duplicate since I don't have any Win32 machines with anywhere near that many files, but I think I can work on it, and presumably with my little 1GB memory machines (compared to your 4GB), it is going to die long before a million files ... ---------------------------------------------------------------------- silhouette - 01-17-08 10:05 ---------------------------------------------------------------------- Yes it's using VSS, here is the message from the backup that was later restored: 10-Jan 23:26 bacula-dir: No prior Full backup Job record found. 10-Jan 23:26 bacula-dir: No prior or suitable Full backup found in catalog. Doing FULL backup. 10-Jan 23:26 bacula-dir: Start Backup JobId 1150, Job=windows.2008-01-10_23.26.50 10-Jan 23:26 bacula-sd: Recycled volume "BackupVolume.bac" on device "FileStorage" (/mnt/data/backup), all previous data lost. 10-Jan 23:27 windows-fd: Generate VSS snapshots. Driver="VSS Win 2003", Drive(s)="CDEF" 11-Jan 02:49 bacula-sd: Job write elapsed time = 03:22:57, Transfer rate = 8.730 M bytes/second 11-Jan 02:50 windows-fd: VSS Writer (BackupComplete): "System Writer", State: 0x1 (VSS_WS_STABLE) 11-Jan 02:50 windows-fd: VSS Writer (BackupComplete): "MSDEWriter", State: 0x1 (VSS_WS_STABLE) 11-Jan 02:50 windows-fd: VSS Writer (BackupComplete): "Registry Writer", State: 0x1 (VSS_WS_STABLE) 11-Jan 02:50 windows-fd: VSS Writer (BackupComplete): "Event Log Writer", State: 0x1 (VSS_WS_STABLE) 11-Jan 02:50 windows-fd: VSS Writer (BackupComplete): "COM+ REGDB Writer", State: 0x1 (VSS_WS_STABLE) 11-Jan 02:50 windows-fd: VSS Writer (BackupComplete): "IIS Metabase Writer", State: 0x1 (VSS_WS_STABLE) 11-Jan 02:50 windows-fd: VSS Writer (BackupComplete): "WMI Writer", State: 0x1 (VSS_WS_STABLE) 11-Jan 02:50 windows-fd: VSS Writer (BackupComplete): "BITS Writer", State: 0x1 (VSS_WS_STABLE) 11-Jan 02:49 bacula-dir: Bacula 2.0.3 (06Mar07): 11-Jan-2008 02:49:55 JobId: 1150 Job: windows.2008-01-10_23.26.50 Backup Level: Full (upgraded from Incremental) Client: "windows-fd" 2.0.3 (06Mar07) Linux,Cross-compile,Win32 FileSet: "WindowsFileSet" 2007-05-08 20:00:00 Pool: "Default" (From Job resource) Storage: "File" (From Job resource) Scheduled time: 10-Jan-2008 23:26:33 Start time: 10-Jan-2008 23:26:56 End time: 11-Jan-2008 02:49:55 Elapsed time: 3 hours 22 mins 59 secs Priority: 1 FD Files Written: 2,220,751 SD Files Written: 2,220,751 FD Bytes Written: 105,876,094,230 (105.8 GB) SD Bytes Written: 106,309,108,720 (106.3 GB) Rate: 8693.3 KB/s Software Compression: None VSS: yes Encryption: no Volume name(s): BackupVolume.bac Volume Session Id: 6 Volume Session Time: 1199955679 Last Volume Bytes: 106,459,135,310 (106.4 GB) Non-fatal FD errors: 0 SD Errors: 0 FD termination status: OK SD termination status: OK Termination: Backup OK 11-Jan 02:49 bacula-dir: Begin pruning Jobs. 11-Jan 02:49 bacula-dir: No Jobs found to prune. 11-Jan 02:49 bacula-dir: Begin pruning Files. 11-Jan 02:49 bacula-dir: No Files found to prune. 11-Jan 02:49 bacula-dir: End auto prune. ---------------------------------------------------------------------- kern - 03-18-08 11:24 ---------------------------------------------------------------------- I have now duplicated this problem on version 2.2.8 with 839,000 file (the number is probably not significant). As far as I can tell, Bacula is not consuming memory during the restore, but the OS is. It is possible that Bacula is making some OS call that allocates memory and Bacula does not know to release the memory (this is an unfortunate design of some of the Win32 API's). I will look over all the Win32 APIs we make, but there are a lot made in the mingw32 code as well. This is not one of those cases that is easily debugged; unless some smart Win32 guru has any ideas. ---------------------------------------------------------------------- kern - 03-18-08 19:21 ---------------------------------------------------------------------- I've posted a BETA version of 2.2.9 that corrects the problem. It *should* work with your 2.0.3 Director, but I cannot guarantee it 100%. This BETA code also has proposed fixes for bugs http://bugs.bacula.org/view.php?id=1041 and http://bugs.bacula.org/view.php?id=1065 included, so please test carefully. You can find the beta code at: http://www.bacula.org/downloads/winbacula-2.2.9.exe and http://www.bacula.org/downloads/winbacula-2.2.9.exe.sig Feedback would be appreciated. ---------------------------------------------------------------------- kern - 03-22-08 10:34 ---------------------------------------------------------------------- Please note that I have posted a new BETA version of the Win32 binaries that I would appreciate you testing. They are in the same location that I previously mentioned. ---------------------------------------------------------------------- kern - 03-25-08 07:05 ---------------------------------------------------------------------- I believe that this problem is now fixed in the 2.2.9 beta version of Win32. This version will probably work with your 2.0.3 director, but before installing it please be sure you can fall back to your 2.0.3 version. Issue History Date Modified Username Field Change ====================================================================== 01-14-08 08:58 silhouette New Issue 01-14-08 08:59 silhouette Issue Monitored: silhouette 01-16-08 21:21 kern Note Added: 0003073 01-16-08 21:21 kern Status new => feedback 01-17-08 08:35 silhouette Note Added: 0003075 01-17-08 09:46 kern Note Added: 0003076 01-17-08 10:05 silhouette Note Added: 0003077 03-18-08 11:24 kern Note Added: 0003217 03-18-08 19:21 kern Note Added: 0003218 03-22-08 10:34 kern Note Added: 0003227 03-25-08 07:05 kern Note Added: 0003233 03-25-08 07:05 kern Status feedback => closed 03-25-08 07:05 kern Resolution open => fixed 03-25-08 07:05 kern Fixed in Version => 2.2.9 ====================================================================== |