From: Craig B. <cr...@at...> - 2002-07-27 05:33:13
|
> I'm a little confused by some of the statistics in the CGI's reports. On > the main "status" screen, one of the items on my system says: > > "Pool hashing gives 3 repeated files with longest chain 1" > > Does this mean that there are only three files which are the same (and > thus disk space saved due to hashing) on my system? Or just that there > are 3 files whose MD5 sum were the same but are actually different > files? Yes, roughly the latter. It means that in 3 cases there are different files that have the same MD5 checksum. This could happen in several ways: - four different files (f1,f2,f3,f4) that all have the same MD5 checksum (ie: 3 repeats), or - a triple of different files (f1,f2,f3) and a pair of different files (g1,g2), (also 3 total repeated files), or - three pairs of files (f1,f2), (g1,g2) and (h1,h2)(also 3 total repeated files). The longest chain tells you the worst-case number of files with MD5 digests. Since you see "longest chain 1" that means it is the last case: three times you have a pair of files that have the same MD5 digest. The more useful number, that BackupPC doesn't report directly, is the overall storage efficiency. Take the sum of the full and incremental backups from the PC Summary page. That's how much data has been backed up. Divide by the Pool size on the main summary page (eg: Pool is XX.XX GB). That tells you the overall benefit of compression and pooling. > Also, I had been running backuppc on only one host, and just recently > added a second host. After that host's first full backup, its "home" > screen reports under "File Size/Count Reuse Summary" that there are > 11,601 MB total, 4,737 MB existing and 6,864 MB new. Does this mean that > 4,737 MB worth of files were identical to those from the other host > already in the pool? (This sounds about right, since I copy a lot of > files to both places). Yes, that's right. In addition, the "4,737 MB existing" also includes any repeated files on the new host too: BackupPC also finds repeated files within a single backup. > Also, has anyone looked into generating graphs for the statistics? I don't know of anyone who is doing this, but it would be a great feature to add. GD::Graph makes it pretty easy. It would also be nice to show extra statistics like server load (eg: average number of BackupPC_dumps running each hour for the last 24 hours). It wouldn't be too hard to make BackupPC keep track of these things. > Or into editing config files from the CGI interface (maybe using webmin)? Paul Lukins <pa...@ar...> has asked about config file editing too. What I suggested as a first step was supporting user editing of some parameters, so users can customize certain host-specific settings (eg: Exclude/Include list, maybe full and incremental save counts and periods, email frequency etc). The master config file could have a list of which subset of parameters the user is actually allowed to edit, so site-wide policy can be customized (eg: some sites might not want users to change the number of fulls to keep for their client to 100). Leon Letto <le...@le...> has expressed interest in helping on a webmin module for BackupPC. > I might be interested in taking either of these on, if I wasn't > duplicating effort. Great! If you want to hook up with Paul or Leon, or do it yourself, it would be excellent! I'm happy to merge the new code into the next release. Craig |
From: <pu...@to...> - 2002-07-27 07:11:17
|
> > Also, has anyone looked into generating graphs for the statistics? > > I don't know of anyone who is doing this, but it would be a > great feature to add. GD::Graph makes it pretty easy. It > would also be nice to show extra statistics like server load > (eg: average number of BackupPC_dumps running each hour for > the last 24 hours). It wouldn't be too hard to make BackupPC > keep track of these things. OK, I'll look into GD::Graph, and study the BackupPC code a little more. Does the CGI interface currently calculate statistics "on the fly" each time you view it? Of course that wouldn't scale too well for images, and it seems like simply generating new images whenever a backup completes would suffice. > Paul Lukins <pa...@ar...> has asked about config file > editing too. What I suggested as a first step was supporting > user editing of some parameters, so users can customize > certain host-specific settings > (eg: Exclude/Include list, maybe full and incremental save > counts and periods, email frequency etc). The master config > file could have a list of which subset of parameters the user > is actually allowed to edit, so site-wide policy can be > customized (eg: some sites might not want users to change the > number of fulls to keep for their client to 100). > > Leon Letto <le...@le...> has expressed interest in > helping on a webmin module for BackupPC. > I haven't even installed webmin, and maybe it would be overkill for something like this. Your first idea seems to make sense. There could be an array of config parameters that you want to allow, either in the site-wide or host-specific config file (something like $Config{AllowUserConfig} = ['BackupFilesOnly', 'BackupFilesExclude', 'FullPeriod', 'IncrPeriod'] The CGI page that allowed config editing could have an array of config keys that it knew how to build edit forms for (in an order that made sense) and for each one that's in the AllowUserConfig array, it could call a method in a new module, say BackupPC::UserConfig, of the same name. That method would return the portion of the web page that would allow editing of that parameter, in a way that made sense (i.e. not just a text box for everything; use radio buttons or dropdowns where possible). The problem, of course, would be writing everything back out to the config files. I could see three solutions: 1. They would have to be in a specific format in order for it to be able to overwrite the correct entries. Probably not very desirable. 2. Simply look for some sort of "tag" in the config file that means "everything below this line is subject to being overwritten". Write such a tag if it doesn't exist, then simply dump any changed values below this line. Since you simply load in the config file as a Perl snippet, those would overwrite any entries in %Config defined earlier. 3. Kinda like 2, but have the editor store settings back to a different file, such as user-config.pl. I'm thinking 2 sounds the best, although that might lead to some confusion if someone didn't notice that some values were redefined later. Paul, have you done any work on this yet? |
From: Paul L. <pa...@zi...> - 2002-07-28 16:42:59
|
> OK, I'll look into GD::Graph, and study the BackupPC code a > little more. Does the CGI interface currently calculate > statistics "on the fly" each time you view it? Of course that > wouldn't scale too well for images, and it seems like simply > generating new images whenever a backup completes would suffice. A quick hint on images, if not generated on the fly - use pragma no-cache on pages that have the images to prevent 'stale' images from being displayed. > > Paul Lukins <pa...@ar...> has asked about config file > > editing too. What I suggested as a first step was supporting > > user editing of some parameters, so users can customize > > certain host-specific settings > > (eg: Exclude/Include list, maybe full and incremental save > > counts and periods, email frequency etc). The master config > > file could have a list of which subset of parameters the user > > is actually allowed to edit, so site-wide policy can be > > customized (eg: some sites might not want users to change the > > number of fulls to keep for their client to 100). <snip> > Paul, have you done any work on this yet? Not yet, but here's what I've been thinking... Something similar to your 'BackupPC::UserConfig' interface that provides both a definition of the user parameters and a method for reading/writing their values. The persistence of the data shouldn't matter, e.g. could be anything from a Data::Dumper-style flat file (that becomes a 'tied' hash) to a RDBMS such as mysql. BackupPC::UserConfig would define what the parameters are, whether the parameter is user-editable, what type of parameter (list, text, boolean, number, ...), a sensible default (whatever is used now), and possibly a flexible interface to the persistence. Something like: sub GetUserConfig { ... my $config_def = { 'BackupFilesExclude' => { user => 1, type => 'list', desc => 'Files to exclude', value => GetUserParam('BackupFilesExclude') || ['outlook.exe', 'virus.exe', ...], }, 'IncrKeepCnt' => { user => 0, type => 'number', desc => 'Number of incrementals to keep', value => 10, }, ... }, return $config_def; } The 'user', 'type', and 'desc' give hints to the cgi display (whether to show up or not, render as a checkbox or selection or textbox, a description). If the 'value' field is user-editable, retrieve it with the 'GetUserParam' method (half of the flexible interface to persistence; the other half would be 'SetUserParam') or load the sensible default. Within 'Set/GetUserParam', you decide how data is stored. Personally, I would implement using a persistent database connection to say, mysql because it's fast and I'm doing this sort of thing elsewhere. It could easily include a tied hash to a flat file, though, as to not impose the requirement of mysql/DBI on BackupPC. I haven't gotten my BackupPC installation going yet, so I can't imagine the gory details of how this would interface to the rest of the system. IMHO, Paul. |
From: Craig B. <cr...@at...> - 2002-07-31 04:51:33
|
> OK, I'll look into GD::Graph, and study the BackupPC code a little more. > Does the CGI interface currently calculate statistics "on the fly" each > time you view it? Of course that wouldn't scale too well for images, and > it seems like simply generating new images whenever a backup completes > would suffice. Currently the CGI interface computes some things on the fly, and other things are computed by the applications and passed onto the CGI interface. Generating the graphs on the fly isn't too hard. Simply do an img src= pointing back to the CGI script with some flags that tell it what image to generate. It can then use GD::Graph to generate the image right there. With a Content-Type of image/png and a Content-Transfer-Encoding of binary you can just print out the graph. This isn't so great performance wise, but with mod_perl it will be fast. Also, GD::Graph3d creates nice 3D pie, bar and line charts. Craig |
From: Toby J. <pu...@to...> - 2002-07-31 15:59:47
|
> Generating the graphs on the fly isn't too hard. I wasn't worried about the difficulty; I was thinking more of performance. I figured it would be better to generate them only after a backup and once each time the daemon wakes up. But now that I think about it again, I can see where it would be useful, if several backups were running, to monitor various statistics in real-time. > img src= pointing back to the CGI script with some flags that > tell it what image to generate. It can then use GD::Graph to > generate the image right there. With a Content-Type of image/png > and a Content-Transfer-Encoding of binary you can just print out > the graph. This isn't so great performance wise, but with > mod_perl it will be fast. Also, GD::Graph3d creates nice 3D pie, > bar and line charts. > > Craig > > ---------------------------------------------- > Filtered by despammed.com. Tracer: XAA055621028091394 > Need cheap U.S. dialup but want to keep despamming your mail? Check out U.S. national dialup > access from $13.95 with Despammed filtration at http://www.myguard.net > > |
From: Toby J. <pu...@to...> - 2002-08-13 16:09:09
|
Craig, The way I'm going about this is to develop a separate CGI script, BackupPC_StatChart, that will solely generate charts. It will look a lot like the Admin script, i.e. it instantiates a new BackupPC::Lib if needed, checks that the logged-in user is what's expected, gets some info from the running server, then dispatches the request for a chart to the appropriate subroutine. I decided to create a separate script because: * it means the user wouldn't have to wait for the images to be generated before seeing the page. * it seems like the Admin script may get bloated if all of the chart code were included there. * the current Admin script is geared towards creating an HTML page with an error message if one occurs. However, if something is expecting an image, we can't generate an HTML response. (Hmmm.. maybe I could use GD::Text to generate an image with the error message!) * I could see where the script may be used in the future for other purposes, such as to send graphs in emails to the host owners. However, it's not too late to just keep everything in the Admin script if you think that's a better route. toby > > Generating the graphs on the fly isn't too hard. > > I wasn't worried about the difficulty; I was thinking more of performance. I > figured it would be better to generate them only after a backup and once > each time the daemon wakes up. But now that I think about it again, I can > see where it would be useful, if several backups were running, to monitor > various statistics in real-time. > > > img src= pointing back to the CGI script with some flags that > > tell it what image to generate. It can then use GD::Graph to > > generate the image right there. With a Content-Type of image/png > > and a Content-Transfer-Encoding of binary you can just print out > > the graph. This isn't so great performance wise, but with > > mod_perl it will be fast. Also, GD::Graph3d creates nice 3D pie, > > bar and line charts. > > > > Craig > > > > ---------------------------------------------- > > Filtered by despammed.com. Tracer: XAA055621028091394 > > Need cheap U.S. dialup but want to keep despamming your mail? Check out > U.S. national dialup > > access from $13.95 with Despammed filtration at http://www.myguard.net > > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: Dice - The leading online job board > for high-tech professionals. Search and apply for tech jobs today! > http://seeker.dice.com/seeker.epl?rel_code=31 > _______________________________________________ > BackupPC-users mailing list > Bac...@li... > https://lists.sourceforge.net/lists/listinfo/backuppc-users > > ---------------------------------------------- > Filtered by despammed.com. Tracer: LAA215701028131616 > Need cheap U.S. dialup but want to keep despamming your mail? Check out U.S. national dialup > access from $13.95 with Despammed filtration at http://www.myguard.net > > |
From: Toby J. <pu...@to...> - 2002-07-31 16:13:07
|
There is an option to set the PASSWD environment variable within the startup script /etc/init.d/backuppc. However, since passing the "--user" argument to "daemon" (the shell function that starts and logs the service) causes BackupPC to be started via su, the environment is reset, even if PASSWD is exported. There is an option "-m" to su that preserves the caller's environment; however this is probably not desirable since the service is usually started by root at system boot. Setting PASSWD in the startup script (rather than config.pl) is desirable to me since I can leave it readable by root only, while config.pl is group-readable for admins. The workaround I have implemented is to create a second command-line argument to BackupPC, -p, which contains the password: daemon --user <username> /path/to/bin/BackupPC -d -p "'$PASSWD'" (There can't be any quotes in the password this way, but I don't know enough about shell scripting to do it better). Then I read the password from $opts{p} and put it into %ENV. Which brings me to a second issue: the global %opts hash you use. I haven't coded for mod_perl much, but my understanding is that global variables, unless explicitly set, are in danger of being carried over to subsequent calls of the script by the same httpd child process. I don't believe 'getopts' actually clears out the %opts hash, so shouldn't that be done explicitly beforehand with a "my %opts = ()"? Lastly, the environment variable $PASSWD seems a little too generic. Possibly better as $BACKUPPC_PASSWD or even $BACKUPPC_SMB_PASSWD? |
From: Toby J. <pu...@to...> - 2002-08-01 02:36:42
|
> Which brings me to a second issue: the global %opts hash you > use. I haven't coded for mod_perl much, but my understanding > is that global variables, unless explicitly set, are in > danger of being carried over to subsequent calls of the > script by the same httpd child process. OK, forget that part.. the BackupPC program doesn't use mod_perl. Duh. toby |
From: Craig B. <cr...@at...> - 2002-08-14 06:51:58
|
> The way I'm going about this is to develop a separate CGI script, > BackupPC_StatChart, that will solely generate charts. It will look a lot > like the Admin script, i.e. it instantiates a new BackupPC::Lib if needed, > checks that the logged-in user is what's expected, gets some info from the > running server, then dispatches the request for a chart to the appropriate > subroutine. > > I decided to create a separate script because: > > * it means the user wouldn't have to wait for the images to be generated > before seeing the page. > * it seems like the Admin script may get bloated if > all of the chart code were included there. > * the current Admin script is geared towards creating an HTML page with an > error message if one occurs. However, if something is expecting an image, we > can't generate an HTML response. (Hmmm.. maybe I could use GD::Text to > generate an image with the error message!) > * I could see where the script may be used in the future for other purposes, > such as to send graphs in emails to the host owners. > > However, it's not too late to just keep everything in the Admin script if > you think that's a better route. I would prefer that the graphs also get generated by BackupPC_Admin, mainly because of mod_perl: to speed things up BackupPC_Admin keep its socket connection to BackupPC open between requests, so with say 8 httpds running, there will be 8 connections to BackupPC. Each additional, independent mod_perl script with a BackupPC::Lib object connected to the server will consume another 8 file descriptors in BackupPC. Also, people who (finally) got their BackupPC_Admin running correctly will need to go through more steps to get a new script running too (eg: getting permissions right, updating httpd.conf). Using the same script shouldn't change whether the user can see the page before the images are complete. Each image will cause BackupPC_Admin to run again in parallel with the page being rendered. The main page will display faster if the <img src> tags include the image size: this allows the main page to be rendered before the image sizes are determined from the images. All your other points are correct: BackupPC_Admin is already very bloated, and making it bigger isn't great. How about most of the graphing code goes in a module BackupPC::StatChart or similar? That way other non-CGI code can easily get to it. And BackupPC_Admin can have just a couple of extra lines that calls BackupPC::StatChart, eg, when action=chart. The code in BackupPC::StatChart can do the rest. You could have some standard arguments when action=chart, like type, xsize, ysize etc. Craig |
From: Toby J. <pu...@to...> - 2002-08-14 16:14:03
|
> Using the same script shouldn't change whether the user can see the page > before the images are complete. Yes, I realized after I sent that message that this would be the case. > The main page will display faster if the <img src> tags include the > image size My intent is to have the image dimensions as a %Config parameter, so this also should not be a problem. > How about most of the graphing code goes in a module > BackupPC::StatChart or similar? I was already thinking of a new module in order to take care of the historic data reading and writing. The user could define the granularity and age of the historic data to keep -- such as jobs per hour/day, disk usage over a month period, etc. That way, every time the main daemon wakes up or does something, it could call that module with current stats. The module would decide whether it needed to save that info, and take care of the actual writing to some sort of persistent historic data file. I think just a flat text file should serve this purpose, since the volume of data won't be that great. toby |
From: Chris E. <ce...@us...> - 2002-08-21 15:55:43
|
Hi all, I just started using BackupPC 1.5.0. Everything is working great except for one little detail..... the filtering. I currently have a "per-PC" config.pl for 3 machines that looks like this: ----------------------------------------------- $Conf{SmbShareName} = ['C$', 'D$']; $Conf{BackupFilesExclude} = [ '\WINNT\TEMP\*', '\TEMP\*', '*\Temporary Internet Files\*', '*\Local Settings\Temp\*', '*.mp3', '*.dmp' ]; ----------------------------------------------- However, all of these files and directories get backed up anyway. I've tried several variations, searched through the docs, searched the mailing list archives and even fiddled with the code a little (everything is now back the way it was), but I can't seem to get any filtering to work. Am I missing something? Thanks, - Chris |
From: Timothy D. <dem...@ar...> - 2002-07-30 17:44:17
|
RRDTool <http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/> is very fast and flexible graphing tool. It will keep you from reinventing the wheel, and keep BackupPC lightweight. Tim --On Friday, July 26, 2002 10:33 PM -0700 Craig Barratt <cr...@at...> wrote: > I don't know of anyone who is doing this, but it would be a great > feature to add. GD::Graph makes it pretty easy. It would also be > nice to show extra statistics like server load (eg: average number > of BackupPC_dumps running each hour for the last 24 hours). It > wouldn't be too hard to make BackupPC keep track of these things. -- Timothy Demarest ArrayComm, Inc. dem...@ar... 2480 North 1st Street, Suite 200 http://www.arraycomm.com San Jose, CA 95131 |
From: Toby J. <pu...@to...> - 2002-07-30 20:12:14
|
Thanks for the tip, I may look into it more but it looks like it can only do (x,y) graphs. I think other types of graphs would make more sense for some uses, such as a pie chart to show backup filesystem disk usage. > RRDTool <http://people.ee.ethz.ch/~oetiker/webtools/rrdtool/> is very fast > and flexible graphing tool. It will keep you from reinventing the wheel, > and keep BackupPC lightweight. > > Tim > > --On Friday, July 26, 2002 10:33 PM -0700 Craig Barratt <cr...@at...> > wrote: > > > I don't know of anyone who is doing this, but it would be a great > > feature to add. GD::Graph makes it pretty easy. It would also be > > nice to show extra statistics like server load (eg: average number > > of BackupPC_dumps running each hour for the last 24 hours). It > > wouldn't be too hard to make BackupPC keep track of these things. > |