mbackup-devel Mailing List for Midnight Backup (Page 2)
Status: Alpha
Brought to you by:
jo2y
You can subscribe to this list here.
| 2000 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(16) |
Jun
(6) |
Jul
(5) |
Aug
(19) |
Sep
(1) |
Oct
(1) |
Nov
(2) |
Dec
(1) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2001 |
Jan
(3) |
Feb
|
Mar
|
Apr
(7) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
(2) |
| 2002 |
Jan
(4) |
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
|
From: James O'K. <jo...@mi...> - 2000-10-15 03:37:20
|
I've spent dozens of hours the past work working on some threaded code and I'm happy with what I have now. I'll shortly be committing it to cvs as soon as I remember how to commit to the right branches. For a little longer I'm going to keep threads as a seperate branch until I'm happy with the results. In some quick testing, a threaded version took only 80% of the time that the non-threaded version took and I think this number can be improved further. A key thing will making things so that they are self-tuning. I talked to someone who was willing to try a BeOS port of the code. I don't remember if he subscribed to the list or not. I'd also like to find people to do ports to freebsd, openbsd, and netbsd. Ask your friends. I was talking to a friend of mine about the project and he made the comment that it sounds like we were building a framework for building backup applications. I like the sound of that and it's a closer description of what I have planned in my head for mbackup, so I'm going to change some of the sourceforge descriptions. I'm going to post a help wanted on sourceforge and try and find someone to do the xml config file stuff. I'd really like to get some movement happening again. I can't think of anything else off-hand. -james |
|
From: James O'K. <jo...@mi...> - 2000-09-22 19:58:16
|
I just thought I'd let people know that I didn't die or disappear. I've just had several deadlines at work that have been keeping me too busy. Hopefully, I'll get some time in about 2 weeks to work on new code. In the mean time, if anyone has any code they want me to merge into CVS send me the patches and I'll look at them. thanks -james |
|
From: John H. <Jo...@mw...> - 2000-08-23 05:23:35
|
{{ Nick you know all about XML, is this reasonable??}}
----- Original Message -----
From: James O'Kane <jo...@mi...>
To: <mba...@li...>
Sent: Wednesday, 23 August 2000 09:49
Subject: Re: [mbackup-devel] xml as a tape header: the cons
> On Wed, 23 Aug 2000, John Huttley wrote:
> > Implementation wise, the client would create the XML header with the data and
> > pass it,
> > locally or over a network , to the server process.
>
> I'm not sure I understand this. Do you plan to chance the file_tag struct
> into a pointer to an xml formatted string? If that's they plan, I'm
> curious about the overhead of making each module understand how to read
> and add to the xml format. Right now, to change something in the file_tag
> struct you can just do a file_tag->current_size = 100. If we change that
> to xml, won't we have to file_tag->xml_set_current_size(100)? I'm not sure
> I see how that is better.
> On the other hand, if we're just talking about having the tape-writing
> module create the xml header just as we write to tape then we're on the
> same page.
Not quite. the existing file_tag only makes sense in the context of a unix
system.
If the intent is to backup _data_ as against _files_ then a general purpose
interface is required.
However this does not mean that we need to pass around an XML formatted string
and have every filter
parse it and re write it. Your example shows it as worse case, because you have
picked an attribute that is
already supported in the file_tag. A simple struct dereference is hard to beat!
There is certainly going to be overhead in having each filter understand DOM, but
it is,
I think, survivable. It does not increase with the size of the file.. unlike
compression, for example.
Suppose we were backing up a netware 3 file system.
Our header needs to look like this..
<stream_code>NETWARE 3</stream_code> <<+++ Actually I suppose we
specify the DTD
<stream_id>3</stream_id>
<object_id>489894</objectid>
## Quick ref to this file, makes subsequent headers much shorter
<length>1234567</length>
<owner>someone</owner>
<cdate>2000-03-01 23:45:01</cdate> ## ISO standard dates.
<mdate>2000-08-21 20:45:01</mdate>
<trustee type=user>someuser1<rights>RWCM</rights></trustee> ##
trustees are unlimited in number
<trustee type=user>someuser2<rights>SRWCEMFA</rights></trustee>
<trustee type=group>somegroup1<rights>RF</rights></trustee>
<trustee type=group>admingroup1<rights>SRWCEMFA</rights></trustee>
<type>directory<IRmask>SRF</IRmask>
<attribute>DPR</attribute>
<server>MYSERVER</server>
<volume>VOL1</volume>
<path>home\john\documents</path>
<namespace>long
<filename>mydatadirectory</filename>
<OS2EA encoding=BASE64> ## if resource fork size is a
problem, we can write them out as data in their own block
kilobytes of base 64 data
</OS2EA>
</namespace>
<namespace>mac
<filename>mydatadirectory</filename>
<resource_fork encoding=BASE64>
even more kilobytes of base 64 data
</resource_fork>
</namespace>
<namespace>DOS
<filename>MYDAT~1</filename>
</namespace>
<block>
<sequence>1</sequence>
<length>524288</length> # we are writing out in 512kb
blocks..
<offset>0</offset>
</block>
============
Then the next block can just have
<stream_code>NETWARE 3</stream_code
<stream_id>3</stream_id>
<object_id>489894</objectid>
<block>
<sequence>2</sequence>
<length>524288</length>
<offset>524288</offset>
</block>
=============
When this hits the filters, they may act on the data to change the size of the
blocks (compression etc)
and add additional entities into the header.
As you can see, there isn't much in there that matches the existing file_tag
structure.
Netware 4 and higher is even worse.
XML/DOM headers are better in that we can do more with them. Not as simple or as
fast though.
> I was also planning on using XML to talk between modules and the GUI such
> as libglade and possibly using XML for talking between client and server,
> but I'm not sure if this would be the same XML format as the data on tape.
Good old glade! I never did work it out. I'm sure though, the the XML will be
quite different.
Regards
John
|
|
From: James O'K. <jo...@mi...> - 2000-08-22 21:49:38
|
On Wed, 23 Aug 2000, John Huttley wrote: > Implementation wise, the client would create the XML header with the data and > pass it, > locally or over a network , to the server process. I'm not sure I understand this. Do you plan to chance the file_tag struct into a pointer to an xml formatted string? If that's they plan, I'm curious about the overhead of making each module understand how to read and add to the xml format. Right now, to change something in the file_tag struct you can just do a file_tag->current_size = 100. If we change that to xml, won't we have to file_tag->xml_set_current_size(100)? I'm not sure I see how that is better. On the other hand, if we're just talking about having the tape-writing module create the xml header just as we write to tape then we're on the same page. I was also planning on using XML to talk between modules and the GUI such as libglade and possibly using XML for talking between client and server, but I'm not sure if this would be the same XML format as the data on tape. -james |
|
From: John H. <Jo...@mw...> - 2000-08-22 21:29:01
|
> The only thing I have against using xml as a tape format is the extra tape > space that it would use. I wrote this quick example: > > <?xml version="1.0" ?> > <mbackup:header> > <mbackup:date>966968837</mbackup:date> > <mbackup:filename>/usr/local/bin/foobar</mbackup:filename> > <mbackup:hostname>uhura.midnightlinux.com</mbackup:hostname> > </mbackup:header> > > I'm not sure if that's true valid xml, but I think it's close enough for > this example. I also have this more traditional header format: > > 96696883724uhura.midnightlinux.com21usr/local/bin/foobar > > 9 digit time, 2 digit next field length, hostname, 2 digit next field > length, filename. > > Both give the same information. The xml one is 215 bytes and the other one > is 58 bytes. > > I also gathered some data that is typical of the data we use at work: > > [root@cadillac round5]# du -a|wc -l > 306050 > [root@cadillac round5]# du -s > 4070060 . > > 306,050 files using about 4 gigs of space. > With the xml header, there is 65M of header data to label 4gig of data. > With the other header, there is 17M of header data to label 4gig of data. > > Given this is a simple header and doesn't have all the data a full header > might have, but I feel that the xml header will grow faster than the other > header, even if we don't use the mbackup namespace part. > > Just something to consider... > > -james I've just checked out our web/ftp server. for /etc, which is classic for small files, 6390k/1333 files = 4.8K on average. over the whole system 5.5gb/ 106639 files = 52.2kb I was thinking of a much more verbose and detailed header so lets say 1k per file. nah lets say 2k. So for a full backup this is going to take 213Mb. But thats trivial, not worth worrying over! For a DDS-3 thats an extra 3mins of backup time. Of course for a drive supporting compression, its even smaller. The advantages of expandability and flexibility remain huge. Implementation wise, the client would create the XML header with the data and pass it, locally or over a network , to the server process. When it comes into the server its parsed into a DOM object. (I dont see any reason why we need a validating parser though). Then a pointer to the DOM gets passed along with the data to the filters. One possibility might be a compression filter that examines to DOM to see if the data is in a compressed native format (With Netware 4+ for example, you can open a compressed file and keep it compressed). It might then compress the data and add a new entity to the DOM <mbackup:compression>gzip</mbackup:compression> and at the point where it goes out to the device it gets converted to an XML document. Regards |
|
From: John H. <Jo...@mw...> - 2000-08-22 21:28:59
|
> The only thing I have against using xml as a tape format is the extra tape > space that it would use. I wrote this quick example: > > <?xml version="1.0" ?> > <mbackup:header> > <mbackup:date>966968837</mbackup:date> > <mbackup:filename>/usr/local/bin/foobar</mbackup:filename> > <mbackup:hostname>uhura.midnightlinux.com</mbackup:hostname> > </mbackup:header> > > I'm not sure if that's true valid xml, but I think it's close enough for > this example. I also have this more traditional header format: > > 96696883724uhura.midnightlinux.com21usr/local/bin/foobar > > 9 digit time, 2 digit next field length, hostname, 2 digit next field > length, filename. > > Both give the same information. The xml one is 215 bytes and the other one > is 58 bytes. > > I also gathered some data that is typical of the data we use at work: > > [root@cadillac round5]# du -a|wc -l > 306050 > [root@cadillac round5]# du -s > 4070060 . > > 306,050 files using about 4 gigs of space. > With the xml header, there is 65M of header data to label 4gig of data. > With the other header, there is 17M of header data to label 4gig of data. > > Given this is a simple header and doesn't have all the data a full header > might have, but I feel that the xml header will grow faster than the other > header, even if we don't use the mbackup namespace part. > > Just something to consider... > > -james I've just checked out our web/ftp server. for /etc, which is classic for small files, 6390k/1333 files = 4.8K on average. over the whole system 5.5gb/ 106639 files = 52.2kb I was thinking of a much more verbose and detailed header so lets say 1k per file. nah lets say 2k. So for a full backup this is going to take 213Mb. But thats trivial, not worth worrying over! For a DDS-3 thats an extra 3mins of backup time. Of course for a drive supporting compression, its even smaller. The advantages of expandability and flexibility remain huge. Implementation wise, the client would create the XML header with the data and pass it, locally or over a network , to the server process. When it comes into the server its parsed into a DOM object. (I dont see any reason why we need a validating parser though). Then a pointer to the DOM gets passed along with the data to the filters. One possibility might be a compression filter that examines to DOM to see if the data is in a compressed native format (With Netware 4+ for example, you can open a compressed file and keep it compressed). It might then compress the data and add a new entity to the DOM <mbackup:compression>gzip</mbackup:compression> and at the point where it goes out to the device it gets converted to an XML document. Regards |
|
From: James O'K. <jo...@mi...> - 2000-08-22 21:00:57
|
It's listed on the webpage, but I don't think we ever mentioned it here. There is an IRC channel that I hang out on waiting for people to stop by. Connect to any of the open projects servers such as irc.linux.com and connect to #mbackup. I'll probably be there, but idle, so give me a chance to check my screen before you disappear. -james |
|
From: John H. <Jo...@mw...> - 2000-08-22 20:54:26
|
Right! I'll check out the tar source to figure out how to do it.. Regards > I think there is a way to find those holes because gnu tar can be smart > and encode those holes in such a way that they don't use up tape. Try > this: > > [jokane@lal jokane]$ dd if=/dev/zero of=foo count=1 bs=1k seek=1000 > 1+0 records in > 1+0 records out > [jokane@lal jokane]$ ls -la foo > -rw-r--r-- 1 jokane 505 1025024 Aug 22 15:09 foo > [jokane@lal jokane]$ du foo > 8 foo > [jokane@lal jokane]$ tar cf foo.tar --sparse foo > [jokane@lal jokane]$ ls -la foo.tar > -rw-r--r-- 1 jokane 505 10240 Aug 22 15:10 foo.tar > [jokane@lal jokane]$ du foo.tar > 12 foo.tar > > cool right? :) > > -james > > > |
|
From: James O'K. <jo...@mi...> - 2000-08-22 19:13:20
|
I think there is a way to find those holes because gnu tar can be smart and encode those holes in such a way that they don't use up tape. Try this: [jokane@lal jokane]$ dd if=/dev/zero of=foo count=1 bs=1k seek=1000 1+0 records in 1+0 records out [jokane@lal jokane]$ ls -la foo -rw-r--r-- 1 jokane 505 1025024 Aug 22 15:09 foo [jokane@lal jokane]$ du foo 8 foo [jokane@lal jokane]$ tar cf foo.tar --sparse foo [jokane@lal jokane]$ ls -la foo.tar -rw-r--r-- 1 jokane 505 10240 Aug 22 15:10 foo.tar [jokane@lal jokane]$ du foo.tar 12 foo.tar cool right? :) -james |
|
From: James O'K. <jo...@mi...> - 2000-08-22 19:08:54
|
The only thing I have against using xml as a tape format is the extra tape space that it would use. I wrote this quick example: <?xml version="1.0" ?> <mbackup:header> <mbackup:date>966968837</mbackup:date> <mbackup:filename>/usr/local/bin/foobar</mbackup:filename> <mbackup:hostname>uhura.midnightlinux.com</mbackup:hostname> </mbackup:header> I'm not sure if that's true valid xml, but I think it's close enough for this example. I also have this more traditional header format: 96696883724uhura.midnightlinux.com21usr/local/bin/foobar 9 digit time, 2 digit next field length, hostname, 2 digit next field length, filename. Both give the same information. The xml one is 215 bytes and the other one is 58 bytes. I also gathered some data that is typical of the data we use at work: [root@cadillac round5]# du -a|wc -l 306050 [root@cadillac round5]# du -s 4070060 . 306,050 files using about 4 gigs of space. With the xml header, there is 65M of header data to label 4gig of data. With the other header, there is 17M of header data to label 4gig of data. Given this is a simple header and doesn't have all the data a full header might have, but I feel that the xml header will grow faster than the other header, even if we don't use the mbackup namespace part. Just something to consider... -james |
|
From: John H. <Jo...@mw...> - 2000-08-22 03:35:08
|
Hello all. I've been thinking about unix files and the fact that they can have a logical length which is more than the actual allocation. There does not seem to be any way of statting the size and position of such holes. Can anyone confirm or refute this? Regards John |
|
From: James O'K. <jo...@mi...> - 2000-08-14 21:37:25
|
I should have mentioned this earlier, but I'm at LinuxWorld Conference this week. If anyone else is in the area (San Jose, CA) I'd be happy to meet for lunch and brainstorm ideas or explain things. Email me privately if you're interested. -james |
|
From: Known H. N. R. <ni...@in...> - 2000-08-10 03:01:47
|
>
>
>> >Does anyone have experience with xml and unicode?
>>
>> I have what I would consider a good working knowledge of XML.
>>
>
>
>Excellent! you can render an informed opinion on the suitability of XML as a
>header format?!
xml would make a good heaxcder format or metadata format. its
for formalized, tagged data; I've been sort of selling it
for this use in mbackup since the beginning.
as always,
nick
ni...@gr... * http://www.fargus.net/nick
Developer - Systems Engineer - Mad System Guru - MOO Sales
Keep on GRAWK'n!
|
|
From: John H. <Jo...@mw...> - 2000-08-10 01:30:19
|
> >Does anyone have experience with xml and unicode? > > I have what I would consider a good working knowledge of XML. > Excellent! you can render an informed opinion on the suitability of XML as a header format?! Regards |
|
From: Known H. N. R. <ni...@in...> - 2000-08-10 01:25:15
|
>Does anyone have experience with xml and unicode?
I have what I would consider a good working knowledge of XML.
as always,
nick
ni...@gr... * http://www.fargus.net/nick
Developer - Systems Engineer - Mad System Guru - MOO Sales
Keep on GRAWK'n!
|
|
From: John H. <Jo...@mw...> - 2000-08-10 01:00:57
|
James and I have been having a interesting thread and he has asked me to summarise for public scrutiny. A Nasa techs' musing on the limitations of backup systems. http://www.computer.org/conferences/meta96/kobler/paper.html The ieee's musings. typical committee! http://www.ieee-sssc.org/projects.html A place that has links to tape specs. Note sidf and mtf http://www.tapeguy.com parallelism in tape systems (a must to implement) http://www.backupcentral.com/parallel.html ================================================ On support for AIO for file access.. http://oss.sgi.com/projects/kaio/ >From reading that, I get the impression that kaio and glibc aio are function call compatible, just a change of #include <aio.h> to <linux/aio.h> and a recompile is needed. So here's my plan. I'll most likely start working aio into the disk writing module as I rewrite it anyway. However, I'll probably use the glibc aio. My reasoning is, I want to get a very high acceptance rate with as many people as I can. I want the a proportional mind share in backups as apache has in webservers. :) If I tell people that they must recompile their kernel to use a backup program it will be harder to get acceptance. That's why it's fortunate that the two implimentations are function call compatiable. When I get around to creating a configure script, we can add the option of compiling against kaio. This seems to be a win for everyone. #ifdef USE_KAIO #include <linux/aio.h> #else #include <aio.h> #endif /* USE_KAIO */ ============================================== Extended information using tar format. If you create a tar archive which has a file repeated, each occurance will be restored. The effect is that the last file overwrites the earlier ones. Therefore, its possible to put the file extended stats into a file and follow it with the data file, (same name). tar will restore the data on the last file. Thus it is backwards compatible. Smart programs (mbackup) spot the duplication and interpret the leading file(s) as containing extended information. ================================================ and an idea from me about tape formats The situation of having multiple tape drives and doing raid /0/1/5 across them or multiple independent streams to independent tape drives or a combination of the above is more likely to be relevant. However its all the same really, just different types of metadata to be recorded. HSM too, i guess. If we can be sufficiently flexible and expandable, there is no problem. Even proposals like that straw man from nasa have glaring errors. Did you see his proposed date format. a. In USA format not ISO. b. fixed length field size. c. (giggle) not y2k compliant. and this was 1996! So, a modest proposal... all the stat information is an xml document. This meets the requirements of standardisation, flexibility and expandabilty. elements like file length are just a string of ascii digits. dates are in ISO standard text form i.e. yyyy-mm-dd hh:mm:ss.ssss........ file names can also be in unicode. we can store extra binary information as a BASE64 encoded mime type. This would be OS/2 EA's, MAC resource forks etc. At the start of the media we can specify extra elements to define device level interleaving or anything else. Then we can lay down the tape in two ways. Our own format, using the xml header directly, or tar, using that clever 2 file system. Within limits this could give backward compatibility, limiting the need to perform format conversion. The objections are too hard, too slow, too verbose(big,inefficient) Too hard. A lot of work goes into xml these days. We just steal it. use the libxml and libunicode libraries. Its just a learning curve. Too slow. I'd be surprised. Its CPU stuff and CPU's are much faster than IO and are getting faster at an accelerating rate. it can't match something like software data compression for slowness. Too verbose. Compared to what? in most cases a k or so. worst case with OS/2 EA's, 100k or so. 40% expansion of base64 over binary. Just not worth worrying about. The advantage is unlimited flexibility and its all printable! Regards John (Hisdad) |
|
From: John H. <Jo...@mw...> - 2000-08-10 00:59:43
|
Does anyone have experience with xml and unicode? aio? Regards John (hisdad) |
|
From: James O'K. <jo...@mi...> - 2000-08-08 05:05:02
|
Someone reminded me that I forgot to include the table layout in the tar ball. Try this: # # Table structure for table 'filelocation' # CREATE TABLE filelocation ( filelocation_id int(32) NOT NULL auto_increment, filename varchar(255), checksum varchar(32), location varchar(255), PRIMARY KEY (filelocation_id) ); # # Table structure for table 'metadata' # CREATE TABLE metadata ( metadata_id int(32) NOT NULL auto_increment, checksum varchar(32), filename varchar(255), filtered_name varchar(255), partnum int(16), lastpart int(16), current_size int(16), filelocation_id int(32), hostname varchar(255), st_dev int(64), st_ino int(32), st_mode int(32), st_nlink int(32), st_uid int(16), st_gid int(16), st_rdev int(64), st_size int(32), st_blksize int(32), st_blocks int(32), st_atime int(32), st_mtime int(32), st_ctime int(32), PRIMARY KEY (metadata_id), KEY metadata_index (checksum,filename) ); |
|
From: James O'K. <jo...@mi...> - 2000-08-03 00:10:13
|
I forgot, also mention things your interested in regarding mbackup. -james |
|
From: James O'K. <jo...@mi...> - 2000-08-03 00:09:06
|
If people want to send me their userid's on sourceforge, I'll add you to the developer's list. (then I can start assigning bugs to you ;) -james |
|
From: James O'K. <jo...@mi...> - 2000-08-01 22:32:24
|
If you haven't gotten notice already, I've put 0.5 up on the server. This release includes several changes. (diff -uNr between 0.4 and 0.5 was over 2500 lines.) I'll try and summarize them here. I don't remember if I posted this, but I did a restore under a very controlled setup. So real restores are close. * Added md5 checksums * Rewrote plain_reader and plain_writer so they can share code * Changed p_r and p_w to generate a list of files upfront instead of as needed * Changed p_w so that if dynamically generates a output pathname $outpath/$host/$time/foo * Added a concept of a controller module that when asks tell you if we should backup or restore * Completly broke network_reader and network_writer :) * wrote unfilter for bzip2_filter Current TODO list (If anyone is interested speak up and I'll give more details) * fix network_reader/network_writer * add pthreads. I'm most interested in working on this next. * write unfilter for tar_filter * Clean-up the configuration process (Still thinking about what this means) * add exclude paths to plain_reader * abstract the writting code in plain_reader and plain_writer so they can share common code * get logging_module into a robust working state. * more as I think of them.... Some people might be interested in this: http://www.cosource.com/cgi-bin/cos.pl/wish/info/348 -james |
|
From: James O'K. <jo...@mi...> - 2000-07-17 16:12:56
|
I've just checked in some code that people might be interested in. I added a control module which makes decisions on if we should do a backup or a restore. In the client version, it's fairly basic, but it could be expanded to allow it to be a scheduler, or run as a client deamon, and the server could push restores to it. Very cool if done right. I also rewrote plain_reader.c. It now creates a linked list of all the files to be backed up as a first step. This list is better formatted for what the tar module is looking for, but I think it still needs some more tweeks. With this change, I'm going to share code between plain_reader.c and plain_writer.c since they are counterparts to each other. This will almost finish the work to allow restores. I'm planning another release and announcement once that is done, so if you have any code or patches send them to me soon. -james |
|
From: James O'K. <jo...@mi...> - 2000-07-11 06:07:27
|
I've starting to become away of a design flaw in the current setup and I need some brainstorming help for a fix. Things have worked fine until now when everything has been flowing from disk through filters and out to disk. Now that I'm trying to integrate restore functionality that simplicity quickly falls apart. For example, the plain_reader.c and plain_writer.c are basically opposites, so I've started to merge them into one filter. The problem comes in when I need to call read_file(), is this a read for backup or for restore? If it's a backup, then currently we read from the config value startpath. It gets harder when we do a restore because there isn't (cuurently) a config value to say where things are stored. All of this could be solved with a rewrite to these modules. The other problem is when there is a network involved. I'm hoping to have just one server program that does restores and backups. Under the current plan, the network module on the server would have to communicate to the main program that we are doing a restore because that calls for a whole different set and order of filters. A third problem is communication between and random module and the indexing module. Since this could be on a remote server, we would prefer to not have every module making their own connection to the server to ask questions. Anyone have any ideas? I've been thinking about this for a few days.. -james |
|
From: James O'K. <jo...@mi...> - 2000-07-06 17:42:27
|
On Thu, 6 Jul 2000 ra...@Te... wrote: > Let's think of users home directories in a larger networked environment. > Home dirs may be placed on different hosts and on different disks within > a single host. As requirements change $HOMEs may be moved between disks > or hosts. A feature I was planning was the uses of md5 checksums. That get indexed along with other metadata. My idea was to use this for backing up things like /usr which would be largely the same across machines. This idea could be extended to your problem, where if the md5 sums are the same, then we don't backup, we just make a reference to the previous backed up stuff. Some of the details are still fuzzy in my mind, but I hope to clear them up as things move closer to that point. I'm open to hear some other ideas that might work? > Another point that directly affects the above mentioned idea (in case of > a file system backup -- opposed to a data base backup) is: > > What is the file? Is it the data behind the inode or the file name? If > it's the inode, what should be done in case the backup set has been > move to a new location (or restored on a new disk)? And should the data > be backuped when the file name or permissions/ownership have changed? I've been treating it as a file, but I've been keeping the inode with it for later use. One way to index this would be to have two table like so: Metadata data_location -------- ------------- metadatarecordnum datarecordnum filename tapenum stat() info position_on_tape data_location_pointer checksum This is just off the top of my head, but if you have two files, that happen to be hardlinks to each other, in the metadata table, you would have the parts that were different, the filename, the info from a stat() call and a reference to the data_location table. The second file that shares that inode would have different information in the metadata table, but a pointer to the same data_location. One tricky part however is getting them to be hardlinks on the restore. Perhaps, we could also add a previous_version pointer that refered to the version that came before. With some semi complex SQL queries we could follow the chain of pointers through all the version of the file. It's not very simple, but the possibility to do it is there. Volunteers to help code are welcome. :) -james |
|
From: <mar...@ce...> - 2000-07-06 13:57:27
|
Hi, I saw your project on freshmeat.net and from what I can read about it it looks very promising. I also noted you are currently just developing it for the Linux platform and am writing to ask if there is any interest in porting it to OpenBSD? In order to help the Opensource community I have built a "porting box" and I am now willing to hand out accounts to use it completely free of charge. If this sounds interesting please take a look at http://ports.centus.com and see for yourself. It may not be the best looking site but looks is not my aim. Also dont let the legal mumbojumbo intimidate you or your co-developers, but do read it, since it's main purpose is to stop abuse. cheers. Martin |