From: Dragan K. <dk...@ly...> - 2003-09-14 21:14:47
|
|>> However, many administrators choose to have |>> inferior filesystems, such as ext2, only |>> because they won't consider using another |>> backup utility |> |> ext3 is supported too, and can be compared in |> terms of speed and reliability to reiserfs | | I'm no filesystem expert, but I think ext2/3 | is considered by many to be inferior to ReiserFS. | So I avoided mentioning/implying any statement | about filesystem efficiency by writing this instead: | | However, many administrators only choose among | filesystems supported by dump, because they won't | consider using another backup utility. It is a valid point, Antonios. The old ext2/3 is in many respects inferior to reiserfs. Just about the only thing in favour of ext[23] is the existence of dump/restore, faithfully maintained by Mr. Pop. As you already emphasized, it is the only backup utility which doesn't fsck the inodes one way or another. Significant as it may be for you, and no doubt it is a very fine distinction, I would stress a couple of others more than you did. dump also doesn't require the fs to be mounted in order to take stock of it. I liked the concept so much that I wrote a better dump for 2 fs's, hfs (something like ext2) and VxFS (the Veritas's extents-based fs), because it allowed me to offload data even from corrupted umountable volumes, before fsck did irreparable damage while pretending to repair it. I've been in a dilemma which fs to adopt for my production samba system a year or so ago and made quite a number of tests to objectify the decision. The performance advantage of reiserfs 3.6 were so overwhelming that I opted against ext3. I'll explain my tests and resulting motivation in more detail. I've striped 4 IDE disks of 120 GB each using LVM. The ext3 showed 454,035,480 kBs formatted, but reiserfs showed 468,699,164, almost 15 GB more. The respective mkfs times were 7 minutes for ext3, during which time the disk-meter showed furious writing activity at 45-50 MB/s, but mkreiserfs took less than a minute, at nowhere near the level of disk activity. Storing the same batch of 26,477 files in 4,105 dirs, took 26,643,224 kBs on ext3, on reiserfs it was 16 MB less, 26,627,729 kBs. Among the files there were 12 big tarballs ~ 22.6 GB, containing partial backups of a dozen PCs, altogether 209,902 files in 14,310 directories, average file size 110 kBs, but at first they were just big files, about 2 GB each. The rest of the files were a typical mix of profile and home shares, 156 kB big on average. Using an LTO-2 tape drive, dump sped at 33.5 MB/s, while tar on reiserfs did a bit better, 39.5 MB/s. It must be noted that prior to writing to the tape, dump needed close to 5 more minutes to visit all the inodes and bitmaps, during which time it registered constantly between 105 and 116 MB/s on the disk-meter. These 5 minutes are not counted fully for speed computation, because it remains the same even if the disk is full and not only 6 % full, so I added only 6 % of that time to adjust in a fair way the figure output by dump. Since obviously the tape drive is a limiting factor, I redid the measurements with /dev/null as the target for dump and for tar a filter pipe, which only reads the data and counts the bytes. I learned from Stellian that tar cheats when /dev/null is the target, therefore I wrote a filter programm to keep it from cheating. Dumping to /dev/null went faster at 44.7 MB/s but nowhere near as fast as tar/reiserfs: 130.1 MB/s, through the said read-and-discard filter. There are more advantages of reiserfs over ext3 apart from backup speed with dump and tar respectively. I unpacked those big tarballs on both file systems and timed the action - ext3 went at 20.3 MB/s, but reiserfs did it considerably faster at 32.2 MB/s. The deletion of those tarballs took ext3 27.7 s, reiserfs made it in 15.1 s. In addition to that, the volume occupancy increased on ext3 to 27,125,944 kBs for ext3, but 68.6 MBs less for reiser - 27,041,864 kBs With all those small files around both utilities were slower but differently slower, dump could now do it at 31.9 MB/s, but tar on reiserfs still got decent 39.2 MB/s. Outputting the same data to /dev/null dump was surprisingly much faster than before. It whooshed at 57 MB/s, whereas reiserfs's agility was reduced to 92 MB/s, again through that byte-eating filter. Deleting all those files took ext3 full 291.8 s whereas the same action on reiserfs ended in 27.4 s. Quite a few other operations are an order of magnitude faster on reiserfs as compared with ext3 especially in the case of pathologically big directories with hundreds or millions of small files (which was not the case here, timings are in seconds): ext3 reiser operation ----- ------ ------------------------------- 200.0 9.3 du -sk . 203.8 13.9 find . | wc 223.0 38.2 getfacl -R . > /dev/null 304.0 20.6 quotacheck -vug /dev/vg01/lvol1 In order to abstract the cacheing bias I made these timings always after a fresh mount. Once performed each of these operations is significantly faster the second time around, but even then the ratio remains - ext3 gets these things done (except quotacheck, which takes considerably longer) in about 10 seconds the second time, but reiser does it in 1 second flat. By many of these operations the disk-meter shows very high disk usage for ext3 (over 100 MB/s) but next to insignificant disk activity for reiserfs (up to 4-5 MB/s in bursts). Let's not forget the most important feature of every backup - the ability to restore data fast but faithfully. The read-after-write head on most modern streamers is fine and dandy but it does not guarantee that the tape won't be scratched or creased by the rail guides just an inch down the line, as it once happened with a vertically mounted DDS3. You guessed it, the only time something like that happened and I didn't do the test-restore was when I actually needed it. I got a report from the manufacturer a few weeks later as to what happened but I had to restore the system immediately. I excised a chunk of some 280 MBs from the tape by splicing in a stream of null bytes. Fortunately what was missing was part of an Oracle database, whose data were also automatically offloaded with exp and could be easily reconstructed, because the .dmp file was sanely elsewhere on the tape. Only regular test restores can give you pointers as to the sanity of the whole procedure. With the price of disks below $1/GB, there is no excuse for not doing it on a regular basis. So how are dump and tar comparing on restore? They're both safe, thanks for asking. The dump with big tarballs restored at 32.1 MB/s. Sorry I missed measuring the restore speed with small files. Restoring tar backup went at 37.6 MB/s with big files and 37.8 MB/s with many small ones. As conclusion, I think it is time to move on. The good old ext[23] is pretty stable, which means it won't develop in any significant way. On the other hand reiserfs is evolving in very many interesting ways, promissing even better performance and more features than any other fs in use. For example, there will be a reiser-tapefs plug-in, which will reorganize the metadata in such a way that a tape may be mounted as a read-only file system in order to be able to use any ordinary file manager for restore. It looks very much like dump, in that it pulls all the directories ahead of files, but it only backs up used space, unlike dump which also ritually dumps used/free inode bitmaps twice, full inode blocks and 1k header blocks, most of it empty, for each directory, each file and every half an MB chunk of big files. The interesting part is, that this plug-in will act like a cleaner/defragmenter upon full restore, optimizing placement of parts for even faster access. My choice of reiserfs was right in hindsight even without tapefs plug-in, because it allowed me to leverage an fs-neutral utility to backup not only the linux/samba server but also some twenty PC's with valuable data in the same format. A little off-topic (sorry Stellian) but still to the point of backups, I'll describe how I backup my systems in more detail. I recognized early that having a tape drive directly attached to the server introduces some complexity which is undesirable if the server is to remain on-line 24x7 (remember the guy who had to shutdown a production system in order to reset the tape). Another linux box, acting as a standby and backup server simplified things quite a bit. Their Gb NICs are connected via crossed TP. I couldn't get quite the same transfer rates to tape and I also noticed that the LTO-2 drive was doing a lot of shoe-shining when fed off the remote tar. The solution to that was staged backup. Let the tar pump a tarball and transfer it to the tape later. Each 1st of the month I tar the server (230 GBs), index it and put it on the tape, then tar all those smbmount'd PC shares (360 GBs), index them and put them on the tape. After that I restore the server tar on the staging disks and have it ready should anything happen to the server's RAID. There is space enough to accumulate daily increments of server and all PC shares. The server's incremental tar is daily unpacked onto the standby copy, so it gets bigger with time, because I don't trim deletions. Once a week I index the tar increments and put them on tape too. You might wonder what I meant by "indexing"? Well, the other great advantage of dump is that it backs up all of the directories ahead of the data files. This makes it easy to reconstruct the hierarchical fs tree, which allows the admin to navigate through the off-line fs in a similar fashion as though it were on-line. It takes the guesswork out of precise spelling of a path. You can easily verify if a file is at all there by reading only some MBs into the tape, instead of wading through all the darned GBs only to find that either the file is not there or you misspelled an upper/lowercase or something like that. Now, I could live without being able to backup an unmountable volume, grudgingly, I must say, but having an easy user-friendly interface to the contents was more than I could swallow. I made an indexing utility which parses all the staged tarballs and extracts all the header info - path, size, fmode, mtime, owner, group, and position on tape, and put them in a 2-way linked list which my vxfs restore can understand and deliver the data on demand either from the staged tarballs or from tape. Since the program reads on-line files it skips over data from header to header with lseek(), so it takes only minutes to index hundreds of GBs. At first I used the dump 64-bit format for this but I realized how wasteful it was and made it more rational - for example, the info on all of the PC files (360 GBs, over a million files and directories) are squozen into less than 50 MBs (12 MBs after gzip). A whole month's worth of backup indices takes about 30 MBs, so I can keep many tapes' worth of them without significantly wasting space and then be able to search for versions of files easily even without mounting the tapes. This TOC file is called "...index", it is always the first file on tape and the rest of the tarballs are laid out in alphabetical order, by which they were also indexed. Upon reading this file the restore utility presents the following ncurses-based user interface (you need proportional font to see it properly): | Size in kBs Dirs Size(kB) Paths | ----------- ------------- ---------- ----------------------- | 62,458 /01.c.030710 || 5,661 /Documents and Settings | 66,569 /01.c.030711 || 2,594 /My Files | 3,917 /01.c.030714 || 0 /ProSim | 13,468 /01.c.030716 || 2 /Prg | 12,605 /01.c.030717 || 2,544 /SRC | 237,167 /01.c.030719 || 1,858,356 /SYS_SAV | 1,301,366 /01.d.030710 || 245 /TEMP | 303,833 /01.d.030711 || 12,468 /WINNT | 87,352 /01.d.030714 | | 94,380 /01.d.030716 | |-->1,881,874 /02.c.030711<--- | 34,208 /02.c.030714 | | ................... etc .................. | 4,010,457 /90.s.030717 | | 3,054,117 /90.s.030719 | | | select: ?help Restore Quit Note: The "-->" and "<---" are actually place holders for inverse video display of the current line. Basically, each day's increment of each host is a root directory in this tape repository. You can navigate the backed up namespace with customary hot-keys of Windows file browser (arrows left/ right/up/down, page up/down, home, end, tab, space, etc) as well as their vi equivalents (h, l, k, j, h, g etc.) to obtain something like this (now you really need a proportional font if you dont want to see garbled text): | Size in kBs Dirs Size(kB) Paths | ----------- -------------------------- -------- ------------- | 62,458 /01.c.030710 || 0 *2EEB3D0A.V01 | 66,569 /01.c.030711 | | 3,917 /01.c.030714 | | 13,468 /01.c.030716 | | 94,380 /01.d.030716 | | 1,881,874 /02.c.030711 | | 5,661 | /Documents and Settings | | 1 | | /All Users | | 0 | | | /Application Data | | 0 | | | | /Microsoft | | 0 | | | | | /Windows NT | | 0 | | | | | | /NTBackup | | --------->0 | | | | | | | #catalogs51<-- | 1 | | | /Startmenue | | 5,659 | | /RonBrowne.QNO1 | | 2,594 | /My Files | | 0 | /ProSim | | 2 | /Prg | | 2,544 | /SRC | | 1,858,356 | /SYS_SAV | | 245 | /TEMP | | 12,468 | /WINNT | | 34,208 /02.c.030714 | | ............ etc .............. | | select: ?help Restore Quit Selecting an item, by pressing <Enter> on it, as shown above, changes the prefix (a file's blank prefix turns to *, a dir's slash becomes a #, recursively of course) and, if told to restore, it just picks it up from where it's on the tape within a minute or so, or instantly if it's still in a staged tarball. Since all attributes are in ...index, it doesn't waste time stopping at constituent directory entries, they are mkdir'd ad-hoc. It makes a positive impact on the users when every now and then they manage to delete something they need. And it's salutary for me too. ____________________________________________________________ Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail! http://login.mail.lycos.com/r/referral?aid=27005 |
From: Dragan K. <dk...@ly...> - 2003-11-29 20:02:27
|
Hi Stellian, in a previous posting you mentioned that dump uses the host endianness, meaning mostly it is small-endian, because there are few big-endian processors today. I find it a bit awkward and it breaks the compatibility between platforms of different endianness. Is there a way to bridge the gap? I tried reading Linux dump on an HP system. It didn't work. I tried HP dump and vxdump on a Linux. It didn't work either. Of course, your dump/restore sources can at least be built on the target platform, but I guess one would still have to fix a lot of binary reads to reverse the endianness on a big-endian system. Was it ever a problem? Is there a provision for dealing with endianness in dump? Cheers Dragan PS: It's not an accademic issue that bothers me. I'm about to release a reiserfsdump/restore and I used ntohX family of routines to keep the big-endianness even though I did it on an Intel Linux. It was much easier to debug when bytes are ordered most-significant-first.(left to right). Now I would like to make it compatible with your dump and am scratching my head how to do it. Any idea? |
From: Stelian P. <st...@po...> - 2003-11-29 22:42:20
|
On Sat, Nov 29, 2003 at 08:55:21PM +0100, Dragan Krnic wrote: > Hi Stellian, > > in a previous posting you mentioned that dump uses the > host endianness, meaning mostly it is small-endian, > because there are few big-endian processors today. I don't recall having said that. > I find it a bit awkward and it breaks the compatibility > between platforms of different endianness. > > Is there a way to bridge the gap? In fact dump writes the data in the platform endianness and there is code in restore which tries to detect if the machine on which the restore is done has a different endianess, and if this is the case, it does the conversions. This applies to the dump own metadata (the dump header etc). Wrt to the filesystem data, ext2/ext3 metadata is in little endian no matter what the host endianess is, so there is nothing to be done. This is often the case for filesystems because they are designed to be read on any machine, big endian or little endian. Stelian. -- Stelian Pop <st...@po...> |
From: Dragan K. <dk...@ly...> - 2003-11-30 10:05:56
|
>> in a previous posting you mentioned that dump uses the >> host endianness... > > I don't recall having said that. > ... > In fact dump writes the data in the platform endianness You just said it again :-) > Wrt to the filesystem data, ext2/ext3 metadata is in little > endian no matter what the host endianess is, so there is > nothing to be done. Don't be so pessimistic. There's always a way out. What about xfsdump? Of course, one can use xfsrestore to recover xfsdump to an ext2/ext3, but is its format compatible to your dump's format? Cheers Dragan |
From: Stelian P. <st...@po...> - 2003-11-30 11:54:28
|
On Sun, Nov 30, 2003 at 11:05:35AM +0100, Dragan Krnic wrote: > >> in a previous posting you mentioned that dump uses the > >> host endianness... > > > > I don't recall having said that. > > ... > > In fact dump writes the data in the platform endianness > > You just said it again :-) Sorry I misread you. I thought you implied that I said dump was in little endian format because most of today's platforms are and that we don't care about big endian ones. > > > Wrt to the filesystem data, ext2/ext3 metadata is in little > > endian no matter what the host endianess is, so there is > > nothing to be done. > > Don't be so pessimistic. There's always a way out. I am not pessimistic. "nothing to be done" means "no efforts are needed" here. > What about xfsdump? Of course, one can use xfsrestore to > recover xfsdump to an ext2/ext3, but is its format > compatible to your dump's format? I have no idea, in fact I didn't even know that XFS had its own dump/restore tools :) Stelian. -- Stelian Pop <st...@po...> |
From: Dragan K. <dk...@ly...> - 2003-11-30 15:13:49
|
>> > Wrt to the filesystem data, ext2/ext3 metadata is in little >> > endian no matter what the host endianess is, so there is >> > nothing to be done. >> >> Don't be so pessimistic. There's always a way out. > > I am not pessimistic. "nothing to be done" means > "no efforts are needed" here. I misread you on this one. Some other file systems, HP's HFS and Sun's UFS, are very similar to ext2. They differ in name and magic mostly and in where the superblock starts. I compiled my hfsdump under Solaris and it worked with few changes, but of course it was unusable on HP side and vice versa, because one is big-endian and the other is small-endian. Thanks for the replies. Cheers Dragan |
From: Stelian P. <st...@po...> - 2003-11-30 21:27:36
|
On Sun, Nov 30, 2003 at 04:13:28PM +0100, Dragan Krnic wrote: > >> > Wrt to the filesystem data, ext2/ext3 metadata is in little > >> > endian no matter what the host endianess is, so there is > >> > nothing to be done. > >> > >> Don't be so pessimistic. There's always a way out. > > > > I am not pessimistic. "nothing to be done" means > > "no efforts are needed" here. > > I misread you on this one. :) > Some other file systems, HP's HFS and Sun's UFS, are very > similar to ext2. They differ in name and magic mostly and > in where the superblock starts. Yeah, this must be because all those filesystems must be derived from the original UNIX one. However, I'd say there is a big possibility that those filesystem have evolved in separate directions, implementing uncompatible extensions... > I compiled my hfsdump under > Solaris and it worked with few changes, but of course it > was unusable on HP side and vice versa, because one is > big-endian and the other is small-endian. Yup, this is because HP and Sun make both the hardware and the software. I imagine it has been funny when they ported Solaris to Intel hardware. :) Stelian. -- Stelian Pop <st...@po...> |
From: Kenneth P. <sh...@se...> - 2003-12-01 01:16:28
|
--On Sunday, November 30, 2003 10:25 PM +0100 Stelian Pop <st...@po...> wrote: > I imagine it has been funny when they ported Solaris to Intel > hardware. :) I had a Sun 386i running SunOS 4. You should have seen all the ifdef's in the kernel headers to deal with the fact that not only was the endianness different, but bit fields were assigned by the 386 compiler in the opposite direction from SPARC. (All the hardware was dealt with as structs of bit fields.) |
From: Stelian P. <st...@po...> - 2003-09-15 09:47:00
|
On Sun, Sep 14, 2003 at 10:13:59PM +0100, Dragan Krnic wrote: > It is a valid point, Antonios. The old ext2/3 is > in many respects inferior to reiserfs. [...] Comparisions between reiserfs and ext3 performance are off topic on this list. Dragan, I've already told you in the past that your tests were not very 'professionnal'. You compare file system creation time (which is irrelevant since you only do it once), file system size (and you don't say what options you are using), you don't distinguish between the 3 ext3 mount modes, etc. I pointed you towards the ext2/3 or linux kernel mailing lists where people did real tests against those two filesystems, and the results were more or less equivalent. If you want to discuss your personal performance measurements, please do it there. > As conclusion, I think it is time to move on. > The good old ext[23] is pretty stable, which means > it won't develop in any significant way. ext3 is still in active development, both feature-wise and performance-wise. > On the > other hand reiserfs is evolving in very many > interesting ways, promissing even better performance > and more features than any other fs in use. For > example, there will be a reiser-tapefs plug-in, [...] Interesting. That's IMHO the advantage of reiserfs: they are exploring different algorithms, new ideas, etc. But unfortunately they often tend to forget stability... I never had a single problem with ext3, but the reiserfs history is filled with security/migration/loss of data/hash conflicts problems. Yes, I am biased but it has happened to me in the real life. Stelian. -- Stelian Pop <st...@po...> |