From: Steve S. <ste...@ho...> - 2003-05-11 21:11:47
|
>Ok, here is my take on recursive COW files and in addation a V3 >cow header layout is included. I'm happy to see the nested COW issue being addressed here. I had discussed it with Jeff, and he scared me off doing it myself by describing an (elegant) recursive driver design that he wanted to see. I wasn't convinced drivers calling drivers calling drivers... was efficient** (I think you've coded is an iterative design, btw, not a recursive one like your comments indicate). I was also the person interested in the ISAM format COW for non-sparse devices. Jeff and I discussed that, too, and agreed that a "format" field (0 == conventional bitmap index, 1 == ISAM, etc. ) in the COW header would suffice for now, and ISAM could be integrated later. To make that port easier, I envisioned a dev->ops like function lookup table that would point to the proper routines based on the format of the COW. It might even contain the functions to process the different version headers, like so: if((vops = get_COW_version_ops(workspace) == NULL) { ERROR("Unknown COW file version"); return NULL; } cow_name = vops->get_backing_name(workspace); vops->fill_devinfo(&tmp,workspace); ... Just a thought. >Now this is just a test version, I have a pretty good idea how to >hook it into the the ubd driver but I wanted to ask for comments >first, this version has lots of debugging to check for my map >allocation and which file is being read. 1) I'm not convinced this is proper: #include <sys/param.h> /* MAXPATHLEN */ [...] #define PATH_LEN_V1 256 #define PATH_LEN_V2 MAXPATHLEN While 256 may have been too short(?), MAXPATHLEN is ambigious, and possibly variable, depending on system, leading to more porting headaches. Pick an appropriate length, or define the length as a variable in the header. 2) if (memcmp(&workspace.v3.magic, &magic, sizeof(magic)) == 0) { if (ntohl(workspace.v3.version) == 3) { version = ntohl(workspace.v3.version); cow_name = workspace.v3.backing_file; tmp.data_length = ntohll(workspace.v3.data_length); tmp.data_offset = ntohll(workspace.v3.data_offset); tmp.mtime = ntohll(workspace.v3.mtime); tmp.cow_offset = ntohll(workspace.v3.cow_offset); tmp.cow_length = ntohll(workspace.v3.cow_length); tmp.data_offset = ntohll(workspace.v3.data_offset); tmp.data_length = ntohll(workspace.v3.data_length); The above two lines are typos (duplicates), I think? tmp.cylinders = ntohl(workspace.v3.cylinders); tmp.heads = ntohl(workspace.v3.heads); tmp.sectors = ntohl(workspace.v3.sectors); tmp.sector_size = ntohl(workspace.v3.sector_size); } } 3) #define COW_SPARSE 1 What is this for? Future ISAM support? There is only one instance of this define being addressed in the code, which leads me to: 4) The comments in the code mostly state the obvious, like: /* write out to the data section */ [...] tmp = pwrite64(dev->fd, buf, dev->sector_size, dev->data_offset + work); While extremely odd code segments, like this: if ((dev->flags & COW_SPARSE) && (memcmp(buf, zero,dev->sector_size) == 0)) { tmp = read_COW(dev, buf, 1, offset); if (tmp == 1) { if (memcmp(buf, zero, dev->sector_size) == 0) { goto one_done; } DEBUG("zero %s fd%d buf %p[%d]:%d\n", dev->name, dev->fd, zero, dev->sector_size, dev->data_offset + work); tmp = pwrite64(dev->fd, zero, dev->sector_size, dev->data_offset + work); if (tmp < dev->sector_size) { ERROR("write zero %d:buf[%p]+%xl:%d", dev->fd, zero, dev->data_offset + work, dev->sector_size); return result; } goto one_done; } } Are unfathomable to me and go completely unexplained. Why is this here? What special case is this code *supposed* to take care of. Why are blocks of zeros an issue? How CPU expensive is doing the memcmp on the blocks - not only once, but *twice*? What are the implications of the block *not* comparing to a bunch of zeros? I am glad someone's hacking at this, I would love to see nested COW files. Steve Schmidtke ** An easy optimization for nested COW files with James' iterative implementation, as opposed to Jeff's recursive proposal, would be to create a bitmap union of all the nested COW bitmaps. If a bit for a block is set in the union bitmap, then each of the COWs must be searched for that block, otherwise (and the most likely case) the block exists in the backing file. Another optimization would make the union bitmap an index array, such that a block entry in the index array also identifies which COW entry contains that block (with an index of 0 defining a backing file block) - no sequential searching or bit shifting required, at the cost of 1 byte per block for the first 255 nested COW files. _________________________________________________________________ Add photos to your messages with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail |