From: Phillip L. <ph...@lo...> - 2006-10-22 03:14:37
|
On 21 Oct 2006, at 08:07, hoo...@ya... wrote: > > Hello all, > > I have just started reading squashfs3.1 > (ftp.slax.org/SLAX-6.x/testing/technology-preview/devel/official- > squashfs3.1.patch), > and got some questions. I am unsure whether they are actual bugs or > not. Won't you tell me the truth? > No problem doing that :-) > o hardllinked inode > Usually, in traditional unix filesystems, when a file is hardlinked, > all files share a single inode object. > In squashfs, the inode (at least in memory) is generated for each. Of > course, the link count and inode number are correct. So it does no > harm > to users, but a stackable filesystem. > If the inode is shared among the hardlinked files, it will make a > stack > filesystem (unionfs or aufs) happy. Since they are assuming the > inode is > shared as usual. Hmm, I see your point. The Squashfs behaviour in 3.1 is a hang-over of Squashfs 2.x and earlier when inode numbers given to VFS were not 100% guaranteed to be unique. In Squashfs 2.x and earlier no real inode numbers were stored, and the inode number passed to VFS was computed from the 48-bit disk location of the inode. Because VFS inodes were 32 bits (and still are on 32-bit systems), this required compacting the 48-bit disk location to a unique 32 bit number. The strategy used made use of knowledge of the redundancy in the 48-bit disk location, but there was always the possibility that two different inodes would get the same inode number. Doing an iget() lookup would cause problems in this case. Using iget5() was a possibility, but doing new_inode() worked as far as users are concerned because any duplicate inodes so created did not cause problems as they are identical. Squashfs 3.0 and later introduces real inode_numbers which make iget () lookup possible. Updating Squashfs to use iget() was something I intended to do for Squashfs 3.0 but didn't get the time to do it. I instead choose to do this when I added NFS support, as it is needed for NFS. I didn't consider the issue with regards to hard-links (only supported in Squashfs 3.0 and later) and stackable filesystems. The next version of Squashfs has NFS support, and the work to use iget () has been done. It should be released very soon. This issue should therefore go away. > o release_meta_index() and SMP > While this simple assignment to unlock is a good idea, but it is > doubtful to support SMP. How about appending smp_mb() after the > assignment, or changing the type of meta->locked from 'unsigned short' > to 'atomic_t'? > Correct. This could cause problems on SMP systems. > o block_cache_mutex in squashfs_get_cached_block() > block_cache_mutex is locked just AFTER testing > msblk->block_cache[0...CACHED_BLKS].block. Should the lock be held > just > BEFORE testing? Is it safe when the multiple processes enter here? Yes, it is safe. The lock isn't acquired before the list search as this minimises the amount of time the lock is held to maxmise parallelism. After the list is searched, the lock is acquired. Once the lock is acquired, if the process found the cached block in the list, the list entry is rechecked to ensure it still holds the expected block. This guards against cases where the process has raced against another process which has updated the chosen list entry. If the entry has changed, the process retries the list search. If the entry is still the same (which is expected to be normally the case), the entry is guaranteed not to change, as the process holds the lock. With regards to SMP systems, acquiring the lock ensures any memory barriers have been performed (the list is only updated when the lock is held), which ensures the second all important check is performed using uptodate data. > Also, is it safe the temporary unlock/lock around > msblk->block_cache[i].length = squashfs_read_data(...); Yes, it is safe. Before the lock is released, the list entry is set to SQUASHFS_USED_BLK. This is a special value which ensures that no cache searches match on this value, and the entry is not reused by any other process. It in effect locks the entry against use by any other process while squashfs_read_data() takes place. The lock is released around squashfs_read_data() because it is a slow operation (reading and decompressing data from disk). > o fragment_mutex in get_cached_fragment() > Updating fragment[i].block is done AFTER fragment_mutex is released. > msblk->fragment[i].block = start_block; > Is it safe when the multiple processes enter here? > The block update should really be performed with the lock held, however, the current code is safe. The entry in question has just been read from disk via squashfs_read_data() and was updated to SQUASHFS_INVALID_BLK with the lock held to prevent matches and re-use as discussed previously. Doing this with the lock held ensures any SMP systems sees the SQUASHFS_INVALID_BLK update. The update in question is performed without the lock held and so SMP systems may read stale data, however, the stale data will be SQUASHFS_INVALID_BLK and so the block will be ignored, which is harmless. > o d_add() in squashfs_lookup() > It might be better to call d_splice_alias() instead of d_add(), but I > don't this d_add() is harmful. d_splice_alias() is only required for NFS (d_splice_alias() connects a previously disconnected directory tree with its parent). Without NFS support, d_splice_alias() is unnecessary. The latest Squashfs version with NFS support uses d_splice_alias(). Regards Phillip |