Re: [Squashfs-devel] some questions after reading squashfs3.1

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 21 Oct 2006, at 08:07, hoo...@ya... wrote:

>
> Hello all,
>
> I have just started reading squashfs3.1
> (ftp.slax.org/SLAX-6.x/testing/technology-preview/devel/official- 
> squashfs3.1.patch),
> and got some questions. I am unsure whether they are actual bugs or
> not. Won't you tell me the truth?
>
No problem doing that :-)

> o hardllinked inode
> Usually, in traditional unix filesystems, when a file is hardlinked,
> all files share a single inode object.
> In squashfs, the inode (at least in memory) is generated for each. Of
> course, the link count and inode number are correct. So it does no  
> harm
> to users, but a stackable filesystem.
> If the inode is shared among the hardlinked files, it will make a  
> stack
> filesystem (unionfs or aufs) happy. Since they are assuming the  
> inode is
> shared as usual.

Hmm, I see your point.  The Squashfs behaviour in 3.1 is a hang-over  
of Squashfs 2.x and earlier when inode numbers given to VFS were not  
100% guaranteed to be unique.  In Squashfs 2.x and earlier no real  
inode numbers were stored, and the inode number passed to VFS was  
computed from the 48-bit disk location of the inode.  Because VFS  
inodes were 32 bits (and still are on 32-bit systems), this required  
compacting the 48-bit disk location to a unique 32 bit number.  The  
strategy used made use of knowledge of the redundancy in the 48-bit  
disk location, but there was always the possibility that two  
different inodes would get the same inode number.  Doing an iget()  
lookup would cause problems in this case.  Using iget5() was a  
possibility, but doing new_inode() worked as far as users are  
concerned because any duplicate inodes so created did not cause  
problems as they are identical.

Squashfs 3.0 and later introduces real inode_numbers which make iget 
() lookup possible.  Updating Squashfs to use iget() was something  I  
intended to do for Squashfs 3.0 but didn't get the time to do it.  I  
instead  choose to do this when I added NFS support, as it is needed  
for NFS.  I didn't consider the issue with regards to hard-links  
(only supported in Squashfs 3.0 and later) and stackable filesystems.

The next version of Squashfs has NFS support, and the work to use iget 
() has been done.  It should be released very soon.  This issue  
should therefore go away.

> o release_meta_index() and SMP
> While this simple assignment to unlock is a good idea, but it is
> doubtful to support SMP. How about appending smp_mb() after the
> assignment, or changing the type of meta->locked from 'unsigned short'
> to 'atomic_t'?
>

Correct.  This could cause problems on SMP systems.

> o block_cache_mutex in squashfs_get_cached_block()
> block_cache_mutex is locked just AFTER testing
> msblk->block_cache[0...CACHED_BLKS].block. Should the lock be held  
> just
> BEFORE testing? Is it safe when the multiple processes enter here?

Yes, it is safe.  The lock isn't acquired before the list search as  
this minimises the amount of time the lock is held to maxmise  
parallelism.  After the list is searched, the lock is acquired.  Once  
the lock is acquired, if the process found the cached block in the  
list, the list entry is rechecked to ensure it still holds the  
expected block.  This guards against cases where the process has  
raced against another process which has updated the chosen list  
entry.  If the entry has changed, the process retries the list  
search.  If the entry is still the same (which is expected to be  
normally the case), the entry is guaranteed not to change, as the  
process holds the lock.

With regards to SMP systems, acquiring the lock ensures any memory  
barriers have been performed (the list is only updated when the lock  
is held), which ensures the second all important check is performed  
using uptodate data.

> Also, is it safe the temporary unlock/lock around
> 	msblk->block_cache[i].length = squashfs_read_data(...);

Yes, it is safe.   Before the lock is released, the list entry is set  
to SQUASHFS_USED_BLK.  This is a special value which ensures that no  
cache searches match on this value, and the entry is not reused by  
any other process.  It in effect locks the entry against use by any  
other process while squashfs_read_data() takes place.  The lock is  
released around squashfs_read_data() because it is a slow operation  
(reading and decompressing data from disk).

> o fragment_mutex in get_cached_fragment()
> Updating fragment[i].block is done AFTER fragment_mutex is released.
> 	msblk->fragment[i].block = start_block;
> Is it safe when the multiple processes enter here?
>

The block update should really be performed with the lock held,  
however, the current code is safe.  The entry in question has just  
been read from disk via squashfs_read_data() and was updated to  
SQUASHFS_INVALID_BLK with the lock held to prevent matches and re-use  
as discussed previously.  Doing this with the lock held ensures any  
SMP systems sees the SQUASHFS_INVALID_BLK update.  The update in  
question is performed without the lock held and so SMP systems may  
read stale data, however, the stale data will be SQUASHFS_INVALID_BLK  
and so the block will be ignored, which is harmless.

> o d_add() in squashfs_lookup()
> It might be better to call d_splice_alias() instead of d_add(), but I
> don't this d_add() is harmful.

d_splice_alias() is only required for NFS (d_splice_alias() connects  
a previously disconnected directory tree with its parent).  Without  
NFS support, d_splice_alias() is unnecessary.

The latest Squashfs version with NFS support uses d_splice_alias().

Regards

Phillip