Thread: [open-vm-tools-devel] Getting the kernel modules into Linux mainline

Status: Beta

Brought to you by: andycking, dsouders, dtor, dynofu, and 3 others

open-vm-tools-devel

[open-vm-tools-devel] Getting the kernel modules into Linux mainline

From: Chris M. <ma...@ch...> - 2008-07-26 20:42:58

Hi all

I saw Dan Benamy's email recently about getting the kernel modules into
Linux mainline, and by co-incidence I've been working on a cleanup for
the last few weeks, since Greg K-H added it to linux-staging:

http://git.kernel.org/?p=linux/kernel/git/gregkh/staging.git

I've only looked at vmblock thus far, but I'm prepared to give the other
modules similar treatment if it's seen as worthwhile.  If vmblock is to
be replaced with a FUSE alternative then obviously this patch is going
to be academic, but at least it's given me a good insight into the
amount of effort required.

I haven't posted it to LKML yet but was thinking about doing it sometime
soon as an RFC.  The code is *substantially* reduced in size (one third
of the original), I've stripped out all the backwards-compatibility and
operating system independent stuff (otherwise it won't get past first
base) and also made a large number of coding style changes to fit in
with the kernel style. There's plenty more that could/should be done but
at least it's now clean of all checkpatch and sparse warnings (there
were getting on for a thousand).

I haven't changed the CamelCase function names yet and doubtless that
will be frowned upon.  I did replace the custom double linked list
implementation with the standard Linux kernel version, but otherwise it
should be functionally identical to the original; I was careful to check
the object code after each change and have tested that it works in
VMware Player.

The upshot is that it's so different from the original VMware source
that it would probably be seen as a fork, so I'm not sure about its
long-term viability. I guess if VMware or the open-vm-tools community
are going to continue to develop the original code (for the benefit of
other O/Ses and older Linux kernels) then that's cool but someone would
have to merge those changes into mainline ongoing.

Anyway if you want to have a look it's at:
http://www.chrismalley.co.uk/patches/add-vmblock-driver.patch

I have the individual commits in a git repository so will be able to do
a patch series against -staging, but frankly the above is much easier
for people who are new to the code to review.

Comments welcome!

cheers
Chris Malley

PS. I don't work for VMware, and this is an independent project not
sponsored by my employer.

Re: [open-vm-tools-devel] Getting the kernel modules into Linux mainline

From: Daniel B. <db...@vm...> - 2008-07-28 19:31:49

On Saturday 26 July 2008 13:42:54 Chris Malley wrote:

> I saw Dan Benamy's email recently about getting the kernel modules into
> Linux mainline, and by co-incidence I've been working on a cleanup for
> the last few weeks, since Greg K-H added it to linux-staging:

Hi Chris,

This is awesome! Thanks!

FWIW, I've gone ahead with porting vmblock to fuse for my internship project, 
but I don't know what the plan is as far as which will be used where in the 
future.

Best regards,
Dan

Re: [open-vm-tools-devel] Getting the kernel modules into Linux mainline

From: Adar D. <ad...@vm...> - 2008-11-11 11:09:17

Attachments: vmblocktest.c manual-blocker.c

Chris,

I'm sorry that none of us replied to your mail, but better 3 months late than never, right?

Thanks very much for taking the time to go through the vmblock code and cleaning it up. As a developer I can imagine how annoying it must be to translate a non-trivial piece of code from one coding style to another, so I very much appreciate your perseverance in the matter.

Before I get to your patch, let's discuss the more interesting issue of vmblock's future. As you surmised, vmblock-fuse was written to replace vmblock. This was done primarily because the informal feedback we received from some kernel developers suggested that the design of vmblock is a non-starter for the kernel. See http://bugzilla.redhat.com/show_bug.cgi?id=294341 for Christoph Hellwig's comments and http://sourceforge.net/mailarchive/message.php?msg_name=46E6FDEC.3090700%40codemonkey.ws for Anthony Liguori's comments. Unfortunately, vmblock-fuse isn't quite ready for prime-time, but until its code is part of open-vm-tools it's difficult to describe exactly why (the open-sourcing process is taking longer than I expected). I suggest that you table the work on vmblock for the time being, and once vmblock-fuse is part of open-vm-tools we can have a more productive discussion about the future of vmblock. How does that sound?

If you do want to continue preparing VMware module code for upstream inclusion, first of all, thanks, and secondly, I recommend vmmemctl. It's a very simple module (especially once the compat glue is removed), it hasn't changed in a long time, and as Linux has recently gained kvm and xen balloon modules, I imagine the VMware one will be fairly non-controversial. Another alternative is the vmxnet module, which is a straightforward paravirtualized network card. The one issue there might be the morphing "feature", which may require us to merge the vmxnet code into the existing upstream pcnet32 module. If you're interested in this, let me know and I'll provide more details.

Since we're on the subject, I also want to say a few words about upstreaming VMware's kernel modules. In general we're getting more and more comfortable about upstreaming as some of us have been socializing the idea within the company. I think the remaining obstacle is this idea that once our code is upstream, we are no longer acting as the distributor, which means we can no longer control its destiny. We've shipped out-of-tree modules for years now and have grown accustomed to the control that this gives us; among other things, we can do things like release updated modules simultaneously with our products, or make whatever changes we want regardless of how locked down the affected distros might be (for example, we can add a new feature to a module intended for a distro that's been declared end-of-life by its vendor). It's very difficult to cede this control once you've enjoyed it for many product releases, which is partly why progress has been slow. To mitigate this problem, we're trying to understand whether, after upstreaming, we can gain some control back by working more closely with distros so that new features or changes that are important to us can be backported into existing distro releases in time for, say, a VMware product release. I don't have any details to share yet, but I'm sure either Ragavan or myself will say more about this in the near future.

Overall I think your vmblock patch is a solid improvement. I didn't look so much at the functionality as I imagine it's the same as before, minus compat glue. If there's something in particular you'd like me to pay closer attention to, let me know. Since you're working on vmblock, I've attached two unit test programs that you might find useful. vmblocktest.c is a general stress test we've used on all of our vmblock implementations (fuse, Linux, FreeBSD, and Solaris). manual-blocker.c was written by Daniel and used for testing vmblock-fuse. You should be able to build both with a minimum of fuss, but let me know if you have issues there.

Here are my (cosmetic) comments:
- The PURPOSE and TESTING portions are especially useful; we're suffering from a dearth of documentation.
- The overloaded interface you mention in the test code was also discussed by Dan Benamy and Miklos at the time that Dan was writing vmblock-fuse. In his implementation he's opted for a new interface where a character representing the command and the path are concatenated and sent to the driver via a single write(2). You can see view the thread at http://article.gmane.org/gmane.comp.file-systems.fuse.devel/6533, if you're interested.
- In Kconfig, some of the capitalization of "VMware" is inconsistent; could you fix that?
- Also in Kconfig, the driver should probably be referred to as a "blocking" driver instead of a "block" driver; that should alleviate any confusion regarding vmblock and the block layer.
- In Makefile, the indentation for the backticks isn't consistent. I'm not sure if you care about that.

Again, thanks for helping us out with this effort!

Re: [open-vm-tools-devel] Getting the kernel modules into Linux mainline

From: Chris M. <ma...@ch...> - 2008-11-11 13:37:06

Hi Adar

Thanks for the comprehensive response; I did do a little further cosmetic
work on vmblock which I'll post here after I've tried your test code, but to
be honest I've not had much hacking time since my son was born in August!  I
agree that the future is probably userspace anyway so I'll park it for now,
and maybe take a look at the other modules you suggest.

cheers
Chris

Re: [open-vm-tools-devel] Getting the kernel modules into Linux mainline

From: Adar D. <ad...@vm...> - 2008-11-13 00:45:19

> > Since we're on the subject, I also want to say a few words about
> > upstreaming VMware's kernel modules. In general we're getting more and
> > more comfortable about upstreaming as some of us have been socializing
> > the idea within the company. I think the remaining obstacle is this
> > idea that once our code is upstream, we are no longer acting as the
> > distributor, which means we can no longer control its destiny. We've
> > shipped out-of-tree modules for years now and have grown accustomed to
> > the control that this gives us; among other things, we can do things
> > like release updated modules simultaneously with our products, or make
> > whatever changes we want regardless of how locked down the affected
> > distros might be (for example, we can add a new feature to a module
> > intended for a distro that's been declared end-of-life by its vendor).
> > It's very difficult to cede this control once you've enjoyed it for
> > many product releases, which is partly why progress has been slow. To
> > mitigate this problem, we're trying to understand whether, after
> > upstreaming, we can gain some control back by working more closely
> > with distros so that new features or changes that are important to us
> > can be backported into existing distro releases in time for, say, a
> > VMware product release. I don't have any details to share yet, but I'm
> > sure either Ragavan or myself will say more about this in the near
> > future.
> 
> I don't see why you can't continue to still ship your drivers
> out-of-tree just fine, as you do today, if the code is also in the
> kernel tree as well.
> 
> Why do you think this is somehow "giving up control"?  How does this
> prevent you from still providing specific modules for specific distros,
> pre-build and ready to go?
> 
> Overall, it will end up saving you engineering time and effort if your
> code gets into the tree, as 75% of your driver lines of code will go
> away (no need for the multiple layers of indirection to handle zillions
> of kernel versions.)
> 
> If you look around, this works just fine for almost all hardware
> vendors, so it's not like this is a new model of doing work or anything
> :)

I'm not sure I see exactly how this will work out as painlessly as you say, so let me describe my understanding of the situation and you can correct my mistakes.

Let's suppose we want to continue to ship out-of-tree as well as in-tree drivers. The in-tree drivers are live in a single upstream codebase. This codebase can only produce working drivers for the kernel it ships. The out-of-tree drivers live in a single "downstream" codebase littered with compatibility wrappers as we do today. This codebase can be compiled for virtually any kernel, provided we continue to chase down all kernel API changes and wrap them.

Distros will periodically snapshot the upstream kernel and ship some version of our drivers. Additionally, we will periodically snapshot our downstream source and ship a large number of prebuilt drivers and the source itself as part of the VMware Tools, as we do today. Any given VM may or may not have some version of the drivers provided by the distro itself. When installing VMware Tools, our installer will need to choose whether to install the drivers we provide, or to depend on the drivers already found on the system. Some of this logic already exists in our installer today and we can extend it to check the versions of the drivers to make decisions like these, or we can ship dkms in our installer and use it to do this work. Also, it's not enough that these decisions be made when installing or upgrading VMware Tools; they also need to be made when the distro's kernel is upgraded. Again, this is something that dkms can do at boot-time.

With this system in place, VMware is in a position to provide driver updates as it sees fit via the VMware Tools, and to deliver drivers via upstream for the best possible out-of-the-box experience. However, we've paid some high prices to get here:
1) All of our prebuilt driver infrastructure must live on and we must continue to add the latest distros as they are released.
2) We must maintain a separate "downstream" codebase which will continually subject us to API chasing headaches.
3) The "downstream" codebase is essentially a fork of the upstream codebase, so merging changes between the two is painful.
4) There are quite a few moving parts on the user's system tasked with making sure the user has the latest drivers, either from VMware or from the distro. The more complicated the system, the more fragile it will be.
5) The "downstream" codebase is released on VMware's product release schedule. Since it is distro-independent, each of its releases will need to be tested against each distro release. This is a sizable m by n testing matrix.

So, what am I missing?

Re: [open-vm-tools-devel] Getting the kernel modules into Linux mainline

From: Adar D. <ad...@vm...> - 2008-11-14 02:39:33

> > I'm not sure I see exactly how this will work out as painlessly as you
> > say, so let me describe my understanding of the situation and you can
> > correct my mistakes.
> >
> > Let's suppose we want to continue to ship out-of-tree as well as
> > in-tree drivers. The in-tree drivers are live in a single upstream
> > codebase. This codebase can only produce working drivers for the
> > kernel it ships. The out-of-tree drivers live in a single "downstream"
> > codebase littered with compatibility wrappers as we do today. This
> > codebase can be compiled for virtually any kernel, provided we
> > continue to chase down all kernel API changes and wrap them.
> 
> The problem is you are wanting to continue to maintain and support this
> "downstream" codebase.  If you drop that, it will save you time and
> money and make your whole life easier.  Backporting the upstream kernel
> drivers to the handful of older kernels that are shipped by the distros
> is much simpler than trying to maintain your tangled-web of macros and
> wrapper functions that have been built up to support every kernel ever
> created.

OK, now I see what you're getting at. We've discussed maintaining a few distro kernels "downstream" before; I think our concern was that while this approach works well for a small number of distros, it doesn't scale. Now, you could make the argument that it needn't scale; out-of-tree updates should be the exception, not the rule. I tend to agree, though we'll need to discuss this some more and decide which guests merit this sort of treatment and which don't.

Since we're on the subject of overriding the distro's kernel modules with our own, what mechanism would you recommend for getting this done? I see several options:
1) Add some more logic to the Tools installer. We're part of the way there (we can build modules from source and deploy prebuilt modules), so we could add some overriding logic based on /lib/modules/updates or something.
2) Use dkms. We'd probably need to ship dkms with the Tools so that this will work on guests that don't come with dkms pre-installed. But I believe dkms is capable of both deploying prebuilt modules as well as cleanly overriding distro modules with its own.
3) Rely on distro driver update programs. Red Hat's driver update program mentions that we can ship a modprobe.d file that will override a Red Hat module in favor of own distributed by a hardware vendor (see "Overriding a Red Hat supplied driver" at http://driverupdateprogram.com). Of course, this method would only work for those distros that support a driver update program, which I believe is RHEL5, SLES9, and SLES10 (and I don't know whether Novell's partner driver process supports this sort of operation).

> I'm guessing that you can remove over 2/3 of your code base if this were
> to happen.
> 
> Note that you are pretty much the only company trying to do this kind of
> thing.  Have you ever thought that perhaps this might not be the correct
> model?  Or did you just think that you all were somehow smarter than
> everyone else?  :)

I plead the fifth. :)

But seriously, I don't know for sure why we go to such lengths to get modules installed in people's kernels. I know that we've done it this way for a long time, and so I'm sure there's enough momentum behind it that it's difficult to change.

If you could point me to some references online that describe how other companies approach this problem, I'd greatly appreciate it.

Re: [open-vm-tools-devel] Getting the kernel modules into Linux mainline

From: Chris M. <ma...@ch...> - 2008-11-18 20:47:18

Just for completeness, here's the updated patch for vmblock, which I've
run through the vmblocktest tool that Adar posted.  Seems a shame not to
post it somewhere so you guys might as well have it!

It's checkpatch and sparse clean, and applies against 2.6.27.

--
From: Chris Malley <ma...@ch...>
Subject: [PATCH] fs: add support for VMware vmblock "blocking" filesystem

This is a pseudo filesystem used by VMware tools to enforce safe
drag-and-drop / copy-and-paste functionality between host and guest.

Adapted from the open-vm-tools project, version 2008.10.10

Signed-off-by: Chris Malley <ma...@ch...>
---
 Documentation/filesystems/vmblock.txt |   68 ++
 fs/Kconfig                            |   21 +
 fs/Makefile                           |    1 +
 fs/vmblock/Makefile                   |    7 +
 fs/vmblock/vmblock_fs.c               | 1159 +++++++++++++++++++++++++++++++++
 include/linux/Kbuild                  |    1 +
 include/linux/vmblock.h               |   49 ++
 7 files changed, 1305 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/filesystems/vmblock.txt
 create mode 100644 fs/vmblock/Makefile
 create mode 100644 fs/vmblock/vmblock_fs.c
 create mode 100644 include/linux/vmblock.h

diff --git a/Documentation/filesystems/vmblock.txt b/Documentation/filesystems/vmblock.txt
new file mode 100644
index 0000000..ff5f98b
--- /dev/null
+++ b/Documentation/filesystems/vmblock.txt
@@ -0,0 +1,68 @@
+VMware Blocking Filesystem driver
+---------------------------------
+
+PURPOSE
+
+VMware have created a driver to enable safe drag and drop operations
+between host and guest, by implementing a pseudo "blocking" filesystem.
+
+The idea is that files can be copied via a temporary directory, but the target
+should not be able to access the file until the copy has finished.  This is
+accomplished by mounting a "view" of the directory, containing only symlinks
+to the real files, and a special control file.  Before copying, a "block"
+command is issued to the control file along with the filename.  Now any attempt
+to access the file via the view will block until a "release" command is issued
+(presumably after the source is happy that the copy has completed).
+
+
+TESTING
+
+1. Make a directory on the guest to store the files (/tmp/VMwareDnD by default):
+	# mkdir /tmp/VMwareDnD
+	# chmod 1777 /tmp/VMwareDnD
+	# echo "a test file" > /tmp/VMwareDnD/foo
+
+2. Load the vmblock module:
+	# modprobe vmblock
+	# ls -p /proc/fs/vmblock
+	dev  mountPoint/
+
+3. Mount the blocking filesystem:
+	# mount -t vmblock none /proc/fs/vmblock/mountPoint
+	# ls -l /proc/fs/vmblock/mountPoint
+	foo -> /tmp/VMwareDnD/foo
+
+4. To place a block on a file (see listing [1] below):
+	# cat /proc/fs/vmblock/mountPoint/foo
+	a test file
+	# addblock_wait10 /tmp/VMwareDnD/foo &
+	# cat /proc/fs/vmblock/mountPoint/foo
+	<ten seconds later....>
+	a test file
+
+
+EXAMPLE USERSPACE CODE
+
+/* addblock_wait10.c */
+#include <linux/limits.h>
+#include <linux/vmblock.h>
+#include <string.h>
+#include <fcntl.h>
+
+int main(int argc, char *argv[]) {
+	char buf[PATH_MAX];
+	
+	int fd = open(VMBLOCK_DEVICE, VMBLOCK_DEVICE_MODE);
+
+	strncpy(buf, argv[1], PATH_MAX);
+	/*
+	 * The size of the write dictates the type of command.
+	 * For example, VMBLOCK_ADD_FILEBLOCK is 98 bytes.
+	 * However the module will actually read PATH_MAX bytes!
+	 */
+	write(fd, buf, VMBLOCK_ADD_FILEBLOCK); /* should return 0 */
+	sleep(10); /* file is now blocked for 10 secs */
+
+	close(fd); /* block is released */
+}
+/* end of addblock_wait10.c */
diff --git a/fs/Kconfig b/fs/Kconfig
index abccb5d..cbf4837 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -979,6 +979,27 @@ config CONFIGFS_FS
 	  Both sysfs and configfs can and should exist together on the
 	  same system. One is not a replacement for the other.
 
+config VMBLOCK_FS
+	tristate "VMware blocking filesystem (EXPERIMENTAL)"
+	depends on EXPERIMENTAL
+	default n
+	help
+	  Select this option if you plan to run VMware and you want
+	  to enable drag-and-drop and copy-and-paste operations between
+	  host and guest.
+
+	  See <file:Documentation/filesystems/vmblock.txt> for details.
+
+	  If you have no intention of running VMware, say N.
+
+config VMBLOCK_DEBUG
+	bool "VMware blocking filesystem debugging"
+	default n
+	depends on VMBLOCK_FS
+	help
+	  Enable vmblock debugging features such as the facility to list
+	  all the current file blocks.
+
 endmenu
 
 menu "Miscellaneous filesystems"
diff --git a/fs/vmblock/Makefile b/fs/vmblock/Makefile
new file mode 100644
index 0000000..5dfb4db
--- /dev/null
+++ b/fs/vmblock/Makefile
@@ -0,0 +1,7 @@
+#
+# Makefile for the linux vmblock filesystem routines.
+#
+
+obj-$(CONFIG_VMBLOCK_FS) += vmblock.o
+
+vmblock-objs := vmblock_fs.o
diff --git a/fs/vmblock/vmblock_fs.c b/fs/vmblock/vmblock_fs.c
new file mode 100644
index 0000000..129d25a
--- /dev/null
+++ b/fs/vmblock/vmblock_fs.c
@@ -0,0 +1,1159 @@
+/*
+ * Copyright (C) 2006 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ */
+
+/*
+ * Rework for inclusion into Linux mainline
+ * Copyright (C) 2008, Chris Malley <ma...@ch...>
+ */
+
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/namei.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/statfs.h>
+#include <linux/string.h>
+#include <linux/vmblock.h>
+
+#define VMBLOCK_FS_NAME		"vmblock"
+#define VMBLOCK_ROOT_INO	1
+#define VMBLOCK_BLOCKSIZE	1024
+#define VMBLOCK_SUPER_MAGIC	0xabababab
+#define VMBLOCK_UNKNOWN_BLOCKER	NULL
+
+#define VMBLOCK_ERROR(fmt, args...)  \
+	pr_err("vmblock:%s:%u " fmt "\n", __func__, __LINE__, ##args)
+#define VMBLOCK_WARNING(fmt, args...)  \
+	pr_warning("vmblock:%s:%u " fmt "\n", __func__, __LINE__, ##args)
+#define VMBLOCK_INFO(fmt, args...)  \
+	pr_info("vmblock:%s:%u " fmt "\n", __func__, __LINE__, ##args)
+#define VMBLOCK_DEBUG(fmt, args...)  \
+	pr_debug("vmblock:%s:%u " fmt "\n", __func__, __LINE__, ##args)
+
+
+struct vmb_inode_info {
+	char		name[PATH_MAX];
+	size_t		name_len;
+	struct dentry	*actual_dentry;
+	struct inode	inode;
+};
+
+struct vmb_blocking_info {
+	struct list_head links;
+	atomic_t refcount;
+	struct file *blocker;
+	struct completion completion;
+	char filename[PATH_MAX];
+};
+
+struct vmb_filldir_info {
+	filldir_t filldir;
+	void *dirent;
+};
+
+static struct kmem_cache *vmb_inode_info_cache;
+static char *root = "/tmp/VMwareDnD";
+module_param(root, charp, 0600);
+MODULE_PARM_DESC(root, "The directory the file system redirects to");
+
+/*
+ * vmb_blocking_wait_on_file() is the only blocking function called from the
+ * filesystem section, so declare it here.
+ */
+static int vmb_blocking_wait_on_file(const char *filename);
+
+/*
+ * Filesystem section
+ *
+ * The following functions implement the filesystem operations necessary to
+ * present a custom view of the underlying directory.
+ *
+ */
+
+/* Filesystem utility functions */
+
+static inline struct vmb_inode_info *vmb_inode_to_iinfo(
+	struct inode *inode)
+{
+	return container_of(inode, struct vmb_inode_info, inode);
+}
+
+static inline struct dentry *vmb_inode_to_actual_dentry(
+	struct inode *inode)
+{
+	return vmb_inode_to_iinfo(inode)->actual_dentry;
+}
+
+static inline struct inode *vmb_inode_to_actual_inode(
+	struct inode *inode)
+{
+	return vmb_inode_to_actual_dentry(inode)->d_inode;
+}
+
+static int vmb_make_full_name(struct inode *dir,
+	struct dentry *dentry, char *buf_out, size_t buf_out_size)
+{
+	struct vmb_inode_info *dir_iinfo;
+	int ret;
+	int size_req;
+	const char *sep;
+	const char *dirname;
+	const char *filename;
+
+	BUG_ON(!buf_out);
+
+	/*
+	 * If dir is supplied, contruct the full path of the actual file,
+	 * otherwise it's the root directory.
+	 */
+	if (dir) {
+		BUG_ON(!dentry);
+
+		if (!dentry->d_name.name) {
+			VMBLOCK_ERROR("dentry name is empty");
+			ret = -EINVAL;
+			goto out_name;
+		}
+
+		dir_iinfo = vmb_inode_to_iinfo(dir);
+
+		dirname = dir_iinfo->name;
+		filename = dentry->d_name.name;
+	} else {
+		dirname = root;
+		filename = "";
+	}
+
+	if ((strlen(dirname) <= 1) || (strlen(filename) == 0))
+		sep = "";
+	else
+		sep = "/";
+
+	size_req = strlen(dirname) + strlen(sep) + strlen(filename) + 1;
+
+	if (size_req > buf_out_size) {
+		VMBLOCK_ERROR("path too long");
+		ret = -ENAMETOOLONG;
+		goto out_name;
+	}
+
+	snprintf(buf_out, buf_out_size, "%s%s%s", dirname, sep, filename);
+	ret = 0;
+
+out_name:
+	return ret;
+}
+
+static struct inode *vmb_get_inode(struct super_block *sb,
+	struct inode *dir, struct dentry *dentry, ino_t ino)
+{
+	struct vmb_inode_info *iinfo;
+	struct nameidata actual_nd;
+	struct inode *inode;
+
+	BUG_ON(ino < VMBLOCK_ROOT_INO);
+
+	inode = iget_locked(sb, ino);
+
+	if (!inode)
+		goto out_inode;
+
+	iinfo = vmb_inode_to_iinfo(inode);
+
+	if (inode->i_state & I_NEW) {
+		iinfo->name[0] = '\0';
+		iinfo->name_len = 0;
+		iinfo->actual_dentry = NULL;
+		unlock_new_inode(inode);
+	}
+
+	if (vmb_make_full_name(dir, dentry, iinfo->name,
+						sizeof iinfo->name) < 0) {
+		VMBLOCK_ERROR("could not make full name");
+		iput(inode);
+		goto out_inode;
+	}
+
+	if (path_lookup(iinfo->name, 0, &actual_nd)) {
+		/*
+		 * This file does not exist, so we create an inode that doesn't
+		 * know about its underlying file.  Operations that create files
+		 * and directories need an inode to operate on even if there is
+		 * no actual file yet.
+		 */
+		iinfo->actual_dentry = NULL;
+	} else {
+		iinfo->actual_dentry = actual_nd.path.dentry;
+		path_put(&actual_nd.path);
+	}
+
+out_inode:
+	return inode;
+}
+
+/* Link dentry operations */
+
+static int vmb_dentry_revalidate(struct dentry *dentry,
+	struct nameidata *nd)
+{
+	struct vmb_inode_info *iinfo;
+	struct nameidata actual_nd;
+	struct dentry *actual_dentry;
+	int ret;
+
+	if (!dentry) {
+		VMBLOCK_WARNING("invalid args from kernel");
+		return 0;
+	}
+
+	/*
+	 * If a dentry does not have an inode associated with it then
+	 * we are dealing with a negative dentry. Always invalidate a negative
+	 * dentry which will cause a fresh lookup.
+	 */
+	if (!dentry->d_inode)
+		return 0;
+
+
+	iinfo = vmb_inode_to_iinfo(dentry->d_inode);
+	if (!iinfo) {
+		VMBLOCK_WARNING("dentry has no fs-specific data");
+		return 0;
+	}
+
+	/* Block if there is a pending block on this file */
+	vmb_blocking_wait_on_file(iinfo->name);
+
+	/*
+	 * If the actual dentry has a revalidate function, we'll let it figure
+	 * out whether the dentry is still valid. If not, do a path lookup to
+	 * ensure that the file still exists.
+	 */
+	actual_dentry = iinfo->actual_dentry;
+
+	if (actual_dentry &&
+		actual_dentry->d_op &&
+		actual_dentry->d_op->d_revalidate) {
+			return actual_dentry->d_op->d_revalidate(actual_dentry,
+				nd);
+	}
+
+	if (path_lookup(iinfo->name, 0, &actual_nd)) {
+		VMBLOCK_INFO("[%s] no longer exists",	iinfo->name);
+		return 0;
+	}
+	ret = actual_nd.path.dentry &&
+		actual_nd.path.dentry->d_inode;
+	path_put(&actual_nd.path);
+
+	VMBLOCK_DEBUG("[%s] %s revalidated", iinfo->name, ret ? "" : "not");
+	return ret;
+}
+
+static struct dentry_operations vmb_link_dentry_ops = {
+	.d_revalidate = vmb_dentry_revalidate,
+};
+
+/* Link inode operations */
+
+static int vmb_readlink(struct dentry *dentry, char __user *buffer,
+	int buflen)
+{
+	struct vmb_inode_info *iinfo;
+
+	if (!dentry || !buffer) {
+		VMBLOCK_WARNING("invalid args from kernel");
+		return -EINVAL;
+	}
+
+	iinfo = vmb_inode_to_iinfo(dentry->d_inode);
+	if (!iinfo)
+		return -EINVAL;
+
+	return vfs_readlink(dentry, buffer, buflen, iinfo->name);
+}
+
+static void *vmb_follow_link(struct dentry *dentry, struct nameidata *nd)
+{
+	int ret;
+	struct vmb_inode_info *iinfo;
+
+	if (!dentry) {
+		VMBLOCK_WARNING("invalid args from kernel");
+		ret = -EINVAL;
+		goto out_link;
+	}
+
+	iinfo = vmb_inode_to_iinfo(dentry->d_inode);
+	if (!iinfo) {
+		ret = -EINVAL;
+		goto out_link;
+	}
+
+	ret = vfs_follow_link(nd, iinfo->name);
+
+out_link:
+	return ERR_PTR(ret);
+}
+
+static const struct inode_operations vmb_link_inode_ops = {
+	.readlink	= vmb_readlink,
+	.follow_link	= vmb_follow_link,
+};
+
+/* Root inode operations */
+
+static ino_t vmb_get_next_ino(void)
+{
+	static DEFINE_SPINLOCK(vmb_ino_lock);
+	static ino_t next_ino = VMBLOCK_ROOT_INO + 1;
+	ino_t ret;
+
+	spin_lock(&vmb_ino_lock);
+	ret = next_ino++;
+	spin_unlock(&vmb_ino_lock);
+
+	return ret;
+}
+
+static struct dentry *vmb_lookup(struct inode *dir,	struct dentry *dentry,
+	struct nameidata *nd)
+{
+	char *filename;
+	struct inode *inode;
+	int ret;
+
+	if (!dir || !dentry) {
+		VMBLOCK_WARNING("invalid args from kernel");
+		return ERR_PTR(-EINVAL);
+	}
+
+	/*
+	 * The kernel should only pass us our own inodes, but check just to be
+	 * safe.
+	 */
+	if (!vmb_inode_to_iinfo(dir)) {
+		VMBLOCK_WARNING("invalid inode provided");
+		return ERR_PTR(-EINVAL);
+	}
+
+	/* Get a slab from the kernel's names_cache of PATH_MAX-sized buffers */
+	filename = __getname();
+	if (!filename) {
+		VMBLOCK_WARNING("unable to obtain memory for filename");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	ret = vmb_make_full_name(dir, dentry, filename, PATH_MAX);
+	if (ret < 0) {
+		VMBLOCK_WARNING("could not construct full name");
+		__putname(filename);
+		return ERR_PTR(ret);
+	}
+
+	/* Block if there is a pending block on this file */
+	vmb_blocking_wait_on_file(filename);
+	__putname(filename);
+
+	inode = vmb_get_inode(dir->i_sb, dir, dentry,
+						vmb_get_next_ino());
+	if (!inode) {
+		VMBLOCK_WARNING("failed to get inode");
+		return ERR_PTR(-ENOMEM);
+	}
+
+	dentry->d_op = &vmb_link_dentry_ops;
+	dentry->d_time = jiffies;
+
+	/*
+	 * If the actual file's dentry doesn't have an inode, it means the file
+	 * we are redirecting to doesn't exist.  Give back the inode that was
+	 * created for this and add a NULL dentry->inode entry in the dcache.
+	 * (The NULL entry is added so ops to create files/directories are
+	 * invoked by VFS.)
+	 */
+	if (!vmb_inode_to_actual_dentry(inode) ||
+				!vmb_inode_to_actual_inode(inode)) {
+		iput(inode);
+		d_add(dentry, NULL);
+		return NULL;
+	}
+
+	inode->i_mode = S_IFLNK | S_IRWXUGO;
+	inode->i_size = vmb_inode_to_iinfo(inode)->name_len;
+	inode->i_version = 1;
+	inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+	inode->i_uid = inode->i_gid = 0;
+	inode->i_op = &vmb_link_inode_ops;
+
+	d_add(dentry, inode);
+	return NULL;
+}
+
+static const struct inode_operations vmb_root_inode_ops = {
+	.lookup	= vmb_lookup,
+};
+
+/* Root file operations */
+
+static int vmb_filldir(void *buf,
+	const char *name,
+	int namelen,
+	loff_t offset,
+	u64 ino,
+	unsigned int d_type)
+{
+	struct vmb_filldir_info *info = (struct vmb_filldir_info *)buf;
+
+	/* Specify DT_LNK regardless */
+	return info->filldir(info->dirent, name, namelen, offset, ino, DT_LNK);
+}
+
+static int vmb_readdir(struct file *file, void *dirent, filldir_t filldir)
+{
+	int ret;
+	struct vmb_filldir_info info;
+	struct file *actual_file;
+
+	if (!file) {
+		VMBLOCK_WARNING("invalid args from kernel");
+		return -EINVAL;
+	}
+
+	actual_file = file->private_data;
+	if (!actual_file) {
+		VMBLOCK_WARNING("no actual file found");
+		return -EINVAL;
+	}
+
+	info.filldir = filldir;
+	info.dirent = dirent;
+
+	actual_file->f_pos = file->f_pos;
+	ret = vfs_readdir(actual_file, vmb_filldir, &info);
+	file->f_pos = actual_file->f_pos;
+
+	return ret;
+}
+
+static int vmb_open(struct inode *inode, struct file *file)
+{
+	struct vmb_inode_info *iinfo;
+	struct file *actual_file;
+
+	if (!inode || !file || !vmb_inode_to_iinfo(inode)) {
+		VMBLOCK_WARNING("invalid args from kernel");
+		return -EINVAL;
+	}
+
+	iinfo = vmb_inode_to_iinfo(inode);
+
+	/*
+	 * Get an open file for the directory we are redirecting to.  This
+	 * ensures we can gracefully handle cases where that directory is
+	 * removed after we are mounted.
+	 */
+	actual_file = filp_open(iinfo->name, file->f_flags, file->f_flags);
+	if (IS_ERR(actual_file)) {
+		VMBLOCK_WARNING("could not open file [%s]", iinfo->name);
+		file->private_data = NULL;
+		return PTR_ERR(actual_file);
+	}
+
+	/*
+	 * If the file opened is the same as the one retrieved for the file
+	 * then we shouldn't allow the open to happen.  This can only occur if
+	 * the redirected root directory specified at mount time is the same as
+	 * where the mount is placed.  Later in vmb_readdir() we'd call
+	 * vfs_readdir() and that would try to acquire the inode's semaphore;
+	 * if the two inodes are the same we'll deadlock.
+	 */
+	if (actual_file->f_dentry && inode == actual_file->f_dentry->d_inode) {
+		VMBLOCK_WARNING("identical inode encountered, "
+				"open cannot succeed");
+		if (filp_close(actual_file, current->files) < 0)
+			VMBLOCK_WARNING("unable to close opened file");
+		return -EINVAL;
+	}
+
+	file->private_data = actual_file;
+	return 0;
+}
+
+static int vmb_release(struct inode *inode, struct file *file)
+{
+	int ret;
+	struct file *actual_file;
+
+	if (!inode || !file) {
+		VMBLOCK_WARNING("invalid args from kernel");
+		return -EINVAL;
+	}
+
+	actual_file = file->private_data;
+	if (!actual_file) {
+		VMBLOCK_WARNING("no actual file found");
+		return -EINVAL;
+	}
+
+	ret = filp_close(actual_file, current->files);
+
+	return ret;
+}
+
+static const struct file_operations vmb_root_file_ops = {
+	.readdir	= vmb_readdir,
+	.open		= vmb_open,
+	.release	= vmb_release,
+};
+
+/* Super block operations */
+
+static struct inode *vmb_alloc_inode(struct super_block *sb)
+{
+	struct vmb_inode_info *iinfo;
+
+	iinfo = kmem_cache_alloc(vmb_inode_info_cache, GFP_KERNEL);
+	if (!iinfo) {
+		VMBLOCK_ERROR("could not allocate inode info");
+		return NULL;
+	}
+
+	return &iinfo->inode;
+}
+
+static void vmb_destroy_inode(struct inode *inode)
+{
+	kmem_cache_free(vmb_inode_info_cache, vmb_inode_to_iinfo(inode));
+}
+
+static int vmb_statfs(struct dentry *dentry, struct kstatfs *stat)
+{
+	if (!stat)
+		return -EINVAL;
+
+	stat->f_type = VMBLOCK_SUPER_MAGIC;
+	stat->f_bsize = 0;
+	stat->f_namelen = NAME_MAX;
+	stat->f_blocks = 0;
+	stat->f_bfree = 0;
+	stat->f_bavail = 0;
+
+	return 0;
+}
+
+static const struct super_operations vmb_super_ops = {
+	.alloc_inode	= vmb_alloc_inode,
+	.destroy_inode	= vmb_destroy_inode,
+	.statfs		= vmb_statfs,
+};
+
+/* File system operations */
+
+static int vmb_fill_super(struct super_block *sb, void *data, int silent)
+{
+	struct inode *root_inode;
+	struct dentry *root_dentry;
+
+	sb->s_magic = VMBLOCK_SUPER_MAGIC;
+	sb->s_blocksize = VMBLOCK_BLOCKSIZE;
+	sb->s_op = &vmb_super_ops;
+
+	root_inode = vmb_get_inode(sb, NULL, NULL, VMBLOCK_ROOT_INO);
+
+	if (!root_inode)
+		return -EINVAL;
+
+	if (!vmb_inode_to_iinfo(root_inode) ||
+		!vmb_inode_to_actual_dentry(root_inode) ||
+		!vmb_inode_to_actual_inode(root_inode) ||
+		!S_ISDIR(vmb_inode_to_actual_inode(root_inode)->i_mode)) {
+		iput(root_inode);
+		return -EINVAL;
+	}
+
+	root_dentry = d_alloc_root(root_inode);
+	if (!root_dentry) {
+		iput(root_inode);
+		return -ENOMEM;
+	}
+	sb->s_root = root_dentry;
+
+	root_inode->i_op = &vmb_root_inode_ops;
+	root_inode->i_fop = &vmb_root_file_ops;
+	root_inode->i_mode = S_IFDIR | S_IRUGO | S_IXUGO;
+
+	VMBLOCK_INFO("%s: file system mounted", VMBLOCK_FS_NAME);
+
+	return 0;
+}
+
+static int vmb_get_sb(struct file_system_type *fs_type,
+	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
+{
+	return get_sb_nodev(fs_type, flags, data, vmb_fill_super, mnt);
+
+}
+
+static struct file_system_type vmb_fs_type = {
+	.owner		= THIS_MODULE,
+	.name		= VMBLOCK_FS_NAME,
+	.get_sb		= vmb_get_sb,
+	.kill_sb	= kill_anon_super,
+};
+
+static void vmb_inode_info_cache_ctor(void *slab_elem)
+{
+	struct vmb_inode_info *iinfo = slab_elem;
+
+	inode_init_once(&iinfo->inode);
+}
+
+static int __init vmb_fs_init(const char *fsroot)
+{
+	int err;
+
+	if (!fsroot) {
+		VMBLOCK_ERROR("no root specified (missing module param?)");
+		err = -EINVAL;
+		goto out_fs;
+	}
+
+	vmb_inode_info_cache = kmem_cache_create(
+				"vmb_inode_info_cache",
+				sizeof(struct vmb_inode_info), 0,
+				SLAB_HWCACHE_ALIGN,
+				vmb_inode_info_cache_ctor);
+
+	if (!vmb_inode_info_cache) {
+		VMBLOCK_ERROR("could not initialize inode cache");
+		err = -ENOMEM;
+		goto out_fs;
+	}
+
+	err = register_filesystem(&vmb_fs_type);
+	if (err) {
+		VMBLOCK_ERROR("could not register filesystem");
+		kmem_cache_destroy(vmb_inode_info_cache);
+		goto out_fs;
+	}
+
+out_fs:
+	return err;
+}
+
+static void __exit vmb_fs_cleanup(void)
+{
+	unregister_filesystem(&vmb_fs_type);
+
+	kmem_cache_destroy(vmb_inode_info_cache);
+}
+
+/*
+ * Blocking section
+ *
+ * The following functions implement the blocking functionality.
+ */
+
+static LIST_HEAD(vmb_blocked_files);
+static rwlock_t vmb_blocked_files_lock;
+static struct kmem_cache *vmb_blocking_info_cache;
+
+static int vmb_blocking_init(void)
+{
+	BUG_ON(vmb_blocking_info_cache);
+
+	vmb_blocking_info_cache = kmem_cache_create("vmb_blocking_info_cache",
+					sizeof(struct vmb_blocking_info), 0,
+					SLAB_HWCACHE_ALIGN, NULL);
+	if (!vmb_blocking_info_cache)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&vmb_blocked_files);
+	rwlock_init(&vmb_blocked_files_lock);
+
+	return 0;
+}
+
+static void vmb_blocking_cleanup(void)
+{
+	BUG_ON(!vmb_blocking_info_cache);
+	BUG_ON(!list_empty(&vmb_blocked_files));
+
+	kmem_cache_destroy(vmb_blocking_info_cache);
+}
+
+static struct vmb_blocking_info *vmb_blocking_alloc_block(
+	struct kmem_cache *cache,
+	const char *filename,
+	const struct file *blocker)
+{
+	struct vmb_blocking_info *block;
+	size_t ret;
+
+	/* Initialize this file's block structure. */
+	block = kmem_cache_alloc(vmb_blocking_info_cache, GFP_KERNEL);
+	if (!block)
+		goto out_alloc_block;
+
+	ret = strlcpy(block->filename, filename, sizeof block->filename);
+	if (ret >= sizeof block->filename) {
+		VMBLOCK_WARNING("filename is too large");
+		kmem_cache_free(vmb_blocking_info_cache, block);
+		goto out_alloc_block;
+	}
+
+	INIT_LIST_HEAD(&block->links);
+	atomic_set(&block->refcount, 1);
+	init_completion(&block->completion);
+	block->blocker = (struct file *)blocker;
+
+out_alloc_block:
+	return block;
+}
+
+static void vmb_blocking_free_block(struct kmem_cache *cache,
+	struct vmb_blocking_info *block)
+{
+	BUG_ON(!cache);
+	BUG_ON(!block);
+
+	kmem_cache_free(cache, block);
+}
+
+static struct vmb_blocking_info *vmb_blocking_get_block(
+	const char *filename,
+	const struct file *blocker)
+{
+	struct vmb_blocking_info *curr_block;
+
+	list_for_each_entry(curr_block, &vmb_blocked_files, links) {
+		if ((blocker == VMBLOCK_UNKNOWN_BLOCKER ||
+			blocker == curr_block->blocker) &&
+			strcmp(curr_block->filename, filename) == 0) {
+			atomic_inc(&curr_block->refcount);
+			return curr_block;
+		}
+	}
+
+	return NULL;
+}
+
+static bool vmb_block_exists(const char *filename)
+{
+	struct vmb_blocking_info *block = vmb_blocking_get_block(filename,
+						VMBLOCK_UNKNOWN_BLOCKER);
+
+	if (block) {
+		/* get_block above will have incremented refcount */
+		atomic_dec(&block->refcount);
+		return true;
+	}
+
+	return false;
+}
+
+static int vmb_blocking_add_file_block(const char *filename,
+	const struct file *blocker)
+{
+	int ret = 0;
+	struct vmb_blocking_info *block;
+
+	BUG_ON(!filename);
+
+	/* Create a new block. */
+	block = vmb_blocking_alloc_block(vmb_blocking_info_cache, filename,
+								blocker);
+	if (!block) {
+		VMBLOCK_WARNING("out of memory");
+		ret = -ENOMEM;
+		goto out_add_file_block;
+	}
+	write_lock(&vmb_blocked_files_lock);
+
+	/*
+	 * Prevent duplicate blocks of any filename.  Done under same lock
+	 * as list addition to ensure check for and adding of file are atomic.
+	 */
+	if (vmb_block_exists(filename)) {
+		VMBLOCK_WARNING("block already exists for [%s]", filename);
+		write_unlock(&vmb_blocked_files_lock);
+		vmb_blocking_free_block(vmb_blocking_info_cache, block);
+		ret = -EEXIST;
+		goto out_add_file_block;
+	}
+
+	list_add_tail(&block->links, &vmb_blocked_files);
+
+	write_unlock(&vmb_blocked_files_lock);
+
+	VMBLOCK_INFO("added block for [%s]", filename);
+
+out_add_file_block:
+	return ret;
+}
+
+static void vmb_blocking_free_complete(const struct file *blocker,
+	struct vmb_blocking_info *block)
+{
+	/*
+	 * struct vmb_blocking_info's, as the result of placing
+	 * a block on a file or directory, reference
+	 * themselves.  When the block is lifted, we need to
+	 * remove this self-reference and handle the result
+	 * appropriately.
+	 */
+	if (atomic_dec_and_test(&block->refcount)) {
+		/* Free blocks without any waiters ... */
+		VMBLOCK_INFO("Freeing block with no waiters for blocker "
+					"[%p] (%s)", blocker, block->filename);
+		vmb_blocking_free_block(vmb_blocking_info_cache, block);
+	} else {
+		/* ... or wakeup the waiting threads */
+		VMBLOCK_INFO("Completing block for blocker "
+					"[%p] (%s)", blocker, block->filename);
+		complete_all(&block->completion);
+	}
+}
+
+static int vmb_blocking_remove_file_block(const char *filename,
+	const struct file *blocker)
+{
+	int ret = 0;
+	struct vmb_blocking_info *block;
+
+	BUG_ON(!filename);
+
+	write_lock(&vmb_blocked_files_lock);
+
+	block = vmb_blocking_get_block(filename, blocker);
+	if (!block) {
+		write_unlock(&vmb_blocked_files_lock);
+		ret = -ENOENT;
+		goto out_remove_file_block;
+	}
+
+	list_del(&block->links);
+	write_unlock(&vmb_blocked_files_lock);
+
+	/* Undo vmb_blocking_get_block's refcount increment first */
+	atomic_dec(&block->refcount);
+
+	/*
+	 * Now remove /our/ reference (as opposed to references by waiting
+	 * threads)
+	 */
+	vmb_blocking_free_complete(blocker, block);
+
+out_remove_file_block:
+	return ret;
+}
+
+static unsigned int vmb_blocking_remove_all_blocks(const struct file *blocker)
+{
+	struct vmb_blocking_info *curr_block, *tmp;
+	/* struct list_head *tmp; */
+	unsigned int removed = 0;
+
+	write_lock(&vmb_blocked_files_lock);
+
+	list_for_each_entry_safe(curr_block, tmp, &vmb_blocked_files, links) {
+		if (blocker == curr_block->blocker ||
+			blocker == VMBLOCK_UNKNOWN_BLOCKER) {
+
+			list_del(&curr_block->links);
+
+			/*
+			 * We count only entries removed from the -list-,
+			 * regardless of whether or not other waiters exist.
+			 */
+			++removed;
+
+			vmb_blocking_free_complete(blocker, curr_block);
+		}
+	}
+
+	write_unlock(&vmb_blocked_files_lock);
+
+	return removed;
+}
+
+static int vmb_blocking_wait_on_file(const char *filename)
+{
+	struct vmb_blocking_info *block;
+	int error = 0;
+
+	BUG_ON(!filename);
+
+	read_lock(&vmb_blocked_files_lock);
+	block = vmb_blocking_get_block(filename, VMBLOCK_UNKNOWN_BLOCKER);
+	read_unlock(&vmb_blocked_files_lock);
+
+	if (!block)
+		/* This file is not blocked, just return */
+		goto out_wait_on_file;
+
+	VMBLOCK_INFO("(%d) Waiting for completion on [%s]",
+						current->pid, filename);
+
+	wait_for_completion(&block->completion);
+
+	VMBLOCK_INFO("(%d) Wokeup from block on [%s]",
+						current->pid, filename);
+
+	/*
+	 * The assumptions here are as follows:
+	 *   1.	The struct vmb_blocking_info holds a reference to itself.
+	 *	(struct vmb_blocking_info's refcount is initialized to 1)
+	 *   2.	struct vmb_blocking_info's self reference is deleted only when
+	 *	it is /also/ removed removed from the block list.
+	 *
+	 * Therefore, if the reference count hits zero, it's because the block
+	 * is no longer in the list, and there is no chance of another thread
+	 * finding and referencing this block between our dec_and_test and
+	 * freeing it.
+	 */
+	if (atomic_dec_and_test(&block->refcount)) {
+		/* We were the last thread, so clean up */
+		VMBLOCK_INFO("(%d) I am the last to wakeup, freeing the "
+				"block on [%s]", current->pid, filename);
+		vmb_blocking_free_block(vmb_blocking_info_cache, block);
+	}
+
+out_wait_on_file:
+	return error;
+}
+
+#ifdef CONFIG_VMBLOCK_DEBUG
+static void vmb_blocking_list_file_blocks(void)
+{
+	struct vmb_blocking_info *curr_block;
+	int count = 0;
+
+	read_lock(&vmb_blocked_files_lock);
+
+	list_for_each_entry(curr_block, &vmb_blocked_files, links) {
+		VMBLOCK_DEBUG("(%d) Filename: [%s], Blocker: [%p]",
+			 count++, curr_block->filename, curr_block->blocker);
+	}
+
+	read_unlock(&vmb_blocked_files_lock);
+
+	if (!count)
+		VMBLOCK_DEBUG("No blocks currently exist");
+}
+#endif
+
+/*
+ * Control section
+ *
+ * The following functions implement the control functionality.
+ */
+
+static ssize_t vmb_control_write(struct file *file, const char __user *buf,
+	size_t cmd, loff_t *ppos)
+{
+	long ret;
+	ssize_t i;
+	char *filename;
+
+#ifdef CONFIG_VMBLOCK_DEBUG
+	if (cmd == VMBLOCK_LIST_FILEBLOCKS) {
+		vmb_blocking_list_file_blocks();
+		return 0;
+	}
+#endif
+
+	filename = getname(buf);
+	ret = PTR_ERR(filename);
+	if (IS_ERR(filename)) {
+		VMBLOCK_WARNING("Could not get filename from user buffer");
+		goto exit;
+	}
+
+	/* Remove all trailing path separators. */
+	for (i = strlen(filename) - 1; i >= 0 && filename[i] == '/'; i--)
+		filename[i] = '\0';
+
+	if (i < 0) {
+		ret = -EINVAL;
+		goto exit_putname;
+	}
+
+	switch (cmd) {
+	case VMBLOCK_ADD_FILEBLOCK:
+		ret = vmb_blocking_add_file_block(filename, file);
+		break;
+	case VMBLOCK_DEL_FILEBLOCK:
+		ret = vmb_blocking_remove_file_block(filename, file);
+		break;
+	default:
+		VMBLOCK_WARNING("unrecognized command (%u) received",
+				  (unsigned)cmd);
+		ret = -EINVAL;
+		break;
+	}
+
+
+exit_putname:
+	putname(filename);
+
+exit:
+	return ret;
+}
+
+static int vmb_control_release(struct inode *inode,	struct file *file)
+{
+	vmb_blocking_remove_all_blocks(file);
+
+	return 0;
+}
+
+static const struct file_operations vmb_control_file_ops = {
+	.owner		= THIS_MODULE,
+	.write		= vmb_control_write,
+	.release	= vmb_control_release,
+};
+
+static struct proc_dir_entry *vmb_control_proc_dir_entry;
+
+static int vmb_control_proc_init(void)
+{
+	int ret = 0;
+
+	struct proc_dir_entry *control_proc_entry;
+	struct proc_dir_entry *control_proc_mountpoint;
+
+	/* Create /proc/fs/vmblock */
+	vmb_control_proc_dir_entry = proc_mkdir(
+					VMBLOCK_CONTROL_PROC_DIRNAME, NULL);
+
+	if (!vmb_control_proc_dir_entry) {
+		VMBLOCK_WARNING("could not create /proc/"
+					VMBLOCK_CONTROL_PROC_DIRNAME);
+		ret = -EINVAL;
+		goto out_control_proc_init;
+	}
+
+	vmb_control_proc_dir_entry->owner = THIS_MODULE;
+
+	/* Create /proc/fs/vmblock/mountPoint */
+	control_proc_mountpoint = proc_mkdir(VMBLOCK_CONTROL_MOUNTPOINT,
+					vmb_control_proc_dir_entry);
+
+	if (!control_proc_mountpoint) {
+		VMBLOCK_WARNING("could not create " VMBLOCK_MOUNT_POINT);
+		remove_proc_entry(VMBLOCK_CONTROL_PROC_DIRNAME, NULL);
+		ret = -EINVAL;
+		goto out_control_proc_init;
+	}
+
+	control_proc_mountpoint->owner = THIS_MODULE;
+
+	/* Create /proc/fs/vmblock/dev */
+	control_proc_entry = create_proc_entry(VMBLOCK_CONTROL_DEVNAME,
+			VMBLOCK_CONTROL_MODE, vmb_control_proc_dir_entry);
+
+	if (!control_proc_entry) {
+		VMBLOCK_WARNING("could not create " VMBLOCK_DEVICE);
+		remove_proc_entry(VMBLOCK_CONTROL_MOUNTPOINT,
+					vmb_control_proc_dir_entry);
+		remove_proc_entry(VMBLOCK_CONTROL_PROC_DIRNAME, NULL);
+		ret = -EINVAL;
+		goto out_control_proc_init;
+	}
+
+	control_proc_entry->proc_fops = &vmb_control_file_ops;
+
+out_control_proc_init:
+	return ret;
+}
+
+static void vmb_control_proc_cleanup(void)
+{
+	if (vmb_control_proc_dir_entry) {
+		remove_proc_entry(VMBLOCK_CONTROL_MOUNTPOINT,
+					vmb_control_proc_dir_entry);
+		remove_proc_entry(VMBLOCK_CONTROL_DEVNAME,
+					vmb_control_proc_dir_entry);
+		remove_proc_entry(VMBLOCK_CONTROL_PROC_DIRNAME, NULL);
+	}
+}
+
+static int __init vmb_control_init(void)
+{
+	int ret;
+
+	ret = vmb_blocking_init();
+	if (ret < 0) {
+		VMBLOCK_WARNING("could not initialize blocking ops");
+		goto out_control_init;
+	}
+
+	ret = vmb_control_proc_init();
+	if (ret < 0) {
+		VMBLOCK_WARNING("could not setup proc device");
+		vmb_blocking_cleanup();
+	}
+
+out_control_init:
+	return ret;
+}
+
+static void vmb_control_cleanup(void)
+{
+	vmb_control_proc_cleanup();
+	vmb_blocking_cleanup();
+}
+
+/*
+ * Module section
+ *
+ * The following functions implement the usual kernel module entry and exit
+ * points.
+ */
+
+static int __init vmb_init(void)
+{
+	int err = vmb_control_init();
+	if (err)
+		goto out;
+
+	err = vmb_fs_init(root);
+
+	if (err)
+		vmb_control_cleanup();
+
+out:
+	return err;
+}
+
+static void __exit vmb_exit(void)
+{
+	vmb_fs_cleanup();
+	vmb_control_cleanup();
+}
+
+module_init(vmb_init);
+module_exit(vmb_exit);
+
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("VMware, Inc.");
+MODULE_DESCRIPTION("VMware Blocking File System");
diff --git a/fs/Makefile b/fs/Makefile
index a1482a5..b42fe8c 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -122,3 +122,4 @@ obj-$(CONFIG_HPPFS)		+= hppfs/
 obj-$(CONFIG_DEBUG_FS)		+= debugfs/
 obj-$(CONFIG_OCFS2_FS)		+= ocfs2/
 obj-$(CONFIG_GFS2_FS)           += gfs2/
+obj-$(CONFIG_VMBLOCK_FS)	+= vmblock/
diff --git a/include/linux/Kbuild b/include/linux/Kbuild
index b68ec09..e6b6bd0 100644
--- a/include/linux/Kbuild
+++ b/include/linux/Kbuild
@@ -161,6 +161,7 @@ header-y += veth.h
 header-y += video_decoder.h
 header-y += video_encoder.h
 header-y += videotext.h
+header-y += vmblock.h
 header-y += x25.h
 
 unifdef-y += acct.h
diff --git a/include/linux/vmblock.h b/include/linux/vmblock.h
new file mode 100644
index 0000000..e70ad81
--- /dev/null
+++ b/include/linux/vmblock.h
@@ -0,0 +1,49 @@
+/*********************************************************
+ * Copyright (C) 2006 VMware, Inc. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation version 2 and no later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St, Fifth Floor, Boston, MA  02110-1301 USA
+ *
+ *********************************************************/
+
+/*
+ * vmblock.h --
+ *
+ *   User-level interface to the vmblock device.
+ */
+
+#ifndef _LINUX_VMBLOCK_H
+#define _LINUX_VMBLOCK_H
+
+#define VMBLOCK_FS_NAME		"vmblock"
+
+/* Commands for the control half of vmblock driver */
+#define VMBLOCK_ADD_FILEBLOCK		 98
+#define VMBLOCK_DEL_FILEBLOCK		 99
+#ifdef CONFIG_VMBLOCK_DEBUG
+#	define VMBLOCK_LIST_FILEBLOCKS	 100
+#endif
+#define VMBLOCK_CONTROL_DIRNAME	VMBLOCK_FS_NAME
+#define VMBLOCK_CONTROL_DEVNAME	"dev"
+#define VMBLOCK_CONTROL_MOUNTPOINT	"mountPoint"
+#define VMBLOCK_CONTROL_PROC_DIRNAME	"fs/" VMBLOCK_CONTROL_DIRNAME
+
+#define VMBLOCK_MOUNT_POINT	"/proc/" VMBLOCK_CONTROL_PROC_DIRNAME \
+				"/" VMBLOCK_CONTROL_MOUNTPOINT
+#define VMBLOCK_DEVICE		"/proc/" VMBLOCK_CONTROL_PROC_DIRNAME   \
+				"/" VMBLOCK_CONTROL_DEVNAME
+#define VMBLOCK_DEVICE_MODE	O_WRONLY
+#define VMBLOCK_CONTROL(fd, op, path)  write(fd, path, op)
+#define VMBLOCK_CONTROL_MODE	(S_IRUSR | S_IFREG)
+
+#endif /* _LINUX_VMBLOCK_H */
-- 
1.5.5.1.308.g1fbb5

RE: [open-vm-tools-devel] Getting the kernel modules into Linux mainline

From: Adar D. <ad...@vm...> - 2008-11-18 20:57:25

> Just for completeness, here's the updated patch for vmblock, which I've
> run through the vmblocktest tool that Adar posted.  Seems a shame not to
> post it somewhere so you guys might as well have it!
> 
> It's checkpatch and sparse clean, and applies against 2.6.27.

Thanks, Chris. When the vmblock-fuse code gets added to the open-vm-tools we can bring this patch out and talk about what happens next.

Just out of curiosity, how long did this cleanup effort take? We're trying to gauge how much work is involved in preparing a module for inclusion upstream, and since you're the first to try, your feedback would be appreciated.

RE: [open-vm-tools-devel] Getting the kernel modules into Linux mainline

From: Chris M. <ma...@ch...> - 2008-11-18 22:03:33

On Tue, 2008-11-18 at 12:57 -0800, Adar Dembo wrote:
> > Just for completeness, here's the updated patch for vmblock, which I've
> > run through the vmblocktest tool that Adar posted.  Seems a shame not to
> > post it somewhere so you guys might as well have it!
> > 
> > It's checkpatch and sparse clean, and applies against 2.6.27.
> 
> Thanks, Chris. When the vmblock-fuse code gets added to the open-vm-tools we can bring this patch out and talk about what happens next.
> 
> Just out of curiosity, how long did this cleanup effort take? We're trying to gauge how much work is involved in preparing a module for inclusion upstream, and since you're the first to try, your feedback would be appreciated.
> 

Probably took me longer than somebody who was already familiar with the
code; I systematically went through each file in turn, doing the easy
stuff like whitespace first and committing after each one, using
checkpatch.pl as a guide to how much work was left to complete (i.e. the
number of lines with warnings / total LOC). I had a script to run
checkpatch, compile with sparse, compare object code and print out a
"percent done" as an incentive! Took about 3 weeks (doing an hour or two
per day) to get to the state of the original patch, where it was clean
of warnings but still looked a bit "alien" wrt other kernel code.

By that time I was familiar enough with the code that I could start with
an empty file and (using similar modules like romfs as a guide) build it
up from the bottom with linux kernel-style identifiers and paste only
the bare minimum code. That was pretty quick, a few days maybe.

Since then I've just had to diff -rup the vmblock directory in the
subsequent open-vm-tools releases to grab the latest relevant stuff
which was pretty straightforward, and probably not a huge amount of
ongoing maintenance effort.

If I was going to do it again, if the source looks like it needs
substantial rework, then there isn't much value in hundreds of
individual commits; I think I'd jump straight into creating the module
from scratch and pasting the relevant code snippets directly.  Some
documentation and test harnesses would be really really useful here,
because then you could be a bit less paranoid about diffing object code
at each stage.

cheers
Chris