
Tim thinks people would think we used their source anyway, so we might as
well use it and then go the Open Source route.

It looks like for mount and  unmount to work, 'stat' on root needs to work,
which means vfs_root needs to work. And if we fake vfs_root, then stat
is gonna try a VOP_GETATTR which also has to work. Sigh.

Since I'm using their source code anyway, I'm also trying to keep the structures
close to the linux ones. It might make it easier to pull in their code,
and to get future changes.

ext2_read_inode is the one that reads the inode from the device.

It was panic'ing in a bad mutex_destroy after a mount/unmount/mount/unmount
sequence. This _might_ be because unmount wasn't destroying inodes for
that filesystem.

looks like a 'cat' of 'hosts' will result in an ext2fs_map/ext2fs_read pair.
Perhaps the read is the fallback case.

We weren't preventing unmounts of busy filesystems. Now unmount prevents
the unmount if any referenced inodes are in the cache. But that seems
to prevent unmount completely. Sigh. Maybe ext2fs_inactive needs to work.

need to examine
	- proper locking
	- proper logical<->physical block mapping
	- proper VN_HOLD/VN_RELE.
	- proper b_flags.  "B_STALE | B_AGE" is not always right.

we weren't dealing with indirect blocks right because we never read them
in (I missed the call to ll_rw_block in block_getblk. Right now we just
go read it in unconditionally, which is probably wrong).

It doesn't seem to work on sparc. The magic number is backwards, but it looks
like it really should be in native format (and linux on sparc is indeed
big-endian). But aha! the 5.1 redhat ext2 source _does_ swap all the on-disk
fields. It _is_ little-endian there too. I dunno if it's worth doing that
here. It sucks that 5.1 diverged for SPARC and they didn't fold the changes
back though (using swab macros that they #ifdef to nothing on x86).
Now it works. Even 64-bit.

Some of the unmount problems were because we were destroying the root
vnode while purging the inode cache, but then would VN_RELE it.
Also, in lookup, we were VN_HOLD the vnode we got back from iget, even though
iget already held it (which made the unmount return EBUSY - plus we needed
to special case the root vnode when checking for active inodes, since it
will be held).

The terrible speed problem was because we were rereading blocks
unconditionally. I think this would particularly suck when searching
indirect blocks, since we'd read each again for every access.  So now,
we just only call strategy if B_DONE isn't set, which makes it fast.
But we might want to move away from the Linx way of passing block info around
in a bp, then calling read later, and just go use bread(). ext2_getblk
and friends could just pass around a block number wheich ext2fs_bread
could then use with bread().

It seems like, if you unmount and remount the filestyem, you'll get crap
on subsequent accesses to files (i.e., cat'ing fstab might give you
lilo.conf, and cat'ing inetd.conf might give you fstab). Seems like we
need to call bflush and bfinval on the vfs_dev? We do now, but that doesn't
seem to help. Maybe we need to do something else. It does seem like unmounting
and modunloading aren't terribly stable.  Maybe we're deleting active
inodes? Sigh.

I think we should be freeing memory in ext2fs_inactive. Otherwise, the inodes
may never get freed except when unmounting. Maybe we should just dump the
cache and do the free in inactive always? inactive wasn't quite right, _plus_
we weren't calling pvn_vplist_dirty to invalidate the pages on the vnode
going away.

Files with holes in them didn't work, since the getblk routines would fail.
We needed to detect that, plus the EFBIG errno from them (maybe we should
pick another to make it clearer?). And block_getblk wasn't brelse'ing
the passed-in buf on error, so the next access would block forever. I really
think we should stop passing struct buf's around and just deal in daddr_t's,
using bread (which would hide the B_DONE testing too).

Running WordPerfect 8 caused a recursive mutex_enter in the read routine.
It looks like we took a fault in during the uiomove() there, while holding
fsp->e_lock. Then we tried to fill the buffer by paging it in, but we
grabbed the lock there too (death). So now we grab the inodes rwlock,
which may or may not be right.

The 'sync' problem seems to be because of inodes left in the cache
from NFS (it vgot them, but never released them). So sync needs to
dump inodes from the cache. But it probably shouldn't dump all of them
for that fs, like I did (and that should be a function anyway). and my sync
doesn't actually deal with the NULL (panic?) vfsp case. sync'ing is
clearly still not right.

We were also panic'ing when starting netscape/wp8, then bringing up and
exiting gnome. This was because we eventually called page_pptonum on
a NULL pointer and died, which seemed to be because we had a buf pointer
with a NULL b_pages field but B_PAGEIO set. This was because pvn_read_kluster
was returning NULL, but we drove on and setup a buf and called strategy. Sigh.
So now we go back to reread which retries the page_find.

Seems like when rebooting, if we have one ext2fs filesystem mounted on
top of another (/linux and /linux/usr/src), we get an assertion failure.
We're probably not holding the right lock.

panic: assertion failed: coveredvp == NULL || vn_vfswlock_held(coveredvp), file: ../../common/fs/vfs.c, line: 440
stopped at      0xfbd01028:     ta      0x7d

perhaps we've destroyed a covered vnode. No doubt 'sync' has removed it
from the cache :-( We probably have to do the IREF flag and all. Sigh.

I need to add some of the original ext2fs copyrights back. Even though it's
copied in pieces, they need to be there.

ext2fs can't mount RH6.1 partitions - it does this:

	NOTICE: ext2fs: Block bitmap for group 0 not in group (block 1048576)!
	NOTICE: ext2fs: group descriptors corrupted!
	I wonder if they use a different blocksize, or if I just have some
	other bug? Perhaps I should investigate more wrappers around the
	linux functions anyway, since that would make it easier to stay
	in sync. Probably time for a rewrite anyway, as much of what I did
	was for byte-swapping which they didn't have, but now they do.
	They _are_ doing 4k blocks/fragments.

	This is because of the block size dance we do at mount.
	We read the superblock assuming a filesystem block size of 1024,
	then if it's different, we change to that size (4096). But that
	mean the block the superblock is in shifts size, and what was
	block 1 is now part of block 0, so places that were adding the
	superblock offsets are now wrong. Sigh. I hacked that up to
	not add the superblock block #'s if the sizes didn't match,
	but we should do more validation really.

	The image of stonekeep's root failed at the 2 gig mark, presumably
	dd on their side isn't large-file-aware. So I'll have to test it
	directly on stargazer with RH6.1.

	My fix seems to work.

	RH6.1 also wants to put non-boot partitions in extended ones,
	so ext2fs should probably do the :c dance pcfs does (and share
	the code).

	ext2fs couldn't handle Mandrake 7.0 partitions either. This seems
	to be two things:
		1. It had the 'large file' support bit set in the superblock,
		   so we didn't mount it. This means it uses the directory
		   acl field of the inode for the high 32-bits, so now we
		   allow that (by shifting in the high bits for non-dirs).
		2. The directory listings were corrupt. This is because they
		   seem to have stolen a byte from the name_len field, which
		   the new filesystem must be using. That seems to be ok 
		   now, though we don't deal with that field yet. They must
		   mke2fs with '-O filetype'. Apparently newer mke2fs's
		   enable this and "sparse super blocks" by default. I wonder
		   what this file_type stuff is actually useful for?
	Mandrake is also ultra-secure by default, I guess, which confuses
	things because one cannot read /usr or /usr/bin except as root.

	It would be much better to just have a wrapper around their code.
	I don't know if we can do that and still create both a 32 and
	64-bit version, but we'll see.

