At 12:55 09/07/02, Richard Russon wrote:
>There hasn't been a release of the NTFS tools for some time and Anton's
>mentioned that he'd like to push for a 2.0 release some time soon.
Yep. Although I should add here that I am going to go via a beta phase,
i.e. release 1.9.99beta or simillar first and if noone complains release
2.0.0 a few weeks later. Also before releasing the beta, I want the code to
be in freeze for a few weeks (at least the to be released code - other
utilities can continue development; to be released code can continue
develepment in a different CVS branch). During the freeze I envision
testing of the utilities to happen as well as possibly writing test
utilities for the library.
Please take my comments below as RFC, too. I am just saying what I think...
As usual I do so in a very long winded way and this post is now getting
close to winning the longest email ever prize. (-;
>I've got some ideas for changes, that I think we ought to do before we
>make a major release. Some are trivial, some will require a bit of thought.
Cool. I was hoping to get comments from as many people as possible so we
don't need another rewrite and a 3.0 release in a few months... (-;
>attrib.c is getting much too big. it's more than triple the size of
>anything else. I want to move the the runlist code into runlist.c
>(about a third of the code).
Great idea. I like the name. I take it you volunteer to do the split? (-;
btw. This is so obvious that it is silly to write it so I appologise in
advance, but please use a runlist.h header as well...
>Anton's done a great job of sorting out the APIs,
Thanks. (-:
>but there are a few places where the function naming isn't consistant.
Yes, I know. I have noticed that, too. It kind of just happened is the
explanation. I agree we need to work on that.
> e.g. ntfs_lookup_attr -> ntfs_attr_lookup, etc. It looks a bit reverse
> Polish, but the others are named that way.
Yes, I like that. First ntfs_, second the structure we operate on (here
attr_), then the operation name.
I would like to see that applied to all of the API functions. (Please only
ignore the remnants of the old library in attrib.c, i.e.
get_attrbibute_value and set_attribute_value and there were some others I
think, they are destined for /dev/null as soon as working replacements
exist and are in use throughout, I think we may have the replacements
already, just all users will need updating...)
>A few C things: Do we need to declare basic types as const in function
>declarations? e.g. function (const int i) The variable is passed by
>value, so it doesn't matter to the outside. Can any compilers do
>anything extra knowing the value is const?
gcc-2.x doesn't take advantage of this (so I am told). Don't know about
gcc-3.x. I would like to have const everywhere possible in order to
emphasize the "IN" parameter type but it is not strictly necessary. Do you
have any particular reason not to want to put const? - I only don't put
const when the compiler complains about it and/or when I want to recycle a
variable...
The only thing gcc uses const for is to place static/global variables in
read-only memory so they physically cannot be written to at all which is nice.
>Do we _need_ to use "register" anywhere?
gcc-2.x completely ignores "register". Other compilers don't ignore it. We
don't strictly need to use it, but please keep the existing ones. There are
only 4 (or 5?) present and they are located in very specially chosen places
based on assembler level code optimizations.
>malloc/calloc/realloc/free. Can I rename these to ntfs_*? Then in
>debug we can easily intercept them and perform some memory tracking.
I would prefer not to. If we want to do something with memory we should do
it properly with ElectricFence or simillar tools. They are way more
powerful than we could ever make them without major efforts on our part
which would be a waste of our time IMO.
>Pointers. At the moment, we have control of all the source, but we have
>written a library. We need to check all pointers given to us from the
>outside.
Correct. I believe all externally accessible functions are checked.
Anywhere where they are not, I have missed something... I don't think there
will be too many instances where we need to modify this. There may be some
older functions but certainly all the new functions do all checking
possible. I have been paying a lot of attention to make sure of that. Note
in some places null checks do not exist but that is because the parameter
is allowed to be null...
Note I don't want to checking (beyond catching our own bugs) to
INTERNAL-flagged functions. The functions calling them should already have
done all checking necessary.
>What form should the checking take? if (!ptr) return error;
>or perhaps BUG_ON(!ptr);
IMO, a library should never, ever, do things like BUG(). A userspace
library or utility that crashes like that is a Bad Thing(TM).
if (!ptr) {
// do any necessary cleanup
errno = EINVAL;
return -1;
}
is the way to go for a library, i.e. always report errors via errno and
returning -1. Obviously EINVAL is only for generic parameters being wrong.
I always refernce man errno and check what the most appropriate error code
for the current sitution is. e.g. file descriptor int f, if !f errno = EBADF...
Obviously there are some functions where we can't use -1 as an error return
value. Special arrangements are required there. Examples include boolean
functions which will return non-zero (true) on success.
>We need to document the comment blocks in the coding style
Would be a good idea.
>Error handling. At the moment it's completely unstructured.
That is not quite true...
>On an error do we return 0, -1?
Depends on the function. Usually -1.
>Do we set errno?
For all functions with return -1, definitely. For others, it depends, but
usually yes.
>When should the library display any output?
Good question. Ideally I would like to have the ability to compile or not
compile debug output as it is now. In addition, I would like to be able to
enable/disable error output via an option of some kind. (No I don't know
how that is going to happen and if at all)
In general I think a library should not output messages at all. It should
return an error and the program using the library should decide what it
would like to do about this, perhaps print an error and perhaps not. But
there maybe select cases where a message is on order. One example is where
I am causing the library to segfault in the run list merging code if we run
out of memory during a critical step at the end. There we want to tell the
user we segfaulted on purpose or we will just get a bug report...
Admittedly that induced segfault is awful. But it is only there until I (or
someone else!) figure out how to be able to avoid the memory allocations
too late in the process. I am very open to suggestions there...
>Minor errors, serious errors, never? We need to
>adopt one approach and document it.
I would vote for ./configure --enable-debug -> print loads of stuff to
stderr. ./configure --disable-debugt -> print to stderr only in really
serious cases like the segfault about to be caused.
>Testing. This word is the bane of all programmers. I hate it too, but
>I understand the need for it. We cannot anticipate every use of the
>library, but we should have some form of unit testing. The library is
>still undergoing a lot of changes and it's hard to verify that nothing
>has broken. Any suggestions or guidelines would be welcome.
Sorry no suggestions or guidelines, but a regression test suite where we
just need to type "make test" would be very nice indeed...
>Back to attrib.c. It's a very complicated piece of code.
Yes. It is code to edit a database... That is a complicated thing.
>On top of this it's obfuscated by the need for le16_to_cpu() function calls
>everywhere.
Well it is little endian... Can't do anything about it. We could shorten
the names to something like LC() nad CL() but I don't want to put automatic
type checking in there because I often use the macros to perform type
conversion for me so I don't have to do additional casts. This would break
when using LC()/CL()...
>The code needs simplifying, but how? I've considered a few
>macro solutions, but they tended to be fairly specific.
It depends what exactly you want to simplify. You had mentioned to me
before that you don't like the frequent offset additions like this:
FILE_NAME *fn;
ATTR_RECORD *attr;
fn = (FILE_NAME*)((u8*)attr + le16_to_cpu(attr->value_offset));
I agree those are quite horrible and very long. Considering how they are
used we can make a generic macro, (please suggest a name!), lets call it
GF() for now just to show what it could look like.
We would replace the above code with:
fn = GF(attr, value_offset);
And the macro would look like something like this (untested!):
#define GF(a, b) \
({ \
(void *)((u8*)(a) + le_to_cpu((a)->(b))); \
})
And again untested:
#define le_to_cpu(x) \
{( \
typeof(x) _x; \
if (sizeof(x) == 1) \
_x = (x); \
else if (sizeof(x) == 2) \
_x = le16_to_cpu(x); \
else if (sizeof(x) == 4) \
_x = le32_to_cpu(x); \
else if (sizeof(x) == 8) \
_x = le64_to_cpu(x); \
_x; \
)}
In theory this should not only work, but it should also be completely
optimised away by the compiler so that the remaining assembler code is
identical to the original big, bad, and ugly line. (-;
The prize question is what to call the GF() macro! The function is clear,
to get to a memory location pointed to by a little endian field in a
structure. But I can't think of a name for it...
If anyone can come up with a sensible name we can test it and assuming it
does really work both with gcc-2.96 and gcc-3.something I would be very
happy to see the macro added to types.h or support.h or something and all
that ugly code replaced...
Can you give examples of what else you would like to see simplified? -
That's the only one I remember you complaining about. (-;
Cheers,
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
|