Hello everybody!
I've just testet 0.8.0 and I found several things lacking:
1) String handling
- Strings don't necessarily have an \0 at the end. They
may have another string after them, like this:
char s1[]="this is a string\n";
char *s2=s1+9; // "a string\n";
Depending on which usage ht first finds, s1 may not be
seen as a string (and I can't tag it as such, because
there's no \0)
- Strings with special characters are not allowed. I
can't tag "?!=/" as a string.
- Strings need not be concatenated in memory; they may
be aligned or fixed-size, with \0 in-between.
eg. memory layout recognized: "abc\0def\0ghij\0"
not recognized: "abcde\0\0\0fghi\0\0\0\0jklmnop\0"
or eg. char arr[10][20];
- even if there's a \0 somewhere, and only [a-z0-9 ] in
there,
I didn't succeed to tag some strings.
- How can I remove a errornously set string?
- Detection of unicode strings
2) Other
- How can I tag data as short/long?
- If an instruction goes like
jmp [eax*4 + 0x...]
hte should recognice the address as an array, and tag
not only the first value.
Bonus points for detecting if it's an code address,
data address, or similar.
Extra bonus points: on the instruction do a register
scan to find possible values (easy if eg an "and eax,
0x1f" just before) and limit array size.
Otherwise just store a bit for this array saying "may
be smaller" and scaling down for every
identifier/string/etc. found after the array.
- It would save some screen space if multiple bytes
would be shown as eg. "db 0 dup(0x200)"
- Searching for unicode strings would be nice.
But all other things aside - thank you for this program!
Logged In: YES
user_id=3437
"Strings don't necessarily have an \0 at the end."
Well, the particular test case you gave has the special
problem, that the meaning changes depending on which string
was discovered first. This is currently not really supported
by ht. But both strings have a \0 at the end, don't they?
" Strings with special characters are not allowed. I
can't tag "?!=/" as a string."
Well, strings with special characters are allowed, but the
above string contains to much special characters. There some
heuristic involved for detecting string, see
analyser/language.cc
"- Strings need not be concatenated in memory; "
ht detects the strings when they are referenced. This should
be no problem.
"- even if there's a \0 somewhere, and only [a-z0-9 ] in
there,
I didn't succeed to tag some strings."
might be a bug somewhere
"- How can I remove a errornously set string?"
press del
"- Detection of unicode strings"
Should be implemented. See analyser/language.cc
"- How can I tag data as short/long?"
Not implemented, sorry.
"- If an instruction goes like
jmp [eax*4 + 0x...]"
Too much for the currently analyser. We had plans to
implement this in ht 2.0
"- Searching for unicode strings would be nice."
Well, with a complicated search expression this should be
possible.
Logged In: YES
user_id=740781
Does it make sense to post patches somewhere?
I had a quick look in cvs and noticed that most files there
were some months old.
How is the 2.0 release going? Is the (would-be) 2.0 source
available anywhere? Is 1.0 still a goal?
But seeing how much 0.8.0 already does I'd stick with that
and expand/rewrite that as needed ...
Regards,
Phil
Logged In: YES
user_id=3437
You find the sources for the rewrite in the "htdata"
repository. Since the original author died and I'm too
involved in other projects the development is currently
somewhat stalled. But if you provide patches I'll happily
integrate them.