Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#7 Wishes for data analysis/manipulation

open
nobody
None
5
2004-08-20
2004-08-20
P.Marek
No

Hello everybody!

I've just testet 0.8.0 and I found several things lacking:

1) String handling

- Strings don't necessarily have an \0 at the end. They
may have another string after them, like this:
char s1[]="this is a string\n";
char *s2=s1+9; // "a string\n";
Depending on which usage ht first finds, s1 may not be
seen as a string (and I can't tag it as such, because
there's no \0)

- Strings with special characters are not allowed. I
can't tag "?!=/" as a string.

- Strings need not be concatenated in memory; they may
be aligned or fixed-size, with \0 in-between.
eg. memory layout recognized: "abc\0def\0ghij\0"
not recognized: "abcde\0\0\0fghi\0\0\0\0jklmnop\0"
or eg. char arr[10][20];

- even if there's a \0 somewhere, and only [a-z0-9 ] in
there,
I didn't succeed to tag some strings.

- How can I remove a errornously set string?

- Detection of unicode strings

2) Other

- How can I tag data as short/long?

- If an instruction goes like
jmp [eax*4 + 0x...]
hte should recognice the address as an array, and tag
not only the first value.
Bonus points for detecting if it's an code address,
data address, or similar.

Extra bonus points: on the instruction do a register
scan to find possible values (easy if eg an "and eax,
0x1f" just before) and limit array size.
Otherwise just store a bit for this array saying "may
be smaller" and scaling down for every
identifier/string/etc. found after the array.

- It would save some screen space if multiple bytes
would be shown as eg. "db 0 dup(0x200)"

- Searching for unicode strings would be nice.

But all other things aside - thank you for this program!

Discussion

  • Logged In: YES
    user_id=3437

    "Strings don't necessarily have an \0 at the end."

    Well, the particular test case you gave has the special
    problem, that the meaning changes depending on which string
    was discovered first. This is currently not really supported
    by ht. But both strings have a \0 at the end, don't they?

    " Strings with special characters are not allowed. I
    can't tag "?!=/" as a string."

    Well, strings with special characters are allowed, but the
    above string contains to much special characters. There some
    heuristic involved for detecting string, see
    analyser/language.cc

    "- Strings need not be concatenated in memory; "
    ht detects the strings when they are referenced. This should
    be no problem.

    "- even if there's a \0 somewhere, and only [a-z0-9 ] in
    there,
    I didn't succeed to tag some strings."
    might be a bug somewhere

    "- How can I remove a errornously set string?"
    press del

    "- Detection of unicode strings"
    Should be implemented. See analyser/language.cc

    "- How can I tag data as short/long?"
    Not implemented, sorry.

    "- If an instruction goes like
    jmp [eax*4 + 0x...]"
    Too much for the currently analyser. We had plans to
    implement this in ht 2.0

    "- Searching for unicode strings would be nice."
    Well, with a complicated search expression this should be
    possible.

     
  • P.Marek
    P.Marek
    2004-08-26

    Logged In: YES
    user_id=740781

    Does it make sense to post patches somewhere?

    I had a quick look in cvs and noticed that most files there
    were some months old.

    How is the 2.0 release going? Is the (would-be) 2.0 source
    available anywhere? Is 1.0 still a goal?

    But seeing how much 0.8.0 already does I'd stick with that
    and expand/rewrite that as needed ...

    Regards,

    Phil

     
  • Logged In: YES
    user_id=3437

    You find the sources for the rewrite in the "htdata"
    repository. Since the original author died and I'm too
    involved in other projects the development is currently
    somewhat stalled. But if you provide patches I'll happily
    integrate them.