#433 Streamline s_utf8.c

pending-accepted
puredata (375)
5
2012-12-15
2011-10-10
Marvin Humphrey
No

There are a number of UTF-8 manipulation routines which were included
when the library was added with the justification that they might be
used someday. However, time has passed, they are still unused, and some
of them have bugs -- so it is time to delete them with a justification
of YAGNI (You Ain't Gonna Need It).

Discussion

  • Makes sense to me. I'm assigning it to Bryan Jurish, the original author of the UTF-8 support, to see if he agrees.

     
  • Bryan Jurish
    Bryan Jurish
    2011-10-10

    Looks good to me. I can't test anything right now, but Martin's suggestions seem safe and clear. Give it a whirl...

    I didn't see any fix for the NUL-termination problems I saw suggested on the #dataflow transcript though (and I never noticed any such segfaults myself): has that problem been confirmed?

     
  • Bryan Jurish:

    > I didn't see any fix for the NUL-termination problems I saw suggested
    > on the #dataflow transcript though (and I never noticed any such
    > segfaults myself): has that problem been confirmed?

    The NUL-termination problems for u8_inc(), u8_charnum() and u8_offset()
    were fixed under a different issue:

    http://sourceforge.net/tracker/?func=detail&atid=478072&aid=3420484&group_id=55736

    Those functions were all capable of overrunning the end of a
    non-NUL-terminated string; u8_offset() could potentially return an
    incorrect answer depending on what it found there.

    I never duplicated the segfault myself using the steps that Matthieu
    described in #dataflow; however, it was very easy to detect the invalid
    reads by running Pd under Valgrind. If I tried hard I might be able to
    trigger a segfault consistently by carefully crafting bytes to trick
    u8_offset() into giving too big a number, resulting in a cursor location
    beyond the end of a string rather than at the end of a string. In any
    case, the patch both eliminated the invalid reads *and* sped things up,
    so I think it was justified even without a consistently segfaulting test
    case.

    I suspect that u8_inc_ptr() has problems similar to u8_inc() -- but
    since it does not seem to be used anywhere at present, I haven't put any
    time into diagnosing or fixing it. Instead, this patch removes it
    altogether. We can resurrect it from version control if a need arises
    in the future and vet it at that time.

    The patches attached to this issue should be low-risk, as all they do is
    remove functionality rather than change behaviors. The removed symbols
    are unused by Vanilla, and seem also to be unused by Pd-extended. The
    risk is that I missed some usage somewhere, resulting in link-time
    failure.

     


Anonymous


Cancel   Add attachments