Menu

strncat crash in cmd_ln.c

Help
Halle
2011-05-18
2012-09-22
  • Halle

    Halle - 2011-05-18

    Hi Nickolay,

    I'm dealing with a weird bug resulting (I think) from the interaction between
    a Sphinxbase 0.6.1 command line string function and the fact that the
    directory names on the iPhone are dynamic (that is, they are based on where
    the OS chooses to put the app sandbox and not fixed or known to the developer
    the way that an OS directory like /bin would be -- the enclosing directory
    name for an app is randomly-assigned, fairly lengthy, and that name is
    returned to the app programmatically if needed).

    The background is that I am testing the upcoming version of OpenEars on an
    iPhone 1G, and it used to work perfectly, and then beginning yesterday it
    started to invariably crash on the same line in cmd_ln.c. I uninstalled and
    rebooted and rolled back to an old version that was known-working, and that
    version crashed. And, in fact, every version now crashes on this same line
    going all the way back. Very weird.

    Today, after spending some time trying to figure why an app would be crashing
    on some string code when the same code was non-crashing a day or a month ago,
    I think I figured out that the crashing could be because of whatever iOS has
    started to name my app directories, which is perhaps longer or perhaps has an
    undesirable character versus whatever the install directory name was back when
    it was working, and which is out of my control. I was hoping you could help me
    solve this riddle.

    Without further ado, the 100%-replicable (right now) crash is on the function
    call strncat(dest, source, source_len) on line 179 of cmd_ln.c, in the
    function strnappend(char
    dest, size_t dest_allocation, const char *source,
    size_t n). Any logging placed after line 179 is never called. When I log the
    arguments that are going into that call by adding this printf to line 178:

    printf("source is \'%s\' and source length is %lu\n", source, (unsigned
    long)source_len);

    The output right before the crash is as follows:

    source is '/var/mobile/Applications/4C28E8C7-B74F-
    44B1-9742-95904C75115B/OpenEarsSampleProject.app/OpenEars1.languagemodel' and
    size is 111

    Then: boom.

    Do you know why this is crashing, perhaps just with this directory name or
    perhaps for another code-external reason I haven't thought of yet, and can you
    recommend a workaround? The call to strnappend is made by strappend which is
    made by arg_resolve_env which is made by cmd_ln_val_init while starting up
    Pocketsphinx. The line which is failing is the location of the .lm. This code
    is also currently working on all of my test devices as well as having worked
    on the device in question a couple days ago. The device that is failing is
    running an earlier version of iOS than the others, but again, it used to be
    able to do this.

    Sorry to be still using 0.6.1 but it's going to be a couple months at least
    before I have time to upgrade OpenEars to .7 :)

    Thank you very much,

    Halle

     
  • Halle

    Halle - 2011-05-18

    Another clue: if I change strncat to strlcat the crash disappears (so I guess
    that means it's a buffer overrun or similar?), but then after the argument
    list definition print I receive:

    ERROR: "cmd_ln.c", line 590: Bad argument value for -ds: 2
    ERROR: "cmd_ln.c", line 661: cmd_ln_parse_r failed

    For values which are known-working since OpenEars release 1. These are the
    arguments I'm running with:

    -lm /var/mobile/Applications/1B052694-8AE0-4DC0-8503-33AEE3ECF751/OpenEarsSampleProject.app/OpenEars1.languagemodel \
        -bestpath yes \
        -dict /var/mobile/Applications/1B052694-8AE0-4DC0-8503-33AEE3ECF751/OpenEarsSampleProject.app/OpenEars1.dic \
        -ds 2 \
        -hmm /var/mobile/Applications/1B052694-8AE0-4DC0-8503-33AEE3ECF751/OpenEarsSampleProject.app \
        -kdmaxdepth 7 \
        -maxhmmpf 3000 \
        -maxwpf 5 \
        -topn 3
    
     
  • Halle

    Halle - 2011-05-18

    And if I remove the argument for -ds I get the same error for -kdmaxdepth. At
    this phase in the init, as far as I can tell there would be no validation for
    the long pathname strings so there is no reason to assume they are coming
    through entirely correctly either. Could this be some kind of null termination
    thing?

    I also tested giving the dictionary and language model much shorter names so I
    could force the entire pathname to a shorter length, and it didn't help, so
    I'm not sure this is purely about them being long.

     
  • Halle

    Halle - 2011-05-18

    OK, so if I replace strncat(dest, source, source_len) with strlcat(dest,
    source, source_len+1) it appears to work on the device that was previously
    crashing, as well as the devices which were never crashing. Do you think this
    is a robust solution or is it likely to have unforeseen consequences (given
    that the upcoming version of OpenEars allows users to reinit with new
    arguments in the same continuous loop)? Although there appears to be a fix I'm
    a little unsettled at not knowing the underlying reason why this one device
    suddenly stopped being able to successfully pass init arguments regardless of
    build version.

     
  • Nickolay V. Shmyrev

    Hello

    I'm not really sure it's a sphinxbase issue. I tried to think what is wrong in
    the code and don't see a reason. Maybe it's some IPhone bug. Can you reproduce
    it in small test? Maybe if you call strncat with the same arguments it will
    not crash. It might be a memory corruption or a compiler issue.

    Your proposed solution is definitely wrong, it doesn't take into account the
    lenght of dest which already was there. strlcat third argument is the length
    of the final destination, not the length of the source. And there is no
    strlcat here in Linux.

     
  • Halle

    Halle - 2011-05-19

    OK, but after a day of testing it does seem to work perfectly in terms of
    expected input and expected output during initialization (I have strlcat in
    Objective-C++ which is what I'm working with, but I can see that it wouldn't
    work for Sphinxbase as a whole), which is funny if it isn't actually a drop-in
    replacement. I'll fix that.

    I can easily believe the bug is dependent on the iPhone in some way but I
    can't replicate it in a small test insofar as it still isn't clear what it is
    about the input that causes a crash in just this scenario. If it's memory
    corruption, it's amazingly specific to just this round of installs on just
    this device. But I will give it a try to recreate in a test.

    While I have you here, what do you think about my init values for the iPhone:

    -bestpath yes \
    -ds 2 \
    -kdmaxdepth 7 \
    -maxhmmpf 3000 \
    -maxwpf 5 \
    -topn 3

    Would you change any of them?

     
  • Nickolay V. Shmyrev

    -bestpath yes \

    bestpath and fwdflat mostly have sense for a large vocabulary. For small
    vocabulary it's better to disable them

    -ds 2 \

    This has too negative impact on accuracy

    -kdmaxdepth 7 \

    This is senseless

    -maxhmmpf 3000 \
    -maxwpf 5 \
    -topn 3

    Those are good but -topn better needs to be used during model training as
    well.

     
  • Halle

    Halle - 2011-05-20

    Hi Nickolay,

    After doing more reading I think "definitely wrong" is a bit of an
    overstatement. The fact that strlcat works where strncat doesn't when they
    both take identical arguments pretty strongly suggests that there is a mis-
    count due to a null termination character, especially since it is necessary to
    add a byte to length in order to get a drop-in result, which then apparently
    works correctly for all input.

    Here is the relevant passage from the man on strlcat to clarify:

    "strlcpy() and strlcat() take the full size of the buffer (not
    just the length) and guarantee to NUL-terminate the result (as long as
    size is larger than 0 or, in the case of strlcat(), as long as there is
    at least one byte free in dst). Note that you should include a byte for
    the NUL in size."

    That doesn't mean that it's a Sphinxbase mistake (if it could occur with Linux
    I'm sure you would have heard about it many times by now) but so far I haven't
    been able to find any code that can be amended in my iPhone implementation
    that affects it on the exceedingly rare occasions that it shows up, so that's
    the situation I'm working with.

    What is returned by both functions, their only interface difference, doesn't
    seem to be much of issue since it is being dropped into a line in which the
    return value isn't made use of. It is possible to see that the input becomes
    exactly the expected output on both the devices that worked with strncat and
    those that didn't work with strncat by looking at the launch values for
    Pocketsphinx when it starts up, so I think there is a basis for (my) taking
    this as a potential correct fix (for the platform I'm trying to debug) subject
    to lots of further testing.

    This is senseless

    I would hate to be the cause of a senseless argument.

    :D

    One last question for you -- do you see any obviously missing arguments that
    you think could be particularly beneficial to an iPhone run of Pocketsphinx?

    Thanks again,

    Halle

     
  • Nickolay V. Shmyrev

    Hello Halle

    I think what you are using is more or less ok. The issue now is to test
    confidence and OOV detection stuff as well as profile the application to find
    the performance issue. I don't think any serious success could be made by
    tweaking parameters.

     
  • Halle

    Halle - 2011-05-24

    Cool, thank you for taking a look at that, it's really helpful. I made your
    suggested changes regarding -ds and got rid of -kdmaxdepth, and the accuracy
    seems much improved due to not overriding -ds.

    I was originally using that setting because when I started, my driver was
    slower and there was a note in the wiki back then that that would help with
    recognition speed, which seemed necessary as a compensation for the driver
    latency. Now there is a much lower-latency driver in OpenEars so I'm not
    detecting any slowdown as a result of removing it.

    I'm very happy with the current speed and performance I'm seeing with
    Pocketsphinx, so when I get the next chunk of time free to improve OpenEars,
    I'll probably mostly be looking at reducing its memory footprint more. Because
    of the differences between the Core Audio underpinnings and Linux audio that
    we've discussed a couple of times, it is necessary to create a ringbuffer
    structure in order to process Pocketsphinx in memory and provide a continuous
    set of samples for ad_read to read through at any given moment, and for the
    first version of OpenEars which uses it I've erred on the side of too much
    headroom in the ringbuffer sections until it's clear that all of the devices
    return a similar number of samples for any given callback round in which the
    samples are rendered.

     
  • Nickolay V. Shmyrev

    Great, thanks.

     

Log in to post a comment.