I'm dealing with a weird bug resulting (I think) from the interaction between
a Sphinxbase 0.6.1 command line string function and the fact that the
directory names on the iPhone are dynamic (that is, they are based on where
the OS chooses to put the app sandbox and not fixed or known to the developer
the way that an OS directory like /bin would be -- the enclosing directory
name for an app is randomly-assigned, fairly lengthy, and that name is
returned to the app programmatically if needed).
The background is that I am testing the upcoming version of OpenEars on an
iPhone 1G, and it used to work perfectly, and then beginning yesterday it
started to invariably crash on the same line in cmd_ln.c. I uninstalled and
rebooted and rolled back to an old version that was known-working, and that
version crashed. And, in fact, every version now crashes on this same line
going all the way back. Very weird.
Today, after spending some time trying to figure why an app would be crashing
on some string code when the same code was non-crashing a day or a month ago,
I think I figured out that the crashing could be because of whatever iOS has
started to name my app directories, which is perhaps longer or perhaps has an
undesirable character versus whatever the install directory name was back when
it was working, and which is out of my control. I was hoping you could help me
solve this riddle.
Without further ado, the 100%-replicable (right now) crash is on the function
call strncat(dest, source, source_len) on line 179 of cmd_ln.c, in the
function strnappend(char dest, size_t dest_allocation, const char *source,
size_t n). Any logging placed after line 179 is never called. When I log the
arguments that are going into that call by adding this printf to line 178:
printf("source is \'%s\' and source length is %lu\n", source, (unsigned
long)source_len);
The output right before the crash is as follows:
source is '/var/mobile/Applications/4C28E8C7-B74F-
44B1-9742-95904C75115B/OpenEarsSampleProject.app/OpenEars1.languagemodel' and
size is 111
Then: boom.
Do you know why this is crashing, perhaps just with this directory name or
perhaps for another code-external reason I haven't thought of yet, and can you
recommend a workaround? The call to strnappend is made by strappend which is
made by arg_resolve_env which is made by cmd_ln_val_init while starting up
Pocketsphinx. The line which is failing is the location of the .lm. This code
is also currently working on all of my test devices as well as having worked
on the device in question a couple days ago. The device that is failing is
running an earlier version of iOS than the others, but again, it used to be
able to do this.
Sorry to be still using 0.6.1 but it's going to be a couple months at least
before I have time to upgrade OpenEars to .7 :)
Thank you very much,
Halle
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Another clue: if I change strncat to strlcat the crash disappears (so I guess
that means it's a buffer overrun or similar?), but then after the argument
list definition print I receive:
ERROR: "cmd_ln.c", line 590: Bad argument value for -ds: 2
ERROR: "cmd_ln.c", line 661: cmd_ln_parse_r failed
For values which are known-working since OpenEars release 1. These are the
arguments I'm running with:
And if I remove the argument for -ds I get the same error for -kdmaxdepth. At
this phase in the init, as far as I can tell there would be no validation for
the long pathname strings so there is no reason to assume they are coming
through entirely correctly either. Could this be some kind of null termination
thing?
I also tested giving the dictionary and language model much shorter names so I
could force the entire pathname to a shorter length, and it didn't help, so
I'm not sure this is purely about them being long.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, so if I replace strncat(dest, source, source_len) with strlcat(dest,
source, source_len+1) it appears to work on the device that was previously
crashing, as well as the devices which were never crashing. Do you think this
is a robust solution or is it likely to have unforeseen consequences (given
that the upcoming version of OpenEars allows users to reinit with new
arguments in the same continuous loop)? Although there appears to be a fix I'm
a little unsettled at not knowing the underlying reason why this one device
suddenly stopped being able to successfully pass init arguments regardless of
build version.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not really sure it's a sphinxbase issue. I tried to think what is wrong in
the code and don't see a reason. Maybe it's some IPhone bug. Can you reproduce
it in small test? Maybe if you call strncat with the same arguments it will
not crash. It might be a memory corruption or a compiler issue.
Your proposed solution is definitely wrong, it doesn't take into account the
lenght of dest which already was there. strlcat third argument is the length
of the final destination, not the length of the source. And there is no
strlcat here in Linux.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, but after a day of testing it does seem to work perfectly in terms of
expected input and expected output during initialization (I have strlcat in
Objective-C++ which is what I'm working with, but I can see that it wouldn't
work for Sphinxbase as a whole), which is funny if it isn't actually a drop-in
replacement. I'll fix that.
I can easily believe the bug is dependent on the iPhone in some way but I
can't replicate it in a small test insofar as it still isn't clear what it is
about the input that causes a crash in just this scenario. If it's memory
corruption, it's amazingly specific to just this round of installs on just
this device. But I will give it a try to recreate in a test.
While I have you here, what do you think about my init values for the iPhone:
After doing more reading I think "definitely wrong" is a bit of an
overstatement. The fact that strlcat works where strncat doesn't when they
both take identical arguments pretty strongly suggests that there is a mis-
count due to a null termination character, especially since it is necessary to
add a byte to length in order to get a drop-in result, which then apparently
works correctly for all input.
Here is the relevant passage from the man on strlcat to clarify:
"strlcpy() and strlcat() take the full size of the buffer (not
just the length) and guarantee to NUL-terminate the result (as long as
size is larger than 0 or, in the case of strlcat(), as long as there is
at least one byte free in dst). Note that you should include a byte for
the NUL in size."
That doesn't mean that it's a Sphinxbase mistake (if it could occur with Linux
I'm sure you would have heard about it many times by now) but so far I haven't
been able to find any code that can be amended in my iPhone implementation
that affects it on the exceedingly rare occasions that it shows up, so that's
the situation I'm working with.
What is returned by both functions, their only interface difference, doesn't
seem to be much of issue since it is being dropped into a line in which the
return value isn't made use of. It is possible to see that the input becomes
exactly the expected output on both the devices that worked with strncat and
those that didn't work with strncat by looking at the launch values for
Pocketsphinx when it starts up, so I think there is a basis for (my) taking
this as a potential correct fix (for the platform I'm trying to debug) subject
to lots of further testing.
This is senseless
I would hate to be the cause of a senseless argument.
:D
One last question for you -- do you see any obviously missing arguments that
you think could be particularly beneficial to an iPhone run of Pocketsphinx?
Thanks again,
Halle
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think what you are using is more or less ok. The issue now is to test
confidence and OOV detection stuff as well as profile the application to find
the performance issue. I don't think any serious success could be made by
tweaking parameters.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Cool, thank you for taking a look at that, it's really helpful. I made your
suggested changes regarding -ds and got rid of -kdmaxdepth, and the accuracy
seems much improved due to not overriding -ds.
I was originally using that setting because when I started, my driver was
slower and there was a note in the wiki back then that that would help with
recognition speed, which seemed necessary as a compensation for the driver
latency. Now there is a much lower-latency driver in OpenEars so I'm not
detecting any slowdown as a result of removing it.
I'm very happy with the current speed and performance I'm seeing with
Pocketsphinx, so when I get the next chunk of time free to improve OpenEars,
I'll probably mostly be looking at reducing its memory footprint more. Because
of the differences between the Core Audio underpinnings and Linux audio that
we've discussed a couple of times, it is necessary to create a ringbuffer
structure in order to process Pocketsphinx in memory and provide a continuous
set of samples for ad_read to read through at any given moment, and for the
first version of OpenEars which uses it I've erred on the side of too much
headroom in the ringbuffer sections until it's clear that all of the devices
return a similar number of samples for any given callback round in which the
samples are rendered.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
I'm dealing with a weird bug resulting (I think) from the interaction between
a Sphinxbase 0.6.1 command line string function and the fact that the
directory names on the iPhone are dynamic (that is, they are based on where
the OS chooses to put the app sandbox and not fixed or known to the developer
the way that an OS directory like /bin would be -- the enclosing directory
name for an app is randomly-assigned, fairly lengthy, and that name is
returned to the app programmatically if needed).
The background is that I am testing the upcoming version of OpenEars on an
iPhone 1G, and it used to work perfectly, and then beginning yesterday it
started to invariably crash on the same line in cmd_ln.c. I uninstalled and
rebooted and rolled back to an old version that was known-working, and that
version crashed. And, in fact, every version now crashes on this same line
going all the way back. Very weird.
Today, after spending some time trying to figure why an app would be crashing
on some string code when the same code was non-crashing a day or a month ago,
I think I figured out that the crashing could be because of whatever iOS has
started to name my app directories, which is perhaps longer or perhaps has an
undesirable character versus whatever the install directory name was back when
it was working, and which is out of my control. I was hoping you could help me
solve this riddle.
Without further ado, the 100%-replicable (right now) crash is on the function
call strncat(dest, source, source_len) on line 179 of cmd_ln.c, in the
function strnappend(char dest, size_t dest_allocation, const char *source,
size_t n). Any logging placed after line 179 is never called. When I log the
arguments that are going into that call by adding this printf to line 178:
printf("source is \'%s\' and source length is %lu\n", source, (unsigned
long)source_len);
The output right before the crash is as follows:
source is '/var/mobile/Applications/4C28E8C7-B74F-
44B1-9742-95904C75115B/OpenEarsSampleProject.app/OpenEars1.languagemodel' and
size is 111
Then: boom.
Do you know why this is crashing, perhaps just with this directory name or
perhaps for another code-external reason I haven't thought of yet, and can you
recommend a workaround? The call to strnappend is made by strappend which is
made by arg_resolve_env which is made by cmd_ln_val_init while starting up
Pocketsphinx. The line which is failing is the location of the .lm. This code
is also currently working on all of my test devices as well as having worked
on the device in question a couple days ago. The device that is failing is
running an earlier version of iOS than the others, but again, it used to be
able to do this.
Sorry to be still using 0.6.1 but it's going to be a couple months at least
before I have time to upgrade OpenEars to .7 :)
Thank you very much,
Halle
Another clue: if I change strncat to strlcat the crash disappears (so I guess
that means it's a buffer overrun or similar?), but then after the argument
list definition print I receive:
ERROR: "cmd_ln.c", line 590: Bad argument value for -ds: 2
ERROR: "cmd_ln.c", line 661: cmd_ln_parse_r failed
For values which are known-working since OpenEars release 1. These are the
arguments I'm running with:
And if I remove the argument for -ds I get the same error for -kdmaxdepth. At
this phase in the init, as far as I can tell there would be no validation for
the long pathname strings so there is no reason to assume they are coming
through entirely correctly either. Could this be some kind of null termination
thing?
I also tested giving the dictionary and language model much shorter names so I
could force the entire pathname to a shorter length, and it didn't help, so
I'm not sure this is purely about them being long.
OK, so if I replace strncat(dest, source, source_len) with strlcat(dest,
source, source_len+1) it appears to work on the device that was previously
crashing, as well as the devices which were never crashing. Do you think this
is a robust solution or is it likely to have unforeseen consequences (given
that the upcoming version of OpenEars allows users to reinit with new
arguments in the same continuous loop)? Although there appears to be a fix I'm
a little unsettled at not knowing the underlying reason why this one device
suddenly stopped being able to successfully pass init arguments regardless of
build version.
Hello
I'm not really sure it's a sphinxbase issue. I tried to think what is wrong in
the code and don't see a reason. Maybe it's some IPhone bug. Can you reproduce
it in small test? Maybe if you call strncat with the same arguments it will
not crash. It might be a memory corruption or a compiler issue.
Your proposed solution is definitely wrong, it doesn't take into account the
lenght of dest which already was there. strlcat third argument is the length
of the final destination, not the length of the source. And there is no
strlcat here in Linux.
OK, but after a day of testing it does seem to work perfectly in terms of
expected input and expected output during initialization (I have strlcat in
Objective-C++ which is what I'm working with, but I can see that it wouldn't
work for Sphinxbase as a whole), which is funny if it isn't actually a drop-in
replacement. I'll fix that.
I can easily believe the bug is dependent on the iPhone in some way but I
can't replicate it in a small test insofar as it still isn't clear what it is
about the input that causes a crash in just this scenario. If it's memory
corruption, it's amazingly specific to just this round of installs on just
this device. But I will give it a try to recreate in a test.
While I have you here, what do you think about my init values for the iPhone:
-bestpath yes \
-ds 2 \
-kdmaxdepth 7 \
-maxhmmpf 3000 \
-maxwpf 5 \
-topn 3
Would you change any of them?
bestpath and fwdflat mostly have sense for a large vocabulary. For small
vocabulary it's better to disable them
This has too negative impact on accuracy
This is senseless
Those are good but -topn better needs to be used during model training as
well.
Hi Nickolay,
After doing more reading I think "definitely wrong" is a bit of an
overstatement. The fact that strlcat works where strncat doesn't when they
both take identical arguments pretty strongly suggests that there is a mis-
count due to a null termination character, especially since it is necessary to
add a byte to length in order to get a drop-in result, which then apparently
works correctly for all input.
Here is the relevant passage from the man on strlcat to clarify:
"strlcpy() and strlcat() take the full size of the buffer (not
just the length) and guarantee to NUL-terminate the result (as long as
size is larger than 0 or, in the case of strlcat(), as long as there is
at least one byte free in dst). Note that you should include a byte for
the NUL in size."
That doesn't mean that it's a Sphinxbase mistake (if it could occur with Linux
I'm sure you would have heard about it many times by now) but so far I haven't
been able to find any code that can be amended in my iPhone implementation
that affects it on the exceedingly rare occasions that it shows up, so that's
the situation I'm working with.
What is returned by both functions, their only interface difference, doesn't
seem to be much of issue since it is being dropped into a line in which the
return value isn't made use of. It is possible to see that the input becomes
exactly the expected output on both the devices that worked with strncat and
those that didn't work with strncat by looking at the launch values for
Pocketsphinx when it starts up, so I think there is a basis for (my) taking
this as a potential correct fix (for the platform I'm trying to debug) subject
to lots of further testing.
I would hate to be the cause of a senseless argument.
:D
One last question for you -- do you see any obviously missing arguments that
you think could be particularly beneficial to an iPhone run of Pocketsphinx?
Thanks again,
Halle
Hello Halle
I think what you are using is more or less ok. The issue now is to test
confidence and OOV detection stuff as well as profile the application to find
the performance issue. I don't think any serious success could be made by
tweaking parameters.
Cool, thank you for taking a look at that, it's really helpful. I made your
suggested changes regarding -ds and got rid of -kdmaxdepth, and the accuracy
seems much improved due to not overriding -ds.
I was originally using that setting because when I started, my driver was
slower and there was a note in the wiki back then that that would help with
recognition speed, which seemed necessary as a compensation for the driver
latency. Now there is a much lower-latency driver in OpenEars so I'm not
detecting any slowdown as a result of removing it.
I'm very happy with the current speed and performance I'm seeing with
Pocketsphinx, so when I get the next chunk of time free to improve OpenEars,
I'll probably mostly be looking at reducing its memory footprint more. Because
of the differences between the Core Audio underpinnings and Linux audio that
we've discussed a couple of times, it is necessary to create a ringbuffer
structure in order to process Pocketsphinx in memory and provide a continuous
set of samples for ad_read to read through at any given moment, and for the
first version of OpenEars which uses it I've erred on the side of too much
headroom in the ringbuffer sections until it's clear that all of the devices
return a similar number of samples for any given callback round in which the
samples are rendered.
Great, thanks.