Menu

Domain-specific Vocabularies?

Developers
ashore
2014-02-15
2014-02-16
  • ashore

    ashore - 2014-02-15

    All, for use in ram-limited devices, I'm interested in pursuing reducing RAM cram. Our domain, public safety dispatch operations, uses a rather limited number of terms in day-to-day activities, and where names are involved, these cd readily be spelled.

    Am I wrong in thinking/hoping this might reduce resource needs?

    I wonder if someone cd point me to where such work has been done?

     
  • Jonathan Duddington

    Obviously you can remove data files for other languages from the espeak-data directory, but that doesn't reduce RAM use because they are not loaded into RAM unless they are used.

    The *_dict file for the current language is copied into RAM. How much space you can save by the *_dict file depends on your language. Some are already small.

    In the case of English, en_dict is one of the larger ones, but is still only 112kBytes, You can remove words from the dictsource/en_list file which you don't need (eg. foreign place names) and then recompile the en_dict file. Removing rules from the en_rules file is less simple, although some are obvious.

    The files: espeak-data/phondata, phonindex, phontab, are all copied into RAM. These contain the phoneme data for all the languages, so you can reduce their size by re-compiling them to include only your language. To do this, remove entries from the "Additional Phoneme Tables" section at the end of the file phsource/phonemes (this is in the espeakedit download), and then use the espeakedit program to re-compile the phoneme data. Note that a language may use a different language voice to produce some "foreign" words. Look for rules in the *rules and *_list files which contain the character sequence ^_.

    It would also be possible to reduce the size of the espeak program library by removing features which you don't need, or which are not used by your language. But that is a more complicated.

    You can perhaps reduce some size constants in the espeak program, such as:

    #define N_WORD_PHONEMES 200 // max phonemes in a word
    #define N_WORD_BYTES 160 // max bytes for the UTF8 characters in a word
    #define N_CLAUSE_WORDS 300 // max words in a clause
    #define N_TR_SOURCE 800 // the source text of a single clause (UTF8 bytes)
    #define N_PHONEME_LIST 1000 // enough for source[N_TR_SOURCE] full of text, else it will truncate

     
    • ashore

      ashore - 2014-02-16

      Big-time thanks for yr response. Highly appreciated!

       

Log in to post a comment.