CMU Sphinx / Forums / Help: Merging dictionaries

Soren Ebbesen - 2016-09-13

Is there a tool to easily merge two dictionaries? I have a feeling that the G2P tool can be used, but I haven't figured out how.

If I manually add a word to my dictionary, do I also have to add it manually to my language model (the .lm file)? I mean, if the word doesn't exist in the language model, the probability of getting picked is zero I suppose? Assuming I have to add it to the LM, would you suggest just to add it to the list of 1-grams and give it the same statistics as a "similar" word?

Thanks,
Soren
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-09-14
  
  Is there a tool to easily merge two dictionaries?
  
  Yes, Python scripting language
  
  I have a feeling that the G2P tool can be used, but I haven't figured out how.
  
  No, unlikely
  
  If I manually add a word to my dictionary, do I also have to add it manually to my language model (the .lm file)?
  
  Language model is not easily editable, you need to use lm tools to update it.
  
  I mean, if the word doesn't exist in the language model, the probability of getting picked is zero I suppose?
  
  Yes
  
  Assuming I have to add it to the LM, would you suggest just to add it to the list of 1-grams and give it the same statistics as a "similar" word?
  
  You can check http://cmusphinx.sourceforge.net/wiki/tutoriallmadvanced
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Wolf - 2016-12-28

I'm in a similar situation. I want my application to support both a built-in dicationay and a "user dictionary". If a user dictionary is specified, it should be merged into the built-in dictionary.

For example:

Built-in dictionary:

[...] biscuit B IH S K AH T bishop B IH SH AH P [...]

User dictionary:

biscuit B IH S K IH T biscuit(2) B IH S K UH IY thimbleweed TH IH M B AH L W IY D

Merged dictionary:

[...] biscuit B IH S K AH T biscuit(2) B IH S K IH T biscuit(3) B IH S K UH IY bishop B IH SH AH P thimbleweed TH IH M B AH L W IY D [...]

So by merge, I mean that

any words not in the built-in dictionary should be added;

any words that already exist should be added as alternative pronunciations;

numeric suffixes should be auto-increased as needed.

I don't want to merge dictionary files, creating a new file. Instead, I'd prefer to leave the two files unchanged and merge the entries in memory, using C/C++.

I understand that there is no built-in functionality for doing this. So here's a rough outline of what I'm planning to do.

Load the built-in dictionary using the -dict option

Read the user dictionary manually, splitting each line into word string and pronunciation string

For each pair: manually strip any numeric suffix, then call ps_add_word.

This approach only works if ps_add_word does the following:

If the word already exists, a new pronunciation is added

If the same word with the same pronunciation already exists, this is a no-op.

So I wonder: Does this approach make sense? Is there a better way that relies on existing functionality?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-02
  
  If the same word with the same pronunciation already exists, this is a no-op.
  
  Not sure what it does currently, but we can modify code to ignore such case.
  
  So I wonder: Does this approach make sense?
  
  Looks ok
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

larabrian - 2020-07-07

PEP 448 also expanded the abilities of by allowing this operator to be used for dumping key/value pairs from one dictionary into a new dictionary . Leveraging dictionary comprehension and the unpacking operator, you can merge the two dictionaries in a single expression .

dict1 = {1:'one' , 2:'two'}
dict2 = {3:'three', 4:'four'}
fDict = {dict1 , dict2}
print(fDict)
{1: 'one', 2: 'two', 3: 'three', 4: 'four'}

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Merging dictionaries

Speech Recognition Toolkit

Forums

Help

Merging dictionaries document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Merging dictionaries