Is there a tool to easily merge two dictionaries? I have a feeling that the G2P tool can be used, but I haven't figured out how.
If I manually add a word to my dictionary, do I also have to add it manually to my language model (the .lm file)? I mean, if the word doesn't exist in the language model, the probability of getting picked is zero I suppose? Assuming I have to add it to the LM, would you suggest just to add it to the list of 1-grams and give it the same statistics as a "similar" word?
Thanks,
Soren
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm in a similar situation. I want my application to support both a built-in dicationay and a "user dictionary". If a user dictionary is specified, it should be merged into the built-in dictionary.
For example:
Built-in dictionary:
[...]biscuit B IH S K AH Tbishop B IH SH AH P[...]
User dictionary:
biscuit B IH S K IH T
biscuit(2) B IH S K UH IY
thimbleweed TH IH M B AH L W IY D
Merged dictionary:
[...]biscuit B IH S K AH Tbiscuit(2) B IH S K IH Tbiscuit(3) B IH S K UH IYbishop B IH SH AH Pthimbleweed TH IH M B AH L W IY D[...]
So by merge, I mean that
any words not in the built-in dictionary should be added;
any words that already exist should be added as alternative pronunciations;
numeric suffixes should be auto-increased as needed.
I don't want to merge dictionary files, creating a new file. Instead, I'd prefer to leave the two files unchanged and merge the entries in memory, using C/C++.
I understand that there is no built-in functionality for doing this. So here's a rough outline of what I'm planning to do.
Load the built-in dictionary using the -dict option
Read the user dictionary manually, splitting each line into word string and pronunciation string
For each pair: manually strip any numeric suffix, then call ps_add_word.
This approach only works if ps_add_word does the following:
If the word already exists, a new pronunciation is added
If the same word with the same pronunciation already exists, this is a no-op.
So I wonder: Does this approach make sense? Is there a better way that relies on existing functionality?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
PEP 448 also expanded the abilities of by allowing this operator to be used for dumping key/value pairs from one dictionary into a new dictionary . Leveraging dictionary comprehension and the unpacking operator, you can merge the two dictionaries in a single expression .
Is there a tool to easily merge two dictionaries? I have a feeling that the G2P tool can be used, but I haven't figured out how.
If I manually add a word to my dictionary, do I also have to add it manually to my language model (the .lm file)? I mean, if the word doesn't exist in the language model, the probability of getting picked is zero I suppose? Assuming I have to add it to the LM, would you suggest just to add it to the list of 1-grams and give it the same statistics as a "similar" word?
Thanks,
Soren
Yes, Python scripting language
No, unlikely
Language model is not easily editable, you need to use lm tools to update it.
Yes
You can check http://cmusphinx.sourceforge.net/wiki/tutoriallmadvanced
I'm in a similar situation. I want my application to support both a built-in dicationay and a "user dictionary". If a user dictionary is specified, it should be merged into the built-in dictionary.
For example:
Built-in dictionary:
User dictionary:
Merged dictionary:
So by merge, I mean that
I don't want to merge dictionary files, creating a new file. Instead, I'd prefer to leave the two files unchanged and merge the entries in memory, using C/C++.
I understand that there is no built-in functionality for doing this. So here's a rough outline of what I'm planning to do.
-dict
optionps_add_word
.This approach only works if
ps_add_word
does the following:So I wonder: Does this approach make sense? Is there a better way that relies on existing functionality?
Not sure what it does currently, but we can modify code to ignore such case.
Looks ok
PEP 448 also expanded the abilities of by allowing this operator to be used for dumping key/value pairs from one dictionary into a new dictionary . Leveraging dictionary comprehension and the unpacking operator, you can merge the two dictionaries in a single expression .