I am training a Greek dictionary in g2p seq2seq in order to extend it later. I am facing the following problem:
After training it for a lot of time (3 hours approximately) in a large enough dictionary, evaluation showed WER=1. So I run interactive mode and each time I insert a Greek word it outputs:
WARNING:tensorflow:Invalid symbol:A
for every Greek letter and nothing as output. I read here (https://sourceforge.net/p/cmusphinx/discussion/help/thread/951e4dff/) that I should add manually the letters in vocab.g2p, but when I do this and enter interactive mode with:
g2p-seq2seq --interactive --model_dir model_folder_path
I found out that this error may happen because the encoding of the vocab.g2p changes when adding Greek symbols. For some reason, I can't save Greek symbols in Western format.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am training a Greek dictionary in g2p seq2seq in order to extend it later. I am facing the following problem:
After training it for a lot of time (3 hours approximately) in a large enough dictionary, evaluation showed WER=1. So I run interactive mode and each time I insert a Greek word it outputs:
WARNING:tensorflow:Invalid symbol:A
for every Greek letter and nothing as output. I read here (https://sourceforge.net/p/cmusphinx/discussion/help/thread/951e4dff/) that I should add manually the letters in vocab.g2p, but when I do this and enter interactive mode with:
g2p-seq2seq --interactive --model_dir model_folder_path
I have the error in the attachment.
Words have to be in lowercase, I told you before.
All words are in lowercase. It was my mistake in the example I gave.
I found out that this error may happen because the encoding of the vocab.g2p changes when adding Greek symbols. For some reason, I can't save Greek symbols in Western format.