Storing Language Model in a NoSQL Database?

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Storing Language Model in a NoSQL Database?

Forum: Help

Creator: i-tiresias

Created: 2020-08-21

Updated: 2020-08-21

i-tiresias - 2020-08-21

Hi,

I have got pocketsphinx working perfectly fine in Python, have created a language model, adapted acoustic model, etc. It's all great, so thanks very much for the years of amazing work.

I am however having trouble with a theoretical deployment issue. I have been told that the language model, dictionary and related data are too large/line numerous to be committed to a development code base, and so they will need to be stored on a NoSQL database. I am using MongoDB.

Can you think of any way that a Sphinx LiveSpeech-type decoder could function properly through querying the database? Would the entire language model file structure need to be temporarily rebuilt from database data every time the app was used?

Then there's the issue of converting the unique Sphinx formats like ARPA into json/bson/csv for importing into the database...

This is fundamentally a feasibility question. Thanks for any help or opinions.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.