[Kaldi-users] Memory requirement for FSTs

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello,

I have been playing with Kaldi online recognizers (great work!) and wanted to ask if the FST approach is useful if I'm running under memory constraints. If I use traditional ARPA language model + acoustic models; total size of models is < 100 Mb (for 20,000 vocab size). But the HCLG.fst takes a whooping 500 Mbs! Why is this so (Perhaps I should read the papers to find the answers, but in short why size of HCLG.fst >> sum of size of individual *.fsts)? Is there some redundancy involved?

What might be alternatives if one want to further reduce the size of HCLG.fst?

Thanks.