Menu

`dict_read` occupies 945 MB for default en-us dictionary

Help
2016-06-21
2016-06-29
  • Daniel Wolf

    Daniel Wolf - 2016-06-21

    I just found out that loading the default en-us dictionary takes up about 945 MB of memory. That seems like an awful lot to me. Is that normal?

    I'm calling ps_init with a trivial configuration:

    cmd_ln_init(
        nullptr, ps_args(), true,
        "-hmm", "some-path/acoustic-model",
        "-dict", "some-path/cmudict-en-us.dict",
        nullptr);
    

    ps_init then calls ps_reinit, which calls dict_init, which calls dict_read.

    The problem is that I'm building a cross-platform application. On Win32, the process can only allocate 1 GB of memory, so I'm getting out-of-memory errors.

     
    • Nickolay V. Shmyrev

      I see 23Mb for ps_init here. Probably you count something differently, it should be shared memory, not real application memory.

       
      • Daniel Wolf

        Daniel Wolf - 2016-06-22

        I'm not sure whether I understand you correctly. Just before ps_init, the Windows task manager tells me that my application uses 7 MB of memory. Just after ps_init, it tells me that my application uses 952 MB. So during ps_init, 945 MB of some sort of memory must have been allocated. Are you saying that pocketsphinx allocates over 900 MB of shared memory?

         
        • Daniel Wolf

          Daniel Wolf - 2016-06-22

          I ran VMMap on my application. It shows that the process has 943 MB of committed private data. Before ps_init, it shows only 13 MB of committed private data. That looks like real allocated memory to me!

           
          • Nickolay V. Shmyrev

            What if you cut the dictionary size?

             
            • Daniel Wolf

              Daniel Wolf - 2016-06-22

              I just removed half of the 134723 lines in cmudict-en-us.dict. Now dict_read only takes 615 MB.

              I think I narrowed the problem down: There are lots of calls to __ckd_malloc__, most of which only allocate a handful of bytes. Internelly, __ckd_malloc__ calls malloc. And with (almost) every malloc, my application's private working set increases by 4 KB.

              It seems like on my Windows machine, malloc always allocates at least 4 KB, even if just 5 bytes were requested.

               
              • Nickolay V. Shmyrev

                This seems like some misconfiguration of runtime, maybe you have some redefine somewhere, for example you assigned malloc to VirtualAlloc. By default malloc should be able to allocate arbitrary number of bytes.

                You need to check compiler options and project configuration to see what happens there, maybe you run some unusual runtime.

                 
                • Daniel Wolf

                  Daniel Wolf - 2016-06-29

                  Thanks for your answer, Nickolay. I've managed to find the cause. Some time ago, I used a valgrind-like Windows tool (gflags). What I didn't realize was that this tool somehow registers itself deep into the system. Even after terminating its process, re-booting the machine, and re-building my application several times, gflags is still active and monitors my application's memory allocations. Which takes memory, of course...

                   

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.