Menu

File names in transcrition

Help
osman b
2008-04-17
2012-09-22
  • osman b

    osman b - 2008-04-17

    Hi,
    I am trying to use Sphinx for Turkish language. I read the tutorial and re-run them. Now I am trying to train HMM's for Turkish phones with my own acoustic training database.
    In my acoustic training database, there are some wav files which have the same name but possibly different transcription as below (or possibly the same transcription);
    <s> SIFIR BI-R I-KI- U-C- DO-RT BES- ALTI YEDI- SEKI-Z DOKUZ </s> (Turkish/001/0001)
    <s> ON YI-RMI- OTUZ KIRK ELLI- ALTMIS- YETMI-S- SEKSEN DOKSAN YU-Z </s> (Turkish/001/0002)
    <s> SIFIR BI-R I-KI- U-C- DO-RT BES- ALTI YEDI- SEKI-Z DOKUZ </s> (Turkish/002/0001)
    <s> YAPILANMIS- DIS-SAL I-MAJIN </s> (Turkish/002/0002)
    These identical named wave files are kept in different folders.
    I would like to ask how can I inform the training tools that; there are identical named wav file in training database but they may have different transcription. Because -as I could understand from tutorial- the names of wav files in both tutorials have always different names and just the name of the file is enough to know the correct transcription. Therefore just the names are included in transcription file. However in my case just to give the file name is not enough since the transcription is ambiguous from the file name (I think I should also give the folder of the file to training tools).

    Is it possible to include the folder of the wav file as I shown above? I mean can I form my transcription as below?
    <s> SIFIR BI-R I-KI- U-C- DO-RT BES- ALTI YEDI- SEKI-Z DOKUZ </s> (Turkish/001/0001)

    Is this a correct transcription file? Does it do what I intend to do with such slight change in file names ->(Turkish/001/0001) ?

    Thank you very much

     
    • Nickolay V. Shmyrev

      Yes, it's a problem. I usually copy the files in a single folder with a different names with the script like that:

      for f in find . -name &quot;*.wav&quot;; do cp $f ${f/\//_}; done

      Thus it will be copied to the file Turkish_001_0001. But actually it's a very unpleasant limitations of the sphinxtrain. It must be fixed somehow.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.