Menu

building arabic training model for recogition on android

Help
2017-08-09
2017-08-09
  • Seif Mostafa

    Seif Mostafa - 2017-08-09

    can I use the following in db.phone to build arabic model to be used in android:

    0.3
    35 n_base
    6255 n_tri
    37740 n_state_map
    1175 n_tied_state
    175 n_tied_ci_state
    35 n_tied_tmat

    Columns definitions

    base lft rt p attrib tmat ... state id's ...

    SIL - - - filler 0 0 1 2 3 4 N
    _ - - - n/a 1 5 6 7 8 9 N
    ء - - - n/a 2 10 11 12 13 14 N
    ا - - - n/a 3 15 16 17 18 19 N
    ب - - - n/a 4 20 21 22 23 24 N
    ة - - - n/a 5 25 26 27 28 29 N
    ت - - - n/a 6 30 31 32 33 34 N
    ث - - - n/a 7 35 36 37 38 39 N
    ج - - - n/a 8 40 41 42 43 44 N
    ح - - - n/a 9 45 46 47 48 49 N
    خ - - - n/a 10 50 51 52 53 54 N
    د - - - n/a 11 55 56 57 58 59 N
    ذ - - - n/a 12 60 61 62 63 64 N
    ر - - - n/a 13 65 66 67 68 69 N
    ز - - - n/a 14 70 71 72 73 74 N
    س - - - n/a 15 75 76 77 78 79 N
    ش - - - n/a 16 80 81 82 83 84 N
    ص - - - n/a 17 85 86 87 88 89 N
    ض - - - n/a 18 90 91 92 93 94 N
    ط - - - n/a 19 95 96 97 98 99 N
    ظ - - - n/a 20 100 101 102 103 104 N
    ع - - - n/a 21 105 106 107 108 109 N
    غ - - - n/a 22 110 111 112 113 114 N
    ف - - - n/a 23 115 116 117 118 119 N
    ق - - - n/a 24 120 121 122 123 124 N
    ك - - - n/a 25 125 126 127 128 129 N
    ل - - - n/a 26 130 131 132 133 134 N
    م - - - n/a 27 135 136 137 138 139 N
    ن - - - n/a 28 140 141 142 143 144 N
    ه - - - n/a 29 145 146 147 148 149 N
    و - - - n/a 30 150 151 152 153 154 N
    ي - - - n/a 31 155 156 157 158 159 N
    َ - - - n/a 32 160 161 162 163 164 N
    ُ - - - n/a 33 165 166 167 168 169 N
    ِ - - - n/a 34 170 171 172 173 174 N
    _ ِ ا i n/a 1 175 176 177 178 179 N
    ء SIL َ b n/a 2 184 188 193 195 204 N
    ء SIL ُ b n/a 2 184 188 193 195 204 N
    ء SIL ِ b n/a 2 184 188 193 195 204 N
    ء ا َ b n/a 2 181 187 192 198 202 N
    ء ا ُ b n/a 2 181 187 192 198 202 N
    ء ا ِ b n/a 2 181 187 192 198 202 N
    ء ا ِ i n/a 2 182 186 190 196 201 N
    ء ب َ b n/a 2 180 189 194 198 202 N
    ء ب ُ b n/a 2 180 189 194 198 202 N
    ء ب ِ b n/a 2 180 189 194 198 202 N
    ء ة َ b n/a 2 181 187 192 199 203 N
    ء ة ُ b n/a 2 181 187 192 199 203 N
    ء ة ِ b n/a 2 181 187 192 199 203 N
    ء ت َ b n/a 2 184 185 192 198 202 N
    ء ت ُ b n/a 2 184 185 192 198 202 N
    ء ت ِ b n/a 2 184 185 192 198 202 N
    ء ث َ b n/a 2 184 185 192 198 203 N
    ء ث ُ b n/a 2 184 185 192 198 203 N
    ء ث ِ b n/a 2 184 185 192 198 203 N
    ء ج َ b n/a 2 184 185 192 198 202 N
    ء ج ُ b n/a 2 184 185 192 198 202 N
    ء ج ِ b n/a 2 184 185 192 198 202 N
    ء ح َ b n/a 2 184 185 192 198 202 N
    ء ح ُ b n/a 2 184 185 192 198 202 N
    ء ح ِ b n/a 2 184 185 192 198 202 N
    ء خ َ b n/a 2 184 185 192 198 202 N
    ء خ ُ b n/a 2 184 185 192 198 202 N
    ء خ ِ b n/a 2 184 185 192 198 202 N
    ء د َ b n/a 2 180 189 194 198 202 N
    ء د ُ b n/a 2 180 189 194 198 202 N
    ء د ِ b n/a 2 180 189 194 198 202 N
    ء ذ َ b n/a 2 184 185 192 198 202 N
    ء ذ ُ b n/a 2 184 185 192 198 202 N
    ء ذ ِ b n/a 2 184 185 192 198 202 N
    ء ر َ b n/a 2 184 185 192 198 203 N
    ء ر ُ b n/a 2 184 185 192 198 203 N
    ء ر ِ b n/a 2 184 185 192 198 203 N
    ء ز َ b n/a 2 184 185 192 198 202 N
    ء ز ُ b n/a 2 184 185 192 198 202 N
    ء ز ِ b n/a 2 184 185 192 198 202 N
    ء س َ b n/a 2 184 185 192 198 202 N
    ء س ُ b n/a 2 184 185 192 198 202 N
    ء س ِ b n/a 2 184 185 192 198 202 N
    ء ش َ b n/a 2 184 185 192 198 202 N
    ء ش ُ b n/a 2 184 185 192 198 202 N
    ء ش ِ b n/a 2 184 185 192 198 202 N
    ء ص َ b n/a 2 184 185 192 198 202 N
    ء ص ُ b n/a 2 184 185 192 198 202 N
    ء ص ِ b n/a 2 184 185 192 198 202 N
    ء ض َ b n/a 2 180 189 194 198 202 N
    ء ض ُ b n/a 2 180 189 194 198 202 N
    ء ض ِ b n/a 2 180 189 194 198 202 N
    ء ط َ b n/a 2 184 185 192 198 202 N
    ء ط ُ b n/a 2 184 185 192 198 202 N
    ء ط ِ b n/a 2 184 185 192 198 202 N
    ء ظ َ b n/a 2 184 185 192 198 202 N
    ء ظ ُ b n/a 2 184 185 192 198 202 N
    ء ظ ِ b n/a 2 184 185 192 198 202 N
    ء ع َ b n/a 2 181 187 192 198 202 N
    ء ع ُ b n/a 2 181 187 192 198 202 N
    ء ع ِ b n/a 2 181 187 192 198 202 N
    ء غ َ b n/a 2 184 185 192 198 202 N
    ء غ ُ b n/a 2 184 185 192 198 202 N
    ء غ ِ b n/a 2 184 185 192 198 202 N
    ء ف َ b n/a 2 184 185 192 198 202 N
    ء ف ُ b n/a 2 184 185 192 198 202 N
    ء ف ِ b n/a 2 184 185 192 198 202 N
    ء ق َ b n/a 2 180 189 194 198 202 N
    ء ق ُ b n/a 2 180 189 194 198 202 N
    ء ق ِ b n/a 2 180 189 194 198 202 N
    ء ك َ b n/a 2 184 185 192 198 202 N
    ء ك ُ b n/a 2 184 185 192 198 202 N
    ء ك ِ b n/a 2 184 185 192 198 202 N
    ء ل َ b n/a 2 184 185 192 198 202 N
    ء ل َ i n/a 2 183 185 191 197 200 N
    ء ل ُ b n/a 2 184 185 192 198 202 N
    ء ل ِ b n/a 2 184 185 192 198 202 N
    ء م َ b n/a 2 184 185 192 198 202 N
    ء م ُ b n/a 2 184 185 192 198 202 N
    ء م ِ b n/a 2 184 185 192 198 202 N
    ء ن َ b n/a 2 184 185 192 198 202 N
    ء ن ُ b n/a 2 184 185 192 198 202 N
    ء ن ِ b n/a 2 184 185 192 198 202 N
    ء ه َ b n/a 2 181 187 192 199 203 N
    ء ه ُ b n/a 2 181 187 192 199 203 N
    ء ه ِ b n/a 2 181 187 192 199 203 N
    ء و َ b n/a 2 184 185 192 198 202 N
    ء و ُ b n/a 2 184 185 192 198 202 N
    ء و ِ b n/a 2 184 185 192 198 202 N
    ء ي َ b n/a 2 181 187 192 198 202 N
    ء ي ُ b n/a 2 181 187 192 198 202 N
    ء ي ِ b n/a 2 181 187 192 198 202 N
    ء َ َ b n/a 2 181 187 192 198 202 N
    ء َ َ i n/a 2 182 186 190 196 201 N
    ء َ ُ b n/a 2 181 187 192 198 202 N
    ء َ ِ b n/a 2 181 187 192 198 202 N
    ء ُ َ i n/a 2 182 186 190 196 201 N
    ء ُ ُ i n/a 2 182 186 190 196 201 N
    ء ِ َ i n/a 2 182 186 190 196 201 N
    ء ِ ِ i n/a 2 182 186 190 196 201 N
    ا SIL ت b n/a 3 205 226 238 249 262 N
    ا SIL س b n/a 3 205 226 238 249 263 N
    ا SIL ل b n/a 3 205 226 239 250 255 N
    ا _ ل i n/a 3 211 219 237 248 256 N
    ا ا ت
    ...
    I use Ubuntu 16.04 LTS (Linux-Distribution) in build, am following this https://cmusphinx.github.io/wiki/tutorialam/ to build my own model to recognize some words (500-1500)
    am prepared some files to build similar to the following files:
    our_db.dic - Phonetic dictionary
    your_db.phone - Phoneset file
    your_db.lm.DMP - Language model
    your_db.filler - List of fillers
    your_db_train.fileids - List of files for training
    your_db_train.transcription - Transcription for training
    your_db_test.fileids - List of files for testing
    your_db_test.transcription - Transcription for testing

     
    • Nickolay V. Shmyrev

      It is better to use English letters for phonemes. Tutorial says that.

       
  • Seif Mostafa

    Seif Mostafa - 2017-08-09

    yes, Nickolay i knew that but Can I use this or it will be rejected?
    Thanks!

     
    • Nickolay V. Shmyrev

      Use English letters for phonemes.

       
  • Seif Mostafa

    Seif Mostafa - 2017-08-09

    Yes, Is it your opinion or I MUST do it using English letters? and Is there any tool can I pass .txt file contains my arabic words and it's output the phonenmes?

     
    • Nickolay V. Shmyrev

      Yes, Is it your opinion or I MUST do it using English letters?

      You must use English letters

      and Is there any tool can I pass .txt file contains my arabic words and it's output the phonenmes?

      echo مرحبا | espeak -v ar -x --sep=" "
      مرحبا m r H b 'a a
      
       

      Last edit: Nickolay V. Shmyrev 2017-08-11
  • Seif Mostafa

    Seif Mostafa - 2017-08-09

    Nickolay, Can you reply me plz?

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.