Menu

phone or word recognition ?

Help
Anonymous
2003-02-20
2012-09-22
  • Anonymous

    Anonymous - 2003-02-20

    Hi,

    I want to use the Sphinx Engine for Integrity Protection of speechdata (detecting manipulations etc). My idea is to use the phonemes as the relevant features to protect the content (like a fingerprint).

    My questions:
    - How robust does the allphone mode work: independance of speaker? of dictionary? of language? same-speaker-but-different-time? ...
    - Should I better use Sphinx 2 or 3?
    - Would it be better to use only the phoems or to use the phonem-transcription of the (dictionary based) detected words?

    I know the allphone mode has been adressed several times before...but I really didn't get it.

    thx so much :-)

    Sascha

     
    • Ivan Uemlianin

      Ivan Uemlianin - 2003-02-21

      Allphone mode works fine.  You don't need a dictionary or a language model.  You do need a phoneset - and I imagine you'll want a fairly fine-grained phoneset for this job, which means you'll be training up your own acoustic model (I imagine - if you know any good AMs 'out there' please tell the group :-).

      Are you interested in doing speaker recognition/differentiation?  I wouldn't have thought phonemic transcription was fine grained enough for speaker recognition. Similarly with detecting manipulations - wouldn't it be better to work with the audioo data (eg look for unfeasible transitions in the spectral or cepstral files) ... unless you train 'phones' of each manipulation you're interested in ...

       
      • Anonymous

        Anonymous - 2003-02-24

        thx Ivan,

        we haven't had speaker detection/diffenrentiation in mind.
        The idea is to use content-dependend features. Manipulations of the content (cropping, reassembling words e.g.) should lead to a different feature extraction result - detecting the location of the manipulation. Why I choose phonems (better wanna try...):
        - Phonemic features may be more robust to (allowed) transformations like DA/AD-Conversion or audio compression than other, spectrum/cepstrum-based features.
        - Phonemic features offer a very(!) low payload description of the content.

        so: what about the AM provided with SPhinx? Are they "good" enough? Shouls I use sphinx2-(all)phone or s3allphone?

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.