Menu

Kaldi Behind a REST API

Developers
fatlog
2015-02-24
2015-05-14
  • fatlog

    fatlog - 2015-02-24

    Hi,

    I am working on a project where we are looking to possibly put Kaldi behind a REST API.
    I was wondering if there are suggested/preferred ways of doing this.
    In the documentation it is suggested to use GridEngine to deploy Kaldi to work in parallel.
    Would this be considered "production" ready? i.e. Could you deploy Kaldi on GridEngine and build a REST API to wrap it?
    Or are you better off just having X number of slaves, each with their own version of Kaldi and wrap this with an API?
    Is there any reason not to use GridEngine in a production environment?

    I suppose what I am really looking for here is advice on how best to deploy Kaldi in a cloud type environment so that it can handle a large volume of requests and is both scalable and performant.

    Thanks
    Robert

     
    • Jan "yenda" Trmal

      I'd say Kaldi SGE interface is in the stage of being production ready. It
      is being used on daily basis at many both research and commercial sites.

      SGE itself is not being actively developed (AFAIK) -- Sun Grid Engine was
      bought by Oracle (and it was renamed to Oracle Grid Engine). Oracle, for
      some reason, lost interest in it and either gave it or sold it or
      transferred the rights to Univa, so now there is Univa Grid Engine. Univa
      is your company if you want/need commercial support -- I think they provide
      support for SGE and OGE as well.
      The SGE predecessor was PBS (portable batch system) and OpenPBS (I'm not
      sure of their relationship). There is open source project Torque, which
      continues on the top of the OpenPBS codebase to add new features and
      bugfixes.
      I'm mentioning it because I believe the job submission interface is largely
      the same so Kaldi SGE interface could work on OpenPBS/Torque as well. I'm
      stressing the 'could', I have no experience with that. Perhaps someone
      else could confirm or disclaim this?

      For some reason, lot of HPC sites are using SLURM nowadays. We have a SLURM
      interface as well. It's not used that much (as far as I know it is/was used
      in ICSI) but SLURM itself is actively developed and you can buy commercial
      support for it as well.

      y.

      On Tue, Feb 24, 2015 at 5:54 AM, Robert robertoregan@users.sf.net wrote:

      Hi,

      I am working on a project where we are looking to possibly put Kaldi
      behind a REST API.
      I was wondering if there are suggested/preferred ways of doing this.
      In the documentation it is suggested to use GridEngine to deploy Kaldi to
      work in parallel.
      Would this be considered "production" ready? i.e. Could you deploy Kaldi
      on GridEngine and build a REST API to wrap it?
      Or are you better off just having X number of slaves, each with their own
      version of Kaldi and wrap this with an API?
      Is there any reason not to use GridEngine in a production environment?

      I suppose what I am really looking for here is advice on how best to
      deploy Kaldi in a cloud type environment so that it can handle a large
      volume of requests and is both scalable and performant.

      Thanks
      Robert


      Kaldi Behind a REST API
      https://sourceforge.net/p/kaldi/discussion/1355349/thread/51df5dd6/?limit=25#2ad5


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355349/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
      • RickyChan

        RickyChan - 2015-05-14

        To let the job submission interface (I guess you meant queue.pl ) works on PBS (portable batch system), it requires some minimum changes:

        variable: $SGE_TASK_ID -> $PBS_ARRAYID
        qsub: qsub in PBS doesn't support "-cwd" and "-j y", just remove them when jobs are submitted on PBS
        qstat command: qstat -j $sge_job_id -> qstat -t $sge_job_id


        cue for process $queue_logfile in queue.pl:
        SGE qsub output -> Your job job_id *** has been submitted
        PBS qsub output -> job_id

         
  • fatlog

    fatlog - 2015-02-24

    Hi Jan,

    Thanks for the feedback and advice.
    I have been playing around with StarCluster to manage cluster deployment with SGE and it seems fairly stable to me at present. I'll continue to investigate this.

    Separately, would you have any other suggestions on how you would deploy Kaldi in a cloud type environment?

    Are there specific ways of deploying it or am I realistically looking at spinning up an AWS instance for example (possibly from a custom image with Kaldi pre-installed), building a web service wrapper on top of this and managing instance lists in the applicatrion itself?

    Thanks
    Robert

     
    • Jan "yenda" Trmal

      I think StarCluster is a good solution. There is Rocks cluster as well, but
      afaik it does not have the AWS dynamicity. Dan wrote some scripts and
      documentation dealing with Amazon EC2 and SGE and Kaldi -- you can find it
      here: https://sourceforge.net/projects/kluster/
      But I think StarCluster has much of the functionality already built in.

      After that, you are on your own I guess. There is something that is called
      DRMAA. I don't know the meaning of the abbreviation, but you can see it as
      a "job API", which allows you to submit jobs and to query about their
      properties without running the shell commands. Both SGE and Torque (and I
      think SLURM as well) support that API (again, I don't know how much or how
      good is the support), but I think this might be the way to go -- there are
      bindings for most of the scripting languages, including PHP and Java, I
      think, so you could avoid some of the complexities of using shell calls in
      your setup.

      y.

      On Tue, Feb 24, 2015 at 11:17 AM, Robert robertoregan@users.sf.net wrote:

      Hi Jan,

      Thanks for the feedback and advice.
      I have been playing around with StarCluster to manage cluster deployment
      with SGE and it seems fairly stable to me at present. I'll continue to
      investigate this.

      Separately, would you have any other suggestions on how you would deploy
      Kaldi in a cloud type environment?

      Are there specific ways of deploying it or am I realistically looking at
      spinning up an AWS instance for example (possibly from a custom image with
      Kaldi pre-installed), building a web service wrapper on top of this and
      managing instance lists in the applicatrion itself?

      Thanks
      Robert


      Kaldi Behind a REST API
      https://sourceforge.net/p/kaldi/discussion/1355349/thread/51df5dd6/?limit=25#f6ca


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/kaldi/discussion/1355349/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       
  • fatlog

    fatlog - 2015-02-24

    Great, thanks for all the help.
    Really appreciate it.