I am working on a project where we are looking to possibly put Kaldi behind a REST API.
I was wondering if there are suggested/preferred ways of doing this.
In the documentation it is suggested to use GridEngine to deploy Kaldi to work in parallel.
Would this be considered "production" ready? i.e. Could you deploy Kaldi on GridEngine and build a REST API to wrap it?
Or are you better off just having X number of slaves, each with their own version of Kaldi and wrap this with an API?
Is there any reason not to use GridEngine in a production environment?
I suppose what I am really looking for here is advice on how best to deploy Kaldi in a cloud type environment so that it can handle a large volume of requests and is both scalable and performant.
Thanks
Robert
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'd say Kaldi SGE interface is in the stage of being production ready. It
is being used on daily basis at many both research and commercial sites.
SGE itself is not being actively developed (AFAIK) -- Sun Grid Engine was
bought by Oracle (and it was renamed to Oracle Grid Engine). Oracle, for
some reason, lost interest in it and either gave it or sold it or
transferred the rights to Univa, so now there is Univa Grid Engine. Univa
is your company if you want/need commercial support -- I think they provide
support for SGE and OGE as well.
The SGE predecessor was PBS (portable batch system) and OpenPBS (I'm not
sure of their relationship). There is open source project Torque, which
continues on the top of the OpenPBS codebase to add new features and
bugfixes.
I'm mentioning it because I believe the job submission interface is largely
the same so Kaldi SGE interface could work on OpenPBS/Torque as well. I'm
stressing the 'could', I have no experience with that. Perhaps someone
else could confirm or disclaim this?
For some reason, lot of HPC sites are using SLURM nowadays. We have a SLURM
interface as well. It's not used that much (as far as I know it is/was used
in ICSI) but SLURM itself is actively developed and you can buy commercial
support for it as well.
I am working on a project where we are looking to possibly put Kaldi
behind a REST API.
I was wondering if there are suggested/preferred ways of doing this.
In the documentation it is suggested to use GridEngine to deploy Kaldi to
work in parallel.
Would this be considered "production" ready? i.e. Could you deploy Kaldi
on GridEngine and build a REST API to wrap it?
Or are you better off just having X number of slaves, each with their own
version of Kaldi and wrap this with an API?
Is there any reason not to use GridEngine in a production environment?
I suppose what I am really looking for here is advice on how best to
deploy Kaldi in a cloud type environment so that it can handle a large
volume of requests and is both scalable and performant.
To let the job submission interface (I guess you meant queue.pl ) works on PBS (portable batch system), it requires some minimum changes:
variable: $SGE_TASK_ID -> $PBS_ARRAYID
qsub: qsub in PBS doesn't support "-cwd" and "-j y", just remove them when jobs are submitted on PBS
qstat command: qstat -j $sge_job_id -> qstat -t $sge_job_id
cue for process $queue_logfile in queue.pl:
SGE qsub output -> Your job job_id *** has been submitted
PBS qsub output -> job_id
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the feedback and advice.
I have been playing around with StarCluster to manage cluster deployment with SGE and it seems fairly stable to me at present. I'll continue to investigate this.
Separately, would you have any other suggestions on how you would deploy Kaldi in a cloud type environment?
Are there specific ways of deploying it or am I realistically looking at spinning up an AWS instance for example (possibly from a custom image with Kaldi pre-installed), building a web service wrapper on top of this and managing instance lists in the applicatrion itself?
Thanks
Robert
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think StarCluster is a good solution. There is Rocks cluster as well, but
afaik it does not have the AWS dynamicity. Dan wrote some scripts and
documentation dealing with Amazon EC2 and SGE and Kaldi -- you can find it
here: https://sourceforge.net/projects/kluster/
But I think StarCluster has much of the functionality already built in.
After that, you are on your own I guess. There is something that is called
DRMAA. I don't know the meaning of the abbreviation, but you can see it as
a "job API", which allows you to submit jobs and to query about their
properties without running the shell commands. Both SGE and Torque (and I
think SLURM as well) support that API (again, I don't know how much or how
good is the support), but I think this might be the way to go -- there are
bindings for most of the scripting languages, including PHP and Java, I
think, so you could avoid some of the complexities of using shell calls in
your setup.
Thanks for the feedback and advice.
I have been playing around with StarCluster to manage cluster deployment
with SGE and it seems fairly stable to me at present. I'll continue to
investigate this.
Separately, would you have any other suggestions on how you would deploy
Kaldi in a cloud type environment?
Are there specific ways of deploying it or am I realistically looking at
spinning up an AWS instance for example (possibly from a custom image with
Kaldi pre-installed), building a web service wrapper on top of this and
managing instance lists in the applicatrion itself?
Guys, in my mind SGE is something that's useful for training models, but
you'd probably want a completely different solution for scalable
recognition in a cloud environment.
Dan
Hi,
I am working on a project where we are looking to possibly put Kaldi behind a REST API.
I was wondering if there are suggested/preferred ways of doing this.
In the documentation it is suggested to use GridEngine to deploy Kaldi to work in parallel.
Would this be considered "production" ready? i.e. Could you deploy Kaldi on GridEngine and build a REST API to wrap it?
Or are you better off just having X number of slaves, each with their own version of Kaldi and wrap this with an API?
Is there any reason not to use GridEngine in a production environment?
I suppose what I am really looking for here is advice on how best to deploy Kaldi in a cloud type environment so that it can handle a large volume of requests and is both scalable and performant.
Thanks
Robert
I'd say Kaldi SGE interface is in the stage of being production ready. It
is being used on daily basis at many both research and commercial sites.
SGE itself is not being actively developed (AFAIK) -- Sun Grid Engine was
bought by Oracle (and it was renamed to Oracle Grid Engine). Oracle, for
some reason, lost interest in it and either gave it or sold it or
transferred the rights to Univa, so now there is Univa Grid Engine. Univa
is your company if you want/need commercial support -- I think they provide
support for SGE and OGE as well.
The SGE predecessor was PBS (portable batch system) and OpenPBS (I'm not
sure of their relationship). There is open source project Torque, which
continues on the top of the OpenPBS codebase to add new features and
bugfixes.
I'm mentioning it because I believe the job submission interface is largely
the same so Kaldi SGE interface could work on OpenPBS/Torque as well. I'm
stressing the 'could', I have no experience with that. Perhaps someone
else could confirm or disclaim this?
For some reason, lot of HPC sites are using SLURM nowadays. We have a SLURM
interface as well. It's not used that much (as far as I know it is/was used
in ICSI) but SLURM itself is actively developed and you can buy commercial
support for it as well.
y.
On Tue, Feb 24, 2015 at 5:54 AM, Robert robertoregan@users.sf.net wrote:
To let the job submission interface (I guess you meant queue.pl ) works on PBS (portable batch system), it requires some minimum changes:
variable: $SGE_TASK_ID -> $PBS_ARRAYID
qsub: qsub in PBS doesn't support "-cwd" and "-j y", just remove them when jobs are submitted on PBS
qstat command: qstat -j $sge_job_id -> qstat -t $sge_job_id
cue for process $queue_logfile in queue.pl:
SGE qsub output -> Your job job_id *** has been submitted
PBS qsub output -> job_id
Hi Jan,
Thanks for the feedback and advice.
I have been playing around with StarCluster to manage cluster deployment with SGE and it seems fairly stable to me at present. I'll continue to investigate this.
Separately, would you have any other suggestions on how you would deploy Kaldi in a cloud type environment?
Are there specific ways of deploying it or am I realistically looking at spinning up an AWS instance for example (possibly from a custom image with Kaldi pre-installed), building a web service wrapper on top of this and managing instance lists in the applicatrion itself?
Thanks
Robert
I think StarCluster is a good solution. There is Rocks cluster as well, but
afaik it does not have the AWS dynamicity. Dan wrote some scripts and
documentation dealing with Amazon EC2 and SGE and Kaldi -- you can find it
here: https://sourceforge.net/projects/kluster/
But I think StarCluster has much of the functionality already built in.
After that, you are on your own I guess. There is something that is called
DRMAA. I don't know the meaning of the abbreviation, but you can see it as
a "job API", which allows you to submit jobs and to query about their
properties without running the shell commands. Both SGE and Torque (and I
think SLURM as well) support that API (again, I don't know how much or how
good is the support), but I think this might be the way to go -- there are
bindings for most of the scripting languages, including PHP and Java, I
think, so you could avoid some of the complexities of using shell calls in
your setup.
y.
On Tue, Feb 24, 2015 at 11:17 AM, Robert robertoregan@users.sf.net wrote:
Great, thanks for all the help.
Really appreciate it.
Guys, in my mind SGE is something that's useful for training models, but
you'd probably want a completely different solution for scalable
recognition in a cloud environment.
Dan
On Tue, Feb 24, 2015 at 11:56 AM, Robert robertoregan@users.sf.net wrote: