|
From: Sean T. <se...@se...> - 2015-02-16 14:18:44
|
I wanted to bring up the integration of the very useful kaldi_io package
that Jan Chorowski made available in December. Is there any consensus on
whether to provide this code as (probably an optional) part of the Kaldi
release? I understand that Boost-Python is a relatively heavy requirement,
but it is easily available on OSX and Linux.
I continue to wrap the executables themselves in python functional
wrappers, which has made the integration with other software easier, and
contributes to pipeline testability and robustness.
-- Sean
On Fri, Dec 26, 2014 at 11:02 AM, Sean True <se...@se...>
wrote:
> I wanted to echo Ondrej's comment about preferring Python to bash/perl for
> scripting. Python wrappers for the command line utilities are useful ...
> I've spent a few hours systematically wrapping them, parsing the output of
> the --help command as a guide to functionality.
>
> This gives wrappers of the general form:
>
> def acc_lda(transition_gmm_model, features_rspecifier,
> posteriors_rspecifier, lda_acc_out, *args, **kwargs):
> """Accumulate LDA statistics based on pdf-ids.
> Executable usage: acc-lda [options] <transition-gmm/model>
> <features-rspecifier> <posteriors-rspecifier> <lda-acc-out>
> Options:
> binary: Write accumulators in binary mode. (bool,true)
> rand_prune: Randomized pruning threshold for posteriors (float,0)"""
> cmd = sh.Command(kaldi_path("src/bin/acc-lda"))
> option_defs = {'binary': ('binary', 'bool', 'true'), 'help': ('help',
> 'bool', 'false'), 'rand_prune': ('rand-prune', 'float', '0'), 'config':
> ('config', 'string', ''), 'print_args': ('print-args', 'bool', 'true'),
> 'verbose': ('verbose', 'int', '0')}
> myOptions = create_options(option_defs, kwargs)
> myArgs = [transition_gmm_model, features_rspecifier,
> posteriors_rspecifier, lda_acc_out]+list(args)
> return cmd (myOptions + myArgs)
>
> There are some refinements that could be added (*args does not make sense
> for this function).
> Because of the rather elegant Python sh package (
> https://pypi.python.org/pypi/sh) these functions will create pipelines if
> composed:
>
> >>> from sh import ls, wc
>
> >>> wc(ls("."))
>
> 8 23 222
>
> There are a few places where constructing from help output is not
> straightforward (for instance, fstrand --help does
> not do the expected thing).
>
> -- Sean
>
> On Fri, Dec 19, 2014 at 6:48 AM, Ondrej Platek <ond...@gm...>
> wrote:
> >
> > Hi Matthew,
> >
> > I made some subjective comments below.
> >
> > PS: Note that I like the proposed wrappers, but I am not sure how
> boost::python is easy to install on all supported platforms.
> >
> > On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <mat...@gm...>
> wrote:
> >>
> >> Hi
> >>
> >> Apologies, I've been snowed under here.
> >>
> >> I haven' had a chance to look over your work. I also don't have any
> views on the 'right' way to do it. My thoughts on this are in a previous
> thread. See subject "Using SWIG to wrap kaldi for python" where I discussed
> this with ondrej platek and
> >> Vassil Panayotov.
> >>
> >> In the idlak branch there is an example of python wrappers that I put
> together some time ago. These are based on SWIG. In the end I didn't need
> this at this stage because in the build system command line executables
> work very well. Its in run time wrappers are very useful. The advantage
> with SWIG is that the much of the same work will also contribute to C#,
> Java, Perl wrappers as well. In my experience the most important were Java
> wrappers to help produce a library for Android. I have no experience with
> C# and moved to Python from Perl so only use Perl in legacy code ;-).
> >>
> >> So some questions to consider:
> >>
> >> 1. Why is python wrapping required for training. using sys.Process to
> run command lines, structured output directories etc mirrors the current
> Perl recipes, what is the added benefit in this case?
> >
> > Well bash and Perl is the current scripting language for Kaldi. For
> example I prefer to use Python instead of both of them.
> >
> >>
> >> 2. If its for run time decoding shouldn't we create a cross platfom C
> API? Perhaps things have changed but C++ APIs were never cross compiler
> compatible in the past so you couldn't do stuff like compile using gnu and
> link in MSN. With a C interface you can distribute libraries. But I am
> possibly out of date on this.
> >
> > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and
> I was not able to wrap OpenFST with just plain C (It may be possible).
> > I used Cython and pyfst mainly because pyfst solved for me wrapping up
> OpenFST and I am really glad that 99% of wrapping OpenFST templates was
> carried out by somebody else (Victor Chahuneau).
> >>
> >>
> >> 3. If 2 is correct shouldn't we define our API and wrap that? Producing
> a formal list of functionality that should be exposed to things like client
> and server applications?
> >>
> >>
> >> I would encourage some care here. Unconstrained wrapping can lead to
> systems which HAVE to use the scripting language (We can already see how
> difficult it is to move away from the Perl scripting if you wish to). Also
> never, never, never reverse wrap (i.e. call python from within C++), yes it
> can be done but that way lays madness.
> >>
> >> v best
> >>
> >> Matthew
> >>
> >>
> >> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...>
> wrote:
> >>>
> >>> Jan-
> >>> I haven't seen any objections to your setup. I'd say we should plan
> >>> to include it in Kaldi at some point (e.g. within the next few
> >>> months), but in the meantime hopefully you can continue to work on it,
> >>> and maybe come up with some other examples of how it's useful to do
> >>> the interfacing with Python- e.g. some kind of application level or
> >>> service-level thing?
> >>> Dan
> >>>
> >>>
> >>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...>
> wrote:
> >>> > Hi Jan,
> >>> > This is very nice work! In our PDNN toolkit, we also have simple
> python
> >>> > wrappers to read and write Kaldi features, mainly for DNN training.
> Your
> >>> > implementation looks like a more comprehensive version.
> >>> >
> >>> > Do you have the functions/commands to do feature splicing? I ask this
> >>> > because we found doing splicing on the fly with Python highly
> expensive.
> >>> > That's why we still stick to PFiles instead of Kaldi features (.scp
> .ark)
> >>> > for DNN triaining. I am very interested to know the efficiency of
> your
> >>> > splicing implementation.
> >>> >
> >>> > Thanks,
> >>> > Yajie
> >>> >
> >>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...>
> wrote:
> >>> >>
> >>> >> OK, thanks.
> >>> >> cc'ing Yajie in case he wants to comment.
> >>> >> Dan
> >>> >>
> >>> >>
> >>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski <
> jan...@gm...>
> >>> >> wrote:
> >>> >> > Hi All,
> >>> >> >
> >>> >> > the wrapper is built during Kaldi compilation. I build it using
> provided
> >>> >> > Makefile. The build depends on:
> >>> >> > 1. Python and numpy (by default it queries the python interpreter
> found
> >>> >> > on
> >>> >> > the path for header file location)
> >>> >> > 2. Boost with Boost::Python library. It is quite heavy to build,
> but
> >>> >> > most
> >>> >> > Linux distributions ship it. Boost python doesn't require any code
> >>> >> > generation steps, the wrapper is defined in a normal c++ code
> file.
> >>> >> >
> >>> >> > During build Python and Boost libraries and Kaldi object files are
> >>> >> > linked
> >>> >> > into a CPython extention module,
> kaldi/src/python/kaldi_io_internal.so.
> >>> >> > It
> >>> >> > works with both static and shared Kaldi builds. Further usage
> requires
> >>> >> > that
> >>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the
> PYTHONPATH - it
> >>> >> > can
> >>> >> > be for example added to the PYTHONPATH variable in the path.sh
> script of
> >>> >> > a
> >>> >> > recipe.
> >>> >> >
> >>> >> > Jan
> >>> >> >
> >>> >> >
> >>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote:
> >>> >> >>
> >>> >> >> Also, Jan- could you send us an email explaining how this works-
> >>> >> >> How does Python "see" the C++ headers? Do you have to invoke
> some
> >>> >> >> special program, like swig? Do you have to write some special
> kind of
> >>> >> >> header that shows how the C++ objects are to be interpreted by
> python?
> >>> >> >> A brief example would be helpful, if so.
> >>> >> >> How is the resulting program linked, if at all? If you
> require
> >>> >> >> functions C++ libraries, are these obtained from the .a or .so
> files
> >>> >> >> at runtime, or compiled into some kind of executable-like blob at
> >>> >> >> compile time? Does your framework require that Kaldi be compiled
> >>> >> >> using dynamic (.so) libraries?
> >>> >> >>
> >>> >> >> Dan
> >>> >> >>
> >>> >> >>
> >>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski
> >>> >> >> <jan...@gm...>
> >>> >> >> wrote:
> >>> >> >>>
> >>> >> >>> Hello Dan,
> >>> >> >>>
> >>> >> >>> thank you for the comments. I tried to make it in the Kaldi
> spirit,
> >>> >> >>> consistency is important. Of course, the scripts can be removed
> and
> >>> >> >>> replaced
> >>> >> >>> with some more useful examples. I don't have too much
> experience with
> >>> >> >>> bridging Python to C++, so any critique on the wrappers and the
> >>> >> >>> approach
> >>> >> >>> taken is welcome.
> >>> >> >>>
> >>> >> >>> Jan
> >>> >> >>>
> >>> >> >>>
> >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote:
> >>> >> >>>>
> >>> >> >>>> Hi all.
> >>> >> >>>> From a first look, it does look very impressive, and nicely
> >>> >> >>>> documented.
> >>> >> >>>> I would appreciate it if people on the list who have Python
> >>> >> >>>> experience
> >>> >> >>>> would comment on this- you can either reply to this thread, or
> to me.
> >>> >> >>>> I don't know if this has been done in the "natural" way, or if
> there
> >>> >> >>>> is some reason why people in the future will say, "why did you
> do it
> >>> >> >>>> this way, you should have done XXX".
> >>> >> >>>>
> >>> >> >>>> Jan:
> >>> >> >>>> in the scripts/ directory you seem to have some examples of
> how you
> >>> >> >>>> can create python programs that behave very much like Kaldi
> >>> >> >>>> command-line programs, using your framework. This is very
> useful.
> >>> >> >>>> However, the programs
> >>> >> >>>> apply-global-cmvn.py
> >>> >> >>>> compute-global-cmvn-stats.py
> >>> >> >>>> are perhaps a little confusing because they provide the same
> >>> >> >>>> functionality that you could get with "compute-cmvn-stats ->
> >>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command;
> and they
> >>> >> >>>> do so using different formats for the CMVN information. I
> know the
> >>> >> >>>> format of storing the CMVN stats in a two-row matrix is
> perhaps not
> >>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it would
> be
> >>> >> >>>> confusing to deviate from that standard.
> >>> >> >>>> Of course, this is a very minor issue that doesn't affect the
> >>> >> >>>> validity
> >>> >> >>>> of the framework as a whole. I am just pointing this out; the
> main
> >>> >> >>>> discussion should be about the framework and whether people
> feel it's
> >>> >> >>>> the "right" way to do this.
> >>> >> >>>>
> >>> >> >>>> Dan
> >>> >> >>>>
> >>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski
> >>> >> >>>> <jan...@gm...>
> >>> >> >>>> wrote:
> >>> >> >>>>>
> >>> >> >>>>> Hi all!
> >>> >> >>>>>
> >>> >> >>>>> I've written wrappers to access Kaldi data files from within
> Python
> >>> >> >>>>> using boost::python (the code is on github
> >>> >> >>>>>
> https://github.com/janchorowski/kaldi-git/tree/python/src/python).
> >>> >> >>>>> If
> >>> >> >>>>> you think this would be an interesting addition please
> instruct me
> >>> >> >>>>> how
> >>> >> >>>>> to contribute.
> >>> >> >>>>>
> >>> >> >>>>> Best Regards,
> >>> >> >>>>> Jan Chorowski
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> ------------------------------------------------------------------------------
> >>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT
> Server
> >>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and
> >>> >> >>>>> Dashboards
> >>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App
> Integration &
> >>> >> >>>>> more
> >>> >> >>>>> Get technology previously reserved for billion-dollar
> corporations,
> >>> >> >>>>> FREE
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> >>> >> >>>>>
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> >>> >> >>>>> _______________________________________________
> >>> >> >>>>> Kaldi-developers mailing list
> >>> >> >>>>> Kal...@li...
> >>> >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
> >>> >> >>>
> >>> >> >>>
> >>> >> >
> >>> >
> >>> >
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> >>> from Actuate! Instantly Supercharge Your Business Reports and
> Dashboards
> >>> with Interactivity, Sharing, Native Excel Exports, App Integration &
> more
> >>> Get technology previously reserved for billion-dollar corporations,
> FREE
> >>>
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> >>> _______________________________________________
> >>> Kaldi-developers mailing list
> >>> Kal...@li...
> >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> >> with Interactivity, Sharing, Native Excel Exports, App Integration &
> more
> >> Get technology previously reserved for billion-dollar corporations, FREE
> >>
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> >> _______________________________________________
> >> Kaldi-developers mailing list
> >> Kal...@li...
> >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
> >>
> >
> >
> > --
> > Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
> ond...@gm...
> >
> >
> ------------------------------------------------------------------------------
> > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> > from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> > with Interactivity, Sharing, Native Excel Exports, App Integration & more
> > Get technology previously reserved for billion-dollar corporations, FREE
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> > _______________________________________________
> > Kaldi-developers mailing list
> > Kal...@li...
> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers
> >
>
|