|
From: Sean T. <se...@se...> - 2014-12-29 06:52:20
|
I wanted to echo Ondrej's comment about preferring Python to bash/perl for
scripting. Python wrappers for the command line utilities are useful ...
I've spent a few hours systematically wrapping them, parsing the output of
the --help command as a guide to functionality.
This gives wrappers of the general form:
def acc_lda(transition_gmm_model, features_rspecifier,
posteriors_rspecifier, lda_acc_out, *args, **kwargs):
"""Accumulate LDA statistics based on pdf-ids.
Executable usage: acc-lda [options] <transition-gmm/model>
<features-rspecifier> <posteriors-rspecifier> <lda-acc-out>
Options:
binary: Write accumulators in binary mode. (bool,true)
rand_prune: Randomized pruning threshold for posteriors (float,0)"""
cmd = sh.Command(kaldi_path("src/bin/acc-lda"))
option_defs = {'binary': ('binary', 'bool', 'true'), 'help': ('help',
'bool', 'false'), 'rand_prune': ('rand-prune', 'float', '0'), 'config':
('config', 'string', ''), 'print_args': ('print-args', 'bool', 'true'),
'verbose': ('verbose', 'int', '0')}
myOptions = create_options(option_defs, kwargs)
myArgs = [transition_gmm_model, features_rspecifier,
posteriors_rspecifier, lda_acc_out]+list(args)
return cmd (myOptions + myArgs)
There are some refinements that could be added (*args does not make sense
for this function).
Because of the rather elegant Python sh package (
https://pypi.python.org/pypi/sh) these functions will create pipelines if
composed:
>>> from sh import ls, wc
>>> wc(ls("."))
8 23 222
There are a few places where constructing from help output is not
straightforward (for instance, fstrand --help does
not do the expected thing).
-- Sean
On Fri, Dec 19, 2014 at 6:48 AM, Ondrej Platek <ond...@gm...>
wrote:
>
> Hi Matthew,
>
> I made some subjective comments below.
>
> PS: Note that I like the proposed wrappers, but I am not sure how
boost::python is easy to install on all supported platforms.
>
> On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <mat...@gm...>
wrote:
>>
>> Hi
>>
>> Apologies, I've been snowed under here.
>>
>> I haven' had a chance to look over your work. I also don't have any
views on the 'right' way to do it. My thoughts on this are in a previous
thread. See subject "Using SWIG to wrap kaldi for python" where I discussed
this with ondrej platek and
>> Vassil Panayotov.
>>
>> In the idlak branch there is an example of python wrappers that I put
together some time ago. These are based on SWIG. In the end I didn't need
this at this stage because in the build system command line executables
work very well. Its in run time wrappers are very useful. The advantage
with SWIG is that the much of the same work will also contribute to C#,
Java, Perl wrappers as well. In my experience the most important were Java
wrappers to help produce a library for Android. I have no experience with
C# and moved to Python from Perl so only use Perl in legacy code ;-).
>>
>> So some questions to consider:
>>
>> 1. Why is python wrapping required for training. using sys.Process to
run command lines, structured output directories etc mirrors the current
Perl recipes, what is the added benefit in this case?
>
> Well bash and Perl is the current scripting language for Kaldi. For
example I prefer to use Python instead of both of them.
>
>>
>> 2. If its for run time decoding shouldn't we create a cross platfom C
API? Perhaps things have changed but C++ APIs were never cross compiler
compatible in the past so you couldn't do stuff like compile using gnu and
link in MSN. With a C interface you can distribute libraries. But I am
possibly out of date on this.
>
> Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and I
was not able to wrap OpenFST with just plain C (It may be possible).
> I used Cython and pyfst mainly because pyfst solved for me wrapping up
OpenFST and I am really glad that 99% of wrapping OpenFST templates was
carried out by somebody else (Victor Chahuneau).
>>
>>
>> 3. If 2 is correct shouldn't we define our API and wrap that? Producing
a formal list of functionality that should be exposed to things like client
and server applications?
>>
>>
>> I would encourage some care here. Unconstrained wrapping can lead to
systems which HAVE to use the scripting language (We can already see how
difficult it is to move away from the Perl scripting if you wish to). Also
never, never, never reverse wrap (i.e. call python from within C++), yes it
can be done but that way lays madness.
>>
>> v best
>>
>> Matthew
>>
>>
>> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> wrote:
>>>
>>> Jan-
>>> I haven't seen any objections to your setup. I'd say we should plan
>>> to include it in Kaldi at some point (e.g. within the next few
>>> months), but in the meantime hopefully you can continue to work on it,
>>> and maybe come up with some other examples of how it's useful to do
>>> the interfacing with Python- e.g. some kind of application level or
>>> service-level thing?
>>> Dan
>>>
>>>
>>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> wrote:
>>> > Hi Jan,
>>> > This is very nice work! In our PDNN toolkit, we also have simple
python
>>> > wrappers to read and write Kaldi features, mainly for DNN training.
Your
>>> > implementation looks like a more comprehensive version.
>>> >
>>> > Do you have the functions/commands to do feature splicing? I ask this
>>> > because we found doing splicing on the fly with Python highly
expensive.
>>> > That's why we still stick to PFiles instead of Kaldi features (.scp
.ark)
>>> > for DNN triaining. I am very interested to know the efficiency of
your
>>> > splicing implementation.
>>> >
>>> > Thanks,
>>> > Yajie
>>> >
>>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...>
wrote:
>>> >>
>>> >> OK, thanks.
>>> >> cc'ing Yajie in case he wants to comment.
>>> >> Dan
>>> >>
>>> >>
>>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski <
jan...@gm...>
>>> >> wrote:
>>> >> > Hi All,
>>> >> >
>>> >> > the wrapper is built during Kaldi compilation. I build it using
provided
>>> >> > Makefile. The build depends on:
>>> >> > 1. Python and numpy (by default it queries the python interpreter
found
>>> >> > on
>>> >> > the path for header file location)
>>> >> > 2. Boost with Boost::Python library. It is quite heavy to build,
but
>>> >> > most
>>> >> > Linux distributions ship it. Boost python doesn't require any code
>>> >> > generation steps, the wrapper is defined in a normal c++ code file.
>>> >> >
>>> >> > During build Python and Boost libraries and Kaldi object files are
>>> >> > linked
>>> >> > into a CPython extention module,
kaldi/src/python/kaldi_io_internal.so.
>>> >> > It
>>> >> > works with both static and shared Kaldi builds. Further usage
requires
>>> >> > that
>>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the
PYTHONPATH - it
>>> >> > can
>>> >> > be for example added to the PYTHONPATH variable in the path.sh
script of
>>> >> > a
>>> >> > recipe.
>>> >> >
>>> >> > Jan
>>> >> >
>>> >> >
>>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote:
>>> >> >>
>>> >> >> Also, Jan- could you send us an email explaining how this works-
>>> >> >> How does Python "see" the C++ headers? Do you have to invoke
some
>>> >> >> special program, like swig? Do you have to write some special
kind of
>>> >> >> header that shows how the C++ objects are to be interpreted by
python?
>>> >> >> A brief example would be helpful, if so.
>>> >> >> How is the resulting program linked, if at all? If you require
>>> >> >> functions C++ libraries, are these obtained from the .a or .so
files
>>> >> >> at runtime, or compiled into some kind of executable-like blob at
>>> >> >> compile time? Does your framework require that Kaldi be compiled
>>> >> >> using dynamic (.so) libraries?
>>> >> >>
>>> >> >> Dan
>>> >> >>
>>> >> >>
>>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski
>>> >> >> <jan...@gm...>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>> Hello Dan,
>>> >> >>>
>>> >> >>> thank you for the comments. I tried to make it in the Kaldi
spirit,
>>> >> >>> consistency is important. Of course, the scripts can be removed
and
>>> >> >>> replaced
>>> >> >>> with some more useful examples. I don't have too much experience
with
>>> >> >>> bridging Python to C++, so any critique on the wrappers and the
>>> >> >>> approach
>>> >> >>> taken is welcome.
>>> >> >>>
>>> >> >>> Jan
>>> >> >>>
>>> >> >>>
>>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote:
>>> >> >>>>
>>> >> >>>> Hi all.
>>> >> >>>> From a first look, it does look very impressive, and nicely
>>> >> >>>> documented.
>>> >> >>>> I would appreciate it if people on the list who have Python
>>> >> >>>> experience
>>> >> >>>> would comment on this- you can either reply to this thread, or
to me.
>>> >> >>>> I don't know if this has been done in the "natural" way, or if
there
>>> >> >>>> is some reason why people in the future will say, "why did you
do it
>>> >> >>>> this way, you should have done XXX".
>>> >> >>>>
>>> >> >>>> Jan:
>>> >> >>>> in the scripts/ directory you seem to have some examples of how
you
>>> >> >>>> can create python programs that behave very much like Kaldi
>>> >> >>>> command-line programs, using your framework. This is very
useful.
>>> >> >>>> However, the programs
>>> >> >>>> apply-global-cmvn.py
>>> >> >>>> compute-global-cmvn-stats.py
>>> >> >>>> are perhaps a little confusing because they provide the same
>>> >> >>>> functionality that you could get with "compute-cmvn-stats ->
>>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; and
they
>>> >> >>>> do so using different formats for the CMVN information. I know
the
>>> >> >>>> format of storing the CMVN stats in a two-row matrix is perhaps
not
>>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it would
be
>>> >> >>>> confusing to deviate from that standard.
>>> >> >>>> Of course, this is a very minor issue that doesn't affect the
>>> >> >>>> validity
>>> >> >>>> of the framework as a whole. I am just pointing this out; the
main
>>> >> >>>> discussion should be about the framework and whether people
feel it's
>>> >> >>>> the "right" way to do this.
>>> >> >>>>
>>> >> >>>> Dan
>>> >> >>>>
>>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski
>>> >> >>>> <jan...@gm...>
>>> >> >>>> wrote:
>>> >> >>>>>
>>> >> >>>>> Hi all!
>>> >> >>>>>
>>> >> >>>>> I've written wrappers to access Kaldi data files from within
Python
>>> >> >>>>> using boost::python (the code is on github
>>> >> >>>>>
https://github.com/janchorowski/kaldi-git/tree/python/src/python).
>>> >> >>>>> If
>>> >> >>>>> you think this would be an interesting addition please
instruct me
>>> >> >>>>> how
>>> >> >>>>> to contribute.
>>> >> >>>>>
>>> >> >>>>> Best Regards,
>>> >> >>>>> Jan Chorowski
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
------------------------------------------------------------------------------
>>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT
Server
>>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and
>>> >> >>>>> Dashboards
>>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App
Integration &
>>> >> >>>>> more
>>> >> >>>>> Get technology previously reserved for billion-dollar
corporations,
>>> >> >>>>> FREE
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
>>> >> >>>>>
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>> >> >>>>> _______________________________________________
>>> >> >>>>> Kaldi-developers mailing list
>>> >> >>>>> Kal...@li...
>>> >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>>> >> >>>
>>> >> >>>
>>> >> >
>>> >
>>> >
>>>
>>>
------------------------------------------------------------------------------
>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>>> with Interactivity, Sharing, Native Excel Exports, App Integration &
more
>>> Get technology previously reserved for billion-dollar corporations, FREE
>>>
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Kaldi-developers mailing list
>>> Kal...@li...
>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>>
>>
>>
------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>>
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Kaldi-developers mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>>
>
>
> --
> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
ond...@gm...
>
>
------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> _______________________________________________
> Kaldi-developers mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>
|