From: Sean T. <se...@se...> - 2015-02-16 14:18:44
|
I wanted to bring up the integration of the very useful kaldi_io package that Jan Chorowski made available in December. Is there any consensus on whether to provide this code as (probably an optional) part of the Kaldi release? I understand that Boost-Python is a relatively heavy requirement, but it is easily available on OSX and Linux. I continue to wrap the executables themselves in python functional wrappers, which has made the integration with other software easier, and contributes to pipeline testability and robustness. -- Sean On Fri, Dec 26, 2014 at 11:02 AM, Sean True <se...@se...> wrote: > I wanted to echo Ondrej's comment about preferring Python to bash/perl for > scripting. Python wrappers for the command line utilities are useful ... > I've spent a few hours systematically wrapping them, parsing the output of > the --help command as a guide to functionality. > > This gives wrappers of the general form: > > def acc_lda(transition_gmm_model, features_rspecifier, > posteriors_rspecifier, lda_acc_out, *args, **kwargs): > """Accumulate LDA statistics based on pdf-ids. > Executable usage: acc-lda [options] <transition-gmm/model> > <features-rspecifier> <posteriors-rspecifier> <lda-acc-out> > Options: > binary: Write accumulators in binary mode. (bool,true) > rand_prune: Randomized pruning threshold for posteriors (float,0)""" > cmd = sh.Command(kaldi_path("src/bin/acc-lda")) > option_defs = {'binary': ('binary', 'bool', 'true'), 'help': ('help', > 'bool', 'false'), 'rand_prune': ('rand-prune', 'float', '0'), 'config': > ('config', 'string', ''), 'print_args': ('print-args', 'bool', 'true'), > 'verbose': ('verbose', 'int', '0')} > myOptions = create_options(option_defs, kwargs) > myArgs = [transition_gmm_model, features_rspecifier, > posteriors_rspecifier, lda_acc_out]+list(args) > return cmd (myOptions + myArgs) > > There are some refinements that could be added (*args does not make sense > for this function). > Because of the rather elegant Python sh package ( > https://pypi.python.org/pypi/sh) these functions will create pipelines if > composed: > > >>> from sh import ls, wc > > >>> wc(ls(".")) > > 8 23 222 > > There are a few places where constructing from help output is not > straightforward (for instance, fstrand --help does > not do the expected thing). > > -- Sean > > On Fri, Dec 19, 2014 at 6:48 AM, Ondrej Platek <ond...@gm...> > wrote: > > > > Hi Matthew, > > > > I made some subjective comments below. > > > > PS: Note that I like the proposed wrappers, but I am not sure how > boost::python is easy to install on all supported platforms. > > > > On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <mat...@gm...> > wrote: > >> > >> Hi > >> > >> Apologies, I've been snowed under here. > >> > >> I haven' had a chance to look over your work. I also don't have any > views on the 'right' way to do it. My thoughts on this are in a previous > thread. See subject "Using SWIG to wrap kaldi for python" where I discussed > this with ondrej platek and > >> Vassil Panayotov. > >> > >> In the idlak branch there is an example of python wrappers that I put > together some time ago. These are based on SWIG. In the end I didn't need > this at this stage because in the build system command line executables > work very well. Its in run time wrappers are very useful. The advantage > with SWIG is that the much of the same work will also contribute to C#, > Java, Perl wrappers as well. In my experience the most important were Java > wrappers to help produce a library for Android. I have no experience with > C# and moved to Python from Perl so only use Perl in legacy code ;-). > >> > >> So some questions to consider: > >> > >> 1. Why is python wrapping required for training. using sys.Process to > run command lines, structured output directories etc mirrors the current > Perl recipes, what is the added benefit in this case? > > > > Well bash and Perl is the current scripting language for Kaldi. For > example I prefer to use Python instead of both of them. > > > >> > >> 2. If its for run time decoding shouldn't we create a cross platfom C > API? Perhaps things have changed but C++ APIs were never cross compiler > compatible in the past so you couldn't do stuff like compile using gnu and > link in MSN. With a C interface you can distribute libraries. But I am > possibly out of date on this. > > > > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and > I was not able to wrap OpenFST with just plain C (It may be possible). > > I used Cython and pyfst mainly because pyfst solved for me wrapping up > OpenFST and I am really glad that 99% of wrapping OpenFST templates was > carried out by somebody else (Victor Chahuneau). > >> > >> > >> 3. If 2 is correct shouldn't we define our API and wrap that? Producing > a formal list of functionality that should be exposed to things like client > and server applications? > >> > >> > >> I would encourage some care here. Unconstrained wrapping can lead to > systems which HAVE to use the scripting language (We can already see how > difficult it is to move away from the Perl scripting if you wish to). Also > never, never, never reverse wrap (i.e. call python from within C++), yes it > can be done but that way lays madness. > >> > >> v best > >> > >> Matthew > >> > >> > >> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> > wrote: > >>> > >>> Jan- > >>> I haven't seen any objections to your setup. I'd say we should plan > >>> to include it in Kaldi at some point (e.g. within the next few > >>> months), but in the meantime hopefully you can continue to work on it, > >>> and maybe come up with some other examples of how it's useful to do > >>> the interfacing with Python- e.g. some kind of application level or > >>> service-level thing? > >>> Dan > >>> > >>> > >>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> > wrote: > >>> > Hi Jan, > >>> > This is very nice work! In our PDNN toolkit, we also have simple > python > >>> > wrappers to read and write Kaldi features, mainly for DNN training. > Your > >>> > implementation looks like a more comprehensive version. > >>> > > >>> > Do you have the functions/commands to do feature splicing? I ask this > >>> > because we found doing splicing on the fly with Python highly > expensive. > >>> > That's why we still stick to PFiles instead of Kaldi features (.scp > .ark) > >>> > for DNN triaining. I am very interested to know the efficiency of > your > >>> > splicing implementation. > >>> > > >>> > Thanks, > >>> > Yajie > >>> > > >>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> > wrote: > >>> >> > >>> >> OK, thanks. > >>> >> cc'ing Yajie in case he wants to comment. > >>> >> Dan > >>> >> > >>> >> > >>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski < > jan...@gm...> > >>> >> wrote: > >>> >> > Hi All, > >>> >> > > >>> >> > the wrapper is built during Kaldi compilation. I build it using > provided > >>> >> > Makefile. The build depends on: > >>> >> > 1. Python and numpy (by default it queries the python interpreter > found > >>> >> > on > >>> >> > the path for header file location) > >>> >> > 2. Boost with Boost::Python library. It is quite heavy to build, > but > >>> >> > most > >>> >> > Linux distributions ship it. Boost python doesn't require any code > >>> >> > generation steps, the wrapper is defined in a normal c++ code > file. > >>> >> > > >>> >> > During build Python and Boost libraries and Kaldi object files are > >>> >> > linked > >>> >> > into a CPython extention module, > kaldi/src/python/kaldi_io_internal.so. > >>> >> > It > >>> >> > works with both static and shared Kaldi builds. Further usage > requires > >>> >> > that > >>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the > PYTHONPATH - it > >>> >> > can > >>> >> > be for example added to the PYTHONPATH variable in the path.sh > script of > >>> >> > a > >>> >> > recipe. > >>> >> > > >>> >> > Jan > >>> >> > > >>> >> > > >>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: > >>> >> >> > >>> >> >> Also, Jan- could you send us an email explaining how this works- > >>> >> >> How does Python "see" the C++ headers? Do you have to invoke > some > >>> >> >> special program, like swig? Do you have to write some special > kind of > >>> >> >> header that shows how the C++ objects are to be interpreted by > python? > >>> >> >> A brief example would be helpful, if so. > >>> >> >> How is the resulting program linked, if at all? If you > require > >>> >> >> functions C++ libraries, are these obtained from the .a or .so > files > >>> >> >> at runtime, or compiled into some kind of executable-like blob at > >>> >> >> compile time? Does your framework require that Kaldi be compiled > >>> >> >> using dynamic (.so) libraries? > >>> >> >> > >>> >> >> Dan > >>> >> >> > >>> >> >> > >>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski > >>> >> >> <jan...@gm...> > >>> >> >> wrote: > >>> >> >>> > >>> >> >>> Hello Dan, > >>> >> >>> > >>> >> >>> thank you for the comments. I tried to make it in the Kaldi > spirit, > >>> >> >>> consistency is important. Of course, the scripts can be removed > and > >>> >> >>> replaced > >>> >> >>> with some more useful examples. I don't have too much > experience with > >>> >> >>> bridging Python to C++, so any critique on the wrappers and the > >>> >> >>> approach > >>> >> >>> taken is welcome. > >>> >> >>> > >>> >> >>> Jan > >>> >> >>> > >>> >> >>> > >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: > >>> >> >>>> > >>> >> >>>> Hi all. > >>> >> >>>> From a first look, it does look very impressive, and nicely > >>> >> >>>> documented. > >>> >> >>>> I would appreciate it if people on the list who have Python > >>> >> >>>> experience > >>> >> >>>> would comment on this- you can either reply to this thread, or > to me. > >>> >> >>>> I don't know if this has been done in the "natural" way, or if > there > >>> >> >>>> is some reason why people in the future will say, "why did you > do it > >>> >> >>>> this way, you should have done XXX". > >>> >> >>>> > >>> >> >>>> Jan: > >>> >> >>>> in the scripts/ directory you seem to have some examples of > how you > >>> >> >>>> can create python programs that behave very much like Kaldi > >>> >> >>>> command-line programs, using your framework. This is very > useful. > >>> >> >>>> However, the programs > >>> >> >>>> apply-global-cmvn.py > >>> >> >>>> compute-global-cmvn-stats.py > >>> >> >>>> are perhaps a little confusing because they provide the same > >>> >> >>>> functionality that you could get with "compute-cmvn-stats -> > >>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; > and they > >>> >> >>>> do so using different formats for the CMVN information. I > know the > >>> >> >>>> format of storing the CMVN stats in a two-row matrix is > perhaps not > >>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it would > be > >>> >> >>>> confusing to deviate from that standard. > >>> >> >>>> Of course, this is a very minor issue that doesn't affect the > >>> >> >>>> validity > >>> >> >>>> of the framework as a whole. I am just pointing this out; the > main > >>> >> >>>> discussion should be about the framework and whether people > feel it's > >>> >> >>>> the "right" way to do this. > >>> >> >>>> > >>> >> >>>> Dan > >>> >> >>>> > >>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski > >>> >> >>>> <jan...@gm...> > >>> >> >>>> wrote: > >>> >> >>>>> > >>> >> >>>>> Hi all! > >>> >> >>>>> > >>> >> >>>>> I've written wrappers to access Kaldi data files from within > Python > >>> >> >>>>> using boost::python (the code is on github > >>> >> >>>>> > https://github.com/janchorowski/kaldi-git/tree/python/src/python). > >>> >> >>>>> If > >>> >> >>>>> you think this would be an interesting addition please > instruct me > >>> >> >>>>> how > >>> >> >>>>> to contribute. > >>> >> >>>>> > >>> >> >>>>> Best Regards, > >>> >> >>>>> Jan Chorowski > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > ------------------------------------------------------------------------------ > >>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT > Server > >>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and > >>> >> >>>>> Dashboards > >>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App > Integration & > >>> >> >>>>> more > >>> >> >>>>> Get technology previously reserved for billion-dollar > corporations, > >>> >> >>>>> FREE > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > >>> >> >>>>> _______________________________________________ > >>> >> >>>>> Kaldi-developers mailing list > >>> >> >>>>> Kal...@li... > >>> >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >>> >> >>> > >>> >> >>> > >>> >> > > >>> > > >>> > > >>> > >>> > ------------------------------------------------------------------------------ > >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > >>> from Actuate! Instantly Supercharge Your Business Reports and > Dashboards > >>> with Interactivity, Sharing, Native Excel Exports, App Integration & > more > >>> Get technology previously reserved for billion-dollar corporations, > FREE > >>> > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > >>> _______________________________________________ > >>> Kaldi-developers mailing list > >>> Kal...@li... > >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >> > >> > >> > ------------------------------------------------------------------------------ > >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards > >> with Interactivity, Sharing, Native Excel Exports, App Integration & > more > >> Get technology previously reserved for billion-dollar corporations, FREE > >> > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > >> _______________________________________________ > >> Kaldi-developers mailing list > >> Kal...@li... > >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >> > > > > > > -- > > Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, > ond...@gm... > > > > > ------------------------------------------------------------------------------ > > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > > with Interactivity, Sharing, Native Excel Exports, App Integration & more > > Get technology previously reserved for billion-dollar corporations, FREE > > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > > _______________________________________________ > > Kaldi-developers mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > |