|
From: Daniel P. <dp...@gm...> - 2014-12-21 20:36:27
|
Jan, perhaps when you have time you could respond to the above comments on the list? I'm not 100% sure what to do about this. BTW, if we do include this, it will likely be optionally compiled, because I don't want the generic Kaldi compilation to be dependent on boost. Dan On Fri, Dec 19, 2014 at 3:48 AM, Ondrej Platek <ond...@gm...> wrote: > Hi Matthew, > > I made some subjective comments below. > > PS: Note that I like the proposed wrappers, but I am not sure how > boost::python is easy to install on all supported platforms. > > On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <mat...@gm...> > wrote: >> >> Hi >> >> Apologies, I've been snowed under here. >> >> I haven' had a chance to look over your work. I also don't have any views >> on the 'right' way to do it. My thoughts on this are in a previous thread. >> See subject "Using SWIG to wrap kaldi for python" where I discussed this >> with ondrej platek and >> Vassil Panayotov. >> >> In the idlak branch there is an example of python wrappers that I put >> together some time ago. These are based on SWIG. In the end I didn't need >> this at this stage because in the build system command line executables work >> very well. Its in run time wrappers are very useful. The advantage with SWIG >> is that the much of the same work will also contribute to C#, Java, Perl >> wrappers as well. In my experience the most important were Java wrappers to >> help produce a library for Android. I have no experience with C# and moved >> to Python from Perl so only use Perl in legacy code ;-). >> >> So some questions to consider: >> >> 1. Why is python wrapping required for training. using sys.Process to run >> command lines, structured output directories etc mirrors the current Perl >> recipes, what is the added benefit in this case? > > Well bash and Perl is the current scripting language for Kaldi. For example > I prefer to use Python instead of both of them. > >> >> 2. If its for run time decoding shouldn't we create a cross platfom C >> API? Perhaps things have changed but C++ APIs were never cross compiler >> compatible in the past so you couldn't do stuff like compile using gnu and >> link in MSN. With a C interface you can distribute libraries. But I am >> possibly out of date on this. > > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and I > was not able to wrap OpenFST with just plain C (It may be possible). > I used Cython and pyfst mainly because pyfst solved for me wrapping up > OpenFST and I am really glad that 99% of wrapping OpenFST templates was > carried out by somebody else (Victor Chahuneau). >> >> >> 3. If 2 is correct shouldn't we define our API and wrap that? Producing a >> formal list of functionality that should be exposed to things like client >> and server applications? >> >> >> I would encourage some care here. Unconstrained wrapping can lead to >> systems which HAVE to use the scripting language (We can already see how >> difficult it is to move away from the Perl scripting if you wish to). Also >> never, never, never reverse wrap (i.e. call python from within C++), yes it >> can be done but that way lays madness. >> >> v best >> >> Matthew >> >> >> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> wrote: >>> >>> Jan- >>> I haven't seen any objections to your setup. I'd say we should plan >>> to include it in Kaldi at some point (e.g. within the next few >>> months), but in the meantime hopefully you can continue to work on it, >>> and maybe come up with some other examples of how it's useful to do >>> the interfacing with Python- e.g. some kind of application level or >>> service-level thing? >>> Dan >>> >>> >>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> wrote: >>> > Hi Jan, >>> > This is very nice work! In our PDNN toolkit, we also have simple python >>> > wrappers to read and write Kaldi features, mainly for DNN training. >>> > Your >>> > implementation looks like a more comprehensive version. >>> > >>> > Do you have the functions/commands to do feature splicing? I ask this >>> > because we found doing splicing on the fly with Python highly >>> > expensive. >>> > That's why we still stick to PFiles instead of Kaldi features (.scp >>> > .ark) >>> > for DNN triaining. I am very interested to know the efficiency of your >>> > splicing implementation. >>> > >>> > Thanks, >>> > Yajie >>> > >>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> wrote: >>> >> >>> >> OK, thanks. >>> >> cc'ing Yajie in case he wants to comment. >>> >> Dan >>> >> >>> >> >>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski >>> >> <jan...@gm...> >>> >> wrote: >>> >> > Hi All, >>> >> > >>> >> > the wrapper is built during Kaldi compilation. I build it using >>> >> > provided >>> >> > Makefile. The build depends on: >>> >> > 1. Python and numpy (by default it queries the python interpreter >>> >> > found >>> >> > on >>> >> > the path for header file location) >>> >> > 2. Boost with Boost::Python library. It is quite heavy to build, but >>> >> > most >>> >> > Linux distributions ship it. Boost python doesn't require any code >>> >> > generation steps, the wrapper is defined in a normal c++ code file. >>> >> > >>> >> > During build Python and Boost libraries and Kaldi object files are >>> >> > linked >>> >> > into a CPython extention module, >>> >> > kaldi/src/python/kaldi_io_internal.so. >>> >> > It >>> >> > works with both static and shared Kaldi builds. Further usage >>> >> > requires >>> >> > that >>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH >>> >> > - it >>> >> > can >>> >> > be for example added to the PYTHONPATH variable in the path.sh >>> >> > script of >>> >> > a >>> >> > recipe. >>> >> > >>> >> > Jan >>> >> > >>> >> > >>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: >>> >> >> >>> >> >> Also, Jan- could you send us an email explaining how this works- >>> >> >> How does Python "see" the C++ headers? Do you have to invoke >>> >> >> some >>> >> >> special program, like swig? Do you have to write some special kind >>> >> >> of >>> >> >> header that shows how the C++ objects are to be interpreted by >>> >> >> python? >>> >> >> A brief example would be helpful, if so. >>> >> >> How is the resulting program linked, if at all? If you require >>> >> >> functions C++ libraries, are these obtained from the .a or .so >>> >> >> files >>> >> >> at runtime, or compiled into some kind of executable-like blob at >>> >> >> compile time? Does your framework require that Kaldi be compiled >>> >> >> using dynamic (.so) libraries? >>> >> >> >>> >> >> Dan >>> >> >> >>> >> >> >>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski >>> >> >> <jan...@gm...> >>> >> >> wrote: >>> >> >>> >>> >> >>> Hello Dan, >>> >> >>> >>> >> >>> thank you for the comments. I tried to make it in the Kaldi >>> >> >>> spirit, >>> >> >>> consistency is important. Of course, the scripts can be removed >>> >> >>> and >>> >> >>> replaced >>> >> >>> with some more useful examples. I don't have too much experience >>> >> >>> with >>> >> >>> bridging Python to C++, so any critique on the wrappers and the >>> >> >>> approach >>> >> >>> taken is welcome. >>> >> >>> >>> >> >>> Jan >>> >> >>> >>> >> >>> >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >>> >> >>>> >>> >> >>>> Hi all. >>> >> >>>> From a first look, it does look very impressive, and nicely >>> >> >>>> documented. >>> >> >>>> I would appreciate it if people on the list who have Python >>> >> >>>> experience >>> >> >>>> would comment on this- you can either reply to this thread, or to >>> >> >>>> me. >>> >> >>>> I don't know if this has been done in the "natural" way, or if >>> >> >>>> there >>> >> >>>> is some reason why people in the future will say, "why did you do >>> >> >>>> it >>> >> >>>> this way, you should have done XXX". >>> >> >>>> >>> >> >>>> Jan: >>> >> >>>> in the scripts/ directory you seem to have some examples of how >>> >> >>>> you >>> >> >>>> can create python programs that behave very much like Kaldi >>> >> >>>> command-line programs, using your framework. This is very >>> >> >>>> useful. >>> >> >>>> However, the programs >>> >> >>>> apply-global-cmvn.py >>> >> >>>> compute-global-cmvn-stats.py >>> >> >>>> are perhaps a little confusing because they provide the same >>> >> >>>> functionality that you could get with "compute-cmvn-stats -> >>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; and >>> >> >>>> they >>> >> >>>> do so using different formats for the CMVN information. I know >>> >> >>>> the >>> >> >>>> format of storing the CMVN stats in a two-row matrix is perhaps >>> >> >>>> not >>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it would be >>> >> >>>> confusing to deviate from that standard. >>> >> >>>> Of course, this is a very minor issue that doesn't affect the >>> >> >>>> validity >>> >> >>>> of the framework as a whole. I am just pointing this out; the >>> >> >>>> main >>> >> >>>> discussion should be about the framework and whether people feel >>> >> >>>> it's >>> >> >>>> the "right" way to do this. >>> >> >>>> >>> >> >>>> Dan >>> >> >>>> >>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski >>> >> >>>> <jan...@gm...> >>> >> >>>> wrote: >>> >> >>>>> >>> >> >>>>> Hi all! >>> >> >>>>> >>> >> >>>>> I've written wrappers to access Kaldi data files from within >>> >> >>>>> Python >>> >> >>>>> using boost::python (the code is on github >>> >> >>>>> >>> >> >>>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). >>> >> >>>>> If >>> >> >>>>> you think this would be an interesting addition please instruct >>> >> >>>>> me >>> >> >>>>> how >>> >> >>>>> to contribute. >>> >> >>>>> >>> >> >>>>> Best Regards, >>> >> >>>>> Jan Chorowski >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> ------------------------------------------------------------------------------ >>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT >>> >> >>>>> Server >>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and >>> >> >>>>> Dashboards >>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App >>> >> >>>>> Integration & >>> >> >>>>> more >>> >> >>>>> Get technology previously reserved for billion-dollar >>> >> >>>>> corporations, >>> >> >>>>> FREE >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> >> >>>>> _______________________________________________ >>> >> >>>>> Kaldi-developers mailing list >>> >> >>>>> Kal...@li... >>> >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> >> >>> >>> >> >>> >>> >> > >>> > >>> > >>> >>> >>> ------------------------------------------------------------------------------ >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >>> with Interactivity, Sharing, Native Excel Exports, App Integration & more >>> Get technology previously reserved for billion-dollar corporations, FREE >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-developers mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> > > > -- > Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, ond...@gm... |