|
From: Daniel P. <dp...@gm...> - 2015-02-16 19:31:47
|
Sean, can you give us some general idea of how you were using the kaldi_io
package?
Dan
On Mon, Feb 16, 2015 at 10:29 AM, Jan Trmal <af...@ce...> wrote:
> Personally, if we were to include wrappers directly into kaldi, I'd prefer
> SWIG as the wrapper generator. I worked to some extent with ctypes,
> boost::python and swig and all are usable and "just fine" for python.
> The concern I have is however, if this is going to be put into kaldi trunk
> and we want it to be really useful, then someone will have to maintain it,
> take the responsibility for it and make it in sync with the C/C++ code,
> which given the rate of kaldi development will need more than negligible
> commitment. Yes we can argue that "the community will keep it updated" but
> frankly, I didn't see any successful project working without someone
> committing to do even the ugly/boring/maintenance work on regular basis.
> And for that a wider user base might be more stimulating in the sense that
> the maintainer would see the wrappers are used (which would yield more
> feedback/bugreports). Which means (at least to me) that more languages
> should be supported -- at least python, perl, java... From all those
> wrapper generators, only SWIG can do that -- i.e. after writing one
> interface file, it can generate wrappers for those langs (and some other as
> well).
>
> Just my two cents..
> y.
>
>
> On Mon, Feb 16, 2015 at 8:52 AM, Sean True <se...@se...>
> wrote:
>
>> I wanted to bring up the integration of the very useful kaldi_io package
>> that Jan Chorowski made available in December. Is there any consensus on
>> whether to provide this code as (probably an optional) part of the Kaldi
>> release? I understand that Boost-Python is a relatively heavy requirement,
>> but it is easily available on OSX and Linux.
>>
>> I continue to wrap the executables themselves in python functional
>> wrappers, which has made the integration with other software easier, and
>> contributes to pipeline testability and robustness.
>>
>> -- Sean
>>
>> On Fri, Dec 26, 2014 at 11:02 AM, Sean True <se...@se...>
>> wrote:
>>
>>> I wanted to echo Ondrej's comment about preferring Python to bash/perl
>>> for scripting. Python wrappers for the command line utilities are useful
>>> ... I've spent a few hours systematically wrapping them, parsing the output
>>> of the --help command as a guide to functionality.
>>>
>>> This gives wrappers of the general form:
>>>
>>> def acc_lda(transition_gmm_model, features_rspecifier,
>>> posteriors_rspecifier, lda_acc_out, *args, **kwargs):
>>> """Accumulate LDA statistics based on pdf-ids.
>>> Executable usage: acc-lda [options] <transition-gmm/model>
>>> <features-rspecifier> <posteriors-rspecifier> <lda-acc-out>
>>> Options:
>>> binary: Write accumulators in binary mode. (bool,true)
>>> rand_prune: Randomized pruning threshold for posteriors
>>> (float,0)"""
>>> cmd = sh.Command(kaldi_path("src/bin/acc-lda"))
>>> option_defs = {'binary': ('binary', 'bool', 'true'), 'help':
>>> ('help', 'bool', 'false'), 'rand_prune': ('rand-prune', 'float', '0'),
>>> 'config': ('config', 'string', ''), 'print_args': ('print-args', 'bool',
>>> 'true'), 'verbose': ('verbose', 'int', '0')}
>>> myOptions = create_options(option_defs, kwargs)
>>> myArgs = [transition_gmm_model, features_rspecifier,
>>> posteriors_rspecifier, lda_acc_out]+list(args)
>>> return cmd (myOptions + myArgs)
>>>
>>> There are some refinements that could be added (*args does not make
>>> sense for this function).
>>> Because of the rather elegant Python sh package (
>>> https://pypi.python.org/pypi/sh) these functions will create pipelines
>>> if composed:
>>>
>>> >>> from sh import ls, wc
>>>
>>> >>> wc(ls("."))
>>>
>>> 8 23 222
>>>
>>> There are a few places where constructing from help output is not
>>> straightforward (for instance, fstrand --help does
>>> not do the expected thing).
>>>
>>> -- Sean
>>>
>>> On Fri, Dec 19, 2014 at 6:48 AM, Ondrej Platek <ond...@gm...>
>>> wrote:
>>> >
>>> > Hi Matthew,
>>> >
>>> > I made some subjective comments below.
>>> >
>>> > PS: Note that I like the proposed wrappers, but I am not sure how
>>> boost::python is easy to install on all supported platforms.
>>> >
>>> > On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <
>>> mat...@gm...> wrote:
>>> >>
>>> >> Hi
>>> >>
>>> >> Apologies, I've been snowed under here.
>>> >>
>>> >> I haven' had a chance to look over your work. I also don't have any
>>> views on the 'right' way to do it. My thoughts on this are in a previous
>>> thread. See subject "Using SWIG to wrap kaldi for python" where I discussed
>>> this with ondrej platek and
>>> >> Vassil Panayotov.
>>> >>
>>> >> In the idlak branch there is an example of python wrappers that I put
>>> together some time ago. These are based on SWIG. In the end I didn't need
>>> this at this stage because in the build system command line executables
>>> work very well. Its in run time wrappers are very useful. The advantage
>>> with SWIG is that the much of the same work will also contribute to C#,
>>> Java, Perl wrappers as well. In my experience the most important were Java
>>> wrappers to help produce a library for Android. I have no experience with
>>> C# and moved to Python from Perl so only use Perl in legacy code ;-).
>>> >>
>>> >> So some questions to consider:
>>> >>
>>> >> 1. Why is python wrapping required for training. using sys.Process to
>>> run command lines, structured output directories etc mirrors the current
>>> Perl recipes, what is the added benefit in this case?
>>> >
>>> > Well bash and Perl is the current scripting language for Kaldi. For
>>> example I prefer to use Python instead of both of them.
>>> >
>>> >>
>>> >> 2. If its for run time decoding shouldn't we create a cross platfom
>>> C API? Perhaps things have changed but C++ APIs were never cross compiler
>>> compatible in the past so you couldn't do stuff like compile using gnu and
>>> link in MSN. With a C interface you can distribute libraries. But I am
>>> possibly out of date on this.
>>> >
>>> > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST
>>> and I was not able to wrap OpenFST with just plain C (It may be possible).
>>> > I used Cython and pyfst mainly because pyfst solved for me wrapping up
>>> OpenFST and I am really glad that 99% of wrapping OpenFST templates was
>>> carried out by somebody else (Victor Chahuneau).
>>> >>
>>> >>
>>> >> 3. If 2 is correct shouldn't we define our API and wrap that?
>>> Producing a formal list of functionality that should be exposed to things
>>> like client and server applications?
>>> >>
>>> >>
>>> >> I would encourage some care here. Unconstrained wrapping can lead to
>>> systems which HAVE to use the scripting language (We can already see how
>>> difficult it is to move away from the Perl scripting if you wish to). Also
>>> never, never, never reverse wrap (i.e. call python from within C++), yes it
>>> can be done but that way lays madness.
>>> >>
>>> >> v best
>>> >>
>>> >> Matthew
>>> >>
>>> >>
>>> >> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...>
>>> wrote:
>>> >>>
>>> >>> Jan-
>>> >>> I haven't seen any objections to your setup. I'd say we should plan
>>> >>> to include it in Kaldi at some point (e.g. within the next few
>>> >>> months), but in the meantime hopefully you can continue to work on
>>> it,
>>> >>> and maybe come up with some other examples of how it's useful to do
>>> >>> the interfacing with Python- e.g. some kind of application level or
>>> >>> service-level thing?
>>> >>> Dan
>>> >>>
>>> >>>
>>> >>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...>
>>> wrote:
>>> >>> > Hi Jan,
>>> >>> > This is very nice work! In our PDNN toolkit, we also have simple
>>> python
>>> >>> > wrappers to read and write Kaldi features, mainly for DNN
>>> training. Your
>>> >>> > implementation looks like a more comprehensive version.
>>> >>> >
>>> >>> > Do you have the functions/commands to do feature splicing? I ask
>>> this
>>> >>> > because we found doing splicing on the fly with Python highly
>>> expensive.
>>> >>> > That's why we still stick to PFiles instead of Kaldi features
>>> (.scp .ark)
>>> >>> > for DNN triaining. I am very interested to know the efficiency of
>>> your
>>> >>> > splicing implementation.
>>> >>> >
>>> >>> > Thanks,
>>> >>> > Yajie
>>> >>> >
>>> >>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...>
>>> wrote:
>>> >>> >>
>>> >>> >> OK, thanks.
>>> >>> >> cc'ing Yajie in case he wants to comment.
>>> >>> >> Dan
>>> >>> >>
>>> >>> >>
>>> >>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski <
>>> jan...@gm...>
>>> >>> >> wrote:
>>> >>> >> > Hi All,
>>> >>> >> >
>>> >>> >> > the wrapper is built during Kaldi compilation. I build it using
>>> provided
>>> >>> >> > Makefile. The build depends on:
>>> >>> >> > 1. Python and numpy (by default it queries the python
>>> interpreter found
>>> >>> >> > on
>>> >>> >> > the path for header file location)
>>> >>> >> > 2. Boost with Boost::Python library. It is quite heavy to
>>> build, but
>>> >>> >> > most
>>> >>> >> > Linux distributions ship it. Boost python doesn't require any
>>> code
>>> >>> >> > generation steps, the wrapper is defined in a normal c++ code
>>> file.
>>> >>> >> >
>>> >>> >> > During build Python and Boost libraries and Kaldi object files
>>> are
>>> >>> >> > linked
>>> >>> >> > into a CPython extention module,
>>> kaldi/src/python/kaldi_io_internal.so.
>>> >>> >> > It
>>> >>> >> > works with both static and shared Kaldi builds. Further usage
>>> requires
>>> >>> >> > that
>>> >>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the
>>> PYTHONPATH - it
>>> >>> >> > can
>>> >>> >> > be for example added to the PYTHONPATH variable in the path.sh
>>> script of
>>> >>> >> > a
>>> >>> >> > recipe.
>>> >>> >> >
>>> >>> >> > Jan
>>> >>> >> >
>>> >>> >> >
>>> >>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote:
>>> >>> >> >>
>>> >>> >> >> Also, Jan- could you send us an email explaining how this
>>> works-
>>> >>> >> >> How does Python "see" the C++ headers? Do you have to
>>> invoke some
>>> >>> >> >> special program, like swig? Do you have to write some special
>>> kind of
>>> >>> >> >> header that shows how the C++ objects are to be interpreted by
>>> python?
>>> >>> >> >> A brief example would be helpful, if so.
>>> >>> >> >> How is the resulting program linked, if at all? If you
>>> require
>>> >>> >> >> functions C++ libraries, are these obtained from the .a or .so
>>> files
>>> >>> >> >> at runtime, or compiled into some kind of executable-like blob
>>> at
>>> >>> >> >> compile time? Does your framework require that Kaldi be
>>> compiled
>>> >>> >> >> using dynamic (.so) libraries?
>>> >>> >> >>
>>> >>> >> >> Dan
>>> >>> >> >>
>>> >>> >> >>
>>> >>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski
>>> >>> >> >> <jan...@gm...>
>>> >>> >> >> wrote:
>>> >>> >> >>>
>>> >>> >> >>> Hello Dan,
>>> >>> >> >>>
>>> >>> >> >>> thank you for the comments. I tried to make it in the Kaldi
>>> spirit,
>>> >>> >> >>> consistency is important. Of course, the scripts can be
>>> removed and
>>> >>> >> >>> replaced
>>> >>> >> >>> with some more useful examples. I don't have too much
>>> experience with
>>> >>> >> >>> bridging Python to C++, so any critique on the wrappers and
>>> the
>>> >>> >> >>> approach
>>> >>> >> >>> taken is welcome.
>>> >>> >> >>>
>>> >>> >> >>> Jan
>>> >>> >> >>>
>>> >>> >> >>>
>>> >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote:
>>> >>> >> >>>>
>>> >>> >> >>>> Hi all.
>>> >>> >> >>>> From a first look, it does look very impressive, and nicely
>>> >>> >> >>>> documented.
>>> >>> >> >>>> I would appreciate it if people on the list who have Python
>>> >>> >> >>>> experience
>>> >>> >> >>>> would comment on this- you can either reply to this thread,
>>> or to me.
>>> >>> >> >>>> I don't know if this has been done in the "natural" way, or
>>> if there
>>> >>> >> >>>> is some reason why people in the future will say, "why did
>>> you do it
>>> >>> >> >>>> this way, you should have done XXX".
>>> >>> >> >>>>
>>> >>> >> >>>> Jan:
>>> >>> >> >>>> in the scripts/ directory you seem to have some examples of
>>> how you
>>> >>> >> >>>> can create python programs that behave very much like Kaldi
>>> >>> >> >>>> command-line programs, using your framework. This is very
>>> useful.
>>> >>> >> >>>> However, the programs
>>> >>> >> >>>> apply-global-cmvn.py
>>> >>> >> >>>> compute-global-cmvn-stats.py
>>> >>> >> >>>> are perhaps a little confusing because they provide the same
>>> >>> >> >>>> functionality that you could get with "compute-cmvn-stats ->
>>> >>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command;
>>> and they
>>> >>> >> >>>> do so using different formats for the CMVN information. I
>>> know the
>>> >>> >> >>>> format of storing the CMVN stats in a two-row matrix is
>>> perhaps not
>>> >>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it
>>> would be
>>> >>> >> >>>> confusing to deviate from that standard.
>>> >>> >> >>>> Of course, this is a very minor issue that doesn't affect the
>>> >>> >> >>>> validity
>>> >>> >> >>>> of the framework as a whole. I am just pointing this out;
>>> the main
>>> >>> >> >>>> discussion should be about the framework and whether people
>>> feel it's
>>> >>> >> >>>> the "right" way to do this.
>>> >>> >> >>>>
>>> >>> >> >>>> Dan
>>> >>> >> >>>>
>>> >>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski
>>> >>> >> >>>> <jan...@gm...>
>>> >>> >> >>>> wrote:
>>> >>> >> >>>>>
>>> >>> >> >>>>> Hi all!
>>> >>> >> >>>>>
>>> >>> >> >>>>> I've written wrappers to access Kaldi data files from
>>> within Python
>>> >>> >> >>>>> using boost::python (the code is on github
>>> >>> >> >>>>>
>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python).
>>> >>> >> >>>>> If
>>> >>> >> >>>>> you think this would be an interesting addition please
>>> instruct me
>>> >>> >> >>>>> how
>>> >>> >> >>>>> to contribute.
>>> >>> >> >>>>>
>>> >>> >> >>>>> Best Regards,
>>> >>> >> >>>>> Jan Chorowski
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> ------------------------------------------------------------------------------
>>> >>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT
>>> Server
>>> >>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports
>>> and
>>> >>> >> >>>>> Dashboards
>>> >>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App
>>> Integration &
>>> >>> >> >>>>> more
>>> >>> >> >>>>> Get technology previously reserved for billion-dollar
>>> corporations,
>>> >>> >> >>>>> FREE
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> >>> >> >>>>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>> >>> >> >>>>> _______________________________________________
>>> >>> >> >>>>> Kaldi-developers mailing list
>>> >>> >> >>>>> Kal...@li...
>>> >>> >> >>>>>
>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>>> >>> >> >>>
>>> >>> >> >>>
>>> >>> >> >
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> ------------------------------------------------------------------------------
>>> >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>> >>> from Actuate! Instantly Supercharge Your Business Reports and
>>> Dashboards
>>> >>> with Interactivity, Sharing, Native Excel Exports, App Integration &
>>> more
>>> >>> Get technology previously reserved for billion-dollar corporations,
>>> FREE
>>> >>>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>> >>> _______________________________________________
>>> >>> Kaldi-developers mailing list
>>> >>> Kal...@li...
>>> >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>>> >>
>>> >>
>>> >>
>>> ------------------------------------------------------------------------------
>>> >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>> >> from Actuate! Instantly Supercharge Your Business Reports and
>>> Dashboards
>>> >> with Interactivity, Sharing, Native Excel Exports, App Integration &
>>> more
>>> >> Get technology previously reserved for billion-dollar corporations,
>>> FREE
>>> >>
>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>> >> _______________________________________________
>>> >> Kaldi-developers mailing list
>>> >> Kal...@li...
>>> >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>>> >>
>>> >
>>> >
>>> > --
>>> > Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
>>> ond...@gm...
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>>> > from Actuate! Instantly Supercharge Your Business Reports and
>>> Dashboards
>>> > with Interactivity, Sharing, Native Excel Exports, App Integration &
>>> more
>>> > Get technology previously reserved for billion-dollar corporations,
>>> FREE
>>> >
>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
>>> > _______________________________________________
>>> > Kaldi-developers mailing list
>>> > Kal...@li...
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>>> >
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Kaldi-developers mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>>
>>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> _______________________________________________
> Kaldi-developers mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>
>
|