From: Jan C. <jan...@gm...> - 2014-12-13 14:28:28
|
Hi all! I've written wrappers to access Kaldi data files from within Python using boost::python (the code is on github https://github.com/janchorowski/kaldi-git/tree/python/src/python). If you think this would be an interesting addition please instruct me how to contribute. Best Regards, Jan Chorowski |
From: Josef N. <jos...@gm...> - 2014-12-13 14:51:02
|
wow, really cool! On Sat, Dec 13, 2014 at 3:28 PM, Jan Chorowski <jan...@gm...> wrote: > > Hi all! > > I've written wrappers to access Kaldi data files from within Python > using boost::python (the code is on github > https://github.com/janchorowski/kaldi-git/tree/python/src/python). If > you think this would be an interesting addition please instruct me how > to contribute. > > Best Regards, > Jan Chorowski > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Daniel P. <dp...@gm...> - 2014-12-13 19:55:21
|
Hi all. >From a first look, it does look very impressive, and nicely documented. I would appreciate it if people on the list who have Python experience would comment on this- you can either reply to this thread, or to me. I don't know if this has been done in the "natural" way, or if there is some reason why people in the future will say, "why did you do it this way, you should have done XXX". Jan: in the scripts/ directory you seem to have some examples of how you can create python programs that behave very much like Kaldi command-line programs, using your framework. This is very useful. However, the programs apply-global-cmvn.py compute-global-cmvn-stats.py are perhaps a little confusing because they provide the same functionality that you could get with "compute-cmvn-stats -> matrix-sum" and "apply-cmvn" on the output of that command; and they do so using different formats for the CMVN information. I know the format of storing the CMVN stats in a two-row matrix is perhaps not perfectly ideal, but it's a standard within Kaldi and it would be confusing to deviate from that standard. Of course, this is a very minor issue that doesn't affect the validity of the framework as a whole. I am just pointing this out; the main discussion should be about the framework and whether people feel it's the "right" way to do this. Dan On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski <jan...@gm...> wrote: > Hi all! > > I've written wrappers to access Kaldi data files from within Python > using boost::python (the code is on github > https://github.com/janchorowski/kaldi-git/tree/python/src/python). If > you think this would be an interesting addition please instruct me how > to contribute. > > Best Regards, > Jan Chorowski > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Jan C. <jan...@gm...> - 2014-12-13 20:04:56
|
Hello Dan, thank you for the comments. I tried to make it in the Kaldi spirit, consistency is important. Of course, the scripts can be removed and replaced with some more useful examples. I don't have too much experience with bridging Python to C++, so any critique on the wrappers and the approach taken is welcome. Jan On 12/13/2014 2:55 PM, Daniel Povey wrote: > Hi all. > From a first look, it does look very impressive, and nicely documented. > I would appreciate it if people on the list who have Python experience > would comment on this- you can either reply to this thread, or to me. > I don't know if this has been done in the "natural" way, or if there > is some reason why people in the future will say, "why did you do it > this way, you should have done XXX". > > Jan: > in the scripts/ directory you seem to have some examples of how you > can create python programs that behave very much like Kaldi > command-line programs, using your framework. This is very useful. > However, the programs > apply-global-cmvn.py > compute-global-cmvn-stats.py > are perhaps a little confusing because they provide the same > functionality that you could get with "compute-cmvn-stats -> > matrix-sum" and "apply-cmvn" on the output of that command; and they > do so using different formats for the CMVN information. I know the > format of storing the CMVN stats in a two-row matrix is perhaps not > perfectly ideal, but it's a standard within Kaldi and it would be > confusing to deviate from that standard. > Of course, this is a very minor issue that doesn't affect the validity > of the framework as a whole. I am just pointing this out; the main > discussion should be about the framework and whether people feel it's > the "right" way to do this. > > Dan > > On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski <jan...@gm...> wrote: >> Hi all! >> >> I've written wrappers to access Kaldi data files from within Python >> using boost::python (the code is on github >> https://github.com/janchorowski/kaldi-git/tree/python/src/python). If >> you think this would be an interesting addition please instruct me how >> to contribute. >> >> Best Regards, >> Jan Chorowski >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers |
From: Daniel P. <dp...@gm...> - 2014-12-13 20:33:39
|
Also, Jan- could you send us an email explaining how this works- How does Python "see" the C++ headers? Do you have to invoke some special program, like swig? Do you have to write some special kind of header that shows how the C++ objects are to be interpreted by python? A brief example would be helpful, if so. How is the resulting program linked, if at all? If you require functions C++ libraries, are these obtained from the .a or .so files at runtime, or compiled into some kind of executable-like blob at compile time? Does your framework require that Kaldi be compiled using dynamic (.so) libraries? Dan On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski <jan...@gm...> wrote: > Hello Dan, > > thank you for the comments. I tried to make it in the Kaldi spirit, > consistency is important. Of course, the scripts can be removed and replaced > with some more useful examples. I don't have too much experience with > bridging Python to C++, so any critique on the wrappers and the approach > taken is welcome. > > Jan > > > On 12/13/2014 2:55 PM, Daniel Povey wrote: >> >> Hi all. >> From a first look, it does look very impressive, and nicely documented. >> I would appreciate it if people on the list who have Python experience >> would comment on this- you can either reply to this thread, or to me. >> I don't know if this has been done in the "natural" way, or if there >> is some reason why people in the future will say, "why did you do it >> this way, you should have done XXX". >> >> Jan: >> in the scripts/ directory you seem to have some examples of how you >> can create python programs that behave very much like Kaldi >> command-line programs, using your framework. This is very useful. >> However, the programs >> apply-global-cmvn.py >> compute-global-cmvn-stats.py >> are perhaps a little confusing because they provide the same >> functionality that you could get with "compute-cmvn-stats -> >> matrix-sum" and "apply-cmvn" on the output of that command; and they >> do so using different formats for the CMVN information. I know the >> format of storing the CMVN stats in a two-row matrix is perhaps not >> perfectly ideal, but it's a standard within Kaldi and it would be >> confusing to deviate from that standard. >> Of course, this is a very minor issue that doesn't affect the validity >> of the framework as a whole. I am just pointing this out; the main >> discussion should be about the framework and whether people feel it's >> the "right" way to do this. >> >> Dan >> >> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski <jan...@gm...> >> wrote: >>> >>> Hi all! >>> >>> I've written wrappers to access Kaldi data files from within Python >>> using boost::python (the code is on github >>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). If >>> you think this would be an interesting addition please instruct me how >>> to contribute. >>> >>> Best Regards, >>> Jan Chorowski >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >>> with Interactivity, Sharing, Native Excel Exports, App Integration & more >>> Get technology previously reserved for billion-dollar corporations, FREE >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-developers mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Jan C. <jan...@gm...> - 2014-12-13 22:31:58
|
Hi All, the wrapper is built during Kaldi compilation. I build it using provided Makefile. The build depends on: 1. Python and numpy (by default it queries the python interpreter found on the path for header file location) 2. Boost with Boost::Python library. It is quite heavy to build, but most Linux distributions ship it. Boost python doesn't require any code generation steps, the wrapper is defined in a normal c++ code file. During build Python and Boost libraries and Kaldi object files are linked into a CPython extention module, kaldi/src/python/kaldi_io_internal.so. It works with both static and shared Kaldi builds. Further usage requires that python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH - it can be for example added to the PYTHONPATH variable in the path.sh script of a recipe. Jan On 12/13/2014 3:33 PM, Daniel Povey wrote: > Also, Jan- could you send us an email explaining how this works- > How does Python "see" the C++ headers? Do you have to invoke some > special program, like swig? Do you have to write some special kind of > header that shows how the C++ objects are to be interpreted by python? > A brief example would be helpful, if so. > How is the resulting program linked, if at all? If you require > functions C++ libraries, are these obtained from the .a or .so files > at runtime, or compiled into some kind of executable-like blob at > compile time? Does your framework require that Kaldi be compiled > using dynamic (.so) libraries? > > Dan > > > On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski <jan...@gm...> wrote: >> Hello Dan, >> >> thank you for the comments. I tried to make it in the Kaldi spirit, >> consistency is important. Of course, the scripts can be removed and replaced >> with some more useful examples. I don't have too much experience with >> bridging Python to C++, so any critique on the wrappers and the approach >> taken is welcome. >> >> Jan >> >> >> On 12/13/2014 2:55 PM, Daniel Povey wrote: >>> Hi all. >>> From a first look, it does look very impressive, and nicely documented. >>> I would appreciate it if people on the list who have Python experience >>> would comment on this- you can either reply to this thread, or to me. >>> I don't know if this has been done in the "natural" way, or if there >>> is some reason why people in the future will say, "why did you do it >>> this way, you should have done XXX". >>> >>> Jan: >>> in the scripts/ directory you seem to have some examples of how you >>> can create python programs that behave very much like Kaldi >>> command-line programs, using your framework. This is very useful. >>> However, the programs >>> apply-global-cmvn.py >>> compute-global-cmvn-stats.py >>> are perhaps a little confusing because they provide the same >>> functionality that you could get with "compute-cmvn-stats -> >>> matrix-sum" and "apply-cmvn" on the output of that command; and they >>> do so using different formats for the CMVN information. I know the >>> format of storing the CMVN stats in a two-row matrix is perhaps not >>> perfectly ideal, but it's a standard within Kaldi and it would be >>> confusing to deviate from that standard. >>> Of course, this is a very minor issue that doesn't affect the validity >>> of the framework as a whole. I am just pointing this out; the main >>> discussion should be about the framework and whether people feel it's >>> the "right" way to do this. >>> >>> Dan >>> >>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski <jan...@gm...> >>> wrote: >>>> Hi all! >>>> >>>> I've written wrappers to access Kaldi data files from within Python >>>> using boost::python (the code is on github >>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). If >>>> you think this would be an interesting addition please instruct me how >>>> to contribute. >>>> >>>> Best Regards, >>>> Jan Chorowski >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >>>> with Interactivity, Sharing, Native Excel Exports, App Integration & more >>>> Get technology previously reserved for billion-dollar corporations, FREE >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Kaldi-developers mailing list >>>> Kal...@li... >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> |
From: Daniel P. <dp...@gm...> - 2014-12-13 23:00:05
|
OK, thanks. cc'ing Yajie in case he wants to comment. Dan On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski <jan...@gm...> wrote: > Hi All, > > the wrapper is built during Kaldi compilation. I build it using provided > Makefile. The build depends on: > 1. Python and numpy (by default it queries the python interpreter found on > the path for header file location) > 2. Boost with Boost::Python library. It is quite heavy to build, but most > Linux distributions ship it. Boost python doesn't require any code > generation steps, the wrapper is defined in a normal c++ code file. > > During build Python and Boost libraries and Kaldi object files are linked > into a CPython extention module, kaldi/src/python/kaldi_io_internal.so. It > works with both static and shared Kaldi builds. Further usage requires that > python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH - it can > be for example added to the PYTHONPATH variable in the path.sh script of a > recipe. > > Jan > > > On 12/13/2014 3:33 PM, Daniel Povey wrote: >> >> Also, Jan- could you send us an email explaining how this works- >> How does Python "see" the C++ headers? Do you have to invoke some >> special program, like swig? Do you have to write some special kind of >> header that shows how the C++ objects are to be interpreted by python? >> A brief example would be helpful, if so. >> How is the resulting program linked, if at all? If you require >> functions C++ libraries, are these obtained from the .a or .so files >> at runtime, or compiled into some kind of executable-like blob at >> compile time? Does your framework require that Kaldi be compiled >> using dynamic (.so) libraries? >> >> Dan >> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski <jan...@gm...> >> wrote: >>> >>> Hello Dan, >>> >>> thank you for the comments. I tried to make it in the Kaldi spirit, >>> consistency is important. Of course, the scripts can be removed and >>> replaced >>> with some more useful examples. I don't have too much experience with >>> bridging Python to C++, so any critique on the wrappers and the approach >>> taken is welcome. >>> >>> Jan >>> >>> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >>>> >>>> Hi all. >>>> From a first look, it does look very impressive, and nicely >>>> documented. >>>> I would appreciate it if people on the list who have Python experience >>>> would comment on this- you can either reply to this thread, or to me. >>>> I don't know if this has been done in the "natural" way, or if there >>>> is some reason why people in the future will say, "why did you do it >>>> this way, you should have done XXX". >>>> >>>> Jan: >>>> in the scripts/ directory you seem to have some examples of how you >>>> can create python programs that behave very much like Kaldi >>>> command-line programs, using your framework. This is very useful. >>>> However, the programs >>>> apply-global-cmvn.py >>>> compute-global-cmvn-stats.py >>>> are perhaps a little confusing because they provide the same >>>> functionality that you could get with "compute-cmvn-stats -> >>>> matrix-sum" and "apply-cmvn" on the output of that command; and they >>>> do so using different formats for the CMVN information. I know the >>>> format of storing the CMVN stats in a two-row matrix is perhaps not >>>> perfectly ideal, but it's a standard within Kaldi and it would be >>>> confusing to deviate from that standard. >>>> Of course, this is a very minor issue that doesn't affect the validity >>>> of the framework as a whole. I am just pointing this out; the main >>>> discussion should be about the framework and whether people feel it's >>>> the "right" way to do this. >>>> >>>> Dan >>>> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski <jan...@gm...> >>>> wrote: >>>>> >>>>> Hi all! >>>>> >>>>> I've written wrappers to access Kaldi data files from within Python >>>>> using boost::python (the code is on github >>>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). If >>>>> you think this would be an interesting addition please instruct me how >>>>> to contribute. >>>>> >>>>> Best Regards, >>>>> Jan Chorowski >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>>>> from Actuate! Instantly Supercharge Your Business Reports and >>>>> Dashboards >>>>> with Interactivity, Sharing, Native Excel Exports, App Integration & >>>>> more >>>>> Get technology previously reserved for billion-dollar corporations, >>>>> FREE >>>>> >>>>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>>>> _______________________________________________ >>>>> Kaldi-developers mailing list >>>>> Kal...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> >>> > |
From: Yajie M. <yaj...@gm...> - 2014-12-14 00:01:21
|
Hi Jan, This is very nice work! In our PDNN toolkit, we also have simple python wrappers to read and write Kaldi features, mainly for DNN training. Your implementation looks like a more comprehensive version. Do you have the functions/commands to do feature splicing? I ask this because we found doing splicing on the fly with Python highly expensive. That's why we still stick to PFiles instead of Kaldi features (.scp .ark) for DNN triaining. I am very interested to know the efficiency of your splicing implementation. Thanks, Yajie On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> wrote: > OK, thanks. > cc'ing Yajie in case he wants to comment. > Dan > > > On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski <jan...@gm...> > wrote: > > Hi All, > > > > the wrapper is built during Kaldi compilation. I build it using provided > > Makefile. The build depends on: > > 1. Python and numpy (by default it queries the python interpreter found > on > > the path for header file location) > > 2. Boost with Boost::Python library. It is quite heavy to build, but most > > Linux distributions ship it. Boost python doesn't require any code > > generation steps, the wrapper is defined in a normal c++ code file. > > > > During build Python and Boost libraries and Kaldi object files are linked > > into a CPython extention module, kaldi/src/python/kaldi_io_internal.so. > It > > works with both static and shared Kaldi builds. Further usage requires > that > > python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH - it > can > > be for example added to the PYTHONPATH variable in the path.sh script of > a > > recipe. > > > > Jan > > > > > > On 12/13/2014 3:33 PM, Daniel Povey wrote: > >> > >> Also, Jan- could you send us an email explaining how this works- > >> How does Python "see" the C++ headers? Do you have to invoke some > >> special program, like swig? Do you have to write some special kind of > >> header that shows how the C++ objects are to be interpreted by python? > >> A brief example would be helpful, if so. > >> How is the resulting program linked, if at all? If you require > >> functions C++ libraries, are these obtained from the .a or .so files > >> at runtime, or compiled into some kind of executable-like blob at > >> compile time? Does your framework require that Kaldi be compiled > >> using dynamic (.so) libraries? > >> > >> Dan > >> > >> > >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski < > jan...@gm...> > >> wrote: > >>> > >>> Hello Dan, > >>> > >>> thank you for the comments. I tried to make it in the Kaldi spirit, > >>> consistency is important. Of course, the scripts can be removed and > >>> replaced > >>> with some more useful examples. I don't have too much experience with > >>> bridging Python to C++, so any critique on the wrappers and the > approach > >>> taken is welcome. > >>> > >>> Jan > >>> > >>> > >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: > >>>> > >>>> Hi all. > >>>> From a first look, it does look very impressive, and nicely > >>>> documented. > >>>> I would appreciate it if people on the list who have Python experience > >>>> would comment on this- you can either reply to this thread, or to me. > >>>> I don't know if this has been done in the "natural" way, or if there > >>>> is some reason why people in the future will say, "why did you do it > >>>> this way, you should have done XXX". > >>>> > >>>> Jan: > >>>> in the scripts/ directory you seem to have some examples of how you > >>>> can create python programs that behave very much like Kaldi > >>>> command-line programs, using your framework. This is very useful. > >>>> However, the programs > >>>> apply-global-cmvn.py > >>>> compute-global-cmvn-stats.py > >>>> are perhaps a little confusing because they provide the same > >>>> functionality that you could get with "compute-cmvn-stats -> > >>>> matrix-sum" and "apply-cmvn" on the output of that command; and they > >>>> do so using different formats for the CMVN information. I know the > >>>> format of storing the CMVN stats in a two-row matrix is perhaps not > >>>> perfectly ideal, but it's a standard within Kaldi and it would be > >>>> confusing to deviate from that standard. > >>>> Of course, this is a very minor issue that doesn't affect the validity > >>>> of the framework as a whole. I am just pointing this out; the main > >>>> discussion should be about the framework and whether people feel it's > >>>> the "right" way to do this. > >>>> > >>>> Dan > >>>> > >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski < > jan...@gm...> > >>>> wrote: > >>>>> > >>>>> Hi all! > >>>>> > >>>>> I've written wrappers to access Kaldi data files from within Python > >>>>> using boost::python (the code is on github > >>>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). > If > >>>>> you think this would be an interesting addition please instruct me > how > >>>>> to contribute. > >>>>> > >>>>> Best Regards, > >>>>> Jan Chorowski > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > ------------------------------------------------------------------------------ > >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > >>>>> from Actuate! Instantly Supercharge Your Business Reports and > >>>>> Dashboards > >>>>> with Interactivity, Sharing, Native Excel Exports, App Integration & > >>>>> more > >>>>> Get technology previously reserved for billion-dollar corporations, > >>>>> FREE > >>>>> > >>>>> > >>>>> > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > >>>>> _______________________________________________ > >>>>> Kaldi-developers mailing list > >>>>> Kal...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >>> > >>> > > > |
From: Daniel P. <dp...@gm...> - 2014-12-18 23:37:10
|
Jan- I haven't seen any objections to your setup. I'd say we should plan to include it in Kaldi at some point (e.g. within the next few months), but in the meantime hopefully you can continue to work on it, and maybe come up with some other examples of how it's useful to do the interfacing with Python- e.g. some kind of application level or service-level thing? Dan On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> wrote: > Hi Jan, > This is very nice work! In our PDNN toolkit, we also have simple python > wrappers to read and write Kaldi features, mainly for DNN training. Your > implementation looks like a more comprehensive version. > > Do you have the functions/commands to do feature splicing? I ask this > because we found doing splicing on the fly with Python highly expensive. > That's why we still stick to PFiles instead of Kaldi features (.scp .ark) > for DNN triaining. I am very interested to know the efficiency of your > splicing implementation. > > Thanks, > Yajie > > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> wrote: >> >> OK, thanks. >> cc'ing Yajie in case he wants to comment. >> Dan >> >> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski <jan...@gm...> >> wrote: >> > Hi All, >> > >> > the wrapper is built during Kaldi compilation. I build it using provided >> > Makefile. The build depends on: >> > 1. Python and numpy (by default it queries the python interpreter found >> > on >> > the path for header file location) >> > 2. Boost with Boost::Python library. It is quite heavy to build, but >> > most >> > Linux distributions ship it. Boost python doesn't require any code >> > generation steps, the wrapper is defined in a normal c++ code file. >> > >> > During build Python and Boost libraries and Kaldi object files are >> > linked >> > into a CPython extention module, kaldi/src/python/kaldi_io_internal.so. >> > It >> > works with both static and shared Kaldi builds. Further usage requires >> > that >> > python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH - it >> > can >> > be for example added to the PYTHONPATH variable in the path.sh script of >> > a >> > recipe. >> > >> > Jan >> > >> > >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: >> >> >> >> Also, Jan- could you send us an email explaining how this works- >> >> How does Python "see" the C++ headers? Do you have to invoke some >> >> special program, like swig? Do you have to write some special kind of >> >> header that shows how the C++ objects are to be interpreted by python? >> >> A brief example would be helpful, if so. >> >> How is the resulting program linked, if at all? If you require >> >> functions C++ libraries, are these obtained from the .a or .so files >> >> at runtime, or compiled into some kind of executable-like blob at >> >> compile time? Does your framework require that Kaldi be compiled >> >> using dynamic (.so) libraries? >> >> >> >> Dan >> >> >> >> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski >> >> <jan...@gm...> >> >> wrote: >> >>> >> >>> Hello Dan, >> >>> >> >>> thank you for the comments. I tried to make it in the Kaldi spirit, >> >>> consistency is important. Of course, the scripts can be removed and >> >>> replaced >> >>> with some more useful examples. I don't have too much experience with >> >>> bridging Python to C++, so any critique on the wrappers and the >> >>> approach >> >>> taken is welcome. >> >>> >> >>> Jan >> >>> >> >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >> >>>> >> >>>> Hi all. >> >>>> From a first look, it does look very impressive, and nicely >> >>>> documented. >> >>>> I would appreciate it if people on the list who have Python >> >>>> experience >> >>>> would comment on this- you can either reply to this thread, or to me. >> >>>> I don't know if this has been done in the "natural" way, or if there >> >>>> is some reason why people in the future will say, "why did you do it >> >>>> this way, you should have done XXX". >> >>>> >> >>>> Jan: >> >>>> in the scripts/ directory you seem to have some examples of how you >> >>>> can create python programs that behave very much like Kaldi >> >>>> command-line programs, using your framework. This is very useful. >> >>>> However, the programs >> >>>> apply-global-cmvn.py >> >>>> compute-global-cmvn-stats.py >> >>>> are perhaps a little confusing because they provide the same >> >>>> functionality that you could get with "compute-cmvn-stats -> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; and they >> >>>> do so using different formats for the CMVN information. I know the >> >>>> format of storing the CMVN stats in a two-row matrix is perhaps not >> >>>> perfectly ideal, but it's a standard within Kaldi and it would be >> >>>> confusing to deviate from that standard. >> >>>> Of course, this is a very minor issue that doesn't affect the >> >>>> validity >> >>>> of the framework as a whole. I am just pointing this out; the main >> >>>> discussion should be about the framework and whether people feel it's >> >>>> the "right" way to do this. >> >>>> >> >>>> Dan >> >>>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski >> >>>> <jan...@gm...> >> >>>> wrote: >> >>>>> >> >>>>> Hi all! >> >>>>> >> >>>>> I've written wrappers to access Kaldi data files from within Python >> >>>>> using boost::python (the code is on github >> >>>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). >> >>>>> If >> >>>>> you think this would be an interesting addition please instruct me >> >>>>> how >> >>>>> to contribute. >> >>>>> >> >>>>> Best Regards, >> >>>>> Jan Chorowski >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> ------------------------------------------------------------------------------ >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and >> >>>>> Dashboards >> >>>>> with Interactivity, Sharing, Native Excel Exports, App Integration & >> >>>>> more >> >>>>> Get technology previously reserved for billion-dollar corporations, >> >>>>> FREE >> >>>>> >> >>>>> >> >>>>> >> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> >>>>> _______________________________________________ >> >>>>> Kaldi-developers mailing list >> >>>>> Kal...@li... >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >>> >> >>> >> > > > |
From: Matthew A. <mat...@gm...> - 2014-12-19 08:30:33
|
Hi Apologies, I've been snowed under here. I haven' had a chance to look over your work. I also don't have any views on the 'right' way to do it. My thoughts on this are in a previous thread. See subject "Using SWIG to wrap kaldi for python" where I discussed this with ondrej platek and Vassil Panayotov. In the idlak branch there is an example of python wrappers that I put together some time ago. These are based on SWIG. In the end I didn't need this at this stage because in the build system command line executables work very well. Its in run time wrappers are very useful. The advantage with SWIG is that the much of the same work will also contribute to C#, Java, Perl wrappers as well. In my experience the most important were Java wrappers to help produce a library for Android. I have no experience with C# and moved to Python from Perl so only use Perl in legacy code ;-). So some questions to consider: 1. Why is python wrapping required for training. using sys.Process to run command lines, structured output directories etc mirrors the current Perl recipes, what is the added benefit in this case? 2. If its for run time decoding shouldn't we create a cross platfom C API? Perhaps things have changed but C++ APIs were never cross compiler compatible in the past so you couldn't do stuff like compile using gnu and link in MSN. With a C interface you can distribute libraries. But I am possibly out of date on this. 3. If 2 is correct shouldn't we define our API and wrap that? Producing a formal list of functionality that should be exposed to things like client and server applications? I would encourage some care here. Unconstrained wrapping can lead to systems which HAVE to use the scripting language (We can already see how difficult it is to move away from the Perl scripting if you wish to). Also never, never, never reverse wrap (i.e. call python from within C++), yes it can be done but that way lays madness. v best Matthew On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> wrote: > > Jan- > I haven't seen any objections to your setup. I'd say we should plan > to include it in Kaldi at some point (e.g. within the next few > months), but in the meantime hopefully you can continue to work on it, > and maybe come up with some other examples of how it's useful to do > the interfacing with Python- e.g. some kind of application level or > service-level thing? > Dan > > > On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> wrote: > > Hi Jan, > > This is very nice work! In our PDNN toolkit, we also have simple python > > wrappers to read and write Kaldi features, mainly for DNN training. Your > > implementation looks like a more comprehensive version. > > > > Do you have the functions/commands to do feature splicing? I ask this > > because we found doing splicing on the fly with Python highly expensive. > > That's why we still stick to PFiles instead of Kaldi features (.scp .ark) > > for DNN triaining. I am very interested to know the efficiency of your > > splicing implementation. > > > > Thanks, > > Yajie > > > > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> wrote: > >> > >> OK, thanks. > >> cc'ing Yajie in case he wants to comment. > >> Dan > >> > >> > >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski <jan...@gm... > > > >> wrote: > >> > Hi All, > >> > > >> > the wrapper is built during Kaldi compilation. I build it using > provided > >> > Makefile. The build depends on: > >> > 1. Python and numpy (by default it queries the python interpreter > found > >> > on > >> > the path for header file location) > >> > 2. Boost with Boost::Python library. It is quite heavy to build, but > >> > most > >> > Linux distributions ship it. Boost python doesn't require any code > >> > generation steps, the wrapper is defined in a normal c++ code file. > >> > > >> > During build Python and Boost libraries and Kaldi object files are > >> > linked > >> > into a CPython extention module, > kaldi/src/python/kaldi_io_internal.so. > >> > It > >> > works with both static and shared Kaldi builds. Further usage requires > >> > that > >> > python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH - > it > >> > can > >> > be for example added to the PYTHONPATH variable in the path.sh script > of > >> > a > >> > recipe. > >> > > >> > Jan > >> > > >> > > >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: > >> >> > >> >> Also, Jan- could you send us an email explaining how this works- > >> >> How does Python "see" the C++ headers? Do you have to invoke some > >> >> special program, like swig? Do you have to write some special kind > of > >> >> header that shows how the C++ objects are to be interpreted by > python? > >> >> A brief example would be helpful, if so. > >> >> How is the resulting program linked, if at all? If you require > >> >> functions C++ libraries, are these obtained from the .a or .so files > >> >> at runtime, or compiled into some kind of executable-like blob at > >> >> compile time? Does your framework require that Kaldi be compiled > >> >> using dynamic (.so) libraries? > >> >> > >> >> Dan > >> >> > >> >> > >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski > >> >> <jan...@gm...> > >> >> wrote: > >> >>> > >> >>> Hello Dan, > >> >>> > >> >>> thank you for the comments. I tried to make it in the Kaldi spirit, > >> >>> consistency is important. Of course, the scripts can be removed and > >> >>> replaced > >> >>> with some more useful examples. I don't have too much experience > with > >> >>> bridging Python to C++, so any critique on the wrappers and the > >> >>> approach > >> >>> taken is welcome. > >> >>> > >> >>> Jan > >> >>> > >> >>> > >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: > >> >>>> > >> >>>> Hi all. > >> >>>> From a first look, it does look very impressive, and nicely > >> >>>> documented. > >> >>>> I would appreciate it if people on the list who have Python > >> >>>> experience > >> >>>> would comment on this- you can either reply to this thread, or to > me. > >> >>>> I don't know if this has been done in the "natural" way, or if > there > >> >>>> is some reason why people in the future will say, "why did you do > it > >> >>>> this way, you should have done XXX". > >> >>>> > >> >>>> Jan: > >> >>>> in the scripts/ directory you seem to have some examples of how you > >> >>>> can create python programs that behave very much like Kaldi > >> >>>> command-line programs, using your framework. This is very useful. > >> >>>> However, the programs > >> >>>> apply-global-cmvn.py > >> >>>> compute-global-cmvn-stats.py > >> >>>> are perhaps a little confusing because they provide the same > >> >>>> functionality that you could get with "compute-cmvn-stats -> > >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; and > they > >> >>>> do so using different formats for the CMVN information. I know the > >> >>>> format of storing the CMVN stats in a two-row matrix is perhaps not > >> >>>> perfectly ideal, but it's a standard within Kaldi and it would be > >> >>>> confusing to deviate from that standard. > >> >>>> Of course, this is a very minor issue that doesn't affect the > >> >>>> validity > >> >>>> of the framework as a whole. I am just pointing this out; the main > >> >>>> discussion should be about the framework and whether people feel > it's > >> >>>> the "right" way to do this. > >> >>>> > >> >>>> Dan > >> >>>> > >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski > >> >>>> <jan...@gm...> > >> >>>> wrote: > >> >>>>> > >> >>>>> Hi all! > >> >>>>> > >> >>>>> I've written wrappers to access Kaldi data files from within > Python > >> >>>>> using boost::python (the code is on github > >> >>>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python > ). > >> >>>>> If > >> >>>>> you think this would be an interesting addition please instruct me > >> >>>>> how > >> >>>>> to contribute. > >> >>>>> > >> >>>>> Best Regards, > >> >>>>> Jan Chorowski > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > ------------------------------------------------------------------------------ > >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and > >> >>>>> Dashboards > >> >>>>> with Interactivity, Sharing, Native Excel Exports, App > Integration & > >> >>>>> more > >> >>>>> Get technology previously reserved for billion-dollar > corporations, > >> >>>>> FREE > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>> > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > >> >>>>> _______________________________________________ > >> >>>>> Kaldi-developers mailing list > >> >>>>> Kal...@li... > >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >> >>> > >> >>> > >> > > > > > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Ondrej P. <ond...@gm...> - 2014-12-19 11:48:20
|
Hi Matthew, I made some subjective comments below. PS: Note that I like the proposed wrappers, but I am not sure how boost::python is easy to install on all supported platforms. On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <mat...@gm...> wrote: > > Hi > > Apologies, I've been snowed under here. > > I haven' had a chance to look over your work. I also don't have any views > on the 'right' way to do it. My thoughts on this are in a previous thread. > See subject "Using SWIG to wrap kaldi for python" where I discussed this > with ondrej platek and > Vassil Panayotov. > > In the idlak branch there is an example of python wrappers that I put > together some time ago. These are based on SWIG. In the end I didn't need > this at this stage because in the build system command line executables > work very well. Its in run time wrappers are very useful. The advantage > with SWIG is that the much of the same work will also contribute to C#, > Java, Perl wrappers as well. In my experience the most important were Java > wrappers to help produce a library for Android. I have no experience with > C# and moved to Python from Perl so only use Perl in legacy code ;-). > > So some questions to consider: > > 1. Why is python wrapping required for training. using sys.Process to run > command lines, structured output directories etc mirrors the current Perl > recipes, what is the added benefit in this case? > Well bash and Perl is the current scripting language for Kaldi. For example I prefer to use Python instead of both of them. > 2. If its for run time decoding shouldn't we create a cross platfom C > API? Perhaps things have changed but C++ APIs were never cross compiler > compatible in the past so you couldn't do stuff like compile using gnu and > link in MSN. With a C interface you can distribute libraries. But I am > possibly out of date on this. > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and I was not able to wrap OpenFST with just plain C (It may be possible). I used Cython and pyfst mainly because pyfst solved for me wrapping up OpenFST and I am really glad that 99% of wrapping OpenFST templates was carried out by somebody else (Victor Chahuneau). > > 3. If 2 is correct shouldn't we define our API and wrap that? Producing a > formal list of functionality that should be exposed to things like client > and server applications? > > I would encourage some care here. Unconstrained wrapping can lead to > systems which HAVE to use the scripting language (We can already see how > difficult it is to move away from the Perl scripting if you wish to). Also > never, never, never reverse wrap (i.e. call python from within C++), yes it > can be done but that way lays madness. > > v best > > Matthew > > > On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> wrote: >> >> Jan- >> I haven't seen any objections to your setup. I'd say we should plan >> to include it in Kaldi at some point (e.g. within the next few >> months), but in the meantime hopefully you can continue to work on it, >> and maybe come up with some other examples of how it's useful to do >> the interfacing with Python- e.g. some kind of application level or >> service-level thing? >> Dan >> >> >> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> wrote: >> > Hi Jan, >> > This is very nice work! In our PDNN toolkit, we also have simple python >> > wrappers to read and write Kaldi features, mainly for DNN training. Your >> > implementation looks like a more comprehensive version. >> > >> > Do you have the functions/commands to do feature splicing? I ask this >> > because we found doing splicing on the fly with Python highly expensive. >> > That's why we still stick to PFiles instead of Kaldi features (.scp >> .ark) >> > for DNN triaining. I am very interested to know the efficiency of your >> > splicing implementation. >> > >> > Thanks, >> > Yajie >> > >> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> wrote: >> >> >> >> OK, thanks. >> >> cc'ing Yajie in case he wants to comment. >> >> Dan >> >> >> >> >> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski < >> jan...@gm...> >> >> wrote: >> >> > Hi All, >> >> > >> >> > the wrapper is built during Kaldi compilation. I build it using >> provided >> >> > Makefile. The build depends on: >> >> > 1. Python and numpy (by default it queries the python interpreter >> found >> >> > on >> >> > the path for header file location) >> >> > 2. Boost with Boost::Python library. It is quite heavy to build, but >> >> > most >> >> > Linux distributions ship it. Boost python doesn't require any code >> >> > generation steps, the wrapper is defined in a normal c++ code file. >> >> > >> >> > During build Python and Boost libraries and Kaldi object files are >> >> > linked >> >> > into a CPython extention module, >> kaldi/src/python/kaldi_io_internal.so. >> >> > It >> >> > works with both static and shared Kaldi builds. Further usage >> requires >> >> > that >> >> > python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH >> - it >> >> > can >> >> > be for example added to the PYTHONPATH variable in the path.sh >> script of >> >> > a >> >> > recipe. >> >> > >> >> > Jan >> >> > >> >> > >> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: >> >> >> >> >> >> Also, Jan- could you send us an email explaining how this works- >> >> >> How does Python "see" the C++ headers? Do you have to invoke >> some >> >> >> special program, like swig? Do you have to write some special kind >> of >> >> >> header that shows how the C++ objects are to be interpreted by >> python? >> >> >> A brief example would be helpful, if so. >> >> >> How is the resulting program linked, if at all? If you require >> >> >> functions C++ libraries, are these obtained from the .a or .so files >> >> >> at runtime, or compiled into some kind of executable-like blob at >> >> >> compile time? Does your framework require that Kaldi be compiled >> >> >> using dynamic (.so) libraries? >> >> >> >> >> >> Dan >> >> >> >> >> >> >> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski >> >> >> <jan...@gm...> >> >> >> wrote: >> >> >>> >> >> >>> Hello Dan, >> >> >>> >> >> >>> thank you for the comments. I tried to make it in the Kaldi spirit, >> >> >>> consistency is important. Of course, the scripts can be removed and >> >> >>> replaced >> >> >>> with some more useful examples. I don't have too much experience >> with >> >> >>> bridging Python to C++, so any critique on the wrappers and the >> >> >>> approach >> >> >>> taken is welcome. >> >> >>> >> >> >>> Jan >> >> >>> >> >> >>> >> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >> >> >>>> >> >> >>>> Hi all. >> >> >>>> From a first look, it does look very impressive, and nicely >> >> >>>> documented. >> >> >>>> I would appreciate it if people on the list who have Python >> >> >>>> experience >> >> >>>> would comment on this- you can either reply to this thread, or to >> me. >> >> >>>> I don't know if this has been done in the "natural" way, or if >> there >> >> >>>> is some reason why people in the future will say, "why did you do >> it >> >> >>>> this way, you should have done XXX". >> >> >>>> >> >> >>>> Jan: >> >> >>>> in the scripts/ directory you seem to have some examples of how >> you >> >> >>>> can create python programs that behave very much like Kaldi >> >> >>>> command-line programs, using your framework. This is very useful. >> >> >>>> However, the programs >> >> >>>> apply-global-cmvn.py >> >> >>>> compute-global-cmvn-stats.py >> >> >>>> are perhaps a little confusing because they provide the same >> >> >>>> functionality that you could get with "compute-cmvn-stats -> >> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; and >> they >> >> >>>> do so using different formats for the CMVN information. I know >> the >> >> >>>> format of storing the CMVN stats in a two-row matrix is perhaps >> not >> >> >>>> perfectly ideal, but it's a standard within Kaldi and it would be >> >> >>>> confusing to deviate from that standard. >> >> >>>> Of course, this is a very minor issue that doesn't affect the >> >> >>>> validity >> >> >>>> of the framework as a whole. I am just pointing this out; the >> main >> >> >>>> discussion should be about the framework and whether people feel >> it's >> >> >>>> the "right" way to do this. >> >> >>>> >> >> >>>> Dan >> >> >>>> >> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski >> >> >>>> <jan...@gm...> >> >> >>>> wrote: >> >> >>>>> >> >> >>>>> Hi all! >> >> >>>>> >> >> >>>>> I've written wrappers to access Kaldi data files from within >> Python >> >> >>>>> using boost::python (the code is on github >> >> >>>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python >> ). >> >> >>>>> If >> >> >>>>> you think this would be an interesting addition please instruct >> me >> >> >>>>> how >> >> >>>>> to contribute. >> >> >>>>> >> >> >>>>> Best Regards, >> >> >>>>> Jan Chorowski >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> ------------------------------------------------------------------------------ >> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and >> >> >>>>> Dashboards >> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App >> Integration & >> >> >>>>> more >> >> >>>>> Get technology previously reserved for billion-dollar >> corporations, >> >> >>>>> FREE >> >> >>>>> >> >> >>>>> >> >> >>>>> >> >> >>>>> >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> >> >>>>> _______________________________________________ >> >> >>>>> Kaldi-developers mailing list >> >> >>>>> Kal...@li... >> >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> >>> >> >> >>> >> >> > >> > >> > >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > -- Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, ond...@gm... |
From: Daniel P. <dp...@gm...> - 2014-12-21 20:36:27
|
Jan, perhaps when you have time you could respond to the above comments on the list? I'm not 100% sure what to do about this. BTW, if we do include this, it will likely be optionally compiled, because I don't want the generic Kaldi compilation to be dependent on boost. Dan On Fri, Dec 19, 2014 at 3:48 AM, Ondrej Platek <ond...@gm...> wrote: > Hi Matthew, > > I made some subjective comments below. > > PS: Note that I like the proposed wrappers, but I am not sure how > boost::python is easy to install on all supported platforms. > > On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <mat...@gm...> > wrote: >> >> Hi >> >> Apologies, I've been snowed under here. >> >> I haven' had a chance to look over your work. I also don't have any views >> on the 'right' way to do it. My thoughts on this are in a previous thread. >> See subject "Using SWIG to wrap kaldi for python" where I discussed this >> with ondrej platek and >> Vassil Panayotov. >> >> In the idlak branch there is an example of python wrappers that I put >> together some time ago. These are based on SWIG. In the end I didn't need >> this at this stage because in the build system command line executables work >> very well. Its in run time wrappers are very useful. The advantage with SWIG >> is that the much of the same work will also contribute to C#, Java, Perl >> wrappers as well. In my experience the most important were Java wrappers to >> help produce a library for Android. I have no experience with C# and moved >> to Python from Perl so only use Perl in legacy code ;-). >> >> So some questions to consider: >> >> 1. Why is python wrapping required for training. using sys.Process to run >> command lines, structured output directories etc mirrors the current Perl >> recipes, what is the added benefit in this case? > > Well bash and Perl is the current scripting language for Kaldi. For example > I prefer to use Python instead of both of them. > >> >> 2. If its for run time decoding shouldn't we create a cross platfom C >> API? Perhaps things have changed but C++ APIs were never cross compiler >> compatible in the past so you couldn't do stuff like compile using gnu and >> link in MSN. With a C interface you can distribute libraries. But I am >> possibly out of date on this. > > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and I > was not able to wrap OpenFST with just plain C (It may be possible). > I used Cython and pyfst mainly because pyfst solved for me wrapping up > OpenFST and I am really glad that 99% of wrapping OpenFST templates was > carried out by somebody else (Victor Chahuneau). >> >> >> 3. If 2 is correct shouldn't we define our API and wrap that? Producing a >> formal list of functionality that should be exposed to things like client >> and server applications? >> >> >> I would encourage some care here. Unconstrained wrapping can lead to >> systems which HAVE to use the scripting language (We can already see how >> difficult it is to move away from the Perl scripting if you wish to). Also >> never, never, never reverse wrap (i.e. call python from within C++), yes it >> can be done but that way lays madness. >> >> v best >> >> Matthew >> >> >> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> wrote: >>> >>> Jan- >>> I haven't seen any objections to your setup. I'd say we should plan >>> to include it in Kaldi at some point (e.g. within the next few >>> months), but in the meantime hopefully you can continue to work on it, >>> and maybe come up with some other examples of how it's useful to do >>> the interfacing with Python- e.g. some kind of application level or >>> service-level thing? >>> Dan >>> >>> >>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> wrote: >>> > Hi Jan, >>> > This is very nice work! In our PDNN toolkit, we also have simple python >>> > wrappers to read and write Kaldi features, mainly for DNN training. >>> > Your >>> > implementation looks like a more comprehensive version. >>> > >>> > Do you have the functions/commands to do feature splicing? I ask this >>> > because we found doing splicing on the fly with Python highly >>> > expensive. >>> > That's why we still stick to PFiles instead of Kaldi features (.scp >>> > .ark) >>> > for DNN triaining. I am very interested to know the efficiency of your >>> > splicing implementation. >>> > >>> > Thanks, >>> > Yajie >>> > >>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> wrote: >>> >> >>> >> OK, thanks. >>> >> cc'ing Yajie in case he wants to comment. >>> >> Dan >>> >> >>> >> >>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski >>> >> <jan...@gm...> >>> >> wrote: >>> >> > Hi All, >>> >> > >>> >> > the wrapper is built during Kaldi compilation. I build it using >>> >> > provided >>> >> > Makefile. The build depends on: >>> >> > 1. Python and numpy (by default it queries the python interpreter >>> >> > found >>> >> > on >>> >> > the path for header file location) >>> >> > 2. Boost with Boost::Python library. It is quite heavy to build, but >>> >> > most >>> >> > Linux distributions ship it. Boost python doesn't require any code >>> >> > generation steps, the wrapper is defined in a normal c++ code file. >>> >> > >>> >> > During build Python and Boost libraries and Kaldi object files are >>> >> > linked >>> >> > into a CPython extention module, >>> >> > kaldi/src/python/kaldi_io_internal.so. >>> >> > It >>> >> > works with both static and shared Kaldi builds. Further usage >>> >> > requires >>> >> > that >>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH >>> >> > - it >>> >> > can >>> >> > be for example added to the PYTHONPATH variable in the path.sh >>> >> > script of >>> >> > a >>> >> > recipe. >>> >> > >>> >> > Jan >>> >> > >>> >> > >>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: >>> >> >> >>> >> >> Also, Jan- could you send us an email explaining how this works- >>> >> >> How does Python "see" the C++ headers? Do you have to invoke >>> >> >> some >>> >> >> special program, like swig? Do you have to write some special kind >>> >> >> of >>> >> >> header that shows how the C++ objects are to be interpreted by >>> >> >> python? >>> >> >> A brief example would be helpful, if so. >>> >> >> How is the resulting program linked, if at all? If you require >>> >> >> functions C++ libraries, are these obtained from the .a or .so >>> >> >> files >>> >> >> at runtime, or compiled into some kind of executable-like blob at >>> >> >> compile time? Does your framework require that Kaldi be compiled >>> >> >> using dynamic (.so) libraries? >>> >> >> >>> >> >> Dan >>> >> >> >>> >> >> >>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski >>> >> >> <jan...@gm...> >>> >> >> wrote: >>> >> >>> >>> >> >>> Hello Dan, >>> >> >>> >>> >> >>> thank you for the comments. I tried to make it in the Kaldi >>> >> >>> spirit, >>> >> >>> consistency is important. Of course, the scripts can be removed >>> >> >>> and >>> >> >>> replaced >>> >> >>> with some more useful examples. I don't have too much experience >>> >> >>> with >>> >> >>> bridging Python to C++, so any critique on the wrappers and the >>> >> >>> approach >>> >> >>> taken is welcome. >>> >> >>> >>> >> >>> Jan >>> >> >>> >>> >> >>> >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >>> >> >>>> >>> >> >>>> Hi all. >>> >> >>>> From a first look, it does look very impressive, and nicely >>> >> >>>> documented. >>> >> >>>> I would appreciate it if people on the list who have Python >>> >> >>>> experience >>> >> >>>> would comment on this- you can either reply to this thread, or to >>> >> >>>> me. >>> >> >>>> I don't know if this has been done in the "natural" way, or if >>> >> >>>> there >>> >> >>>> is some reason why people in the future will say, "why did you do >>> >> >>>> it >>> >> >>>> this way, you should have done XXX". >>> >> >>>> >>> >> >>>> Jan: >>> >> >>>> in the scripts/ directory you seem to have some examples of how >>> >> >>>> you >>> >> >>>> can create python programs that behave very much like Kaldi >>> >> >>>> command-line programs, using your framework. This is very >>> >> >>>> useful. >>> >> >>>> However, the programs >>> >> >>>> apply-global-cmvn.py >>> >> >>>> compute-global-cmvn-stats.py >>> >> >>>> are perhaps a little confusing because they provide the same >>> >> >>>> functionality that you could get with "compute-cmvn-stats -> >>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; and >>> >> >>>> they >>> >> >>>> do so using different formats for the CMVN information. I know >>> >> >>>> the >>> >> >>>> format of storing the CMVN stats in a two-row matrix is perhaps >>> >> >>>> not >>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it would be >>> >> >>>> confusing to deviate from that standard. >>> >> >>>> Of course, this is a very minor issue that doesn't affect the >>> >> >>>> validity >>> >> >>>> of the framework as a whole. I am just pointing this out; the >>> >> >>>> main >>> >> >>>> discussion should be about the framework and whether people feel >>> >> >>>> it's >>> >> >>>> the "right" way to do this. >>> >> >>>> >>> >> >>>> Dan >>> >> >>>> >>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski >>> >> >>>> <jan...@gm...> >>> >> >>>> wrote: >>> >> >>>>> >>> >> >>>>> Hi all! >>> >> >>>>> >>> >> >>>>> I've written wrappers to access Kaldi data files from within >>> >> >>>>> Python >>> >> >>>>> using boost::python (the code is on github >>> >> >>>>> >>> >> >>>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). >>> >> >>>>> If >>> >> >>>>> you think this would be an interesting addition please instruct >>> >> >>>>> me >>> >> >>>>> how >>> >> >>>>> to contribute. >>> >> >>>>> >>> >> >>>>> Best Regards, >>> >> >>>>> Jan Chorowski >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> ------------------------------------------------------------------------------ >>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT >>> >> >>>>> Server >>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and >>> >> >>>>> Dashboards >>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App >>> >> >>>>> Integration & >>> >> >>>>> more >>> >> >>>>> Get technology previously reserved for billion-dollar >>> >> >>>>> corporations, >>> >> >>>>> FREE >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> >> >>>>> _______________________________________________ >>> >> >>>>> Kaldi-developers mailing list >>> >> >>>>> Kal...@li... >>> >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> >> >>> >>> >> >>> >>> >> > >>> > >>> > >>> >>> >>> ------------------------------------------------------------------------------ >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >>> with Interactivity, Sharing, Native Excel Exports, App Integration & more >>> Get technology previously reserved for billion-dollar corporations, FREE >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-developers mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> > > > -- > Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, ond...@gm... |
From: Jan C. <jan...@gm...> - 2014-12-22 08:12:26
|
Hi All, First I'd like to clarify that the wrappers are only meant to read and write a limited set of objects (matrices, vectors, scalars, pairs) from .scp and .ark into Python with a Pythonic feel (dict-like interface for radom access iterators, iterator interface for sequential ones, context management). I didn't consider wrapping fst classes nor anything more specific to Kaldi. In their limited form the wrappers are useful to e.g. use some python neural network library to train nets on Kaldi's features and then use the Kaldi decoders, in a manner similar to Kaldi DNN recipes. I decided not to write the wrappers as generic SWIG wrappers because I target conversion from Kaldi data files to Python specific classes, such as ndarray. Wrappers for other languages would need to address other language-specific matrix libraries etc. I don't know SWIG too much and maybe it would be useful to make wrappers for several languages at once, though. Since the wrappers are optional, maybe the best way to proceed is to extract them into a separate repository, that can be built against against an installation of Kaldi and see whether it is used by anyone? I can work on it sometimes during the holiday break and let you know once it is published. @Yajie splicing a single utterance is easy. You pad it with zeros, then set the stride to be one frame and shape to be several ones. Thus it looks like one large spliced array. However, to train a network you want to shuffle examples (i.e. spliced frames) from multiple utterances - I resorted to reading a few hundred megs of data in to memory, then shuffling it, then reading another batch form the disk. I can send you the code if you would be interested. Jan On 12/21/2014 9:36 PM, Daniel Povey wrote: > Jan, perhaps when you have time you could respond to the above > comments on the list? > I'm not 100% sure what to do about this. > BTW, if we do include this, it will likely be optionally compiled, > because I don't want the generic Kaldi compilation to be dependent on > boost. > > Dan > > > On Fri, Dec 19, 2014 at 3:48 AM, Ondrej Platek <ond...@gm...> wrote: >> Hi Matthew, >> >> I made some subjective comments below. >> >> PS: Note that I like the proposed wrappers, but I am not sure how >> boost::python is easy to install on all supported platforms. >> >> On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <mat...@gm...> >> wrote: >>> Hi >>> >>> Apologies, I've been snowed under here. >>> >>> I haven' had a chance to look over your work. I also don't have any views >>> on the 'right' way to do it. My thoughts on this are in a previous thread. >>> See subject "Using SWIG to wrap kaldi for python" where I discussed this >>> with ondrej platek and >>> Vassil Panayotov. >>> >>> In the idlak branch there is an example of python wrappers that I put >>> together some time ago. These are based on SWIG. In the end I didn't need >>> this at this stage because in the build system command line executables work >>> very well. Its in run time wrappers are very useful. The advantage with SWIG >>> is that the much of the same work will also contribute to C#, Java, Perl >>> wrappers as well. In my experience the most important were Java wrappers to >>> help produce a library for Android. I have no experience with C# and moved >>> to Python from Perl so only use Perl in legacy code ;-). >>> >>> So some questions to consider: >>> >>> 1. Why is python wrapping required for training. using sys.Process to run >>> command lines, structured output directories etc mirrors the current Perl >>> recipes, what is the added benefit in this case? >> Well bash and Perl is the current scripting language for Kaldi. For example >> I prefer to use Python instead of both of them. >> >>> 2. If its for run time decoding shouldn't we create a cross platfom C >>> API? Perhaps things have changed but C++ APIs were never cross compiler >>> compatible in the past so you couldn't do stuff like compile using gnu and >>> link in MSN. With a C interface you can distribute libraries. But I am >>> possibly out of date on this. >> Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and I >> was not able to wrap OpenFST with just plain C (It may be possible). >> I used Cython and pyfst mainly because pyfst solved for me wrapping up >> OpenFST and I am really glad that 99% of wrapping OpenFST templates was >> carried out by somebody else (Victor Chahuneau). >>> >>> 3. If 2 is correct shouldn't we define our API and wrap that? Producing a >>> formal list of functionality that should be exposed to things like client >>> and server applications? >>> >>> >>> I would encourage some care here. Unconstrained wrapping can lead to >>> systems which HAVE to use the scripting language (We can already see how >>> difficult it is to move away from the Perl scripting if you wish to). Also >>> never, never, never reverse wrap (i.e. call python from within C++), yes it >>> can be done but that way lays madness. >>> >>> v best >>> >>> Matthew >>> >>> >>> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> wrote: >>>> Jan- >>>> I haven't seen any objections to your setup. I'd say we should plan >>>> to include it in Kaldi at some point (e.g. within the next few >>>> months), but in the meantime hopefully you can continue to work on it, >>>> and maybe come up with some other examples of how it's useful to do >>>> the interfacing with Python- e.g. some kind of application level or >>>> service-level thing? >>>> Dan >>>> >>>> >>>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> wrote: >>>>> Hi Jan, >>>>> This is very nice work! In our PDNN toolkit, we also have simple python >>>>> wrappers to read and write Kaldi features, mainly for DNN training. >>>>> Your >>>>> implementation looks like a more comprehensive version. >>>>> >>>>> Do you have the functions/commands to do feature splicing? I ask this >>>>> because we found doing splicing on the fly with Python highly >>>>> expensive. >>>>> That's why we still stick to PFiles instead of Kaldi features (.scp >>>>> .ark) >>>>> for DNN triaining. I am very interested to know the efficiency of your >>>>> splicing implementation. >>>>> >>>>> Thanks, >>>>> Yajie >>>>> >>>>> On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> wrote: >>>>>> OK, thanks. >>>>>> cc'ing Yajie in case he wants to comment. >>>>>> Dan >>>>>> >>>>>> >>>>>> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski >>>>>> <jan...@gm...> >>>>>> wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> the wrapper is built during Kaldi compilation. I build it using >>>>>>> provided >>>>>>> Makefile. The build depends on: >>>>>>> 1. Python and numpy (by default it queries the python interpreter >>>>>>> found >>>>>>> on >>>>>>> the path for header file location) >>>>>>> 2. Boost with Boost::Python library. It is quite heavy to build, but >>>>>>> most >>>>>>> Linux distributions ship it. Boost python doesn't require any code >>>>>>> generation steps, the wrapper is defined in a normal c++ code file. >>>>>>> >>>>>>> During build Python and Boost libraries and Kaldi object files are >>>>>>> linked >>>>>>> into a CPython extention module, >>>>>>> kaldi/src/python/kaldi_io_internal.so. >>>>>>> It >>>>>>> works with both static and shared Kaldi builds. Further usage >>>>>>> requires >>>>>>> that >>>>>>> python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH >>>>>>> - it >>>>>>> can >>>>>>> be for example added to the PYTHONPATH variable in the path.sh >>>>>>> script of >>>>>>> a >>>>>>> recipe. >>>>>>> >>>>>>> Jan >>>>>>> >>>>>>> >>>>>>> On 12/13/2014 3:33 PM, Daniel Povey wrote: >>>>>>>> Also, Jan- could you send us an email explaining how this works- >>>>>>>> How does Python "see" the C++ headers? Do you have to invoke >>>>>>>> some >>>>>>>> special program, like swig? Do you have to write some special kind >>>>>>>> of >>>>>>>> header that shows how the C++ objects are to be interpreted by >>>>>>>> python? >>>>>>>> A brief example would be helpful, if so. >>>>>>>> How is the resulting program linked, if at all? If you require >>>>>>>> functions C++ libraries, are these obtained from the .a or .so >>>>>>>> files >>>>>>>> at runtime, or compiled into some kind of executable-like blob at >>>>>>>> compile time? Does your framework require that Kaldi be compiled >>>>>>>> using dynamic (.so) libraries? >>>>>>>> >>>>>>>> Dan >>>>>>>> >>>>>>>> >>>>>>>> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski >>>>>>>> <jan...@gm...> >>>>>>>> wrote: >>>>>>>>> Hello Dan, >>>>>>>>> >>>>>>>>> thank you for the comments. I tried to make it in the Kaldi >>>>>>>>> spirit, >>>>>>>>> consistency is important. Of course, the scripts can be removed >>>>>>>>> and >>>>>>>>> replaced >>>>>>>>> with some more useful examples. I don't have too much experience >>>>>>>>> with >>>>>>>>> bridging Python to C++, so any critique on the wrappers and the >>>>>>>>> approach >>>>>>>>> taken is welcome. >>>>>>>>> >>>>>>>>> Jan >>>>>>>>> >>>>>>>>> >>>>>>>>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >>>>>>>>>> Hi all. >>>>>>>>>> From a first look, it does look very impressive, and nicely >>>>>>>>>> documented. >>>>>>>>>> I would appreciate it if people on the list who have Python >>>>>>>>>> experience >>>>>>>>>> would comment on this- you can either reply to this thread, or to >>>>>>>>>> me. >>>>>>>>>> I don't know if this has been done in the "natural" way, or if >>>>>>>>>> there >>>>>>>>>> is some reason why people in the future will say, "why did you do >>>>>>>>>> it >>>>>>>>>> this way, you should have done XXX". >>>>>>>>>> >>>>>>>>>> Jan: >>>>>>>>>> in the scripts/ directory you seem to have some examples of how >>>>>>>>>> you >>>>>>>>>> can create python programs that behave very much like Kaldi >>>>>>>>>> command-line programs, using your framework. This is very >>>>>>>>>> useful. >>>>>>>>>> However, the programs >>>>>>>>>> apply-global-cmvn.py >>>>>>>>>> compute-global-cmvn-stats.py >>>>>>>>>> are perhaps a little confusing because they provide the same >>>>>>>>>> functionality that you could get with "compute-cmvn-stats -> >>>>>>>>>> matrix-sum" and "apply-cmvn" on the output of that command; and >>>>>>>>>> they >>>>>>>>>> do so using different formats for the CMVN information. I know >>>>>>>>>> the >>>>>>>>>> format of storing the CMVN stats in a two-row matrix is perhaps >>>>>>>>>> not >>>>>>>>>> perfectly ideal, but it's a standard within Kaldi and it would be >>>>>>>>>> confusing to deviate from that standard. >>>>>>>>>> Of course, this is a very minor issue that doesn't affect the >>>>>>>>>> validity >>>>>>>>>> of the framework as a whole. I am just pointing this out; the >>>>>>>>>> main >>>>>>>>>> discussion should be about the framework and whether people feel >>>>>>>>>> it's >>>>>>>>>> the "right" way to do this. >>>>>>>>>> >>>>>>>>>> Dan >>>>>>>>>> >>>>>>>>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski >>>>>>>>>> <jan...@gm...> >>>>>>>>>> wrote: >>>>>>>>>>> Hi all! >>>>>>>>>>> >>>>>>>>>>> I've written wrappers to access Kaldi data files from within >>>>>>>>>>> Python >>>>>>>>>>> using boost::python (the code is on github >>>>>>>>>>> >>>>>>>>>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). >>>>>>>>>>> If >>>>>>>>>>> you think this would be an interesting addition please instruct >>>>>>>>>>> me >>>>>>>>>>> how >>>>>>>>>>> to contribute. >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Jan Chorowski >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT >>>>>>>>>>> Server >>>>>>>>>>> from Actuate! Instantly Supercharge Your Business Reports and >>>>>>>>>>> Dashboards >>>>>>>>>>> with Interactivity, Sharing, Native Excel Exports, App >>>>>>>>>>> Integration & >>>>>>>>>>> more >>>>>>>>>>> Get technology previously reserved for billion-dollar >>>>>>>>>>> corporations, >>>>>>>>>>> FREE >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Kaldi-developers mailing list >>>>>>>>>>> Kal...@li... >>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>>>>>>> >>>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >>>> with Interactivity, Sharing, Native Excel Exports, App Integration & more >>>> Get technology previously reserved for billion-dollar corporations, FREE >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Kaldi-developers mailing list >>>> Kal...@li... >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> >>> >>> ------------------------------------------------------------------------------ >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >>> with Interactivity, Sharing, Native Excel Exports, App Integration & more >>> Get technology previously reserved for billion-dollar corporations, FREE >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-developers mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> >> >> -- >> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, ond...@gm... |
From: Matthew A. <mat...@gm...> - 2014-12-22 09:03:49
|
Hi So asI understand it we have two rather different objectives. 1. Use python libraries and utilities to advance and prototype training i.e with neural nets. So this is the build side. 2. Wrap Kaldi for a future API to be used by say client and server apps. For 1. I would question how important it is to wrap the low level data strucutures, instead of say, converting/rereading from disk. I guess you want to mix and match fuctionality during training on the same data. Alo, a very useful function of doing thisis that it 'opens' up the system making allowing interaction and testing. More important actually for synthesis where less operations require multi-core batch style processing. For 2. we should specify an API for run time, we shouldn't need to wrap FST as that is to low level and its better to write utiltity functions and wrap them. Here multi language support is more useful. Might be nice to do both the same sort of way but depending on boost could be a little extreme for doing it. Matthew On Mon, Dec 22, 2014 at 8:12 AM, Jan Chorowski <jan...@gm...> wrote: > Hi All, > > First I'd like to clarify that the wrappers are only meant to read and > write a limited set of objects (matrices, vectors, scalars, pairs) from > .scp and .ark into Python with a Pythonic feel (dict-like interface for > radom access iterators, iterator interface for sequential ones, context > management). I didn't consider wrapping fst classes nor anything more > specific to Kaldi. > > In their limited form the wrappers are useful to e.g. use some python > neural network library to train nets on Kaldi's features and then use the > Kaldi decoders, in a manner similar to Kaldi DNN recipes. > > I decided not to write the wrappers as generic SWIG wrappers because I > target conversion from Kaldi data files to Python specific classes, such as > ndarray. Wrappers for other languages would need to address other > language-specific matrix libraries etc. I don't know SWIG too much and > maybe it would be useful to make wrappers for several languages at once, > though. > > Since the wrappers are optional, maybe the best way to proceed is to > extract them into a separate repository, that can be built against against > an installation of Kaldi and see whether it is used by anyone? I can work > on it sometimes during the holiday break and let you know once it is > published. > > @Yajie > splicing a single utterance is easy. You pad it with zeros, then set the > stride to be one frame and shape to be several ones. Thus it looks like one > large spliced array. However, to train a network you want to shuffle > examples (i.e. spliced frames) from multiple utterances - I resorted to > reading a few hundred megs of data in to memory, then shuffling it, then > reading another batch form the disk. I can send you the code if you would > be interested. > > Jan > > > On 12/21/2014 9:36 PM, Daniel Povey wrote: > >> Jan, perhaps when you have time you could respond to the above >> comments on the list? >> I'm not 100% sure what to do about this. >> BTW, if we do include this, it will likely be optionally compiled, >> because I don't want the generic Kaldi compilation to be dependent on >> boost. >> >> Dan >> >> >> On Fri, Dec 19, 2014 at 3:48 AM, Ondrej Platek <ond...@gm...> >> wrote: >> >>> Hi Matthew, >>> >>> I made some subjective comments below. >>> >>> PS: Note that I like the proposed wrappers, but I am not sure how >>> boost::python is easy to install on all supported platforms. >>> >>> On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <mat...@gm... >>> > >>> wrote: >>> >>>> Hi >>>> >>>> Apologies, I've been snowed under here. >>>> >>>> I haven' had a chance to look over your work. I also don't have any >>>> views >>>> on the 'right' way to do it. My thoughts on this are in a previous >>>> thread. >>>> See subject "Using SWIG to wrap kaldi for python" where I discussed this >>>> with ondrej platek and >>>> Vassil Panayotov. >>>> >>>> In the idlak branch there is an example of python wrappers that I put >>>> together some time ago. These are based on SWIG. In the end I didn't >>>> need >>>> this at this stage because in the build system command line executables >>>> work >>>> very well. Its in run time wrappers are very useful. The advantage with >>>> SWIG >>>> is that the much of the same work will also contribute to C#, Java, Perl >>>> wrappers as well. In my experience the most important were Java >>>> wrappers to >>>> help produce a library for Android. I have no experience with C# and >>>> moved >>>> to Python from Perl so only use Perl in legacy code ;-). >>>> >>>> So some questions to consider: >>>> >>>> 1. Why is python wrapping required for training. using sys.Process to >>>> run >>>> command lines, structured output directories etc mirrors the current >>>> Perl >>>> recipes, what is the added benefit in this case? >>>> >>> Well bash and Perl is the current scripting language for Kaldi. For >>> example >>> I prefer to use Python instead of both of them. >>> >>> 2. If its for run time decoding shouldn't we create a cross platfom C >>>> API? Perhaps things have changed but C++ APIs were never cross compiler >>>> compatible in the past so you couldn't do stuff like compile using gnu >>>> and >>>> link in MSN. With a C interface you can distribute libraries. But I am >>>> possibly out of date on this. >>>> >>> Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and I >>> was not able to wrap OpenFST with just plain C (It may be possible). >>> I used Cython and pyfst mainly because pyfst solved for me wrapping up >>> OpenFST and I am really glad that 99% of wrapping OpenFST templates was >>> carried out by somebody else (Victor Chahuneau). >>> >>>> >>>> 3. If 2 is correct shouldn't we define our API and wrap that? Producing >>>> a >>>> formal list of functionality that should be exposed to things like >>>> client >>>> and server applications? >>>> >>>> >>>> I would encourage some care here. Unconstrained wrapping can lead to >>>> systems which HAVE to use the scripting language (We can already see how >>>> difficult it is to move away from the Perl scripting if you wish to). >>>> Also >>>> never, never, never reverse wrap (i.e. call python from within C++), >>>> yes it >>>> can be done but that way lays madness. >>>> >>>> v best >>>> >>>> Matthew >>>> >>>> >>>> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> >>>> wrote: >>>> >>>>> Jan- >>>>> I haven't seen any objections to your setup. I'd say we should plan >>>>> to include it in Kaldi at some point (e.g. within the next few >>>>> months), but in the meantime hopefully you can continue to work on it, >>>>> and maybe come up with some other examples of how it's useful to do >>>>> the interfacing with Python- e.g. some kind of application level or >>>>> service-level thing? >>>>> Dan >>>>> >>>>> >>>>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> >>>>> wrote: >>>>> >>>>>> Hi Jan, >>>>>> This is very nice work! In our PDNN toolkit, we also have simple >>>>>> python >>>>>> wrappers to read and write Kaldi features, mainly for DNN training. >>>>>> Your >>>>>> implementation looks like a more comprehensive version. >>>>>> >>>>>> Do you have the functions/commands to do feature splicing? I ask this >>>>>> because we found doing splicing on the fly with Python highly >>>>>> expensive. >>>>>> That's why we still stick to PFiles instead of Kaldi features (.scp >>>>>> .ark) >>>>>> for DNN triaining. I am very interested to know the efficiency of >>>>>> your >>>>>> splicing implementation. >>>>>> >>>>>> Thanks, >>>>>> Yajie >>>>>> >>>>>> On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> >>>>>> wrote: >>>>>> >>>>>>> OK, thanks. >>>>>>> cc'ing Yajie in case he wants to comment. >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski >>>>>>> <jan...@gm...> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> the wrapper is built during Kaldi compilation. I build it using >>>>>>>> provided >>>>>>>> Makefile. The build depends on: >>>>>>>> 1. Python and numpy (by default it queries the python interpreter >>>>>>>> found >>>>>>>> on >>>>>>>> the path for header file location) >>>>>>>> 2. Boost with Boost::Python library. It is quite heavy to build, but >>>>>>>> most >>>>>>>> Linux distributions ship it. Boost python doesn't require any code >>>>>>>> generation steps, the wrapper is defined in a normal c++ code file. >>>>>>>> >>>>>>>> During build Python and Boost libraries and Kaldi object files are >>>>>>>> linked >>>>>>>> into a CPython extention module, >>>>>>>> kaldi/src/python/kaldi_io_internal.so. >>>>>>>> It >>>>>>>> works with both static and shared Kaldi builds. Further usage >>>>>>>> requires >>>>>>>> that >>>>>>>> python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH >>>>>>>> - it >>>>>>>> can >>>>>>>> be for example added to the PYTHONPATH variable in the path.sh >>>>>>>> script of >>>>>>>> a >>>>>>>> recipe. >>>>>>>> >>>>>>>> Jan >>>>>>>> >>>>>>>> >>>>>>>> On 12/13/2014 3:33 PM, Daniel Povey wrote: >>>>>>>> >>>>>>>>> Also, Jan- could you send us an email explaining how this works- >>>>>>>>> How does Python "see" the C++ headers? Do you have to invoke >>>>>>>>> some >>>>>>>>> special program, like swig? Do you have to write some special kind >>>>>>>>> of >>>>>>>>> header that shows how the C++ objects are to be interpreted by >>>>>>>>> python? >>>>>>>>> A brief example would be helpful, if so. >>>>>>>>> How is the resulting program linked, if at all? If you require >>>>>>>>> functions C++ libraries, are these obtained from the .a or .so >>>>>>>>> files >>>>>>>>> at runtime, or compiled into some kind of executable-like blob at >>>>>>>>> compile time? Does your framework require that Kaldi be compiled >>>>>>>>> using dynamic (.so) libraries? >>>>>>>>> >>>>>>>>> Dan >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski >>>>>>>>> <jan...@gm...> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hello Dan, >>>>>>>>>> >>>>>>>>>> thank you for the comments. I tried to make it in the Kaldi >>>>>>>>>> spirit, >>>>>>>>>> consistency is important. Of course, the scripts can be removed >>>>>>>>>> and >>>>>>>>>> replaced >>>>>>>>>> with some more useful examples. I don't have too much experience >>>>>>>>>> with >>>>>>>>>> bridging Python to C++, so any critique on the wrappers and the >>>>>>>>>> approach >>>>>>>>>> taken is welcome. >>>>>>>>>> >>>>>>>>>> Jan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >>>>>>>>>> >>>>>>>>>>> Hi all. >>>>>>>>>>> From a first look, it does look very impressive, and nicely >>>>>>>>>>> documented. >>>>>>>>>>> I would appreciate it if people on the list who have Python >>>>>>>>>>> experience >>>>>>>>>>> would comment on this- you can either reply to this thread, or to >>>>>>>>>>> me. >>>>>>>>>>> I don't know if this has been done in the "natural" way, or if >>>>>>>>>>> there >>>>>>>>>>> is some reason why people in the future will say, "why did you do >>>>>>>>>>> it >>>>>>>>>>> this way, you should have done XXX". >>>>>>>>>>> >>>>>>>>>>> Jan: >>>>>>>>>>> in the scripts/ directory you seem to have some examples of how >>>>>>>>>>> you >>>>>>>>>>> can create python programs that behave very much like Kaldi >>>>>>>>>>> command-line programs, using your framework. This is very >>>>>>>>>>> useful. >>>>>>>>>>> However, the programs >>>>>>>>>>> apply-global-cmvn.py >>>>>>>>>>> compute-global-cmvn-stats.py >>>>>>>>>>> are perhaps a little confusing because they provide the same >>>>>>>>>>> functionality that you could get with "compute-cmvn-stats -> >>>>>>>>>>> matrix-sum" and "apply-cmvn" on the output of that command; and >>>>>>>>>>> they >>>>>>>>>>> do so using different formats for the CMVN information. I know >>>>>>>>>>> the >>>>>>>>>>> format of storing the CMVN stats in a two-row matrix is perhaps >>>>>>>>>>> not >>>>>>>>>>> perfectly ideal, but it's a standard within Kaldi and it would be >>>>>>>>>>> confusing to deviate from that standard. >>>>>>>>>>> Of course, this is a very minor issue that doesn't affect the >>>>>>>>>>> validity >>>>>>>>>>> of the framework as a whole. I am just pointing this out; the >>>>>>>>>>> main >>>>>>>>>>> discussion should be about the framework and whether people feel >>>>>>>>>>> it's >>>>>>>>>>> the "right" way to do this. >>>>>>>>>>> >>>>>>>>>>> Dan >>>>>>>>>>> >>>>>>>>>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski >>>>>>>>>>> <jan...@gm...> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi all! >>>>>>>>>>>> >>>>>>>>>>>> I've written wrappers to access Kaldi data files from within >>>>>>>>>>>> Python >>>>>>>>>>>> using boost::python (the code is on github >>>>>>>>>>>> >>>>>>>>>>>> https://github.com/janchorowski/kaldi-git/tree/ >>>>>>>>>>>> python/src/python). >>>>>>>>>>>> If >>>>>>>>>>>> you think this would be an interesting addition please instruct >>>>>>>>>>>> me >>>>>>>>>>>> how >>>>>>>>>>>> to contribute. >>>>>>>>>>>> >>>>>>>>>>>> Best Regards, >>>>>>>>>>>> Jan Chorowski >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>>> ------------------ >>>>>>>>>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT >>>>>>>>>>>> Server >>>>>>>>>>>> from Actuate! Instantly Supercharge Your Business Reports and >>>>>>>>>>>> Dashboards >>>>>>>>>>>> with Interactivity, Sharing, Native Excel Exports, App >>>>>>>>>>>> Integration & >>>>>>>>>>>> more >>>>>>>>>>>> Get technology previously reserved for billion-dollar >>>>>>>>>>>> corporations, >>>>>>>>>>>> FREE >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151& >>>>>>>>>>>> iu=/4140/ostg.clktrk >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Kaldi-developers mailing list >>>>>>>>>>>> Kal...@li... >>>>>>>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> ------------------------------------------------------------ >>>>> ------------------ >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>>>> from Actuate! Instantly Supercharge Your Business Reports and >>>>> Dashboards >>>>> with Interactivity, Sharing, Native Excel Exports, App Integration & >>>>> more >>>>> Get technology previously reserved for billion-dollar corporations, >>>>> FREE >>>>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151& >>>>> iu=/4140/ostg.clktrk >>>>> _______________________________________________ >>>>> Kaldi-developers mailing list >>>>> Kal...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>>> >>>> >>>> >>>> ------------------------------------------------------------ >>>> ------------------ >>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >>>> with Interactivity, Sharing, Native Excel Exports, App Integration & >>>> more >>>> Get technology previously reserved for billion-dollar corporations, FREE >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151& >>>> iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Kaldi-developers mailing list >>>> Kal...@li... >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>> >>>> >>> -- >>> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, >>> ond...@gm... >>> >> > |
From: Sean T. <se...@se...> - 2014-12-29 06:52:20
|
I wanted to echo Ondrej's comment about preferring Python to bash/perl for scripting. Python wrappers for the command line utilities are useful ... I've spent a few hours systematically wrapping them, parsing the output of the --help command as a guide to functionality. This gives wrappers of the general form: def acc_lda(transition_gmm_model, features_rspecifier, posteriors_rspecifier, lda_acc_out, *args, **kwargs): """Accumulate LDA statistics based on pdf-ids. Executable usage: acc-lda [options] <transition-gmm/model> <features-rspecifier> <posteriors-rspecifier> <lda-acc-out> Options: binary: Write accumulators in binary mode. (bool,true) rand_prune: Randomized pruning threshold for posteriors (float,0)""" cmd = sh.Command(kaldi_path("src/bin/acc-lda")) option_defs = {'binary': ('binary', 'bool', 'true'), 'help': ('help', 'bool', 'false'), 'rand_prune': ('rand-prune', 'float', '0'), 'config': ('config', 'string', ''), 'print_args': ('print-args', 'bool', 'true'), 'verbose': ('verbose', 'int', '0')} myOptions = create_options(option_defs, kwargs) myArgs = [transition_gmm_model, features_rspecifier, posteriors_rspecifier, lda_acc_out]+list(args) return cmd (myOptions + myArgs) There are some refinements that could be added (*args does not make sense for this function). Because of the rather elegant Python sh package ( https://pypi.python.org/pypi/sh) these functions will create pipelines if composed: >>> from sh import ls, wc >>> wc(ls(".")) 8 23 222 There are a few places where constructing from help output is not straightforward (for instance, fstrand --help does not do the expected thing). -- Sean On Fri, Dec 19, 2014 at 6:48 AM, Ondrej Platek <ond...@gm...> wrote: > > Hi Matthew, > > I made some subjective comments below. > > PS: Note that I like the proposed wrappers, but I am not sure how boost::python is easy to install on all supported platforms. > > On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <mat...@gm...> wrote: >> >> Hi >> >> Apologies, I've been snowed under here. >> >> I haven' had a chance to look over your work. I also don't have any views on the 'right' way to do it. My thoughts on this are in a previous thread. See subject "Using SWIG to wrap kaldi for python" where I discussed this with ondrej platek and >> Vassil Panayotov. >> >> In the idlak branch there is an example of python wrappers that I put together some time ago. These are based on SWIG. In the end I didn't need this at this stage because in the build system command line executables work very well. Its in run time wrappers are very useful. The advantage with SWIG is that the much of the same work will also contribute to C#, Java, Perl wrappers as well. In my experience the most important were Java wrappers to help produce a library for Android. I have no experience with C# and moved to Python from Perl so only use Perl in legacy code ;-). >> >> So some questions to consider: >> >> 1. Why is python wrapping required for training. using sys.Process to run command lines, structured output directories etc mirrors the current Perl recipes, what is the added benefit in this case? > > Well bash and Perl is the current scripting language for Kaldi. For example I prefer to use Python instead of both of them. > >> >> 2. If its for run time decoding shouldn't we create a cross platfom C API? Perhaps things have changed but C++ APIs were never cross compiler compatible in the past so you couldn't do stuff like compile using gnu and link in MSN. With a C interface you can distribute libraries. But I am possibly out of date on this. > > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and I was not able to wrap OpenFST with just plain C (It may be possible). > I used Cython and pyfst mainly because pyfst solved for me wrapping up OpenFST and I am really glad that 99% of wrapping OpenFST templates was carried out by somebody else (Victor Chahuneau). >> >> >> 3. If 2 is correct shouldn't we define our API and wrap that? Producing a formal list of functionality that should be exposed to things like client and server applications? >> >> >> I would encourage some care here. Unconstrained wrapping can lead to systems which HAVE to use the scripting language (We can already see how difficult it is to move away from the Perl scripting if you wish to). Also never, never, never reverse wrap (i.e. call python from within C++), yes it can be done but that way lays madness. >> >> v best >> >> Matthew >> >> >> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> wrote: >>> >>> Jan- >>> I haven't seen any objections to your setup. I'd say we should plan >>> to include it in Kaldi at some point (e.g. within the next few >>> months), but in the meantime hopefully you can continue to work on it, >>> and maybe come up with some other examples of how it's useful to do >>> the interfacing with Python- e.g. some kind of application level or >>> service-level thing? >>> Dan >>> >>> >>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> wrote: >>> > Hi Jan, >>> > This is very nice work! In our PDNN toolkit, we also have simple python >>> > wrappers to read and write Kaldi features, mainly for DNN training. Your >>> > implementation looks like a more comprehensive version. >>> > >>> > Do you have the functions/commands to do feature splicing? I ask this >>> > because we found doing splicing on the fly with Python highly expensive. >>> > That's why we still stick to PFiles instead of Kaldi features (.scp .ark) >>> > for DNN triaining. I am very interested to know the efficiency of your >>> > splicing implementation. >>> > >>> > Thanks, >>> > Yajie >>> > >>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> wrote: >>> >> >>> >> OK, thanks. >>> >> cc'ing Yajie in case he wants to comment. >>> >> Dan >>> >> >>> >> >>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski < jan...@gm...> >>> >> wrote: >>> >> > Hi All, >>> >> > >>> >> > the wrapper is built during Kaldi compilation. I build it using provided >>> >> > Makefile. The build depends on: >>> >> > 1. Python and numpy (by default it queries the python interpreter found >>> >> > on >>> >> > the path for header file location) >>> >> > 2. Boost with Boost::Python library. It is quite heavy to build, but >>> >> > most >>> >> > Linux distributions ship it. Boost python doesn't require any code >>> >> > generation steps, the wrapper is defined in a normal c++ code file. >>> >> > >>> >> > During build Python and Boost libraries and Kaldi object files are >>> >> > linked >>> >> > into a CPython extention module, kaldi/src/python/kaldi_io_internal.so. >>> >> > It >>> >> > works with both static and shared Kaldi builds. Further usage requires >>> >> > that >>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the PYTHONPATH - it >>> >> > can >>> >> > be for example added to the PYTHONPATH variable in the path.sh script of >>> >> > a >>> >> > recipe. >>> >> > >>> >> > Jan >>> >> > >>> >> > >>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: >>> >> >> >>> >> >> Also, Jan- could you send us an email explaining how this works- >>> >> >> How does Python "see" the C++ headers? Do you have to invoke some >>> >> >> special program, like swig? Do you have to write some special kind of >>> >> >> header that shows how the C++ objects are to be interpreted by python? >>> >> >> A brief example would be helpful, if so. >>> >> >> How is the resulting program linked, if at all? If you require >>> >> >> functions C++ libraries, are these obtained from the .a or .so files >>> >> >> at runtime, or compiled into some kind of executable-like blob at >>> >> >> compile time? Does your framework require that Kaldi be compiled >>> >> >> using dynamic (.so) libraries? >>> >> >> >>> >> >> Dan >>> >> >> >>> >> >> >>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski >>> >> >> <jan...@gm...> >>> >> >> wrote: >>> >> >>> >>> >> >>> Hello Dan, >>> >> >>> >>> >> >>> thank you for the comments. I tried to make it in the Kaldi spirit, >>> >> >>> consistency is important. Of course, the scripts can be removed and >>> >> >>> replaced >>> >> >>> with some more useful examples. I don't have too much experience with >>> >> >>> bridging Python to C++, so any critique on the wrappers and the >>> >> >>> approach >>> >> >>> taken is welcome. >>> >> >>> >>> >> >>> Jan >>> >> >>> >>> >> >>> >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >>> >> >>>> >>> >> >>>> Hi all. >>> >> >>>> From a first look, it does look very impressive, and nicely >>> >> >>>> documented. >>> >> >>>> I would appreciate it if people on the list who have Python >>> >> >>>> experience >>> >> >>>> would comment on this- you can either reply to this thread, or to me. >>> >> >>>> I don't know if this has been done in the "natural" way, or if there >>> >> >>>> is some reason why people in the future will say, "why did you do it >>> >> >>>> this way, you should have done XXX". >>> >> >>>> >>> >> >>>> Jan: >>> >> >>>> in the scripts/ directory you seem to have some examples of how you >>> >> >>>> can create python programs that behave very much like Kaldi >>> >> >>>> command-line programs, using your framework. This is very useful. >>> >> >>>> However, the programs >>> >> >>>> apply-global-cmvn.py >>> >> >>>> compute-global-cmvn-stats.py >>> >> >>>> are perhaps a little confusing because they provide the same >>> >> >>>> functionality that you could get with "compute-cmvn-stats -> >>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; and they >>> >> >>>> do so using different formats for the CMVN information. I know the >>> >> >>>> format of storing the CMVN stats in a two-row matrix is perhaps not >>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it would be >>> >> >>>> confusing to deviate from that standard. >>> >> >>>> Of course, this is a very minor issue that doesn't affect the >>> >> >>>> validity >>> >> >>>> of the framework as a whole. I am just pointing this out; the main >>> >> >>>> discussion should be about the framework and whether people feel it's >>> >> >>>> the "right" way to do this. >>> >> >>>> >>> >> >>>> Dan >>> >> >>>> >>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski >>> >> >>>> <jan...@gm...> >>> >> >>>> wrote: >>> >> >>>>> >>> >> >>>>> Hi all! >>> >> >>>>> >>> >> >>>>> I've written wrappers to access Kaldi data files from within Python >>> >> >>>>> using boost::python (the code is on github >>> >> >>>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). >>> >> >>>>> If >>> >> >>>>> you think this would be an interesting addition please instruct me >>> >> >>>>> how >>> >> >>>>> to contribute. >>> >> >>>>> >>> >> >>>>> Best Regards, >>> >> >>>>> Jan Chorowski >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> ------------------------------------------------------------------------------ >>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and >>> >> >>>>> Dashboards >>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App Integration & >>> >> >>>>> more >>> >> >>>>> Get technology previously reserved for billion-dollar corporations, >>> >> >>>>> FREE >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> >> >>>>> _______________________________________________ >>> >> >>>>> Kaldi-developers mailing list >>> >> >>>>> Kal...@li... >>> >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> >> >>> >>> >> >>> >>> >> > >>> > >>> > >>> >>> ------------------------------------------------------------------------------ >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >>> with Interactivity, Sharing, Native Excel Exports, App Integration & more >>> Get technology previously reserved for billion-dollar corporations, FREE >>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-developers mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> > > > -- > Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, ond...@gm... > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |
From: Sean T. <se...@se...> - 2015-02-16 14:18:44
|
I wanted to bring up the integration of the very useful kaldi_io package that Jan Chorowski made available in December. Is there any consensus on whether to provide this code as (probably an optional) part of the Kaldi release? I understand that Boost-Python is a relatively heavy requirement, but it is easily available on OSX and Linux. I continue to wrap the executables themselves in python functional wrappers, which has made the integration with other software easier, and contributes to pipeline testability and robustness. -- Sean On Fri, Dec 26, 2014 at 11:02 AM, Sean True <se...@se...> wrote: > I wanted to echo Ondrej's comment about preferring Python to bash/perl for > scripting. Python wrappers for the command line utilities are useful ... > I've spent a few hours systematically wrapping them, parsing the output of > the --help command as a guide to functionality. > > This gives wrappers of the general form: > > def acc_lda(transition_gmm_model, features_rspecifier, > posteriors_rspecifier, lda_acc_out, *args, **kwargs): > """Accumulate LDA statistics based on pdf-ids. > Executable usage: acc-lda [options] <transition-gmm/model> > <features-rspecifier> <posteriors-rspecifier> <lda-acc-out> > Options: > binary: Write accumulators in binary mode. (bool,true) > rand_prune: Randomized pruning threshold for posteriors (float,0)""" > cmd = sh.Command(kaldi_path("src/bin/acc-lda")) > option_defs = {'binary': ('binary', 'bool', 'true'), 'help': ('help', > 'bool', 'false'), 'rand_prune': ('rand-prune', 'float', '0'), 'config': > ('config', 'string', ''), 'print_args': ('print-args', 'bool', 'true'), > 'verbose': ('verbose', 'int', '0')} > myOptions = create_options(option_defs, kwargs) > myArgs = [transition_gmm_model, features_rspecifier, > posteriors_rspecifier, lda_acc_out]+list(args) > return cmd (myOptions + myArgs) > > There are some refinements that could be added (*args does not make sense > for this function). > Because of the rather elegant Python sh package ( > https://pypi.python.org/pypi/sh) these functions will create pipelines if > composed: > > >>> from sh import ls, wc > > >>> wc(ls(".")) > > 8 23 222 > > There are a few places where constructing from help output is not > straightforward (for instance, fstrand --help does > not do the expected thing). > > -- Sean > > On Fri, Dec 19, 2014 at 6:48 AM, Ondrej Platek <ond...@gm...> > wrote: > > > > Hi Matthew, > > > > I made some subjective comments below. > > > > PS: Note that I like the proposed wrappers, but I am not sure how > boost::python is easy to install on all supported platforms. > > > > On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett <mat...@gm...> > wrote: > >> > >> Hi > >> > >> Apologies, I've been snowed under here. > >> > >> I haven' had a chance to look over your work. I also don't have any > views on the 'right' way to do it. My thoughts on this are in a previous > thread. See subject "Using SWIG to wrap kaldi for python" where I discussed > this with ondrej platek and > >> Vassil Panayotov. > >> > >> In the idlak branch there is an example of python wrappers that I put > together some time ago. These are based on SWIG. In the end I didn't need > this at this stage because in the build system command line executables > work very well. Its in run time wrappers are very useful. The advantage > with SWIG is that the much of the same work will also contribute to C#, > Java, Perl wrappers as well. In my experience the most important were Java > wrappers to help produce a library for Android. I have no experience with > C# and moved to Python from Perl so only use Perl in legacy code ;-). > >> > >> So some questions to consider: > >> > >> 1. Why is python wrapping required for training. using sys.Process to > run command lines, structured output directories etc mirrors the current > Perl recipes, what is the added benefit in this case? > > > > Well bash and Perl is the current scripting language for Kaldi. For > example I prefer to use Python instead of both of them. > > > >> > >> 2. If its for run time decoding shouldn't we create a cross platfom C > API? Perhaps things have changed but C++ APIs were never cross compiler > compatible in the past so you couldn't do stuff like compile using gnu and > link in MSN. With a C interface you can distribute libraries. But I am > possibly out of date on this. > > > > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and > I was not able to wrap OpenFST with just plain C (It may be possible). > > I used Cython and pyfst mainly because pyfst solved for me wrapping up > OpenFST and I am really glad that 99% of wrapping OpenFST templates was > carried out by somebody else (Victor Chahuneau). > >> > >> > >> 3. If 2 is correct shouldn't we define our API and wrap that? Producing > a formal list of functionality that should be exposed to things like client > and server applications? > >> > >> > >> I would encourage some care here. Unconstrained wrapping can lead to > systems which HAVE to use the scripting language (We can already see how > difficult it is to move away from the Perl scripting if you wish to). Also > never, never, never reverse wrap (i.e. call python from within C++), yes it > can be done but that way lays madness. > >> > >> v best > >> > >> Matthew > >> > >> > >> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> > wrote: > >>> > >>> Jan- > >>> I haven't seen any objections to your setup. I'd say we should plan > >>> to include it in Kaldi at some point (e.g. within the next few > >>> months), but in the meantime hopefully you can continue to work on it, > >>> and maybe come up with some other examples of how it's useful to do > >>> the interfacing with Python- e.g. some kind of application level or > >>> service-level thing? > >>> Dan > >>> > >>> > >>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> > wrote: > >>> > Hi Jan, > >>> > This is very nice work! In our PDNN toolkit, we also have simple > python > >>> > wrappers to read and write Kaldi features, mainly for DNN training. > Your > >>> > implementation looks like a more comprehensive version. > >>> > > >>> > Do you have the functions/commands to do feature splicing? I ask this > >>> > because we found doing splicing on the fly with Python highly > expensive. > >>> > That's why we still stick to PFiles instead of Kaldi features (.scp > .ark) > >>> > for DNN triaining. I am very interested to know the efficiency of > your > >>> > splicing implementation. > >>> > > >>> > Thanks, > >>> > Yajie > >>> > > >>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> > wrote: > >>> >> > >>> >> OK, thanks. > >>> >> cc'ing Yajie in case he wants to comment. > >>> >> Dan > >>> >> > >>> >> > >>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski < > jan...@gm...> > >>> >> wrote: > >>> >> > Hi All, > >>> >> > > >>> >> > the wrapper is built during Kaldi compilation. I build it using > provided > >>> >> > Makefile. The build depends on: > >>> >> > 1. Python and numpy (by default it queries the python interpreter > found > >>> >> > on > >>> >> > the path for header file location) > >>> >> > 2. Boost with Boost::Python library. It is quite heavy to build, > but > >>> >> > most > >>> >> > Linux distributions ship it. Boost python doesn't require any code > >>> >> > generation steps, the wrapper is defined in a normal c++ code > file. > >>> >> > > >>> >> > During build Python and Boost libraries and Kaldi object files are > >>> >> > linked > >>> >> > into a CPython extention module, > kaldi/src/python/kaldi_io_internal.so. > >>> >> > It > >>> >> > works with both static and shared Kaldi builds. Further usage > requires > >>> >> > that > >>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the > PYTHONPATH - it > >>> >> > can > >>> >> > be for example added to the PYTHONPATH variable in the path.sh > script of > >>> >> > a > >>> >> > recipe. > >>> >> > > >>> >> > Jan > >>> >> > > >>> >> > > >>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: > >>> >> >> > >>> >> >> Also, Jan- could you send us an email explaining how this works- > >>> >> >> How does Python "see" the C++ headers? Do you have to invoke > some > >>> >> >> special program, like swig? Do you have to write some special > kind of > >>> >> >> header that shows how the C++ objects are to be interpreted by > python? > >>> >> >> A brief example would be helpful, if so. > >>> >> >> How is the resulting program linked, if at all? If you > require > >>> >> >> functions C++ libraries, are these obtained from the .a or .so > files > >>> >> >> at runtime, or compiled into some kind of executable-like blob at > >>> >> >> compile time? Does your framework require that Kaldi be compiled > >>> >> >> using dynamic (.so) libraries? > >>> >> >> > >>> >> >> Dan > >>> >> >> > >>> >> >> > >>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski > >>> >> >> <jan...@gm...> > >>> >> >> wrote: > >>> >> >>> > >>> >> >>> Hello Dan, > >>> >> >>> > >>> >> >>> thank you for the comments. I tried to make it in the Kaldi > spirit, > >>> >> >>> consistency is important. Of course, the scripts can be removed > and > >>> >> >>> replaced > >>> >> >>> with some more useful examples. I don't have too much > experience with > >>> >> >>> bridging Python to C++, so any critique on the wrappers and the > >>> >> >>> approach > >>> >> >>> taken is welcome. > >>> >> >>> > >>> >> >>> Jan > >>> >> >>> > >>> >> >>> > >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: > >>> >> >>>> > >>> >> >>>> Hi all. > >>> >> >>>> From a first look, it does look very impressive, and nicely > >>> >> >>>> documented. > >>> >> >>>> I would appreciate it if people on the list who have Python > >>> >> >>>> experience > >>> >> >>>> would comment on this- you can either reply to this thread, or > to me. > >>> >> >>>> I don't know if this has been done in the "natural" way, or if > there > >>> >> >>>> is some reason why people in the future will say, "why did you > do it > >>> >> >>>> this way, you should have done XXX". > >>> >> >>>> > >>> >> >>>> Jan: > >>> >> >>>> in the scripts/ directory you seem to have some examples of > how you > >>> >> >>>> can create python programs that behave very much like Kaldi > >>> >> >>>> command-line programs, using your framework. This is very > useful. > >>> >> >>>> However, the programs > >>> >> >>>> apply-global-cmvn.py > >>> >> >>>> compute-global-cmvn-stats.py > >>> >> >>>> are perhaps a little confusing because they provide the same > >>> >> >>>> functionality that you could get with "compute-cmvn-stats -> > >>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; > and they > >>> >> >>>> do so using different formats for the CMVN information. I > know the > >>> >> >>>> format of storing the CMVN stats in a two-row matrix is > perhaps not > >>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it would > be > >>> >> >>>> confusing to deviate from that standard. > >>> >> >>>> Of course, this is a very minor issue that doesn't affect the > >>> >> >>>> validity > >>> >> >>>> of the framework as a whole. I am just pointing this out; the > main > >>> >> >>>> discussion should be about the framework and whether people > feel it's > >>> >> >>>> the "right" way to do this. > >>> >> >>>> > >>> >> >>>> Dan > >>> >> >>>> > >>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski > >>> >> >>>> <jan...@gm...> > >>> >> >>>> wrote: > >>> >> >>>>> > >>> >> >>>>> Hi all! > >>> >> >>>>> > >>> >> >>>>> I've written wrappers to access Kaldi data files from within > Python > >>> >> >>>>> using boost::python (the code is on github > >>> >> >>>>> > https://github.com/janchorowski/kaldi-git/tree/python/src/python). > >>> >> >>>>> If > >>> >> >>>>> you think this would be an interesting addition please > instruct me > >>> >> >>>>> how > >>> >> >>>>> to contribute. > >>> >> >>>>> > >>> >> >>>>> Best Regards, > >>> >> >>>>> Jan Chorowski > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > ------------------------------------------------------------------------------ > >>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT > Server > >>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and > >>> >> >>>>> Dashboards > >>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App > Integration & > >>> >> >>>>> more > >>> >> >>>>> Get technology previously reserved for billion-dollar > corporations, > >>> >> >>>>> FREE > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > >>> >> >>>>> > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > >>> >> >>>>> _______________________________________________ > >>> >> >>>>> Kaldi-developers mailing list > >>> >> >>>>> Kal...@li... > >>> >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >>> >> >>> > >>> >> >>> > >>> >> > > >>> > > >>> > > >>> > >>> > ------------------------------------------------------------------------------ > >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > >>> from Actuate! Instantly Supercharge Your Business Reports and > Dashboards > >>> with Interactivity, Sharing, Native Excel Exports, App Integration & > more > >>> Get technology previously reserved for billion-dollar corporations, > FREE > >>> > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > >>> _______________________________________________ > >>> Kaldi-developers mailing list > >>> Kal...@li... > >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >> > >> > >> > ------------------------------------------------------------------------------ > >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards > >> with Interactivity, Sharing, Native Excel Exports, App Integration & > more > >> Get technology previously reserved for billion-dollar corporations, FREE > >> > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > >> _______________________________________________ > >> Kaldi-developers mailing list > >> Kal...@li... > >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers > >> > > > > > > -- > > Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, > ond...@gm... > > > > > ------------------------------------------------------------------------------ > > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > > with Interactivity, Sharing, Native Excel Exports, App Integration & more > > Get technology previously reserved for billion-dollar corporations, FREE > > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > > _______________________________________________ > > Kaldi-developers mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > |
From: Jan T. <af...@ce...> - 2015-02-16 15:29:59
|
Personally, if we were to include wrappers directly into kaldi, I'd prefer SWIG as the wrapper generator. I worked to some extent with ctypes, boost::python and swig and all are usable and "just fine" for python. The concern I have is however, if this is going to be put into kaldi trunk and we want it to be really useful, then someone will have to maintain it, take the responsibility for it and make it in sync with the C/C++ code, which given the rate of kaldi development will need more than negligible commitment. Yes we can argue that "the community will keep it updated" but frankly, I didn't see any successful project working without someone committing to do even the ugly/boring/maintenance work on regular basis. And for that a wider user base might be more stimulating in the sense that the maintainer would see the wrappers are used (which would yield more feedback/bugreports). Which means (at least to me) that more languages should be supported -- at least python, perl, java... From all those wrapper generators, only SWIG can do that -- i.e. after writing one interface file, it can generate wrappers for those langs (and some other as well). Just my two cents.. y. On Mon, Feb 16, 2015 at 8:52 AM, Sean True <se...@se...> wrote: > I wanted to bring up the integration of the very useful kaldi_io package > that Jan Chorowski made available in December. Is there any consensus on > whether to provide this code as (probably an optional) part of the Kaldi > release? I understand that Boost-Python is a relatively heavy requirement, > but it is easily available on OSX and Linux. > > I continue to wrap the executables themselves in python functional > wrappers, which has made the integration with other software easier, and > contributes to pipeline testability and robustness. > > -- Sean > > On Fri, Dec 26, 2014 at 11:02 AM, Sean True <se...@se...> > wrote: > >> I wanted to echo Ondrej's comment about preferring Python to bash/perl >> for scripting. Python wrappers for the command line utilities are useful >> ... I've spent a few hours systematically wrapping them, parsing the output >> of the --help command as a guide to functionality. >> >> This gives wrappers of the general form: >> >> def acc_lda(transition_gmm_model, features_rspecifier, >> posteriors_rspecifier, lda_acc_out, *args, **kwargs): >> """Accumulate LDA statistics based on pdf-ids. >> Executable usage: acc-lda [options] <transition-gmm/model> >> <features-rspecifier> <posteriors-rspecifier> <lda-acc-out> >> Options: >> binary: Write accumulators in binary mode. (bool,true) >> rand_prune: Randomized pruning threshold for posteriors (float,0)""" >> cmd = sh.Command(kaldi_path("src/bin/acc-lda")) >> option_defs = {'binary': ('binary', 'bool', 'true'), 'help': ('help', >> 'bool', 'false'), 'rand_prune': ('rand-prune', 'float', '0'), 'config': >> ('config', 'string', ''), 'print_args': ('print-args', 'bool', 'true'), >> 'verbose': ('verbose', 'int', '0')} >> myOptions = create_options(option_defs, kwargs) >> myArgs = [transition_gmm_model, features_rspecifier, >> posteriors_rspecifier, lda_acc_out]+list(args) >> return cmd (myOptions + myArgs) >> >> There are some refinements that could be added (*args does not make sense >> for this function). >> Because of the rather elegant Python sh package ( >> https://pypi.python.org/pypi/sh) these functions will create pipelines >> if composed: >> >> >>> from sh import ls, wc >> >> >>> wc(ls(".")) >> >> 8 23 222 >> >> There are a few places where constructing from help output is not >> straightforward (for instance, fstrand --help does >> not do the expected thing). >> >> -- Sean >> >> On Fri, Dec 19, 2014 at 6:48 AM, Ondrej Platek <ond...@gm...> >> wrote: >> > >> > Hi Matthew, >> > >> > I made some subjective comments below. >> > >> > PS: Note that I like the proposed wrappers, but I am not sure how >> boost::python is easy to install on all supported platforms. >> > >> > On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett < >> mat...@gm...> wrote: >> >> >> >> Hi >> >> >> >> Apologies, I've been snowed under here. >> >> >> >> I haven' had a chance to look over your work. I also don't have any >> views on the 'right' way to do it. My thoughts on this are in a previous >> thread. See subject "Using SWIG to wrap kaldi for python" where I discussed >> this with ondrej platek and >> >> Vassil Panayotov. >> >> >> >> In the idlak branch there is an example of python wrappers that I put >> together some time ago. These are based on SWIG. In the end I didn't need >> this at this stage because in the build system command line executables >> work very well. Its in run time wrappers are very useful. The advantage >> with SWIG is that the much of the same work will also contribute to C#, >> Java, Perl wrappers as well. In my experience the most important were Java >> wrappers to help produce a library for Android. I have no experience with >> C# and moved to Python from Perl so only use Perl in legacy code ;-). >> >> >> >> So some questions to consider: >> >> >> >> 1. Why is python wrapping required for training. using sys.Process to >> run command lines, structured output directories etc mirrors the current >> Perl recipes, what is the added benefit in this case? >> > >> > Well bash and Perl is the current scripting language for Kaldi. For >> example I prefer to use Python instead of both of them. >> > >> >> >> >> 2. If its for run time decoding shouldn't we create a cross platfom C >> API? Perhaps things have changed but C++ APIs were never cross compiler >> compatible in the past so you couldn't do stuff like compile using gnu and >> link in MSN. With a C interface you can distribute libraries. But I am >> possibly out of date on this. >> > >> > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST and >> I was not able to wrap OpenFST with just plain C (It may be possible). >> > I used Cython and pyfst mainly because pyfst solved for me wrapping up >> OpenFST and I am really glad that 99% of wrapping OpenFST templates was >> carried out by somebody else (Victor Chahuneau). >> >> >> >> >> >> 3. If 2 is correct shouldn't we define our API and wrap that? >> Producing a formal list of functionality that should be exposed to things >> like client and server applications? >> >> >> >> >> >> I would encourage some care here. Unconstrained wrapping can lead to >> systems which HAVE to use the scripting language (We can already see how >> difficult it is to move away from the Perl scripting if you wish to). Also >> never, never, never reverse wrap (i.e. call python from within C++), yes it >> can be done but that way lays madness. >> >> >> >> v best >> >> >> >> Matthew >> >> >> >> >> >> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> >> wrote: >> >>> >> >>> Jan- >> >>> I haven't seen any objections to your setup. I'd say we should plan >> >>> to include it in Kaldi at some point (e.g. within the next few >> >>> months), but in the meantime hopefully you can continue to work on it, >> >>> and maybe come up with some other examples of how it's useful to do >> >>> the interfacing with Python- e.g. some kind of application level or >> >>> service-level thing? >> >>> Dan >> >>> >> >>> >> >>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> >> wrote: >> >>> > Hi Jan, >> >>> > This is very nice work! In our PDNN toolkit, we also have simple >> python >> >>> > wrappers to read and write Kaldi features, mainly for DNN training. >> Your >> >>> > implementation looks like a more comprehensive version. >> >>> > >> >>> > Do you have the functions/commands to do feature splicing? I ask >> this >> >>> > because we found doing splicing on the fly with Python highly >> expensive. >> >>> > That's why we still stick to PFiles instead of Kaldi features (.scp >> .ark) >> >>> > for DNN triaining. I am very interested to know the efficiency of >> your >> >>> > splicing implementation. >> >>> > >> >>> > Thanks, >> >>> > Yajie >> >>> > >> >>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> >> wrote: >> >>> >> >> >>> >> OK, thanks. >> >>> >> cc'ing Yajie in case he wants to comment. >> >>> >> Dan >> >>> >> >> >>> >> >> >>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski < >> jan...@gm...> >> >>> >> wrote: >> >>> >> > Hi All, >> >>> >> > >> >>> >> > the wrapper is built during Kaldi compilation. I build it using >> provided >> >>> >> > Makefile. The build depends on: >> >>> >> > 1. Python and numpy (by default it queries the python >> interpreter found >> >>> >> > on >> >>> >> > the path for header file location) >> >>> >> > 2. Boost with Boost::Python library. It is quite heavy to build, >> but >> >>> >> > most >> >>> >> > Linux distributions ship it. Boost python doesn't require any >> code >> >>> >> > generation steps, the wrapper is defined in a normal c++ code >> file. >> >>> >> > >> >>> >> > During build Python and Boost libraries and Kaldi object files >> are >> >>> >> > linked >> >>> >> > into a CPython extention module, >> kaldi/src/python/kaldi_io_internal.so. >> >>> >> > It >> >>> >> > works with both static and shared Kaldi builds. Further usage >> requires >> >>> >> > that >> >>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the >> PYTHONPATH - it >> >>> >> > can >> >>> >> > be for example added to the PYTHONPATH variable in the path.sh >> script of >> >>> >> > a >> >>> >> > recipe. >> >>> >> > >> >>> >> > Jan >> >>> >> > >> >>> >> > >> >>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: >> >>> >> >> >> >>> >> >> Also, Jan- could you send us an email explaining how this works- >> >>> >> >> How does Python "see" the C++ headers? Do you have to >> invoke some >> >>> >> >> special program, like swig? Do you have to write some special >> kind of >> >>> >> >> header that shows how the C++ objects are to be interpreted by >> python? >> >>> >> >> A brief example would be helpful, if so. >> >>> >> >> How is the resulting program linked, if at all? If you >> require >> >>> >> >> functions C++ libraries, are these obtained from the .a or .so >> files >> >>> >> >> at runtime, or compiled into some kind of executable-like blob >> at >> >>> >> >> compile time? Does your framework require that Kaldi be >> compiled >> >>> >> >> using dynamic (.so) libraries? >> >>> >> >> >> >>> >> >> Dan >> >>> >> >> >> >>> >> >> >> >>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski >> >>> >> >> <jan...@gm...> >> >>> >> >> wrote: >> >>> >> >>> >> >>> >> >>> Hello Dan, >> >>> >> >>> >> >>> >> >>> thank you for the comments. I tried to make it in the Kaldi >> spirit, >> >>> >> >>> consistency is important. Of course, the scripts can be >> removed and >> >>> >> >>> replaced >> >>> >> >>> with some more useful examples. I don't have too much >> experience with >> >>> >> >>> bridging Python to C++, so any critique on the wrappers and the >> >>> >> >>> approach >> >>> >> >>> taken is welcome. >> >>> >> >>> >> >>> >> >>> Jan >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >> >>> >> >>>> >> >>> >> >>>> Hi all. >> >>> >> >>>> From a first look, it does look very impressive, and nicely >> >>> >> >>>> documented. >> >>> >> >>>> I would appreciate it if people on the list who have Python >> >>> >> >>>> experience >> >>> >> >>>> would comment on this- you can either reply to this thread, >> or to me. >> >>> >> >>>> I don't know if this has been done in the "natural" way, or >> if there >> >>> >> >>>> is some reason why people in the future will say, "why did >> you do it >> >>> >> >>>> this way, you should have done XXX". >> >>> >> >>>> >> >>> >> >>>> Jan: >> >>> >> >>>> in the scripts/ directory you seem to have some examples of >> how you >> >>> >> >>>> can create python programs that behave very much like Kaldi >> >>> >> >>>> command-line programs, using your framework. This is very >> useful. >> >>> >> >>>> However, the programs >> >>> >> >>>> apply-global-cmvn.py >> >>> >> >>>> compute-global-cmvn-stats.py >> >>> >> >>>> are perhaps a little confusing because they provide the same >> >>> >> >>>> functionality that you could get with "compute-cmvn-stats -> >> >>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; >> and they >> >>> >> >>>> do so using different formats for the CMVN information. I >> know the >> >>> >> >>>> format of storing the CMVN stats in a two-row matrix is >> perhaps not >> >>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it >> would be >> >>> >> >>>> confusing to deviate from that standard. >> >>> >> >>>> Of course, this is a very minor issue that doesn't affect the >> >>> >> >>>> validity >> >>> >> >>>> of the framework as a whole. I am just pointing this out; >> the main >> >>> >> >>>> discussion should be about the framework and whether people >> feel it's >> >>> >> >>>> the "right" way to do this. >> >>> >> >>>> >> >>> >> >>>> Dan >> >>> >> >>>> >> >>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski >> >>> >> >>>> <jan...@gm...> >> >>> >> >>>> wrote: >> >>> >> >>>>> >> >>> >> >>>>> Hi all! >> >>> >> >>>>> >> >>> >> >>>>> I've written wrappers to access Kaldi data files from within >> Python >> >>> >> >>>>> using boost::python (the code is on github >> >>> >> >>>>> >> https://github.com/janchorowski/kaldi-git/tree/python/src/python). >> >>> >> >>>>> If >> >>> >> >>>>> you think this would be an interesting addition please >> instruct me >> >>> >> >>>>> how >> >>> >> >>>>> to contribute. >> >>> >> >>>>> >> >>> >> >>>>> Best Regards, >> >>> >> >>>>> Jan Chorowski >> >>> >> >>>>> >> >>> >> >>>>> >> >>> >> >>>>> >> >>> >> >>>>> >> >>> >> >>>>> >> >>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT >> Server >> >>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports and >> >>> >> >>>>> Dashboards >> >>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App >> Integration & >> >>> >> >>>>> more >> >>> >> >>>>> Get technology previously reserved for billion-dollar >> corporations, >> >>> >> >>>>> FREE >> >>> >> >>>>> >> >>> >> >>>>> >> >>> >> >>>>> >> >>> >> >>>>> >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> >>> >> >>>>> _______________________________________________ >> >>> >> >>>>> Kaldi-developers mailing list >> >>> >> >>>>> Kal...@li... >> >>> >> >>>>> >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >>> >> >>> >> >>> >> >>> >> >>> >> > >> >>> > >> >>> > >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> >>> from Actuate! Instantly Supercharge Your Business Reports and >> Dashboards >> >>> with Interactivity, Sharing, Native Excel Exports, App Integration & >> more >> >>> Get technology previously reserved for billion-dollar corporations, >> FREE >> >>> >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> >>> _______________________________________________ >> >>> Kaldi-developers mailing list >> >>> Kal...@li... >> >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> >> from Actuate! Instantly Supercharge Your Business Reports and >> Dashboards >> >> with Interactivity, Sharing, Native Excel Exports, App Integration & >> more >> >> Get technology previously reserved for billion-dollar corporations, >> FREE >> >> >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> >> Kaldi-developers mailing list >> >> Kal...@li... >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> >> > >> > >> > -- >> > Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, >> ond...@gm... >> > >> > >> ------------------------------------------------------------------------------ >> > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> > from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> > with Interactivity, Sharing, Native Excel Exports, App Integration & >> more >> > Get technology previously reserved for billion-dollar corporations, FREE >> > >> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >> > _______________________________________________ >> > Kaldi-developers mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> > >> > > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Daniel P. <dp...@gm...> - 2015-02-16 19:31:47
|
Sean, can you give us some general idea of how you were using the kaldi_io package? Dan On Mon, Feb 16, 2015 at 10:29 AM, Jan Trmal <af...@ce...> wrote: > Personally, if we were to include wrappers directly into kaldi, I'd prefer > SWIG as the wrapper generator. I worked to some extent with ctypes, > boost::python and swig and all are usable and "just fine" for python. > The concern I have is however, if this is going to be put into kaldi trunk > and we want it to be really useful, then someone will have to maintain it, > take the responsibility for it and make it in sync with the C/C++ code, > which given the rate of kaldi development will need more than negligible > commitment. Yes we can argue that "the community will keep it updated" but > frankly, I didn't see any successful project working without someone > committing to do even the ugly/boring/maintenance work on regular basis. > And for that a wider user base might be more stimulating in the sense that > the maintainer would see the wrappers are used (which would yield more > feedback/bugreports). Which means (at least to me) that more languages > should be supported -- at least python, perl, java... From all those > wrapper generators, only SWIG can do that -- i.e. after writing one > interface file, it can generate wrappers for those langs (and some other as > well). > > Just my two cents.. > y. > > > On Mon, Feb 16, 2015 at 8:52 AM, Sean True <se...@se...> > wrote: > >> I wanted to bring up the integration of the very useful kaldi_io package >> that Jan Chorowski made available in December. Is there any consensus on >> whether to provide this code as (probably an optional) part of the Kaldi >> release? I understand that Boost-Python is a relatively heavy requirement, >> but it is easily available on OSX and Linux. >> >> I continue to wrap the executables themselves in python functional >> wrappers, which has made the integration with other software easier, and >> contributes to pipeline testability and robustness. >> >> -- Sean >> >> On Fri, Dec 26, 2014 at 11:02 AM, Sean True <se...@se...> >> wrote: >> >>> I wanted to echo Ondrej's comment about preferring Python to bash/perl >>> for scripting. Python wrappers for the command line utilities are useful >>> ... I've spent a few hours systematically wrapping them, parsing the output >>> of the --help command as a guide to functionality. >>> >>> This gives wrappers of the general form: >>> >>> def acc_lda(transition_gmm_model, features_rspecifier, >>> posteriors_rspecifier, lda_acc_out, *args, **kwargs): >>> """Accumulate LDA statistics based on pdf-ids. >>> Executable usage: acc-lda [options] <transition-gmm/model> >>> <features-rspecifier> <posteriors-rspecifier> <lda-acc-out> >>> Options: >>> binary: Write accumulators in binary mode. (bool,true) >>> rand_prune: Randomized pruning threshold for posteriors >>> (float,0)""" >>> cmd = sh.Command(kaldi_path("src/bin/acc-lda")) >>> option_defs = {'binary': ('binary', 'bool', 'true'), 'help': >>> ('help', 'bool', 'false'), 'rand_prune': ('rand-prune', 'float', '0'), >>> 'config': ('config', 'string', ''), 'print_args': ('print-args', 'bool', >>> 'true'), 'verbose': ('verbose', 'int', '0')} >>> myOptions = create_options(option_defs, kwargs) >>> myArgs = [transition_gmm_model, features_rspecifier, >>> posteriors_rspecifier, lda_acc_out]+list(args) >>> return cmd (myOptions + myArgs) >>> >>> There are some refinements that could be added (*args does not make >>> sense for this function). >>> Because of the rather elegant Python sh package ( >>> https://pypi.python.org/pypi/sh) these functions will create pipelines >>> if composed: >>> >>> >>> from sh import ls, wc >>> >>> >>> wc(ls(".")) >>> >>> 8 23 222 >>> >>> There are a few places where constructing from help output is not >>> straightforward (for instance, fstrand --help does >>> not do the expected thing). >>> >>> -- Sean >>> >>> On Fri, Dec 19, 2014 at 6:48 AM, Ondrej Platek <ond...@gm...> >>> wrote: >>> > >>> > Hi Matthew, >>> > >>> > I made some subjective comments below. >>> > >>> > PS: Note that I like the proposed wrappers, but I am not sure how >>> boost::python is easy to install on all supported platforms. >>> > >>> > On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett < >>> mat...@gm...> wrote: >>> >> >>> >> Hi >>> >> >>> >> Apologies, I've been snowed under here. >>> >> >>> >> I haven' had a chance to look over your work. I also don't have any >>> views on the 'right' way to do it. My thoughts on this are in a previous >>> thread. See subject "Using SWIG to wrap kaldi for python" where I discussed >>> this with ondrej platek and >>> >> Vassil Panayotov. >>> >> >>> >> In the idlak branch there is an example of python wrappers that I put >>> together some time ago. These are based on SWIG. In the end I didn't need >>> this at this stage because in the build system command line executables >>> work very well. Its in run time wrappers are very useful. The advantage >>> with SWIG is that the much of the same work will also contribute to C#, >>> Java, Perl wrappers as well. In my experience the most important were Java >>> wrappers to help produce a library for Android. I have no experience with >>> C# and moved to Python from Perl so only use Perl in legacy code ;-). >>> >> >>> >> So some questions to consider: >>> >> >>> >> 1. Why is python wrapping required for training. using sys.Process to >>> run command lines, structured output directories etc mirrors the current >>> Perl recipes, what is the added benefit in this case? >>> > >>> > Well bash and Perl is the current scripting language for Kaldi. For >>> example I prefer to use Python instead of both of them. >>> > >>> >> >>> >> 2. If its for run time decoding shouldn't we create a cross platfom >>> C API? Perhaps things have changed but C++ APIs were never cross compiler >>> compatible in the past so you couldn't do stuff like compile using gnu and >>> link in MSN. With a C interface you can distribute libraries. But I am >>> possibly out of date on this. >>> > >>> > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST >>> and I was not able to wrap OpenFST with just plain C (It may be possible). >>> > I used Cython and pyfst mainly because pyfst solved for me wrapping up >>> OpenFST and I am really glad that 99% of wrapping OpenFST templates was >>> carried out by somebody else (Victor Chahuneau). >>> >> >>> >> >>> >> 3. If 2 is correct shouldn't we define our API and wrap that? >>> Producing a formal list of functionality that should be exposed to things >>> like client and server applications? >>> >> >>> >> >>> >> I would encourage some care here. Unconstrained wrapping can lead to >>> systems which HAVE to use the scripting language (We can already see how >>> difficult it is to move away from the Perl scripting if you wish to). Also >>> never, never, never reverse wrap (i.e. call python from within C++), yes it >>> can be done but that way lays madness. >>> >> >>> >> v best >>> >> >>> >> Matthew >>> >> >>> >> >>> >> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> >>> wrote: >>> >>> >>> >>> Jan- >>> >>> I haven't seen any objections to your setup. I'd say we should plan >>> >>> to include it in Kaldi at some point (e.g. within the next few >>> >>> months), but in the meantime hopefully you can continue to work on >>> it, >>> >>> and maybe come up with some other examples of how it's useful to do >>> >>> the interfacing with Python- e.g. some kind of application level or >>> >>> service-level thing? >>> >>> Dan >>> >>> >>> >>> >>> >>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> >>> wrote: >>> >>> > Hi Jan, >>> >>> > This is very nice work! In our PDNN toolkit, we also have simple >>> python >>> >>> > wrappers to read and write Kaldi features, mainly for DNN >>> training. Your >>> >>> > implementation looks like a more comprehensive version. >>> >>> > >>> >>> > Do you have the functions/commands to do feature splicing? I ask >>> this >>> >>> > because we found doing splicing on the fly with Python highly >>> expensive. >>> >>> > That's why we still stick to PFiles instead of Kaldi features >>> (.scp .ark) >>> >>> > for DNN triaining. I am very interested to know the efficiency of >>> your >>> >>> > splicing implementation. >>> >>> > >>> >>> > Thanks, >>> >>> > Yajie >>> >>> > >>> >>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> >>> wrote: >>> >>> >> >>> >>> >> OK, thanks. >>> >>> >> cc'ing Yajie in case he wants to comment. >>> >>> >> Dan >>> >>> >> >>> >>> >> >>> >>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski < >>> jan...@gm...> >>> >>> >> wrote: >>> >>> >> > Hi All, >>> >>> >> > >>> >>> >> > the wrapper is built during Kaldi compilation. I build it using >>> provided >>> >>> >> > Makefile. The build depends on: >>> >>> >> > 1. Python and numpy (by default it queries the python >>> interpreter found >>> >>> >> > on >>> >>> >> > the path for header file location) >>> >>> >> > 2. Boost with Boost::Python library. It is quite heavy to >>> build, but >>> >>> >> > most >>> >>> >> > Linux distributions ship it. Boost python doesn't require any >>> code >>> >>> >> > generation steps, the wrapper is defined in a normal c++ code >>> file. >>> >>> >> > >>> >>> >> > During build Python and Boost libraries and Kaldi object files >>> are >>> >>> >> > linked >>> >>> >> > into a CPython extention module, >>> kaldi/src/python/kaldi_io_internal.so. >>> >>> >> > It >>> >>> >> > works with both static and shared Kaldi builds. Further usage >>> requires >>> >>> >> > that >>> >>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the >>> PYTHONPATH - it >>> >>> >> > can >>> >>> >> > be for example added to the PYTHONPATH variable in the path.sh >>> script of >>> >>> >> > a >>> >>> >> > recipe. >>> >>> >> > >>> >>> >> > Jan >>> >>> >> > >>> >>> >> > >>> >>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: >>> >>> >> >> >>> >>> >> >> Also, Jan- could you send us an email explaining how this >>> works- >>> >>> >> >> How does Python "see" the C++ headers? Do you have to >>> invoke some >>> >>> >> >> special program, like swig? Do you have to write some special >>> kind of >>> >>> >> >> header that shows how the C++ objects are to be interpreted by >>> python? >>> >>> >> >> A brief example would be helpful, if so. >>> >>> >> >> How is the resulting program linked, if at all? If you >>> require >>> >>> >> >> functions C++ libraries, are these obtained from the .a or .so >>> files >>> >>> >> >> at runtime, or compiled into some kind of executable-like blob >>> at >>> >>> >> >> compile time? Does your framework require that Kaldi be >>> compiled >>> >>> >> >> using dynamic (.so) libraries? >>> >>> >> >> >>> >>> >> >> Dan >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski >>> >>> >> >> <jan...@gm...> >>> >>> >> >> wrote: >>> >>> >> >>> >>> >>> >> >>> Hello Dan, >>> >>> >> >>> >>> >>> >> >>> thank you for the comments. I tried to make it in the Kaldi >>> spirit, >>> >>> >> >>> consistency is important. Of course, the scripts can be >>> removed and >>> >>> >> >>> replaced >>> >>> >> >>> with some more useful examples. I don't have too much >>> experience with >>> >>> >> >>> bridging Python to C++, so any critique on the wrappers and >>> the >>> >>> >> >>> approach >>> >>> >> >>> taken is welcome. >>> >>> >> >>> >>> >>> >> >>> Jan >>> >>> >> >>> >>> >>> >> >>> >>> >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >>> >>> >> >>>> >>> >>> >> >>>> Hi all. >>> >>> >> >>>> From a first look, it does look very impressive, and nicely >>> >>> >> >>>> documented. >>> >>> >> >>>> I would appreciate it if people on the list who have Python >>> >>> >> >>>> experience >>> >>> >> >>>> would comment on this- you can either reply to this thread, >>> or to me. >>> >>> >> >>>> I don't know if this has been done in the "natural" way, or >>> if there >>> >>> >> >>>> is some reason why people in the future will say, "why did >>> you do it >>> >>> >> >>>> this way, you should have done XXX". >>> >>> >> >>>> >>> >>> >> >>>> Jan: >>> >>> >> >>>> in the scripts/ directory you seem to have some examples of >>> how you >>> >>> >> >>>> can create python programs that behave very much like Kaldi >>> >>> >> >>>> command-line programs, using your framework. This is very >>> useful. >>> >>> >> >>>> However, the programs >>> >>> >> >>>> apply-global-cmvn.py >>> >>> >> >>>> compute-global-cmvn-stats.py >>> >>> >> >>>> are perhaps a little confusing because they provide the same >>> >>> >> >>>> functionality that you could get with "compute-cmvn-stats -> >>> >>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; >>> and they >>> >>> >> >>>> do so using different formats for the CMVN information. I >>> know the >>> >>> >> >>>> format of storing the CMVN stats in a two-row matrix is >>> perhaps not >>> >>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it >>> would be >>> >>> >> >>>> confusing to deviate from that standard. >>> >>> >> >>>> Of course, this is a very minor issue that doesn't affect the >>> >>> >> >>>> validity >>> >>> >> >>>> of the framework as a whole. I am just pointing this out; >>> the main >>> >>> >> >>>> discussion should be about the framework and whether people >>> feel it's >>> >>> >> >>>> the "right" way to do this. >>> >>> >> >>>> >>> >>> >> >>>> Dan >>> >>> >> >>>> >>> >>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski >>> >>> >> >>>> <jan...@gm...> >>> >>> >> >>>> wrote: >>> >>> >> >>>>> >>> >>> >> >>>>> Hi all! >>> >>> >> >>>>> >>> >>> >> >>>>> I've written wrappers to access Kaldi data files from >>> within Python >>> >>> >> >>>>> using boost::python (the code is on github >>> >>> >> >>>>> >>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). >>> >>> >> >>>>> If >>> >>> >> >>>>> you think this would be an interesting addition please >>> instruct me >>> >>> >> >>>>> how >>> >>> >> >>>>> to contribute. >>> >>> >> >>>>> >>> >>> >> >>>>> Best Regards, >>> >>> >> >>>>> Jan Chorowski >>> >>> >> >>>>> >>> >>> >> >>>>> >>> >>> >> >>>>> >>> >>> >> >>>>> >>> >>> >> >>>>> >>> >>> >> >>>>> >>> ------------------------------------------------------------------------------ >>> >>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT >>> Server >>> >>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports >>> and >>> >>> >> >>>>> Dashboards >>> >>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App >>> Integration & >>> >>> >> >>>>> more >>> >>> >> >>>>> Get technology previously reserved for billion-dollar >>> corporations, >>> >>> >> >>>>> FREE >>> >>> >> >>>>> >>> >>> >> >>>>> >>> >>> >> >>>>> >>> >>> >> >>>>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> >>> >> >>>>> _______________________________________________ >>> >>> >> >>>>> Kaldi-developers mailing list >>> >>> >> >>>>> Kal...@li... >>> >>> >> >>>>> >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> >>> >> >>> >>> >>> >> >>> >>> >>> >> > >>> >>> > >>> >>> > >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>> >>> from Actuate! Instantly Supercharge Your Business Reports and >>> Dashboards >>> >>> with Interactivity, Sharing, Native Excel Exports, App Integration & >>> more >>> >>> Get technology previously reserved for billion-dollar corporations, >>> FREE >>> >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> >>> _______________________________________________ >>> >>> Kaldi-developers mailing list >>> >>> Kal...@li... >>> >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------------ >>> >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>> >> from Actuate! Instantly Supercharge Your Business Reports and >>> Dashboards >>> >> with Interactivity, Sharing, Native Excel Exports, App Integration & >>> more >>> >> Get technology previously reserved for billion-dollar corporations, >>> FREE >>> >> >>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> >> _______________________________________________ >>> >> Kaldi-developers mailing list >>> >> Kal...@li... >>> >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> >> >>> > >>> > >>> > -- >>> > Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, >>> ond...@gm... >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>> > from Actuate! Instantly Supercharge Your Business Reports and >>> Dashboards >>> > with Interactivity, Sharing, Native Excel Exports, App Integration & >>> more >>> > Get technology previously reserved for billion-dollar corporations, >>> FREE >>> > >>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>> > _______________________________________________ >>> > Kaldi-developers mailing list >>> > Kal...@li... >>> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> > >>> >> >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> >> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > |
From: Sean T. <se...@se...> - 2015-02-16 20:18:30
|
Dan, et al. -- My most common use is for doing graphics and reporting on extracted data. Plotting voicing and pitch on top of a spectrogram using matplotlib is an example. I'm also interposing python code between stages of standard kaldi pipelines for monitoring, while maintaining pipeline parallelism (not having to land each stream of data). I could use copy-matrix, I suppose, but the pipelines are complex enough already :-) -- Sean On Mon, Feb 16, 2015 at 2:31 PM, Daniel Povey <dp...@gm...> wrote: > Sean, can you give us some general idea of how you were using the kaldi_io > package? > Dan > > > On Mon, Feb 16, 2015 at 10:29 AM, Jan Trmal <af...@ce...> wrote: > >> Personally, if we were to include wrappers directly into kaldi, I'd >> prefer SWIG as the wrapper generator. I worked to some extent with ctypes, >> boost::python and swig and all are usable and "just fine" for python. >> The concern I have is however, if this is going to be put into kaldi >> trunk and we want it to be really useful, then someone will have to >> maintain it, take the responsibility for it and make it in sync with the >> C/C++ code, which given the rate of kaldi development will need more than >> negligible commitment. Yes we can argue that "the community will keep it >> updated" but frankly, I didn't see any successful project working without >> someone committing to do even the ugly/boring/maintenance work on regular >> basis. And for that a wider user base might be more stimulating in the >> sense that the maintainer would see the wrappers are used (which would >> yield more feedback/bugreports). Which means (at least to me) that more >> languages should be supported -- at least python, perl, java... From all >> those wrapper generators, only SWIG can do that -- i.e. after writing one >> interface file, it can generate wrappers for those langs (and some other as >> well). >> >> Just my two cents.. >> y. >> >> >> On Mon, Feb 16, 2015 at 8:52 AM, Sean True <se...@se...> >> wrote: >> >>> I wanted to bring up the integration of the very useful kaldi_io package >>> that Jan Chorowski made available in December. Is there any consensus on >>> whether to provide this code as (probably an optional) part of the Kaldi >>> release? I understand that Boost-Python is a relatively heavy requirement, >>> but it is easily available on OSX and Linux. >>> >>> I continue to wrap the executables themselves in python functional >>> wrappers, which has made the integration with other software easier, and >>> contributes to pipeline testability and robustness. >>> >>> -- Sean >>> >>> On Fri, Dec 26, 2014 at 11:02 AM, Sean True <se...@se...> >>> wrote: >>> >>>> I wanted to echo Ondrej's comment about preferring Python to bash/perl >>>> for scripting. Python wrappers for the command line utilities are useful >>>> ... I've spent a few hours systematically wrapping them, parsing the output >>>> of the --help command as a guide to functionality. >>>> >>>> This gives wrappers of the general form: >>>> >>>> def acc_lda(transition_gmm_model, features_rspecifier, >>>> posteriors_rspecifier, lda_acc_out, *args, **kwargs): >>>> """Accumulate LDA statistics based on pdf-ids. >>>> Executable usage: acc-lda [options] <transition-gmm/model> >>>> <features-rspecifier> <posteriors-rspecifier> <lda-acc-out> >>>> Options: >>>> binary: Write accumulators in binary mode. (bool,true) >>>> rand_prune: Randomized pruning threshold for posteriors >>>> (float,0)""" >>>> cmd = sh.Command(kaldi_path("src/bin/acc-lda")) >>>> option_defs = {'binary': ('binary', 'bool', 'true'), 'help': >>>> ('help', 'bool', 'false'), 'rand_prune': ('rand-prune', 'float', '0'), >>>> 'config': ('config', 'string', ''), 'print_args': ('print-args', 'bool', >>>> 'true'), 'verbose': ('verbose', 'int', '0')} >>>> myOptions = create_options(option_defs, kwargs) >>>> myArgs = [transition_gmm_model, features_rspecifier, >>>> posteriors_rspecifier, lda_acc_out]+list(args) >>>> return cmd (myOptions + myArgs) >>>> >>>> There are some refinements that could be added (*args does not make >>>> sense for this function). >>>> Because of the rather elegant Python sh package ( >>>> https://pypi.python.org/pypi/sh) these functions will create pipelines >>>> if composed: >>>> >>>> >>> from sh import ls, wc >>>> >>>> >>> wc(ls(".")) >>>> >>>> 8 23 222 >>>> >>>> There are a few places where constructing from help output is not >>>> straightforward (for instance, fstrand --help does >>>> not do the expected thing). >>>> >>>> -- Sean >>>> >>>> On Fri, Dec 19, 2014 at 6:48 AM, Ondrej Platek <ond...@gm...> >>>> wrote: >>>> > >>>> > Hi Matthew, >>>> > >>>> > I made some subjective comments below. >>>> > >>>> > PS: Note that I like the proposed wrappers, but I am not sure how >>>> boost::python is easy to install on all supported platforms. >>>> > >>>> > On Fri, Dec 19, 2014 at 9:30 AM, Matthew Aylett < >>>> mat...@gm...> wrote: >>>> >> >>>> >> Hi >>>> >> >>>> >> Apologies, I've been snowed under here. >>>> >> >>>> >> I haven' had a chance to look over your work. I also don't have any >>>> views on the 'right' way to do it. My thoughts on this are in a previous >>>> thread. See subject "Using SWIG to wrap kaldi for python" where I discussed >>>> this with ondrej platek and >>>> >> Vassil Panayotov. >>>> >> >>>> >> In the idlak branch there is an example of python wrappers that I >>>> put together some time ago. These are based on SWIG. In the end I didn't >>>> need this at this stage because in the build system command line >>>> executables work very well. Its in run time wrappers are very useful. The >>>> advantage with SWIG is that the much of the same work will also contribute >>>> to C#, Java, Perl wrappers as well. In my experience the most important >>>> were Java wrappers to help produce a library for Android. I have no >>>> experience with C# and moved to Python from Perl so only use Perl in legacy >>>> code ;-). >>>> >> >>>> >> So some questions to consider: >>>> >> >>>> >> 1. Why is python wrapping required for training. using sys.Process >>>> to run command lines, structured output directories etc mirrors the current >>>> Perl recipes, what is the added benefit in this case? >>>> > >>>> > Well bash and Perl is the current scripting language for Kaldi. For >>>> example I prefer to use Python instead of both of them. >>>> > >>>> >> >>>> >> 2. If its for run time decoding shouldn't we create a cross platfom >>>> C API? Perhaps things have changed but C++ APIs were never cross compiler >>>> compatible in the past so you couldn't do stuff like compile using gnu and >>>> link in MSN. With a C interface you can distribute libraries. But I am >>>> possibly out of date on this. >>>> > >>>> > Well, I tried that and I gave it up since Kaldi nicely uses OpenFST >>>> and I was not able to wrap OpenFST with just plain C (It may be possible). >>>> > I used Cython and pyfst mainly because pyfst solved for me wrapping >>>> up OpenFST and I am really glad that 99% of wrapping OpenFST templates was >>>> carried out by somebody else (Victor Chahuneau). >>>> >> >>>> >> >>>> >> 3. If 2 is correct shouldn't we define our API and wrap that? >>>> Producing a formal list of functionality that should be exposed to things >>>> like client and server applications? >>>> >> >>>> >> >>>> >> I would encourage some care here. Unconstrained wrapping can lead to >>>> systems which HAVE to use the scripting language (We can already see how >>>> difficult it is to move away from the Perl scripting if you wish to). Also >>>> never, never, never reverse wrap (i.e. call python from within C++), yes it >>>> can be done but that way lays madness. >>>> >> >>>> >> v best >>>> >> >>>> >> Matthew >>>> >> >>>> >> >>>> >> On Thu, Dec 18, 2014 at 11:37 PM, Daniel Povey <dp...@gm...> >>>> wrote: >>>> >>> >>>> >>> Jan- >>>> >>> I haven't seen any objections to your setup. I'd say we should plan >>>> >>> to include it in Kaldi at some point (e.g. within the next few >>>> >>> months), but in the meantime hopefully you can continue to work on >>>> it, >>>> >>> and maybe come up with some other examples of how it's useful to do >>>> >>> the interfacing with Python- e.g. some kind of application level or >>>> >>> service-level thing? >>>> >>> Dan >>>> >>> >>>> >>> >>>> >>> On Sat, Dec 13, 2014 at 4:01 PM, Yajie Miao <yaj...@gm...> >>>> wrote: >>>> >>> > Hi Jan, >>>> >>> > This is very nice work! In our PDNN toolkit, we also have simple >>>> python >>>> >>> > wrappers to read and write Kaldi features, mainly for DNN >>>> training. Your >>>> >>> > implementation looks like a more comprehensive version. >>>> >>> > >>>> >>> > Do you have the functions/commands to do feature splicing? I ask >>>> this >>>> >>> > because we found doing splicing on the fly with Python highly >>>> expensive. >>>> >>> > That's why we still stick to PFiles instead of Kaldi features >>>> (.scp .ark) >>>> >>> > for DNN triaining. I am very interested to know the efficiency >>>> of your >>>> >>> > splicing implementation. >>>> >>> > >>>> >>> > Thanks, >>>> >>> > Yajie >>>> >>> > >>>> >>> > On Sat, Dec 13, 2014 at 5:59 PM, Daniel Povey <dp...@gm...> >>>> wrote: >>>> >>> >> >>>> >>> >> OK, thanks. >>>> >>> >> cc'ing Yajie in case he wants to comment. >>>> >>> >> Dan >>>> >>> >> >>>> >>> >> >>>> >>> >> On Sat, Dec 13, 2014 at 2:31 PM, Jan Chorowski < >>>> jan...@gm...> >>>> >>> >> wrote: >>>> >>> >> > Hi All, >>>> >>> >> > >>>> >>> >> > the wrapper is built during Kaldi compilation. I build it >>>> using provided >>>> >>> >> > Makefile. The build depends on: >>>> >>> >> > 1. Python and numpy (by default it queries the python >>>> interpreter found >>>> >>> >> > on >>>> >>> >> > the path for header file location) >>>> >>> >> > 2. Boost with Boost::Python library. It is quite heavy to >>>> build, but >>>> >>> >> > most >>>> >>> >> > Linux distributions ship it. Boost python doesn't require any >>>> code >>>> >>> >> > generation steps, the wrapper is defined in a normal c++ code >>>> file. >>>> >>> >> > >>>> >>> >> > During build Python and Boost libraries and Kaldi object files >>>> are >>>> >>> >> > linked >>>> >>> >> > into a CPython extention module, >>>> kaldi/src/python/kaldi_io_internal.so. >>>> >>> >> > It >>>> >>> >> > works with both static and shared Kaldi builds. Further usage >>>> requires >>>> >>> >> > that >>>> >>> >> > python finds kaldi_io.py and kaldi_io_internal.so on the >>>> PYTHONPATH - it >>>> >>> >> > can >>>> >>> >> > be for example added to the PYTHONPATH variable in the path.sh >>>> script of >>>> >>> >> > a >>>> >>> >> > recipe. >>>> >>> >> > >>>> >>> >> > Jan >>>> >>> >> > >>>> >>> >> > >>>> >>> >> > On 12/13/2014 3:33 PM, Daniel Povey wrote: >>>> >>> >> >> >>>> >>> >> >> Also, Jan- could you send us an email explaining how this >>>> works- >>>> >>> >> >> How does Python "see" the C++ headers? Do you have to >>>> invoke some >>>> >>> >> >> special program, like swig? Do you have to write some >>>> special kind of >>>> >>> >> >> header that shows how the C++ objects are to be interpreted >>>> by python? >>>> >>> >> >> A brief example would be helpful, if so. >>>> >>> >> >> How is the resulting program linked, if at all? If you >>>> require >>>> >>> >> >> functions C++ libraries, are these obtained from the .a or >>>> .so files >>>> >>> >> >> at runtime, or compiled into some kind of executable-like >>>> blob at >>>> >>> >> >> compile time? Does your framework require that Kaldi be >>>> compiled >>>> >>> >> >> using dynamic (.so) libraries? >>>> >>> >> >> >>>> >>> >> >> Dan >>>> >>> >> >> >>>> >>> >> >> >>>> >>> >> >> On Sat, Dec 13, 2014 at 12:04 PM, Jan Chorowski >>>> >>> >> >> <jan...@gm...> >>>> >>> >> >> wrote: >>>> >>> >> >>> >>>> >>> >> >>> Hello Dan, >>>> >>> >> >>> >>>> >>> >> >>> thank you for the comments. I tried to make it in the Kaldi >>>> spirit, >>>> >>> >> >>> consistency is important. Of course, the scripts can be >>>> removed and >>>> >>> >> >>> replaced >>>> >>> >> >>> with some more useful examples. I don't have too much >>>> experience with >>>> >>> >> >>> bridging Python to C++, so any critique on the wrappers and >>>> the >>>> >>> >> >>> approach >>>> >>> >> >>> taken is welcome. >>>> >>> >> >>> >>>> >>> >> >>> Jan >>>> >>> >> >>> >>>> >>> >> >>> >>>> >>> >> >>> On 12/13/2014 2:55 PM, Daniel Povey wrote: >>>> >>> >> >>>> >>>> >>> >> >>>> Hi all. >>>> >>> >> >>>> From a first look, it does look very impressive, and >>>> nicely >>>> >>> >> >>>> documented. >>>> >>> >> >>>> I would appreciate it if people on the list who have Python >>>> >>> >> >>>> experience >>>> >>> >> >>>> would comment on this- you can either reply to this thread, >>>> or to me. >>>> >>> >> >>>> I don't know if this has been done in the "natural" way, or >>>> if there >>>> >>> >> >>>> is some reason why people in the future will say, "why did >>>> you do it >>>> >>> >> >>>> this way, you should have done XXX". >>>> >>> >> >>>> >>>> >>> >> >>>> Jan: >>>> >>> >> >>>> in the scripts/ directory you seem to have some examples of >>>> how you >>>> >>> >> >>>> can create python programs that behave very much like Kaldi >>>> >>> >> >>>> command-line programs, using your framework. This is very >>>> useful. >>>> >>> >> >>>> However, the programs >>>> >>> >> >>>> apply-global-cmvn.py >>>> >>> >> >>>> compute-global-cmvn-stats.py >>>> >>> >> >>>> are perhaps a little confusing because they provide the same >>>> >>> >> >>>> functionality that you could get with "compute-cmvn-stats -> >>>> >>> >> >>>> matrix-sum" and "apply-cmvn" on the output of that command; >>>> and they >>>> >>> >> >>>> do so using different formats for the CMVN information. I >>>> know the >>>> >>> >> >>>> format of storing the CMVN stats in a two-row matrix is >>>> perhaps not >>>> >>> >> >>>> perfectly ideal, but it's a standard within Kaldi and it >>>> would be >>>> >>> >> >>>> confusing to deviate from that standard. >>>> >>> >> >>>> Of course, this is a very minor issue that doesn't affect >>>> the >>>> >>> >> >>>> validity >>>> >>> >> >>>> of the framework as a whole. I am just pointing this out; >>>> the main >>>> >>> >> >>>> discussion should be about the framework and whether people >>>> feel it's >>>> >>> >> >>>> the "right" way to do this. >>>> >>> >> >>>> >>>> >>> >> >>>> Dan >>>> >>> >> >>>> >>>> >>> >> >>>> On Sat, Dec 13, 2014 at 6:28 AM, Jan Chorowski >>>> >>> >> >>>> <jan...@gm...> >>>> >>> >> >>>> wrote: >>>> >>> >> >>>>> >>>> >>> >> >>>>> Hi all! >>>> >>> >> >>>>> >>>> >>> >> >>>>> I've written wrappers to access Kaldi data files from >>>> within Python >>>> >>> >> >>>>> using boost::python (the code is on github >>>> >>> >> >>>>> >>>> https://github.com/janchorowski/kaldi-git/tree/python/src/python). >>>> >>> >> >>>>> If >>>> >>> >> >>>>> you think this would be an interesting addition please >>>> instruct me >>>> >>> >> >>>>> how >>>> >>> >> >>>>> to contribute. >>>> >>> >> >>>>> >>>> >>> >> >>>>> Best Regards, >>>> >>> >> >>>>> Jan Chorowski >>>> >>> >> >>>>> >>>> >>> >> >>>>> >>>> >>> >> >>>>> >>>> >>> >> >>>>> >>>> >>> >> >>>>> >>>> >>> >> >>>>> >>>> ------------------------------------------------------------------------------ >>>> >>> >> >>>>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT >>>> Server >>>> >>> >> >>>>> from Actuate! Instantly Supercharge Your Business Reports >>>> and >>>> >>> >> >>>>> Dashboards >>>> >>> >> >>>>> with Interactivity, Sharing, Native Excel Exports, App >>>> Integration & >>>> >>> >> >>>>> more >>>> >>> >> >>>>> Get technology previously reserved for billion-dollar >>>> corporations, >>>> >>> >> >>>>> FREE >>>> >>> >> >>>>> >>>> >>> >> >>>>> >>>> >>> >> >>>>> >>>> >>> >> >>>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>>> >>> >> >>>>> _______________________________________________ >>>> >>> >> >>>>> Kaldi-developers mailing list >>>> >>> >> >>>>> Kal...@li... >>>> >>> >> >>>>> >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>> >>> >> >>> >>>> >>> >> >>> >>>> >>> >> > >>>> >>> > >>>> >>> > >>>> >>> >>>> >>> >>>> ------------------------------------------------------------------------------ >>>> >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>>> >>> from Actuate! Instantly Supercharge Your Business Reports and >>>> Dashboards >>>> >>> with Interactivity, Sharing, Native Excel Exports, App Integration >>>> & more >>>> >>> Get technology previously reserved for billion-dollar corporations, >>>> FREE >>>> >>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>>> >>> _______________________________________________ >>>> >>> Kaldi-developers mailing list >>>> >>> Kal...@li... >>>> >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>> >> >>>> >> >>>> >> >>>> ------------------------------------------------------------------------------ >>>> >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>>> >> from Actuate! Instantly Supercharge Your Business Reports and >>>> Dashboards >>>> >> with Interactivity, Sharing, Native Excel Exports, App Integration & >>>> more >>>> >> Get technology previously reserved for billion-dollar corporations, >>>> FREE >>>> >> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>>> >> _______________________________________________ >>>> >> Kaldi-developers mailing list >>>> >> Kal...@li... >>>> >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>> >> >>>> > >>>> > >>>> > -- >>>> > Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, >>>> ond...@gm... >>>> > >>>> > >>>> ------------------------------------------------------------------------------ >>>> > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>>> > from Actuate! Instantly Supercharge Your Business Reports and >>>> Dashboards >>>> > with Interactivity, Sharing, Native Excel Exports, App Integration & >>>> more >>>> > Get technology previously reserved for billion-dollar corporations, >>>> FREE >>>> > >>>> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk >>>> > _______________________________________________ >>>> > Kaldi-developers mailing list >>>> > Kal...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>> > >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >>> with Interactivity, Sharing, Native Excel Exports, App Integration & more >>> Get technology previously reserved for billion-dollar corporations, FREE >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-developers mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server >> from Actuate! Instantly Supercharge Your Business Reports and Dashboards >> with Interactivity, Sharing, Native Excel Exports, App Integration & more >> Get technology previously reserved for billion-dollar corporations, FREE >> >> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> > |