Hello,
I have WARNING messages by LDA training in bw.log file like
utt> 0 de11-068 588 0WARNING: "corpus.c", line 1956: LSN utt id,
AdrianTovar-20080727-bxq/de11-068, does not match ctl utt id, de11-068.
perhaps my transcriptions file looks like:
der adressbereich darf lediglich 30 prozent nutzen
(AdrianTovar-20080727-bxq/de11-068) ein bestimmter vorrat an übertragungskapazität kann zugeordnet werden
(AdrianTovar-20080727-bxq/de11-069) eine beschränkung tritt erst bei besonders intensiver nutzung auf
(AdrianTovar-20080727-bxq/de11-070)
All files have UTF-8 encoding, the OS is Linux/Ubuntu, I am using SVN/trunk.
I don't understand what the problem may be. Can anybody help me?
Thanks in advance,
Yuri
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There is the fullsuffixmatch option of bw which controls the behavior, which
you can set in scripts
-fullsuffixmatch no
Maybe we need to enable it by default. In the future I would like to work
without fileids at all requiring file name to be only in transcription. Patch
to do that would be welcome.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I have WARNING messages by LDA training in bw.log file like
utt> 0 de11-068 588 0WARNING: "corpus.c", line 1956: LSN utt id,
AdrianTovar-20080727-bxq/de11-068, does not match ctl utt id, de11-068.
perhaps my transcriptions file looks like:
der adressbereich darf lediglich 30 prozent nutzen(AdrianTovar-20080727-bxq/de11-068)
ein bestimmter vorrat an übertragungskapazität kann zugeordnet werden(AdrianTovar-20080727-bxq/de11-069)
eine beschränkung tritt erst bei besonders intensiver nutzung auf(AdrianTovar-20080727-bxq/de11-070)
and my fileids file looks like:
AdrianTovar-20080727-bxq/de11-068
AdrianTovar-20080727-bxq/de11-069
AdrianTovar-20080727-bxq/de11-070
All files have UTF-8 encoding, the OS is Linux/Ubuntu, I am using SVN/trunk.
I don't understand what the problem may be. Can anybody help me?
Thanks in advance,
Yuri
I would ignore this warning.
There is the fullsuffixmatch option of bw which controls the behavior, which
you can set in scripts
Maybe we need to enable it by default. In the future I would like to work
without fileids at all requiring file name to be only in transcription. Patch
to do that would be welcome.
Hello Nikolay,
Thank you, but it seems, the behaviour was not changed when I set
-fullsuffixmatch => "no",
This is pl script:
$ST::CFG_FEAT_WINDOW ||= 0;
my $return_value = RunTool
('bw', $logfile, $ctl_counter,
-moddeffn => $moddeffn,
-ts2cbfn => $statepdeffn,
-mixwfn => $mixwfn,
-mwfloor => $mwfloor,
-tpfloor => $tpfloor,
-tmatfn => $tmatfn,
-meanfn => $meanfn,
-varfn => $varfn,
-ltsoov => $ST::CFG_LTSOOV,
-dictfn => $ST::CFG_DICTIONARY,
-fdictfn => $ST::CFG_FILLERDICT,
-ctlfn => $listoffiles,
-part => $part,
-npart => $npart,
-cepdir => $ST::CFG_FEATFILES_DIR,
-cepext => $ST::CFG_FEATFILE_EXTENSION,
-lsnfn => $transcriptfile,
-accumdir => $output_buffer_dir,
-varfloor => $minvar,
-topn => $topn,
-abeam => 1e-90,
-bbeam => 1e-10,
-agc => $ST::CFG_AGC,
-cmn => $ST::CFG_CMN,
-varnorm => $ST::CFG_VARNORM,
-meanreest => "yes",
-varreest => "yes",
'-2passvar' => $var2pass,
-fullvar => $fullvar,
-diagfull => $fullvar,
-feat => $ST::CFG_FEATURE,
-ceplen => $ST::CFG_VECTOR_LENGTH,
-cepwin => $ST::CFG_FEAT_WINDOW,
-fullsuffixmatch => "no",
-timing => "no");
log file:
utt> 0 de11-068 588 0WARNING: "corpus.c", line 1956: LSN utt id,
AdrianTovar-20080727-bxq/de11-068, does not match ctl utt id, de11-068.
It must be 'yes". No is the default
Ok, thanks