From: Arnab G. <ar...@gm...> - 2013-04-19 12:30:17
|
OK, used -1 to mean "take till end of file". Instead of using a bool to accept larger segments at the end it's now a float (time in secs) till which an overshooting segment will be accepted. -Arnab On Thu, Apr 18, 2013 at 7:38 PM, Nagendra Kumar Goel <nag...@go...> wrote: > Arnab, > I prefer not to use soxi as its an overkill sometimes. Sometimes the data > may not even be in wav format (sure will convert before using feature > extraction but that's a different pipe). > How about if we make the syntax requirements more strict - like require the > value to be exactly -1. The only issue will be that it's loaded as float, > but we could take the difference and require that to be very small. This > will help you catch bugs in your scripts early on while keeping me safe. > > I recall earlier there was some data that had incorrect segmentation (like > end time was rounded off), causing scripts to unnecessarily fail for some > segments. However that data has been cleaned up. > > Nagendra > > -----Original Message----- > From: Arnab Ghoshal [mailto:ar...@gm...] > Sent: Thursday, April 18, 2013 2:03 PM > To: Nagendra Kumar Goel > Cc: Daniel Povey; kal...@li... > Subject: Re: [Kaldi-developers] extract-segments > > The reason I don't like the special value is that there is a check to reject > segments that are too small. This is a command line option and is visible to > the user. The special value (in the current code it's really an interval) is > hidden and one can only know about it by reading the code. But the hidden > option has a higher priority than the visible option. So while it is > reasonable for a user to expect any segments with invalid start and end > times (i.e. start >= end) to be rejected, sometimes the whole file may > actually get included instead. > This is, in fact, how we found the problem-- a scripting bug caused some end > times to be 0, which went undetected till some process way down the line > died due to a very big segment that shouldn't have been there. > > There is also an option to accept invalid end times (false by default) and I > am not sure what is the reason to have that functionality. > > The way I would have solved your particular problem is to get the start > (which will be 0) and end times for the single utterance files, while > keeping the segments format unchanged. You could use soxi to get the end > time. > > Let me know if this works for you. > > -Arnab > > On Thu, Apr 18, 2013 at 6:35 PM, Nagendra Kumar Goel > <nag...@go...> wrote: >> I have been using this to mix in data that is segmented with data that >> is sentence by sentence files. I didn't care if its 0 or -1. >> >> Is there a specific reason you don't like it? It solves a real problem >> for me. >> >> >> >> From: Daniel Povey [mailto:dp...@gm...] >> Sent: Thursday, April 18, 2013 1:32 PM >> To: Arnab Ghoshal; Nagendra Kumar Goel >> Cc: kal...@li... >> Subject: Re: [Kaldi-developers] extract-segments >> >> >> >> I think Nagendra may have been using this, he should chime in. >> Dan >> >> >> >> >> >> On Thu, Apr 18, 2013 at 1:30 PM, Arnab Ghoshal <ar...@gm...> wrote: >> >> Hi all, >> >> we just noticed that there is an (unmentioned) assumption in >> extract-segments.cc that an end time of (0, -1] in the segments file >> means "include till the end of the file". But there are additional >> logical bugs that causes an end time of 0 to have the same effect. I >> do not like having this special value of the end time and plan to >> remove it. But is there anybody who has a good reason to keep such a >> functionality? >> >> -Arnab >> >> ---------------------------------------------------------------------- >> -------- Precog is a next-generation analytics platform capable of >> advanced analytics on semi-structured data. The platform includes APIs >> for building apps and a phenomenal toolset for data science. >> Developers can use our toolset for easy data analysis & visualization. >> Get a free account! >> http://www2.precog.com/precogplatform/slashdotnewsletter >> _______________________________________________ >> Kaldi-developers mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >> >> > |