The following features are being requested:
1) A preprocessing stage is required, where various functions can be implemented that act upon the data being used for the training of models.
This window should provide the following options:
*A function to load a set (list) of filenames from file that will be used for training/testing purposes.
*An option to process the list of files given above and check against the given pronunciation dictionaries. Two subsets of filenames should then be created: A set that is usable with the current selection of dictionaries and the set that contains out of dictionary words (Problematic files).
*A check should also be performed to see if there is matching audio and transcription filenames.
*An option to edit transcription files, that will later be copied to the model directory, according to a regular expression.
*All these options should be captured as a preprocessing configuration, so that one can load it again when doing a similar experiment.
2) A way to load a particular set of HTK configuration parameters is also required.
3) It should also be possible to load these configuration files when the tool is used via the command line.
Logged In: YES
user_id=1840987
Originator: NO
All done except for a check for matching audio files/transcriptions.