CMU Sphinx / Forums / Help: Which corpus used to train WSJ acoustic model

peterentropy - 2010-04-09

Hi, can you please advise which of the following WSJ corpora were used to
train the WSJ acoustic models available with Sphinx 4.

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S6A

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S6B

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S6C

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC94S13A

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC94S13B

http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC94S13C

Is there a document that gives these details?

Many thanks. Peter

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-04-09

http://cmusphinx.sourceforge.net/wiki/sphinx4:wsjtasks33optimization

The training data is the set of 321 speakers, from both the speaker
independent and speaker dependent portions in the training and development
test sets in the wsj0 and wsj1 database.

So both [http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S6A]
(http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC93S6A) and ht
tp://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC94S13A were used. It's
default CMU wsj setup.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

peterentropy - 2010-04-11

Many thanks nshmyrev.

This is probably the wrong place to request this.

I would like to obtain a typical sample of the audio from the coprora to
build your WSJ acoustic models.

I am writing from Europe and I woudl like to get a feel for the rate and type
of speech and accent to feed into my program.

This may be forbidden under the terms of agreement with the supplier, in that
case a someone speaking in the manner of the training set audio would suffice.

Any suggestions would be most helpful.

Peter

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-04-11

You can find samples in pocketsphinx/test/data/wsj

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

peterentropy - 2010-04-11

Fantastic.

Thank you

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

peterentropy - 2010-04-11

I refer to the page you directed me to: http://cmusphinx.sourceforge.net/wiki
/sphinx4:wsjtasks33optimization

Where is the parent of this page?

It is not apparant on http://cmusphinx.sourceforge.net/wiki/

There is a page http://cmusphinx.sourceforge.net/wiki/sphinx4:webhome and it refers to
WSJTaskS3.3Optimization S3.3 Decoder optimization for WSJ with link: http://c
musphinx.sourceforge.net/wiki/sphinx4:wsjtasks3.3optimization_s3.3_decoder_opt
imization_for_wsj This link is broken.

Can you please advise the location with the link to: http://cmusphinx.sourcef
orge.net/wiki/sphinx4:wsjtasks33optimization

In addition and still regarding that page, there is discussion about the
variables: language weight (lw), relative beam width (beam), and new word beam
width (nwbeam). From the config.xml files I am familiar with <config> </config>

<property name="absoluteBeamWidth" value="500">
<property name="relativeBeamWidth" value="1E-80">
<property name="absoluteWordBeamWidth" value="20">
<property name="relativeWordBeamWidth" value="1E-60">
<property name="wordInsertionProbability" value="1E-16">
<property name="languageWeight" value="7.0">
<property name="silenceInsertionProbability" value=".1">
<property name="frontend" value="epFrontEnd">
<property name="recognizer" value="recognizer">
<property name="showCreations" value="false"> </property></property></property></property></property></property></property></property></property></property>

But not new word beam width (nwbeam). Is this a Sphinx 3 setting that is not
avaiable under that name in Sphinx 4?

Many thanks

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2010-04-11

Where is the parent of this page?

There is no parent

But not new word beam width (nwbeam). Is this a Sphinx 3 setting that is not
avaiable under that name in Sphinx 4?

sphinx3 beams are different from sphinx4 beams, you can't use the same values.
If you are looking for WSJ config for sphinx4, you can find it in
tests/performance/wsj5k or tests/performance/wsj20k.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Which corpus used to train WSJ acoustic model

Speech Recognition Toolkit

Forums

Help

Which corpus used to train WSJ acoustic model document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Which corpus used to train WSJ acoustic model