BML Speech Element
From smartbody
Contents |
BML <speech> Element
The following BML behavior documentation covers the limited implementation found in SmartBody BML. SmartBody's speech behavior implements an out-of-date specification of BML, thus accounting for the significant differences.
Voice Selection
Before performing speech behaviors, SmartBody characters need to be initialized with proper voice information. This is done with the set character .. voice command:
set char <char-id> voice <impl-id> <parameters>
where
- <char-id> is the identifier of the character whose voice to be modified.
- <impl-id> is the identifier of the voice implementation.
- <parameters> includes all parameters used by the voice implementation for this voice.
Remote Voices
Smartbody can use external modules to process the speech behavior, communicating via VHuman Speech Messages.
set char <char-id> voice remote <voice-id>
AudioFile Voices
Audiofile voices use pre-recorded audio on disk, paired with viseme and word-break timing data.
set char <char-id> voice audiofile <audio-dir>
SmartBody matches speech behaviors to audio using the ref=".." attribute. The value of ref is used to find the .wav and .bml files in the voice directory. The .wav contains the audio and the .bml describes the timing of visemes and word breaks. The bml is created from the audio using Word Breaker found in tools/word-breaker.
Speech Behavior
SmartBody recognizes speech in two forms: plain text and SSML. The two forms are distinguished by the type=".." attribute with the appropriate MIME type, "text/plain" or "application/ssml+xml" respectively. If missing, the format is assumed to be plain text.
Examples:
<act>
<bml>
<speech><!-- type="text/plain" by default -->
Hello from SmartBody!
</speech>
</bml>
</act>
<act>
<bml>
<speech type="application/ssml+xml">
<emphasis>Hello</emphasis><break/> from SmartBody!
</speech>
</bml>
</act>
Both examples speak the text "Hello from SmartBody", but SSML may allow additional control.
When using Audiofile speech, the speech behavior must include a ref=".." attribute to identify the audio and timing resources:
<act>
<bml>
<speech ref="yes_excited">
Yes!
</speech>
</bml>
</act>
Word Break Synchronization
Speech behavior support non-standard sync-points via named markers in word breaks. The exact mark format depends upon the format of the text. Plain text uses <tm> tags with an id attribute, while SSML uses the standard <mark> tag with a name attribute.
<act>
<bml>
<speech type="text/plain">
<tm id="wb0"/>Hello<tm id="wb1"/>from<tm id="wb2"/>SmartBody!
</speech>
</bml>
</act>
<act>
<bml>
<speech type="application/ssml+xml">
<mark name="wb0"/>Hello<mark name="wb1"/>from<mark name="wb2"/>SmartBody!
</speech>
</bml>
</act>
