KB113: Speech Tags

What are Speech Tags

You can use Speech Tags wherever you can specify text to be spoken using Text to Speech (TTS). Speech Tags can be used to change the quality of the voice itself, or to change the pronunciation of a word. Some tags, such as [silence 1s] are used alone, but most are opening and closing tags that form a pair that wrap a word or phrase, such as [digits]123[/digits].

You can use the Script panel's Play button to listen to your tagged text to make sure it sounds right. If you hear the voice speak the text of a tag, then you know that the tag was not understood as a tag - check the spelling using the reference below. Be sure that each opening tag has a corresponding closing tag. If tags are not properly matched, you may not hear the voice at all.

Speech Tag Reference

Here is a list of available tags.

[alt]	Tags a word so that it is spoken in an alternate manner, often associated with an alternate sense of the word. Modern voices use context to determine pronunciation, so this tag is rarely needed.
[conversational]	With certain voices, such as Matthew and Joanna, this tag alters the intonation to sound more relaxed and low-key, as in a natural conversation. See also [news].
[digits]	Reads out a number as digits.
[english]	When using a non-English voice, tags a word or phrase that should be spoken using English pronunciation.
[french]	When using an English voice, tags a word or phrase that should be spoken using French pronunciation.
[german]	When using an English voice, tags a word or phrase that should be spoken using German pronunciation.
[ipa]	Pronounces the enclosed word using IPA phonetic spelling.
[italian]	When using an English voice, tags a word or phrase that should be spoken using Italian pronunciation.
[news]	With certain voices, such as Matthew and Joanna, this tag alters the intonation to sound more like a newscaster. See also [conversational].
[past]	Tags a word as being a past-tense verb. Modern voices use context to determine pronunciation, so this tag is rarely needed.
[silence]	Specifies a pause in the speech. You can use value in seconds (s) or in milliseconds (ms).
[pinyin]	For Madarin Chinese, pronounces the enclosed word using Pinyin phonetic spelling.
[pitch]	Sets the pitch for a word or phrase. As an argument, you can use the word default, x-low, low, medium, high, x-high, or a relative value in %. This feature is not be supported on all voices.
[rate]	Sets the rate for a word or phrase. As an argument, you can use the word default, x-slow, slow, medium, fast, x-fast, or a value between 20% and 200%.
[sampa]	Pronounces the enclosed word using X-SAMPA phonetic spelling.
[spell]	Spells a word instead of speaking it normally. This can also be used with the letter 'a', which might otherwise sound like the word "a" in "a dog".
[spanish]	When using an English voice, tags a word or phrase that should be spoken using Spanish pronunciation.
[spoken]	Use this tag to wrap text as it should be spoken. This text will not appear in a user-facing transcript. Almost always used next to [written].
[verb]	Tags a word as being a verb, to alter the pronunciation. Modern voices use context to determine pronunciation, so this tag is rarely needed.
[volume]	Sets the volume for a word or phrase. As an argument, you can use the word default, silent, x-soft, soft, medium, loud, x-loud, or a value such as +ndB, -ndB.
[written]	Use this tag to wrap text as it should be written. This text may appear in a user-facing transcript, but is never spoken. Almost always used next to [spoken].