EQWatcher Evolution > Anatomy > Text-to-Speech


Changing voice characteristics is actually pretty simple, but can also get pretty ugly.

The Text-to-Speech engine by default will process XML tags (don't get scared now) if the first character of the string is a "<".  XML tags are used within text that is supposed to be spoken, to change voice characteristics and even some other things immediately as it is being spoken.  Please note that they ONLY modify the CURRENT line of speech, so the next line you speak will be normal again.

First, an explanation of what a tag is and how to use them:  These tags can be used in 2 ways. #1 you can use XML tags to change how all following speech will be spoken #2 you can use these tags to change how a specific range of text will be spoken. The first way is done by making the tag look like this: <TAG XXXX/> (the slash at the end provides the effect). The second way is done by using 2 tags, the first tag being <TAG XXXX> and then the second being </TAG> (all of the text between the 2 tags is affected by the change). ***NOTE: Either way, the tags ONLY affect the CURRENT text to speak. Any new speech calls will be normal again.

Examples:


<volume level="50">This text is affected</volume> and this text is not.

<voice required="Name=Microsoft Mary">This is spoken by Mary, if she exists on this computer,</voice> and this is spoken by the default voice.

<volume level="50"/>This entire line is affected and spoken at volume level 50%.

<voice required="Name=Microsoft Mary"/>This entire line is spoken by Mary, if she exists on this computer.


I will explain the use of 7 different options: volume, rate, pitch, emphasis, spell it out, silence, and voice.

Volume


Volume is set using the following format "<volume level="[Level]">", where Level is a number between 0 and 100. 


Rate


There are to ways to set the rate of speech, absolute and relative.  Setting it relatively means it is relative to the current rate of speech (the number given will be ADDED to the current rate, so if the current rate is 5 and you say 5, it will be 10.  If the current rate is 5 and you say -5, it will be 0.), and setting absolute means the rate will be set to the exact number specified.  Possible values for the rate are -10 to 10.  Anything outside of this range is truncated to -10 or 10 depending on which range it is outside of, but is surely legal to use.  -10 is slowest, and 10 is fastest. 

The format for setting the absolute rate is "<rate absspeed="[Rate]">", and the format for setting the relative rate is "<rate speed="[Rate]">".


Pitch


Setting the pitch is quite similar to setting the rate.  There are still two ways, absolute and relative.  The values are still -10 to 10, etc.  -10 is lowest, and 10 is highest.

The format for setting the absolute pitch is "<pitch absmiddle="[Pitch]">", and the format for setting the relative pitch is "<pitch middle="[Pitch]">".


Emphasis


The "emph" tag instructs the voice to emphasize a specific word or section of text.  This tag CANNOT be "empty" which means it must have both a begin tag and end tag like <emph>text</emph>, rather than having the single <emph/>.


Spell it Out


The "spell" tag instructs the voice to spell out a specific word or section of text, rather than read it as a word or phrase.  This tag CANNOT be "empty" which means it must have both a begin tag and end tag like <spell>text</spell>, rather than having the single <spell/>.


Silence


This tag is used to insert a pause for a specified number of milliseconds.  This tag MUST be "empty" which it means it cannot have a begin tag and end tag, just one single tag that looks like this: <silence msec="[Milliseconds]"/>


Voice


This tag is used to select a voice to use, and is more complicated than the other tags.  The preferred voice is searched for in the system's list of voices by specified attributes, which can be any or all of: Age, Gender, Language, Name, Vendor, and VendorPreferred

The voice tag has two attributes, required and optional.  The voice that ends up being selected will have ALL of the "required" attributes, and more of the "optional" attributes than the other installed voices (if several voices have equal numbers of optional attributes one of those will be selected at random.  Note that I'm not positive it will actually be "random" rather than the same voice each time, but that's what it says in Microsoft's documentation).  If no voice is found that matches all of the required attributes, no voice change will occur.

The format of the voice tag should look something like this: <voice required="[Attributes]"><voice optional="[Attributes]">, or <voice required="[Attributes]" optional="[Attributes]">.

Attributes: This includes any number of the Age, Gender, etc. options separated by a semi-colon.  Each attribute takes the form "[Attribute]=[Value]" if you wish the attribute's value to be equal to this value, or "[Attribute]!=[Value]" if you wish the attribute's value to NOT be equal to this value.  Separating values of course then becomes "[Attribute]=[Value];[Attribute]=[Value]".

Possible values of Age are Child, Teen, Adult.  Possible values of Gender are Female, Male.  Possible values of Language are 409 for English and I have no idea the rest.  Possible values of Name are exactly how they are shown in the Speech control panel (LH Michael, LH Michelle, Microsoft Mary, Microsoft Mike, Microsoft Sam, SampleTTSVoice).  Possible values of Vendor are Microsoft and I haven't seen any others.. and I don't see any values of VendorPreferred.

Here's an example or two:

<voice required="Gender=Female" optional="Age=Teen"> This selects a female voice, preferably teen (although if there is no teen female voice, a female voice will be selected at random).

<voice required="Age=Teen" optional="Gender=Male"> This selects a teen-age voice, preferably male (although if there is no male teen voice, a teen voice will be selected at random).

<voice required="Name=LH Michelle"> This selects the "LH Michelle" voice (if it doesn't exist, there is no voice change).


Applying these tags in EQWatcher


For general EQWatcher commands using Text-to-Speech, you really need to know nothing other than the fact that the sentence must start with a "<".  This is causes the TTS interpreter to assume it is supposed to look for XML tags.  Therefore, the first part of the sentence must be an XML tag.  If you need to use a tag in the middle rather than at the beginning, you can simply use <SAPI> as the first tag, with no ill effect.

Examples:

speak <SAPI>This is an example showing <emph>emphasis</emph>

speak <rate absspeed="10"/>This gets spoken at top speed

For EQWatcher scripts, you must remember that using quotation marks within strings requires using a control character, so the compiler does not think it is the end of the string.

Examples:

SpeakSync("<SAPI>This is an example showing <emph>emphasis</emph>");

SpeakSync("<rate absspeed=\"10\"/>This gets spoken at top speed");