Character API Tutorial

This tutorial provides a step-by-step introduction to the capabilities of the 'animate' and 'lipsync' api. It provides specific techniques and code snippets as it builds up to a Reference Implementation based on Character API and Amazon Polly.

With the Character API, your app can use image strips that are downloaded from the cloud, and then looped and seamlessly switched using common JavaScript and CSS techniques. Each strip is fully specified by the parameters of its URL. Strips with new combinations of parameters are created as needed using cloud-computing resources, and then cached for rapid delivery to your application, be it mobile or desktop.

Image strips are easy to understand and work with. Later in the tutorial you will see how most Character API applications actually use texture maps, also known as sprite sheets. A texture map consists of many smaller images packed densely together into a larger image, along with instructions on how to compose the smaller images to form a given frame.

As you go through this API, it will be helpful to try out the examples. But to do so you will need your own API Key, from the AWS Marketplace. You do not need any other product to use the Character API. The API usage itself is metered at $0.007 per call, with unlimited caching.

The main endpoint of the Character API is 'animate'. This is a GET endpoint, and is completely stateless. The only required parameters are an API key, 'character', 'version' and 'format'.

You can use it to create a still image:

<img src="http://mediasemantics.com/animate?key=12345678&character=CarlaHead&version=1.1&format=png"/>

Or you can use it to create a vertical strip of frames:

<img src="http://mediasemantics.com/animate?key=12345678&character=CarlaHead&version=1.1&format=png&action=blink"/>

You can use 'animate' to produce a background image for a div. The CSS 'height' and 'backgroundPosition' attributes make it easy to create a sliding window on the image strip so that you see only one frame at a time:

<div id="anim" style="background-image:url(http://mediasemantics.com/animate?key=12345678&character=CarlaHead&version=1.1&action=blink&format=png); width:250px; height:200px">
</div>

We can set the div's height to 200, which is the height of one frame within the strip, so only the first frame of the strip is showing. Let's write some code to show the rest:

<script>    
    var a = [0, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
    var frame = 0;
    setInterval(function() {
        var div = document.getElementById("anim");
        div.style.backgroundPosition = "0px -" + a[frame] * 200 + "px";
        frame++;
        if (frame == a.length) frame = 0;
    }, 1000/12);    // 12 fps
</script>

The array 'a' has a list of frame indices within the strip. We set a timer to run at 12 frames per second. The variable 'frame' is an index into this array. On each tick of the timer, the expression 'a[frame]' tells us which image to display within the strip. We use the backgroundPosition property to slide the strip to the right position. The backgroundPosition property takes an 'x' value and a 'y' value, but the 'x' value is always 0. With a 'y' of 0, we display the first frame (frame 0) of the strip. With a 'y' of -200 we display the second frame, and so forth.

The end result is that you see the character periodically blink:

Note that the Reference Implementation uses the javascript requestAnimationFrame api instead of setInterval, for better animation performance. It also uses a simple technique to drop frames if needed, to maintain a consistent animation speed.

You will soon see that the actions you provide to a character can be quite a bit more complex than just blinking. In fact, the idea is that you tell the character what you want it to do at a high level, using actions such as Look, Point, and Say. Later in this tutorial we'll show how the Character API can create the frame array you saw in the code above, so that you don't have to.

But first, consider what happens when you switch the 'backgroundImage' attribute of the div. For example we could start our div with a still and then switch it to the blink strip. Notice how the first frame of the blink strip is identical to the still. Character API strips normally start and end in a neutral position precisely so that you can switch between them with no "jumps". Of course it does take time to download the next strip, so in general we always preload it, so that the visual switch can happen in a seamless manner, without any "blank" periods between strips.

The ability to stitch together different image strips as needed is sometimes referred to as "non-linear" media, and is key to allowing characters to react to user events, and present personalized information, such as a stock quote, or tomorrow's weather.

The Character API is cloud-based, and easily scales to meet your traffic needs. You can think of it as a huge collection of character images and image strips that are always at your disposal. Really the images are created "just-in-time", to your requirements, and then cached.

While you pay a fraction of a penny for each call to the Character API, you are free to cache the results on your own server, by implementing a server-based animation cache. This way you only access the Character API when your very first customer views a particular animation, and thereafter the request comes directly from your server's cache. Not only is this allowed, but it is actually the preferred way to use the API for applications involving web services, chatbots, and text-to-speech. By moving the logic that calls the Character API to your own server, you are able to leverage other cloud-based services from different vendors.

Character animation using image strips is a compromise between several different factors, and as such, it may not be the best solution in all cases, but it tends to perform well given today's distribution of bandwidth, client, and server-based compute power.

This tutorial will show you how to generalize the code shown above to create a simple and efficient client-side engine for loading and playing back animation. While the Character API is optimized for the delivery of non-linear animation over the web, you can also use it to generate sequences of images that can be stitched together into an mp4 video, using a widely-used public domain compression tool called 'ffmpeg'. While this is not covered in this tutorial, most of the topics discussed in this tutorial still apply.



Design Choices

The Character API includes a wealth of characters in several different styles, from cartoons to realistic figures.

<img src="http://mediasemantics.com/animate?key=12345678&character=CSFelixFoxFront&version=1.0&format=png">
<img src="http://mediasemantics.com/animate?key=12345678&character=TomHead&version=1.2&format=png">
<img src="http://mediasemantics.com/animate?key=12345678&character=SusanHead&version=3.0&format=png">

Many of them, including Susan, come in different styles, each with a different camera angle, zoom factor, etc. For example Susan also comes in Bust and Body styles.

<img src="http://mediasemantics.com/animate?key=12345678&character=SusanBust&version=3.0&format=png">
<img src="http://mediasemantics.com/animate?key=12345678&character=SusanBody&version=3.0&format=png">

Everyone wants their character to look unique, but the reality is that there is a high cost to developing a custom character from first principles. Thankfully, the Character API includes built-in character customization. Many of the stock characters can be reclothed and recolored to create a wide range of effective characters.

For example, let's say you like Susan, but you want to lose the jacket.

<img src="http://mediasemantics.com/animate?key=12345678&character=SusanBody&version=3.0&over=none">

That's nice, but maybe her white shirt disappears into the background on your website. Maybe you want to pick up a color to match a corporate logo:

<img src="http://mediasemantics.com/animate?key=12345678&character=SusanBody&version=3.0&over=none&topcolor=008080">

Some character styles have packages of "addons" that represent different clothing or hair pieces, to give you an even wider range of appearances. Maybe on the weekend you make her show up totally casual.

<img src="http://mediasemantics.com/animate?key=12345678&character=SusanBody&version=3.0&addons=ClothesPack1,ClothesPack2&addonversions=3.0,3.0&foot=sandals1&bottom=jeans1&top=blouse1&over=none&format=png">

Maybe your application calls for a medical professional, a policeman, or a soldier. Simply dial up the right clothing from several Career addons.

You may be wondering why a version always needs to be specified for each character and addon. To improve edge caching, the cache expiration time on the resulting images is effectively infinite. New versions of a character or an addon are released from time to time, but the old ones are never removed. This allows you to deploy updated characters, which may include updated appearances or behavior, on your own schedule, by incrementing the version number in your URL.

Each character supports a certain set of actions. In general, all characters can perform a range of actions related to the presentation of information - speaking, gesturing, emoting.

Actions that target specific body parts can often be combined. For example here is the "lookleft" action, loaded into a div similar to how we did Carla's blink earlier in this tutorial:

<div id="anim1" style="background-image:url(http://mediasemantics.com/animate?key=12345678&character=SusanBody&version=3.0&action=lookleft&format=png); width:500px; height:400px">
</div>

And here is the "gestureleft" action:

<div id="anim1" style="background-image:url(http://mediasemantics.com/animate?key=12345678&character=SusanBody&version=3.0&action=gestureleft&format=png); width:500px; height:400px">
</div>

Here are both actions together:

<div id="anim1" style="background-image:url(http://mediasemantics.com/animate?key=12345678&character=SusanBody&version=3.0&action=<lookleft/><gestureleft/>&format=png); width:500px; height:400px">
</div>

If you have a single action, you can specify it as a single word, as we have seen so far. But in general, actions and sequences of actions are represented as XML tags. So

<par><lookleft/><gestureleft/></par>

means look left, then gesture left, combining the two in parallel. Later we will see how the <say/> action can include text and actions aligned to this text.

New characters, styles, and addons are constantly being added to the catalog, often by request. Media Semantics, the makers of the Character API, are also able to add custom characters and addons that are tied to a specific API key.

For a detailed catalog of characters and their actions, please see the characters page.



Bandwidth Considerations

The image strip approach to character animation is sensitive to bandwidth. There are several choices that you can make that directly affect the size of the image strips, namely Compression Type, Frame Size, and Number of Frames.

Compression Type

Images strips are coded using either the PNG or the JPEG format, as determined by the 'format' parameter on the Animate URL. Which you choose depends on a couple factors. PNG animation includes transparency, while JPEG does not. So if your application includes a character that appears over top of other content on your website, then you must use PNG. PNG tends to be very efficient for cartoon characters, because of the long runs of solid color. Being a lossless format, it will faithfully capture every pixel.

On the other hand, JPEG is especially good at coding realistic characters. The JPEG format comes with a quality parameter - if not specified the value 75 is used. While the bandwidth can be reduced by lowering the quality, the result will be more artifacts, particularly when you do have a solid run of color. Furthermore, if you are switching from one image strip to another with non-linear media, bear in mind that the switch from the last frame of one strip to the first frame of another strip may result in a slight difference in these artifacts, as the JPEG algorithm uses different tradeoffs for different image strips. These differences tend to be hard to notice in practice, in particular for higher compression quality factors.

A more significant consideration with JPEG is what to do with the background, since JPEG images are always fully opaque. By default the background on a jpeg strip will be white. You can provide a solid color with the 'backcolor' attribute.

<img src="http://mediasemantics.com/animate?key=12345678&character=SusanHead&version=3.0&backcolor=808080&format=jpeg"/>

You can also do a simple vertical gradient:

<img src="http://mediasemantics.com/animate?key=12345678&character=SusanHead&version=3.0&backgradient=vertical&backcolor1=0060CC&backcolor2=003366&format=jpeg"/>

Finally you can specify the URL for an image to use as a background using 'backimage'. The image can be either PNG or JPEG format, and should be the exact same size as the size of a single frame in your strip. You provide the url to the background as a parameter to the Animate URL. For example let's say you had the following background url:

<img src="http://www.mediasemantics.com/img/tutorial/onesandzeroes.jpeg"/>

Since this information contains colons and slashes, we'll need to use the JavaScript escape function to pack it into our url.

<img id="anim1">

<script>
var img = document.getElementById("anim1");
img.src = "http://mediasemantics.com/animate?key=12345678&character=SusanHead&version=3.0&format=jpeg&backimage=" + 
          encodeURIComponent("http://www.mediasemantics.com/img/tutorial/onesandzeroes.jpeg");
</script>

By carefully selecting a background image, you can make your character blend in naturally with your site while still being able to take advantage of the better compression afforded by the jpeg format.

If you use an image, be sure to place it in a publicly accessible location. It will be consulted by the Character API each time it needs to generate a new strip.

IMPORTANT: If you want to change the background image, please upload your new file under a different name, and use that new name in the 'backimage' parameter. The Character API assumes that the image at a given URL is invariant, and can be cached indefinitely.

Frame Size

Clearly the size of each frame in your strip has a direct impact on the size of the strip in kilobytes. Unless otherwise specified, the frame width and height is a standard size that matches the style of the character. For example, headshots tend to be 250x200 pixels.

You can specify a different frame size using the 'width' and 'height' attributes.

<img src="http://mediasemantics.com/animate?key=12345678&character=SusanHead&version=3.0&format=png&width=125&height=150">

As you can see, the character does not resize - all we are doing is cropping the image. With this tighter cropping you will normally need to adjust the position of the character to center it correctly. You can do this with 'charx' and 'chary', the horizontal and vertical offset to be applied to the character. These are values in pixels, with positive y moving the character down within the frame.

<img src="http://mediasemantics.com/animate?key=12345678&character=SusanHead&version=3.0&format=png&width=125&height=150&charx=-60&chary=-30">

Cropping down a character using width, height, charx, and chary is a great way to achieve bandwidth savings, but it is also an important consideration in determining how much real-estate the character takes on your page. The default size of a full body character is 500x400, which is very wide. This can be cropped substantially, but bear in mind that some actions, such as pointing, run the risk of being cut off. If you crop a body character to just the head, then you will want to avoid a pointing action altogether, as it won't be seen.

One thing to bear in mind is that the entire image strip can be scaled up or down. Modern browsers are very good at scaling images, however the results are only as good as the pixel density you start out with. You can take a 250x200 pixel image and scale it down to half size and it will still look great. But there is little point in doing this, since you are downloading more information than you need. Likewise, you can scale the cropped image above to double its effective size with NO loss in bandwidth, but the results may look a little blurry.

<img src="http://mediasemantics.com/animate?key=12345678&character=SusanHead&version=3.0&format=png&width=125&height=150&charx=-60&chary=-30" style="width:250px">

For this reason we normally recommend that you not scale the image strip, but use other techniques to achieve the desired size and bandwidth tradeoffs.

Some characters, notably cartoon characters, allow you to use the 'charscale' attribute to specify a scale. Consider the Charlie character. The default frame size is 307x397.

<img src="http://mediasemantics.com/animate?key=12345678&character=Charlie&version=1.0&format=png"/>

Think of the 'charscale' factor as scaling up the entire character, default frame size and all. So the frame size for charscale=1.5 is 460x595.

<img src="http://mediasemantics.com/animate?key=12345678&character=Charlie&version=1.0&format=png&charscale=1.5"/>

You can then further crop this down to the head with a smaller frame size, as we saw before. Characters that support a 'charscale' are based on vector art, and are re-rendered so as to provide you with the maximum amount of detail.

Number of Frames

Clearly the size of the image strip is directly related to the number of frames in the strip. Recall how we used an array to encode the actual sequence of images strip indices in our blink animation:

<script>
    var a = [0, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
</script>

At the default rate of 12 frames per second, this allowed us to achieve 15 frames, or about 1 1/4 seconds worth of animation out of a strip with 3 frames. We furthermore looped the sequence to get an infinite length of animation.

Consider a strip of a character talking. In a cartoon, it is acceptable for a character's head to move very little, or not at all, as the mouth forms different words. Since there are only about 9 different positions (visemes), the resulting image strip might have as few as 9 images, even if the character speaks for several minutes. In practice it will be more, since the character will need to blink occasionally. On the other hand, when we have a realistic character talking, we expect to see her head move frequently while speaking. So the complexity of the action, including the amount of automatic action you allow, has a direct impact on the total height of the image strip.

Clearly you wouldn't want to put an hour of speech into a single strip - the trick is to break it down into separate image strips that are then stitched together. This lets you overlap the playback of one image strip with the creation of the next. It also lets you play image strips at random, or based on the user's actions. If you succeed, your user will get the impression that the character has a "life of its own". But to achieve this we need a better playback engine.



Improving the Playback Engine

Let's take advantage of everything we've learned to create a more general playback engine for use with the Character API.

Start with a div to contain our animation:

<div id="anim"></div>

Now let's add a link that will cause the character to perform an action:

<a href="javascript:execute('<blink/>')">blink</a>

Now some code to set up some global variables and initialize the div:

<script>

init();

var animateBase;
var savedURL;
var data;
var frame;
var timerId;
var state;

function init() {
    animateBase = "http://mediasemantics.com/animate?character=CarlaHead&version=1.1&format=png";
    state = "";
    var element = document.getElementById("anim");
    element.style.width = "250px";
    element.style.height = "200px";
    element.style.backgroundImage = "url(" + animateBase + ")";
}

It's convenient to have an 'animateBase' variable that contains all the parameters that are invariant for your application. The meaning of the other variables will become clear in due course.

As a side effect of producing the image strip, the Character API produces some animation data that can be retrieved from the response header "x-msi-animationdata". In the case of action=blink, it produces:

{"images":3, "imageHeight":200, "initialState":"", "frames":[[0],[1],[2],[1],[0]], "finalState":"", "fps":12}

The first thing we need to do is to call the Character API and obtain the image data and the animation data.

function execute(action) {
    savedURL = animateBase + "&action=" + encodeURIComponent(action) + "&state=" + encodeURIComponent(state);
    var xhr = new XMLHttpRequest();
    xhr.open("GET", savedURL, true);
    xhr.addEventListener("load", function() {
        data = JSON.parse(xhr.getResponseHeader("x-msi-animationdata"));
        var preload = new Image;
        preload.onload = function() {animate();};
        preload.src = savedURL;
    }, false);
    xhr.send();
}

Here is the beginning of the animate() code.

function animate() {
    var element = document.getElementById("anim");
    element.style.backgroundImage = "url(" + savedURL + ")";
    element.style.backgroundPosition = "0px 0px";

Next we set the backgroundImage of the actual div to the image we just preloaded. Because of local caching, only the initial XMLHttpRequest would have incurred an actual Character API call, and then only if the image strip was not already cached in the browser cache from an earlier request.

Finishing off animate(), we set the global 'frame' to be 0 and start an interval timer.

    frame = 0;
    timerId = setInterval(function() {animateTick()}, 1000 / data.fps);
    state = data.finalState;
}   

Here is the basic outline for the animateTick() function, which gets called at 12 frames per second.

function animateTick() {
    // exit cases
    if (frame >= data.frames.length)
    {
        clearInterval(timerId);
        animateComplete();
        return;
    }
    // first arg is the image frame to show
    var element = document.getElementById("anim");
    element.style.backgroundPosition = "0px -" + data.frames[frame][0] * data.imageHeight + "px";
    frame++;
}

function animateComplete() {
}

The first thing we do in animateTick() is check if the frame index is still in bounds. If it falls off the end, we clear the interval timer and call animationComplete(). Assuming we haven't completed, we set the backgroundPosition of the div to "scroll" the image strip through the window one frame at a time, indirecting through data.frames[frame] to get the actual image index in the image strip to play. You will recognize this as the same code that we ran in our first blink animation.

Go ahead and try clicking the "blink" link now:



Character State

So far we have dealt only with actions that begin and return to a very special "default" state represented by an empty string. But consider what happens when we run:

<lookleft/>

The data for this action looks like this (note the finalState value):

{"images":7, "imageHeight":200, "initialState":"", "data":[[0],[1],[2],[3],[4],[5],[6]], "finalState":"front,handsbyside,lookleft,mouthnormal,eyesnormal", "fps":12, }

In the code shown in the previous section, we had a global variable 'state' that kept track of the state during the handoff from one strip to another. The variable gets passed into the request as the Character API parameter 'initialState', allowing the API to determine the starting position for the action. This starting position is repeated in the animation data as 'initialState', along with the state 'finalState' that results from performing the action, where it is then harvested and put back into the global "state" variable for use in setting up the next action.

Let's put this together with a simple system of 3 actions.

Now see what happens when you press "blink" if your character is the default state or the looking-left state.

Your application may be fine with always returning to the default state between strips, however this simple mechanism gives you a whole new dimension of control to work with. For example you might implement an "idle controller" that calls execute() periodically with 'blink', and with 'lookleft', 'lookupleft', etc. based on where the user clicks on the page.

You may want to stick with the simple assumption that all strips begin and end with the default state (""). As a convenience, you can add a 'return=true' parameter in your Character API call. This will ensure that, regardless of the final state of the character, it will always return to the default state.



Smooth Interruption

In some cases you will want a user's input to interrupt one action and start another. So lets add an abort() function that will stop the current action.

function abort() {
    // not very smooth!
    clearInterval(timerId);
    init();
}

This would cause the character to instantly "pop" back to its initial state - not very smooth. To do better, we can take advantage of "recover=true". With this option, the Character API generates some additional frames at the end of the frame array that are used to smoothly interrupt an action.

We'll use another global 'stopping'.

var stopping = false;

Our new, improved, abort() function just does this:

function abort() {
    stopping = true;
}

We'll add our 'recover' option:

    animateBase = "http://mediasemantics.com/animate?character=CarlaHead&version=1.1&format=png&recover=true";

Let's set up another action to illustrate.

<a href="javascript:execute('<lookleft/><pause/><lookuser/>')">lookleft, pause, lookuser</a>

The frame data returned is as follows:

[[0],[1],[2],[3],[4],[5],[6,25],[6,25],[6,25],[6,25],[6,25],[6,25],[6,25],[6,25],[6,25],
[6,25],[6,25],[6,25],[6],[5],[4],[3],[2],[1],[0,-1],[6],[5],[4],[3],[2],[1],[0,32],[0,-1]]

When a frame has a second argument, it's meaning is "jump to this frame if you are stopping". The special value -1 in the second argument indicates that this is the end of the animation, and it is needed because there are now multiple places where the animation can stop.

We'll modify our animateTick() as follows:

function animateTick() {
    // exit cases
    if (frame == -1 || frame >= data.frames.length)
    {
        clearInterval(timerId);
        animateComplete();
        return;
    }
    // first arg is the image frame to show
    var element = document.getElementById("anim");
    element.style.backgroundPosition = "0px -" + data.frames[frame][0] * 200 + "px";
    // second arg is -1 if this is the last frame to show, or a recovery frame to go to if stopping early
    if (data.frames[frame][1] == -1)
        frame = data.frames[frame][1];
    else if (stopping && data.frames[frame][1])
        frame = data.frames[frame][1];
    else
        frame++;
}

To see this in action, first press the first link and watch the animation proceed uninterrupted. Then try it again, but immediately press abort().

Not every frame will have recovery information - generally if the character is mid-action, it will wait until the action is complete to recover. You can think of abort() as being a request to "hurry up" the animation. Note that the "recovery=true" option generally does not increase the size of the image strips generated - it just adds a little more information that allows our client-side code to smoothly recover from an interruption.



External Commands

There are times when you will want to synchronize events in your application with key points in your animation. You can use the <cmd/> action to trigger an event that you can handle in your code. You can also specify parameters that are passed to your function as an object.

Consider the following action XML:

<lookleft/><cmd target="1"/><lookuser/>

The attribute can be anything you like, and you can have several of them. The data generated for this command looks like this:

[[0],[1],[2],[3],[4],[5],[6,0,{"target":1}],[5],[4],[3],[2],[1],[0]]

To handle the third frame argument we need one more line in our animateTick() function:

    // third arg is an extensible side-effect string that is triggered when a given frame is reached
    if (data.frames[frame][2])
        embeddedCommand(data.frames[frame][2]);

Now we can implement a simple embeddedCommand() function:

function embeddedCommand(data) {
    console.log(JSON.stringify(data));
}

Let's give this a try. A console log output should appear the instant Carla's head reaches the looking-left state, if you have your Developer window's Console tab open on this page.



Playing Audio

To a first approximation, you add audio to an animation by playing an mp3 file with the same length as the animation.

In order to play audio on a mobile device, you need to take care to load the audio file in direct response to a user event, such as a touch. However it is perfectly okay to pause the audio on the "canplaythrough" event, load some animation, and then restart the audio. Let's illustrate:

Start with a new tag somewhere on your site:

<audio id="audio"></audio>

Now create a new function playAudio():

function playAudio(action, file) {
    var audio = document.getElementById("audio");
    audio.oncanplaythrough = function() {
        audio.pause();
        // Now load the animation
        execute(action);
    }
    audio.src = file;
    audio.play();
}

We can then add this to the end of the animate() function developed before:

function animate(data) {
    // Start the animation...
    
    // Start the audio, if any
    var audio = document.getElementById("audio");
    audio.play();
}

Now let's invoke playAudio() with the following action:

<say>The quick brown fox.</say>

and an audio file that you can listen to here:

Here is the result so far.

As you can see, there is one fatal problem with this arrangement - the lips don't move. Indeed, how can they when the Character API has no knowledge of the audio file that is playing?



The LipSync API

To make the character's mouth move, we first need to analyze the audio file for lip sync information. If you are using pre-recorded audio files, then you can use the LipSync API through this sample. Simply upload your audio file and you will get back a block of phoneme data. When you do this for our sample audio file, we get something like this:

z0kjh+/u+mS7vhPB4YZT2DsBdtd+WE1Gix5UvFFDhBdTIGYRciap0UW+XTHR9Nlk9JaKSxbFKRrLPmKQc50hDkbPh
JQD0Ql4pxmTp49Arujy+6y0dUD1rIk8BR2thEBkj9DZEi+Ba/lp6UmAvo69YcIUUK+CXPFcRZ4ucV+DdDo=

You can think of this as the "lipsync signature" of the audio file. It represents all the data that is needed to produce a lip-synced animation. Now let's add a 'lipsync' parameter to playAudio(), and pass this information through to execute():

function playAudio(action, file, lipsync)
{
    ...
    execute(action, lipsync);
    ...
}

and finally onto the Animate URL inside execute():

function execute(action, lipsync) {
    ...
    savedURL = animateBase + "&action=" + encodeURIComponent(action) + "&state=" + state + "&lipsync=" + encodeURIComponent(lipsync);
    ...
}

Now let's try again, with the lipsync data we obtained above.

Now we see that the lips move appropriately.

For completeness, we will want to modify abort() to immediately stop the audio:

function abort() {
    stopping = true;
    var audio = document.getElementById("audio");
    audio.pause();
}

Note that this section discussed the use of a simpler, but slightly older HTML5 audio API. The Reference Implementation still uses HTML5 audio as a fallback on Internet Explorer, but uses a newer standard called WebAudio for modern browsers.



Using Textures

One limitation of image strips is that you frequently have only one portion of an image that is animating. In our blink strip at the beginning of this tutorial, only a rectangular portion around the eyes change at any given time. A more efficient way to encode animation is to use texture maps.

If you specify &webgl=true on your animate request, you will see that the result is a rectangular image that packs many pieces of character imagery of different sizes. For example the result of running the action

<say><lookleft/>test</say>

looks like this:

Moreover, the control data contains a new section called "recipes", which represents the instructions for creating a given frame. Instead of the first argument of each frame being an index into the image strip, it is now an index into the recipes array:

{..., "frames":[[0],[1],[2],[3],...], ..., "recipes":[[[0,0,0,0,320,240],[92,135,190,240,237,106],[117,41,0,240,190,200]], ..., 
                                                      [[0,0,0,0,320,240],[117,41,0,240,190,200],[173,163,190,389,77,43]]}

A given frame is created by layering several images from the texture file - typically the background, the base of the body, a head, some eyes, a mouth. The bandwidth savings result from reusing images: small mouth or eye images can be pasted on top of the same head image to create several different frames. A recipe is simply an array of layers. Each layer is represented by an array of the form:

[target-x, target-y, source-x, source-y, width, height]

One way to do this composition is to create an HTML5 Canvas the full size of your character, and use the drawImage javascript API to copy each layer from the source coordinates in the texture map to the target coordinates in the canvas. To do this, you might use code such as the following:

var canvas = document.getElementById("myCanvas");
var ctx = canvas.getContext("2d");
ctx.clearRect(0, 0, canvas.width, canvas.height);
var frameRecord = data.frames[frame];
var recipe = data.recipes[frameRecord[0]];
for (var i = 0; i < recipe.length; i++) {
    ctx.drawImage(texture,
        recipe[i][2], recipe[i][3],
        recipe[i][4], recipe[i][5],
        recipe[i][0], recipe[i][1],
        recipe[i][4], recipe[i][5]);
}

If you are developing a game, then you may already use a more sophisticated approach to animation such as WebGL. Modern browsers use GPUs (Graphic Processing Units) to process 2D images efficiently. In particular, the entire texture image is loaded directly into fast GPU memory, and changing frames becomes essentially a matter of changing a few pointers into the texture map.

While you can use WebGL libraries, such as Pixi JS, with the Character API, they are not needed to achieve good performance on most applications.



Using Secondary Textures

We talked about how a practical application will consist of several short episodes of character animation that are looped and switched as needed. Typically each such episode consists of a sentence of text accompanied by one or more actions. The Reference Implementation uses a further technique to improve the reuse of imagery between episodes.

The idea is to load multiple textures instead of just one, however all but the last one are special "named" textures that are known to be highly reusable, and hence more likely to be cached. To obtain a named texture, you use the 'texture' parameter:

http://mediasemantics.com/animate?key=12345678&character=CarlaHead&version=1.1&format=png&webgl=true&texture=LookLeft

Here is the named texture "LookLeft":

Consider what happens when we repeat the command from the previous section with the 'with' parameter:

http://mediasemantics.com/animate?key=12345678&character=CarlaHead&action=test&with=Front,LookLeft

The 'with' parameter tells Character API that you are coming in with the Front and LookLeft named-textures already, and that this imagery should not be included in the resulting image. Here is the result:

Notice how the use of secondary textures allows us to subtract these common images from the result, leaving just the mouths in this case. The resulting animation data still contains complete instructions for assembling the final image. The recipe layer has an additional element that is an index into an array of named textures. This last member is present only for layers that are elided from the image.

[target-x, target-y, source-x, source-y, width, height, named-texture-index]

Assuming that the named textures have been loaded into an array 'namedTextures', the new animation code to account for this might look like this:

for (var i = 0; i < recipe.length; i++) {
    var namedTextureIndex = recipe[i][6];
    if (typeof namedTextureIndex == "number") 
        src = namedTextures[namedTextureIndex];
    else
        src = texture;  // regular texture
    ctx.drawImage(src,
        recipe[i][2], recipe[i][3],
        recipe[i][4], recipe[i][5],
        recipe[i][0], recipe[i][1],
        recipe[i][4], recipe[i][5]);
}

The Reference Implementation leverages secondary textures to optimize the overall bandwidth required.



Next Steps

This tutorial has introduced you to a new way of thinking about character animation.

Throughout this tutorial we have called the Character API 'animate' endpoint directly from client code. When you are ready, please proceed to the next tutorial to learn how to create a Caching Server and access Text-to-Speech.






Copyright © 2020 Media Semantics, Inc. All rights reserved.
Comments? Write webmaster@mediasemantics.com.
See our privacy policy.