Introducing the Character API

The Character API is a RESTful, cloud-based character animation service. Unlike the Character Builder, the Character API is specifically for JavaScript developers, and makes use of cloud computing, edge caching, and either CSS or WebGL animation to deliver a personalized avatar experience to any device, at scale.

Your app uses image strips that are downloaded from the cloud, and then looped and seamlessly switched using simple JavaScript and CSS code. Each strip is fully specified by the parameters of its URL. Strips with new combinations of parameters are created as needed using cloud-computing resources, and then edge-cached for rapid delivery to your application, be it mobile or desktop.

Game developers will recognize an image strip as a special kind of sprite sheet. The Character API can also generate traditional WebGL sprite sheets with extremely efficient packing and playback characteristics. WebGL is described later in the tutorial.

You do not need any other product to use the Character API. The API usage itself is metered at $0.007 per render, with unlimited caching.

The main endpoint of the Character API is 'animate', which is a GET method. The only required parameters, other than the API key, are 'character', 'version' and 'format'.

You can use it to create a still image:

<img src=""/>

Or you can use it to create an "image strip":

<img src=""/>

You can use 'animate' to produce a background image for a div. The CSS 'height' and 'backgroundPosition' attributes make it easy to create a sliding window on the image strip so that you see only one frame at a time:

<div id="anim" style="background-image:url( version=1.1&action=blink&format=png); width:250px; height:200px"> </div>

We set the div's height to 200, which is the height of one frame within the strip, so only the first frame of the strip is showing. Let's write some code to show the rest:

<script> var a = [0, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]; var frame = 0; setInterval(function() { var div = document.getElementById("anim"); = "0px -" + a[frame] * 200 + "px"; frame1++; if (frame == a.length) frame = 0; }, 1000/12); // 12 fps </script>

The array 'a' has a list of frame indices within the strip. We set a timer to run at 12 frames per second. The variable 'frame' is an index into this array. On each tick of the timer, the expression 'a[frame]' tells us which image to display within the strip. We use the backgroundPosition property to slide the strip to the right position. The backgroundPosition property takes an 'x' value and a 'y' value, but the 'x' value is always 0. With a 'y' of 0, we display the first frame (frame 0) of the strip. With a 'y' of -200 we display the second frame, and so forth.

The end result is that you see the character periodically blink:

You will soon see that the actions you provide to a character can be quite a bit more complex than just blinking. In fact, the idea is that you tell the character what you want it to do at a high level, using actions such as Look, Point, and Say. Later in this tutorial we'll show how the Character API can create the frame array you saw in the code above, so that you don't need to.

But first, consider what happens when you switch the 'backgroundImage' attribute of the div. For example we could start our div with a still and then switch it to the blink strip. Notice how the first frame of the blink strip is identical to the still. Character API strips normally start and end in a neutral position precisely so that you can switch between them with no "jumps". Of course it does take time to download the next strip, so in general we always preload it, so that the visual switch can happen in a seamless manner, without any "blank" periods between strips.

The ability to stitch together different image strips as needed is sometimes referred to as "non-linear" media, and is key to allowing characters to react to user events, and present personalized information, such as a stock quote, or tomorrow's weather.

The Character API is cloud-based, and easily scales to meet your traffic needs. You can think of it as a huge collection of character images and image strips that are always at your disposal. Really the images are created "just-in-time", to your requirements, and then cached.

While you pay a fraction of a penny for each call to the Character API, you are free to cache the results on your own server, by implementing a server-based animation cache. This way you only access the Character API when your very first customer views a particular animation, and thereafter the request comes directly from your server's cache. Not only is this allowed, but it is actually the preferred way to use the API for applications involving web services, chatbots, and text-to-speech. By moving the logic that calls the Character API to your own server, you are able to leverage other cloud-based services from different vendors as part of a processing pipeline.

Character animation using image strips is a compromise between several different factors, and as such, it may not be the best solution in all cases, but it tends to perform well given today's distribution of bandwidth, client, and server-based compute power.

This tutorial will show you how to generalize the code shown above to create a simple and efficient client-side engine for loading and playing back image strips, and a server-side engine that will call Text-to-Speech and the Character API on the client's behalf to cache the results. We will explain each line of code, so that you can tailor it to your own application.

As you go through this API, it will be helpful to follow along, but to do so you will need your own API Key.

Design Choices

The Character API includes a catalog of available characters. These characters come in several different styles, from cartoons to realistic figures.

<img src=""> <img src=""> <img src="">

Many of them, including Susan, come in different styles, each with a different camera angle, zoom factor, etc. For example Susan also comes in Bust and Body styles.

<img src=""> <img src="">

Everyone wants their character to look unique, but the reality is that there is a high cost to developing a custom character from first principles. Thankfully, the Character API includes built-in character customization. Many of the stock characters can be reclothed and recolored to create a wide range of effective characters.

For example, let's say you like Susan, but you want her to lose the jacket.

<img src="">

That's nice, but maybe her white shirt disappears into the background on your website. Let's say you want to pick up a color to match a corporate logo:

<img src="">

Media Semantics provides Addons, including Clothes Packs and Hair Packs that work with certain characters to give you an even wider range of appearances.

Maybe on the weekend you make her show up totally casual.

<img src=",ClothesPack2& addonversions=3.0,3.0&foot=sandals1&bottom=jeans1&top=blouse1&over=none&format=png">

Maybe your application calls for a medical professional, a policeman, or a soldier. Simply dial up the right clothing from several Career Packs.

You may be wondering why a version always needs to be specified for each character and addon. To improve edge caching, the cache expiration time on the resulting images is effectively infinite. New versions of a character or an addon are released from time to time, but the old ones are never removed. This allows you to deploy updated characters, which may include updated appearances or behavior, on your own schedule, by incrementing the version number in your calling code.

Just like there is a catalog of characters, there is also a catalog of actions. In general, all characters can perform a range of actions related to the presentation of information - speaking, gesturing, emoting.

Actions that target specific body parts can often be combined. For example here is the "lookleft" action, loaded into a div similar to how we did Carla's blink earlier in this tutorial:

<div id="anim1" style="background-image:url( version=3.0&action=lookleft&format=png); width:500px; height:400px"> </div>

And here is the "gestureleft" action:

<div id="anim1" style="background-image:url( version=3.0&action=gestureleft&format=png); width:500px; height:400px"> </div>

Here are both actions together:

<div id="anim1" style="background-image:url( version=3.0&action=<lookleft/><gestureleft/>&format=png); width:500px; height:400px"> </div>

If you have a single action, you can specify it as a single word, as we have seen so far. But in general, actions and sequences of actions are represented as XML tags. So


means look left, then gesture left, combining the two if possible. Later we will see how the <say/> action can include text and actions aligned to this text.

In can be tedious to specify a return action for each action, or to specify every blink or head movement.

If a character is told to use a hand action such as a <point/>, then it will automatically return to a "hands by side" position after a certain amount of time. Similarly if a character is talking, then it will automatically return to looking at the user after a certain time. With automatic return you normally only need to worry about "pushing" the character in a certain direction, and you can normally count on it returning to the default position in a natural manner.

You can also use the autoaction parameter to allow the character to blink and use subtle head movements while speaking. This parameter takes a number from 0 to 3. The default value is 0, which means that no automatic action is added. The value 1 means that only very simple action, such as blinking, will be added. The value 2 includes blinking and occasional head movement. The value 3 includes blinking and a natural amount of head movement. A character with a higher autoaction value will tend to take more bandwidth.

<img src=""/>

New characters, styles, and addons are constantly being added to the catalog, often by request. Media Semantics, the makers of the Character API, are also able to add custom characters and addons that are tied to a specific API key.

The characters themselves are created using industry-standard 2D and 3D animation tools. Developers who wish to create their own characters, actions, or formats, can do so within the framework of the Character Builder and Character Server products and deploy their solutions to their own server.

For a detailed catalog of characters and their actions, please see the characters page.

Bandwidth Considerations

The image strip approach to character animation is sensitive to bandwidth. There are several choices that you can make that directly affect the size of the image strips, namely Compression Type, Frame Size, and Number of Frames.

Compression Type

Images strips are coded using either the PNG or the JPEG format, as determined by the 'format' parameter on the Animate URL. Which you choose depends on a couple factors. PNG animation includes transparency, while JPEG does not. So if your application includes a character that appears over top of other content on your website, then you must use PNG. PNG tends to be very efficient for cartoon characters, because of the long runs of solid color. Being a lossless format, it will faithfully capture every pixel.

On the other hand, JPEG is especially good at coding realistic characters. The JPEG format comes with a quality parameter - if not specified the value 75 is used. While the bandwidth can be reduced by lowering the quality, the result will be more artifacts, particularly when you do have a solid run of color. Furthermore, if you are switching from one image strip to another with non-linear media, bear in mind that the switch from the last frame of one strip to the first frame of another strip may result in a slight difference in these artifacts, as the JPEG algorithm uses different tradeoffs for different image strips. These differences tend to be hard to notice in practice, in particular for higher compression quality factors.

A more significant consideration with JPEG is what to do with the background, since JPEG images are always fully opaque. By default the background on a jpeg strip will be white. You can provide a solid color with the 'backcolor' attribute.

<img src=""/>

You can also do a simple vertical gradient:

<img src=" backcolor1=0060CC&backcolor2=003366&format=jpeg"/>

Finally you can specify the URL for an image to use as a background using 'backimage'. The image can be either PNG or JPEG format, and should be the exact same size as the size of a single frame in your strip. You provide the url to the background as a parameter to the Animate URL. For example let's say you had the following background url:

<img src=""/>

Since this information contains colons and slashes, we'll need to use the JavaScript escape function to pack it into our url.

<img id="anim1"> <script> var img = document.getElementById("anim1"); img.src = "" + encodeURIComponent(""); </script>

By carefully selecting a background image, you can make your character blend in naturally with your site while still being able to take advantage of the better compression afforded by the jpeg format.

If you use an image, be sure to place it in a publicly accessible location. It will be consulted by the Character API each time it needs to generate a new strip.

IMPORTANT: If you want to change the background image, please upload your new file under a different name, and use that new name in the 'backimage' parameter. In other words, the Character API assumes that the image at a given URL is invariant, and can be cached indefinitely.

Frame Size

Clearly the size of each frame in your strip has a direct impact on the size of the strip in kilobytes. Unless otherwise specified, the frame width and height is a standard size that matches the style of the character. For example, headshots tend to be 250x200 pixels.

You can specify a different frame size using the 'width' and 'height' attributes.

<img src=" version=3.0&format=png&width=125&height=150">

As you can see, the character does not resize - all we are doing is cropping the image. With this tighter cropping you will normally need to adjust the position of the character to center it correctly. You can do this with 'charx' and 'chary', the horizontal and vertical offset to be applied to the character. These are values in pixels, with positive y moving the character down within the frame.

<img src=" format=png&width=125&height=150&charx=-60&chary=-30">

Cropping down a character using width, height, charx, and chary is a great way to achieve bandwidth savings, but it is also an important consideration in determining how much real-estate the character takes on your page. The default size of a full body character is 500x400, which is very wide. This can be cropped substantially, but bear in mind that some actions, such as pointing, run the risk of being cut off. If you crop a body character to just the head, then you will want to avoid a point action altogether, as it won't be seen.

One thing to bear in mind is that the entire image strip can be scaled up or down. Modern browsers are very good at scaling images, however the results are only as good as the pixel density you start out with. You can take a 250x200 pixel image and scale it down to half size and it will still look great. But there is little point in doing this, since you are downloading more information than you need. Likewise, you can scale the cropped image above to double its effective size with NO loss in bandwidth, but the results may look a little blurry.

<img src=" format=png&width=125&height=150&charx=-60&chary=-30" style="width:250px">

For this reason we normally recommend that you not scale the image strip, but use other techniques to achieve the desired size and bandwidth tradeoffs.

Some characters, notably cartoon characters, allow you to use the 'charscale' attribute to specify a scale. Consider the Charlie character. The default frame size is 307x397.

<img src=""/>

Think of the 'charscale' factor as scaling up the entire character, default frame size and all. So the frame size for

<img src=""/>

is 460x595. You can then further crop this down to the head with a smaller frame size, as we saw before. Characters that support a 'charscale' are re-rendered at the server so as to provide you with maximum amount of detail.

Number of Frames

Clearly the size of the image strip is directly related to the number of frames in the strip. Recall how we used an array to encode the actual sequence of images strip indices in our blink animation:

<script> var a = [0, 1, 2, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]; </script>

At the default rate of 12 frames per second, this allowed us to achieve 15 frames, or about 1 1/4 seconds worth of animation out of a strip with 3 frames. We furthermore looped the sequence to get an indefinite length of animation.

Consider a strip of a character talking. In a cartoon, it is acceptable for a character's head to move very little, or not at all, as the mouth forms different words. Since there are only about 9 different positions (visemes), the resulting image strip might have as few as 9 images, even if the character speaks for several minutes. In practice it will be more, since the character will need to blink occasionally. On the other hand, when we have a realistic character talking, we expect to see her head move frequently while speaking. So the complexity of the action, including the amount of automatic action you allow, has a direct impact on the total height of the image strip.

Clearly you wouldn't want to put an hour of speech into a single strip - the trick is to break it down into separate image strips that are then stitched together. This lets you overlap the playback of one image strip with the creation of the next. It also lets you play image strips at random, or based on the user's actions. If you succeed, your user will soon get the impression that the character has a "life of its own". But to achieve this we need a better playback engine.

How can WebGL help with bandwidth?

We will see a little later in this tutorial how WebGL can result in very efficient playback on modern hardware. The webgl=true option causes the Character API to pack the images more densely, but at the expense of requiring a little more complexity to decode. The advantages become clear with larger, more complex animations. Here is the result of webgl=true on a script with some speech and some subtle head movement:

We will stick with simple image strips to develop most concepts in this tutorial, and we will finish up with a proper treatment of WebGL in the final section.

Improving the Playback Engine

Let's take advantage of everything we've learned to create a more general playback engine for use with the Character API.

Start with a div to contain our animation:

<div id="anim"></div>

Now let's add a link that will cause the character to perform an action:

<a href="javascript:execute('<blink/>')">blink</a>

Now some code to set up some global variables and the frame size and initial image on the div:

<script> init(); var animateBase; var savedURL; var data; var frame; var timerId; var state; function init() { animateBase = ""; state = ""; var element = document.getElementById("anim"); = "250px"; = "200px"; = "url(" + animateBase + ")"; }

It's convenient to have an 'animateBase' variable that contains all the parameters that are invariant for your application. The meaning of the other variables will become clear in due course.

As a side effect of producing the image strip, the Character API produces some animation data that can be retrieved from the response header "x-msi-animationdata". In the case of action=blink, it produces:

{"images":3, "imageHeight":200, "initialState":"", "frames":[[0],[1],[2],[1],[0]], "finalState":"", "fps":12}

The first thing we need to do to switch to this action is to call the Character API and obtain both the image data and the animation data. To do this, we use a combination of XMLHttpRequest and a throwaway image used for preloading.

function execute(action) { savedURL = animateBase + "&action=" + encodeURIComponent(action) + "&state=" + encodeURIComponent(state); var xhr = new XMLHttpRequest();"GET", savedURL, true); xhr.addEventListener("load", function() { data = JSON.parse(xhr.getResponseHeader("x-msi-animationdata")); var preload = new Image; preload.onload = function() {animate();}; preload.src = savedURL; }, false); xhr.send(); }

The details are as follows. We use XMLHttpRequest to load both the strip and the animation data from the Character API. The actual image data is thrown out, but it is stored in the browser cache. We then immediately assign the exact same URL (savedURL) to the preload image. Since the image is in the browser cache, the preload is almost instantaneous, and no additional API call is incurred. When the preload completes, we call animate() to do the switch.

Here is the beginning of the animate() code.

function animate() { var element = document.getElementById("anim"); = "url(" + savedURL + ")"; = "0px 0px";

To switch the strip, we set the backgroundImage of the actual div to the image we just preloaded. Again, only the initial XMLHttpRequest would have incurred an actual Character API call, and then only if the image strip was not already cached in the browser cache from an earlier request.

Finishing off animate(), we set the global 'frame' to be 0 and start an interval timer.

frame = 0; timerId = setInterval(function() {animateTick()}, 1000 / data.fps); state = data.finalState; }

Here is the basic outline for the animateTick() function, which gets called at 12 frames per second.

function animateTick() { // exit cases if (frame >= data.frames.length) { clearInterval(timerId); animateComplete(); return; } // first arg is the image frame to show var element = document.getElementById("anim"); = "0px -" + data.frames[frame][0] * data.imageHeight + "px"; frame++; } function animateComplete() { }

The first thing we do in animateTick() is check if the frame index is still in bounds. If it falls off the end, we clear the interval timer and call animationComplete(). Assuming we haven't completed, we set the backgroundPosition of the div to "scroll" the image strip through the window one frame at a time, indirecting through data[frame] to get the actual image in the image strip to play. You will recognize this as the same code that we ran in our first blink animation.

Go ahead and try clicking the "blink" link now:

Character State

So far we have dealt only with actions that begin and return to a very special "default" state represented by an empty string. But consider what happens when we run:


The data for this action looks like this (note the finalState value):

{"images":7, "imageHeight":200, "initialState":"", "data":[[0],[1],[2],[3],[4],[5],[6]], "finalState":"front,handsbyside,lookleft,mouthnormal,eyesnormal", "fps":12, }

In the code shown in the previous section, we had a global variable 'state' that kept track of the state during the handoff from one strip to another. The variable gets passed into the request as the Character API parameter 'initialstate', allowing the API to determine the starting position for the action. This starting position is repeated in the animation data as 'initialState', along with the state 'finalState' that results from performing the action, where it is then harvested and put back into the global "state" variable for use in setting up the next action.

Let's put this together with a simple system or 3 actions.

Now see what happens when you press "blink" if your character is the default state or the looking-left state.

Your application may be fine with always returning to the default state between strips, however this simple mechanism gives you a whole new dimension of control to work with. For example you might implement an "idle controller" that calls execute() periodically with 'blink', and with 'lookleft', 'lookupleft', etc. based on where the user clicks on the page.

You may want to stick with the simple assumption that all strips begin and end with the default state (""). As a convenience, you can add a 'return=true' parameter in your Character API call. This will ensure that, regardless of the final state of the character, it will always return to the default state.

Smooth Interruption

In some cases you will want a user's input to stop one action and start another. So lets add an abort() function that will stop the current action.

function abort() { // not very smooth! clearInterval(timerId); init(); }

This would cause the character to instantly "pop" back to its initial state - not very smooth. To do better, we can take advantage of "recover=true". With this option, the Character API generates some additional frames at the end of the frame array that are used to smoothly interrupt an action.

We'll use another global 'stopping'.

var stopping = false;

Our new, improved, abort() function just does this:

function abort() { stopping = true; }

We'll add our 'recover' option:

animateBase = "";

Let's set up another action to illustrate.

<a href="javascript:execute('<lookleft/><pause/><lookuser/>')">lookleft, pause, lookuser</a>

The frame data returned is as follows:

[[0],[1],[2],[3],[4],[5],[6,25],[6,25],[6,25],[6,25],[6,25],[6,25],[6,25],[6,25],[6,25], [6,25],[6,25],[6,25],[6],[5],[4],[3],[2],[1],[0,-1],[6],[5],[4],[3],[2],[1],[0,32],[0,-1]]

When a frame has a second argument, it's meaning is "jump to this frame if you are stopping". The special value -1 in the second argument indicates that this is the end of the animation, and it is needed because there are now multiple places where the animation can stop.

We'll modify our animateTick() as follows:

function animateTick() { // exit cases if (frame == -1 || frame >= data.frames.length) { clearInterval(timerId); animateComplete(); return; } // first arg is the image frame to show var element = document.getElementById("anim"); = "0px -" + data.frames[frame][0] * 200 + "px"; // second arg is -1 if this is the last frame to show, or a recovery frame to go to if stopping early if (data.frames[frame][1] == -1) frame = data.frames[frame][1]; else if (stopping && data.frames[frame][1]) frame = data.frames[frame][1]; else frame++; }

To see this in action, first press the first link and watch the animation proceed uninterrupted. Then try it again, but immediately press abort().

Not every frame will have recovery information - generally if the character is mid-action, it will wait until the action is complete to recover. You can think of abort() as being a request to "hurry up" the animation. Note that the "recovery=true" option generally does not increase the size of the image strips generated - it just adds a little more information that allows our client-side code to smoothly recover from an interruption.

External Commands

There are times when you will want to synchronize events in your application with key points in your animation. You can use the <cmd/> action to trigger an event that you can handle in your code. You can also specify parameters that are passed to your function as an object.

Consider the following action XML:

<lookleft/><cmd target="1"/><lookuser/>

The attribute can be anything you like, and you can have several of them. The data generated for this command looks like this:


To handle the third frame argument we need one more line in our animateTick() function:

// third arg is an extensible side-effect string that is triggered when a given frame is reached if (data.frames[frame][2]) embeddedCommand(data.frames[frame][2]);

Now we can implement a simple embeddedCommand() function:

function embeddedCommand(data) { console.log(JSON.stringify(data)); }

Let's give this a try. A console log output should appear the instant Carla's head reaches the looking-left state, if you have your Developer window's Console tab open on this page.

Playing Audio

To a first approximation, you add audio to an animation by playing an mp3 file with the same length as the animation.

In order to play audio on a mobile device, you need to take care to load the audio file in direct response to a user event, such as a touch. However it is perfectly okay to pause the audio on the "canplaythrough" event, load some animation, and then restart the audio. Let's illustrate:

Start with a new tag somewhere on your site:

<audio id="audio"></audio>

Now create a new function playAudio():

function playAudio(action, file) { var audio = document.getElementById("audio"); audio.oncanplaythrough = function() { audio.pause(); // Now load the animation execute(action); } audio.src = file;; }

We can then add this to the end of the animate() function developed before, right after starting the interval:

function animate(data) { ... // Start the audio, if any var audio = document.getElementById("audio");; }

Now let's invoke playAudio() with the following action:

<say>The quick brown fox.</say>

and the audio file, which you can listen to here:

Here is the result so far.

As you can see, there is one fatal problem with this arrangement - the lips don't move. Indeed, how can they when the Character API has no knowledge of the audio file that is playing?

The LipSync API

To make the character's mouth move, we first need to analyze the audio file for lip sync information. If you are using pre-recorded audio files, then you can use the LipSync API through this form. Simply upload your audio file and you will get back a block of phoneme data. When you do this for our sample audio file, we get something like this:

z0kjh+/u+mS7vhPB4YZT2DsBdtd+WE1Gix5UvFFDhBdTIGYRciap0UW+XTHR9Nlk9JaKSxbFKRrLPmKQc50hDkbPh JQD0Ql4pxmTp49Arujy+6y0dUD1rIk8BR2thEBkj9DZEi+Ba/lp6UmAvo69YcIUUK+CXPFcRZ4ucV+DdDo= You can think of this as the "lipsync signature" of the audio file. It represents all the data that is needed to produce a lip-synced animation. Now let's add a 'lipsync' parameter to playAudio(), and pass this information through to execute(): function playAudio(action, file, lipsync) { ... execute(action, lipsync); ... } and finally onto the Animate URL inside execute(): function execute(action, lipsync) { ... savedURL = animateBase + "&action=" + encodeURIComponent(action) + "&state=" + state + "&lipsync=" + encodeURIComponent(lipsync); ... }

Now let's try again, with the lipsync data we obtained above.

Now we see that the lips move appropriately.

For completeness, we will want to modify abort() to immediately stop the audio:

function abort() { stopping = true; var audio = document.getElementById("audio"); audio.pause(); }

A Simple Caching Server

So far we have used only client-side processing. But chances are your application already talks to an existing server. Why not let that server call the Character API on the browser's behalf and cache the results? Not only would this lead to reduced usage of the Character API, but, as we will see in the next section, it would enable additional server-side processing to occur, such as Text-to-Speech.

Here we will use nodejs and express to create an API running on our own server. The simplest server, "server.js", looks something like:

var express = require('express'); var app = express(); app.get('/animate', function (req, res) { res.setHeader('content-type', 'text/plain'); res.send("hello world"); }); app.listen(3000, function() { console.log('Listening on port 3000'); });

This is all you need to set up an API endpoint called 'animate' that does nothing more than return the string "hello world". To set this up you will normally create a new directory 'animate', make it the current directory, and then initialize it with nodejs:

$ mkdir animate $ cd animate $ npm init $ npm install express $ npm install request

It is often convenient to host your node application within Apache by way of a ProxyPass entry in your Apache config file, /etc/httpd/conf/httpd.conf.

ProxyPass /node http://localhost:3000

You can now run your server using

node server.js

and access your service with

Let's have a cache directory located in the folder containing server.js. You will want to give it write permission for web users, typically with

sudo chgrp apache cache sudo chmod g+w cache

assuming you are using apache to host your node server.

The first thing we do when we receive the request is to create a 16 character hash from the information in the parameters. For all intents and purposes this is a unique signature for this request.

var express = require('express'); var fs = require('fs'); var request = require('request'); app.get('/animate', function (req, res) { var crypto = require('crypto'); var hash = crypto.createHash('md5'); for (var key in req.query) hash.update(req.query[key]);

Use this hash to create some filenames in the cache directory:

var filename = hash.digest("hex"); if (!req.query.format) req.query.format = "png"; var imageFile = './cache/' + filename + '.' + req.query.format; var dataFile = './cache/' + filename + '.js';

For now let's put a placeholder to deal with the case where there is a cache miss:

if (!fs.existsSync(imageFile)) { // TODO fetch image and data file from Character API } else { finish(res, req.query.format, imageFile, dataFile); } });

We can use finish() to return the format that was requested, along with the 'x-msi-animationdata' response header.

function finish(res, format, imageFile, dataFile) { var data = fs.readFileSync(dataFile, "utf8"); res.setHeader('x-msi-animationdata', data); res.setHeader('content-type', 'image/' + format); res.setHeader('Cache-Control', 'max-age=31536000, public'); // 1 year fs.createReadStream(imageFile).pipe(res); }

To fill in the placeholder, we proceed as follows:

if (!fs.existsSync(imageFile)) { var urlAnimate = ""; var newquery = { key : "123" }; // any other fixed parameters can go here, and the client can omit them for (var key in req.query) if (!newquery[key]) newquery[key] = req.query[key]; request.get({url:urlAnimate, qs: newquery, encoding: null}, function(err, httpResponse, body) { if (httpResponse.statusCode == 404) {res.setHeader('content-type', 'text/plain'); res.write(body); res.end(); return;} fs.writeFile(dataFile, httpResponse.headers["x-msi-animationdata"] || "", "binary", function(err) { fs.writeFile(imageFile, body, "binary", function(err) { finish(res, req.query.format, imageFile, dataFile); }); }); }); }

Here we use the request module to call the Character API, passing in the parameters from the original request. When the result comes back, we pull out the 'x-msi-animationdata' header and write it to the data file, then write the image data to the image file. Then, having populated the cache, we call the same finish() function that we used in the cache hit case.

On the client side, the only thing you need to do is change your API domain in "animateBase' from to your new server. Your new API perfectly mimics the Character API. The only difference is you can now withhold the Character API key from animate requests, as this is now filled in by your server code. In fact the smart thing to do is to make your version of the API less open-ended by fixing some parameters, such as the character name, by moving them out of your client code and into the server code. This makes your API endpoint more specific to your application, and less of a target for Cross Site Request Forgery attacks.

Adding Text-to-Speech

While you might have a Text-to-Speech engine installed directly on your server, many developers simply use another cloud service. Media Semantics does not provide a Text-to-Speech method, however there are several vendors who do. In particular, the Amazon Polly service allows you to cache the resulting audio files, which is essential for our purposes.

Here is a version of play() that takes only a single action parameter.

function playTTS(action) { var audio = document.getElementById("audio"); audio.oncanplaythrough = function() { audio.pause(); // Now load the animation execute(action); } savedURL = animateBase + "&action=" + encodeURIComponent(action) + "&state=" + state + "&audio=true"; audio.src = savedURL;; }

Notice that we copy the line that computes savedURL from execute(), but we add an 'audio=true' parameter, to indicate that the audio file is being requested. We then pass this URL as the source for the audio object.

Next, on the server, we generate an audio filename in addition to our image and data files, taking care to ignore the new audio parameter for the purposes of computing the hash:

var hash = crypto.createHash('md5'); for (var key in req.query) if (key != "audio") // we want urls with and without audio=true to hash to the same string hash.update(req.query[key]); var filename = hash.digest("hex"); if (!req.query.format) req.query.format = "png"; var audioFile = './cache/' + filename + '.mp3'; var imageFile = './cache/' + filename + '.' + req.query.format; var dataFile = './cache/' + filename + '.js';

Our finish() function now needs to return either an audio file or an image file plus animation data.

function finish(res, audio, format, audioFile, imageFile, dataFile) { if (audio) { res.setHeader('content-type', 'audio/mpeg'); fs.createReadStream(audioFile).pipe(res); } else { fs.readFile(dataFile, "utf8", function(err, data) { res.setHeader('x-msi-animationdata', data); res.setHeader('content-type', 'image/' + format); res.setHeader('Cache-Control', 'max-age=31536000, public'); // 1 year fs.createReadStream(imageFile).pipe(res); }); } }

Now there will be two different url requests coming from client: an 'audio=true' request, followed by a regular image request. The image request does not start until after the audio request is complete and at least some of the audio data is downloaded, so when the image request comes through, it will see a cache hit and will get filled almost instantly.

To set up our call to Amazon Polly, we need to add the following logic at the top of the file (substitute your own API keys as appropriate):

var AWS = require('aws-sdk'); var awsConfig = require('aws-config'); AWS.config = awsConfig({ region: 'us-east-1', maxRetries: 3, accessKeyId: 'xxxxxxxxxxxxxx', secretAccessKey: 'xxxxxxxxxxxxxx', timeout: 15000 });

Here, then, is the new cache logic:

if (!fs.existsSync(imageFile)) { var textOnly = req.query.action.replace(new RegExp("<[^>]*>", "g"), "").replace(" "," "); // e.g. <say>Look <cmd/> here.</say> -> Look here. var polly = new AWS.Polly(); var pollyData = { OutputFormat: 'mp3', Text: textOnly, VoiceId: "Joanna" }; polly.synthesizeSpeech(pollyData, function (err, data) { fs.writeFile(audioFile, data.AudioStream, function (err) { // now lipsync it var urlLipSync = ''; var lipSyncData = { transcript: textOnly, audio: data.AudioStream.toString('base64'), key: "12345678" };{url:urlLipSync, formData: lipSyncData}, function(err,httpResponse,body){ // pass the lipsync result to animate. var urlAnimate = ""; var newquery = { lipsync: body, key: "12345678" }; for (var key in req.query) if (!newquery[key]) newquery[key] = req.query[key]; request.get({url:urlAnimate, qs: newquery, encoding: null}, function(err, httpResponse, body) { if (httpResponse.statusCode == 404) {res.setHeader('content-type', 'text/plain'); res.write(body); res.end(); return;} fs.writeFile(dataFile, httpResponse.headers["x-msi-animationdata"] || "", "binary", function(err) { fs.writeFile(imageFile, body, "binary", function(err) { finish(res,, req.query.format, audioFile, imageFile, dataFile); }); }); }); }); }); }); } else { finish(res,, req.query.format, audioFile, imageFile, dataFile); }

Think of the cache miss logic as a pipeline. The first stage is Text-to-Speech, and results in an audio file, which is saved away. The second stage is LipSync, which results in the lipsync data, which is simply passed on to the Animate stage. The resulting image and control data are then written to their respective files. After the cache-miss logic completes, we can simply call finish(), to fulfill the request from the newly-cached data.

Let's see how this works with:


In the case of the Amazon Polly service, there is a feature called Speech Marks which lets you download the "viseme", or lipsync, information for a voice request. To get both audio and Speech Mark information, you'll need to call the Polly API twice with the exact same text, once requesting an output in mp3 format, and once requesting an output in JSON format.

// Call Polly as before var polly = new AWS.Polly(); var pollyData = { OutputFormat: 'mp3', Text: textOnly, VoiceId: "Joanna" }; polly.synthesizeSpeech(pollyData, function (err, data) { // And save it away fs.writeFile(audioFile, data.AudioStream, function (err) { // But now switch the output to json with visemes pollyData.OutputFormat = 'json'; pollyData.SpeechMarkTypes = ['viseme']; polly.synthesizeSpeech(pollyData, function (err, data) { // Zipping up the result gives you lipsync signature just like the LipSync api does var zip = new require('node-zip')(); zip.file('lipsync', data.AudioStream); var dataZipBase64 = zip.generate({base64:true,compression:'DEFLATE'}); // Now you can pass dataZipBase64 as the animate lipsync param...

If you are using AWS Polly voices, then you will find that requesting Speech Marks results in slightly better quality and performance than the LipSync API. This makes sense, since the phoneme data is an intermediate step in the Speech Synthesis process. Not all speech vendors expose phoneme information, but you should use it where available.

Using WebGL

Modern browsers use GPUs (Graphic Processing Units) to process 2D images efficiently. While not all browsers support WebGL, most do. In this tutorial we will use PixiJS, which simplifies the process of adding WebGL and will automatically fall back to HTML5 Canvas on those browsers that don't support it.

One limitation of WebGL is that you must dedicate a rectangular area as the "stage", and that stage cannot be transparent. You can still use the backimage parameter to add a background image that makes the stage appear to blend in with the surrounding site.

You will need to load PixiJS 4.0, available from <script src="pixi.min.js"></script> First you create a renderer and add it to your document. var renderer = PIXI.autoDetectRenderer(250, 200); document.body.appendChild(renderer.view);

You could also add the renderer to a div. At first the renderer is black - as a refinement, you can hide it until the first time you render the stage. You will need to add the &wegbl=true parameter to your animateBase:

animateBase = "... &webgl=true& ...";

We add these globals:

var stage = new PIXI.Container(); var texture; var n = 0; We replace our execute() function with the following version: function execute(action) { savedURL = animateBase + "&action=" + encodeURIComponent(action) + "&state=" + encodeURIComponent(state); var xhr = new XMLHttpRequest();"GET", savedURL, true); xhr.addEventListener("load", function() { data = JSON.parse(xhr.getResponseHeader("x-msi-animationdata")) // use the pixijs Loader to load the image as a texture ++n; PIXI.loader.add("image"+n, savedURL).load(function() { texture = PIXI.loader.resources["image"+n].texture; animate(); }); }, false); xhr.send();

The logic is the same, except that rather than use a new Image object to load the image, we use PIXI's loader. Once again, we appear to call the Character API twice, once to get the animation data and once to get the image, but the second call is fulfilled from the browser cache. Let's turn now to the animate() and animateTick() functions.

function animate() { frame = 0; timerId = setInterval(function() {animateTick(data)}, 1000 / data.fps); } function animateTick() { // exit case as before // first arg is the image frame to show var recipe =[data.frames[frame][0]]; stage.removeChildren(); for (var i = 0; i < recipe.length; i++) { var sprite = new PIXI.Sprite(new PIXI.Texture(texture, new PIXI.Rectangle(recipe[i][2], recipe[i][3], recipe[i][4], recipe[i][5]))); sprite.x = recipe[i][0]; sprite.y = recipe[i][1]; stage.addChild(sprite); } renderer.render(stage); // optional treatment of second and third args as before frame++; }

With webgl=true, the control data contains a new section called "recipes", which represents the instructions for creating a given frame. Instead of the first argument of each frame being an index into the image strip, it is now an index into the recipes array:

{..., "frames":[[0],[1],[2],[3],...], ..., "recipes":[[[0,0,0,0,320,240],[92,135,190,240,237,106],[117,41,0,240,190,200]], ..., [[0,0,0,0,320,240],[117,41,0,240,190,200],[173,163,190,389,77,43]]}

A given frame is created by layering several images from the sprite sheet - typically the background, the base of the body, a head, some eyes, a mouth.

The bandwidth savings result from reusing images: small mouth or eye images can be pasted on top of the same head image to create several different frames. A recipe has a series of steps. Each step is represented by an array of the form:

[target-x, target-y, source-x, source-y, width, height]

With WebGL it is incredibly efficient to do this type of image assembly: each subimage is a sprite, which is merely a pointer to the actual bits in the sprite sheet. No actual bit copying happens until the renderer.render(stage) command, where the composition is done using GPU hardware where possible.

Wrapping Up

This tutorial has introduced you to a new way of thinking about character animation.

  • The Character API enables characters that are continuous, non-linear, and fully reactive to user input, through a simple CSS image strip mechanism, or a more efficient WebGL sprite sheet mechanism.

  • Cloud computing and edge caching let you build your application as if all possible images are available at any time from the cloud. Since an image is fully specified by its URL parameters, novel combinations are created on-the-fly using cloud computing, and then cached at multiple levels for rapid delivery to your app.

  • The mechanism works well with modern mobile and desktop browsers, and provides a good tradeoff between local compute power and bandwidth.

  • The Character API is simple, yet flexible. This tutorial shows how you can call the Character API directly from your client code, or within your own server API as part of a pipeline of cloud services from different vendors.

We can't wait to see what you build with the Character API!

Please send questions and comments to

Copyright © 2017 Media Semantics, Inc. All rights reserved.
Comments? Write See also our privacy policy.

Image 01