Chatbot API Tutorial

The 'reply' API is a cloud-based chatbot engine. It can be used alone for text-based chat, or in conjunction with the Character API's 'animate' api, to build animated characters that answer questions and lead conversations using voice and gestures. The 'reply' endpoint is the same API that is used by the Chatbot module in the Character Builder service.

As with 'animate' and 'lipsyc', the 'reply' API is completely stateless. Each request contains the text input to respond to, data about the conversation state, and the url of a rulebase. The response consists of the text output, and an updated version of the conversation state data.

The rulebase is simply a JSON file containing rules. It is possible to edit this file directly using a text editor, or to generate the file automatically using your own tooling. However the preferred method for creating, testing, and publishing new rulebases is to use the Minds and Models tabs within the Character Builder service.

When you publish a Mind in Character Builder, you are actually creating a rulebase file in the cloud. This file can be located anywhere, as long as it is accessible on the public internet by a url. You can pass the URL as-is to the 'reply' endpoint, or you can first copy the file to your own server if you prefer.

When a "reply" request is serviced, it will first see if it has the rulebase is in its cache. If not, it will load the file from the provided url. Despite their large size, loading a rulebase for the first time is generally less than a second. The rulebase remains cached on a Chatbot API server, further ensuring that each subsequent reply request is processed rapidly (generally within 100 ms). You will typically have several clients chatting with the same rulebase at once, however each conversation remains independent, because all of the conversation state is represented by the data object that arrives with each request and returns, slightly modified, along with the response.



Creating a rulebase

To get started, we'll use the Character Builder to create a Mind for us to use. If you do not already have a Character Builder account, you will need to create one now. The quickest way to get started is to create a new Chatbot module - this will create a Mind with default parameters.

On the Minds tab, select the mind you would like to use and press Publish. You will find the URL for this file in a field "Chatbot API URL" near the bottom of the screen, when you Edit a Mind. This URL will change each time you Publish a Mind or a Model. Rulebases are assumed to be invariant for a given URL. If you change your rulebase, then you will want to give it a new URL, in order to avoid using a cached copy of your mind that the Chatbot API servers may be running. When you Publish a mind in Character Builder, the URL for the previously published mind will remain valid for a period of time to give you a chance to switch your calling code over to the new URL.

You will need a valid subscription in order to use any of the Publish features on Character Builder. If you are not ready to create a rulebase at this time, you can use the following sample rulebase url:

http://s3-us-west-2.amazonaws.com/samples.mediasemantics.com/sample-5-6-20.json


Calling the api from curl

We can call 'reply', a POST method, using a tool such as Postman, or curl.

$ curl -s -d key=12345678 -d input="hello" -d url="http://s3-us-west-2.amazonaws.com/samples.mediasemantics.com/sample-5-6-20.json" http://mediasemantics.com/reply | json_pp
{
   "output" : "[head-nod] Hello.",
   "data" : {
      "world" : "HelloInput(i1)",
      "history" : "> hello\n[head-nod] Hello.\n"
   }
}

The API returns a JSON object. We started this conversation without a data parameter, which is equivalent to using -d data="{}", i.e. an empty JSON object. The call returned with a state object containing the state of the newly started conversation. To continue this conversation you would pass the same object back on your next call to 'reply' using the previous call's returned 'data' parameter.



A client-based implementation

In order to maintain a conversation, you will need to store this state somewhere - one solution is at the client. To see this, here is a simple single-page HTML app that lets you converse with the bot:

<!DOCTYPE html>
<html>
<head>
</head>
<body>

<div id="div1">
</div>
<br>
<input id="input1" type="text" autofocus onkeypress="keypress()"><button onclick="submit()">Submit</button>

<script>
var data = {};
function submit() {
    // Grab the user input and clear the input box
    var input = document.getElementById('input1').value;
    document.getElementById('input1').value = "";
    
    // Add the input to the transcript
    var div = document.getElementById('div1')
    div.innerHTML += "> " + input + "<br>";
    
    // Call the reply API
    var xhr = new XMLHttpRequest();
    xhr.open('POST', "http://mediasemantics.com/reply", true);
    xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
    xhr.onload = function () {
        var o = JSON.parse(xhr.response);
        // Add the output to the transcript
        div.innerHTML += o.output + "<br>";
        // Save the data for the next round
        data = o.data;
    };
    var key = "<your keycode here>";
    var rulebase = "<your rulebase url here>";
    xhr.send("key=" + key + "&url=" + encodeURIComponent(rulebase) + "&data=" + encodeURIComponent(JSON.stringify(data)) + "&input=" + encodeURIComponent(input));
}

function keypress(e) {
    if (event.keyCode == 13)
        submit();
}
</script>

</body>
</html>

Because it calls a web api, you should run this from a localhost web server.

The result should look something like this:



A server-based implementation

While you can call the api directly from a client, we normally recommend that you call it from a server application that you manage, to provide authorization, session detection, and storage. In an animated virtual person application, this is typically the same server as the caching server.

This tutorial is accompanied by a Reference Implementation that includes the html client above, modified to talk to a simple Node.js server using Express. The Reference server.js file includes its own 'reply' endpoint, but it is a GET method. It does not require an API key, and it replaces the data object with the userid.

http://<your domain here>/reply?input=hello&userid=<your userid here>

The return contains the response:

{"output":"[head-nod] Hello."}

If your application includes a login, then you will already have a user id that you can use as a key to store the conversation state. If you do not require a login, then you can generate a random user id that can be used for the duration of the client session, to identify that user. You can even save this random userid in a session cookie, so that you can allow the user to carry on a continuous conversation even when they navigate to another page on your site. The Reference Implementation shows an example of how a userid might be generated randomly.

A very simple implementation can use the file system to store the data object returned for the first call, and then reload it again on subsequent calls. It uses this data to pass to the Chatbot API's respond api, adding the API key and the rule file url.

var express = require('express');
var bodyParser = require('body-parser');
var fs = require('fs');
var request = require('request');

app.get('/reply', function(req, res, next) {
    if (!req.query.userid) throw new Error("missing userid");
    if (!req.query.input) throw new Error("missing input");
    
    // Load the data for this user from disk, if available
    let filename = "./conversations/" + req.query.userid + '.json';
    fs.readFile(filename, "utf8", function(err, raw) {
        let data;
        if (err) 
            data = {}; // Otherwise use empty data
        else 
            data = JSON.parse(raw);
        
        // Call Chatbot API
        let params = {
            key:"<your keycode here>",
            input:req.query.input,
            data:JSON.stringify(data),
            url:"<your rulebase url here>"
        }	
        request.post({url:"http://mediasemantics.com/reply", form:params}, function (err, httpResponse, body) {            
            if (err) return next(err);
            if (httpResponse.statusCode != 200) return next(new Error("chat error"))
            let ret = JSON.parse(body);
            
            // Write the data back again
            fs.writeFile(filename, JSON.stringify(ret.data), function(err) {
                if (err) return next(err);
                
                // Return the response
                res.statusCode = 200;
                if ((req.get("Origin")||"").indexOf("localhost") != -1) res.setHeader('Access-Control-Allow-Origin', req.get("Origin"));
                if ((req.get("Origin")||"").indexOf("youdomain.com") != -1) res.setHeader('Access-Control-Allow-Origin', req.get("Origin"));
                res.setHeader('content-type', 'application/json');
                res.write(JSON.stringify({output:ret.output}));
                res.end();
            });
        });
    });
});
    
app.listen(3000, function() {
  console.log('Listening on port 3000');
});

Assuming you place this in a file server.js, you can run it with "node server.js". Before you start the server, you should ensure that you have an empty subdirectory "conversations", with read/write permissions, in the server.js directory.

The POST in our client code becomes a GET:

// Call the reply API
var xhr = new XMLHttpRequest();
xhr.open('GET', "http://localhost:3000/reply?userid=" + userid + "&input=" + encodeURIComponent(input), true);
xhr.onload = function () {
    var ret = JSON.parse(xhr.response);
    div.innerHTML += ret.output + "
"; }; xhr.send();

This is the essence of the Reference Implementation, but we will build a few refinements in the following sections.

To see it in action, you can try a simple conversation that demonstrates that state is preserved:

> My name is Joe.
Ok Joe.
(refresh page)
> Who am I?
I know you as Joe.

Clearly, the simple file-based mechanism in the Reference Implementation does not scale to multiple instances, and a better solution would be to use a database such as AWS DynamoDB. Storing the conversation state data object on the server has several advantages. In a system with a login, it allows the user to continue a conversation, even across multiple sessions on multiple devices. More importantly, it allows for the accumulation of longer-term knowledge about the user, such as their name, their interests, etc. While you can treat the data object as an opaque bundle that simply needs to be stored and passed back, more advanced uses include mining the predicates in the 'world' field for valuable information, and storing it in a manner that is more digestable for your application. Note that the 'history' field contains only a partial history of the conversation for use by the engine, and you might consider maintaining a complete transcript on your server by logging the actual inputs and outputs. Thers logs can be analyzed to guide further rule development. There are obviously significant privacy implications associated with both logs and conversation state, and a data-destruction policy will need to be factored into any complete solution.



Control inputs and tags

It can be helpful to provide the chat system with all kinds of other "non-verbal" inputs that can make it aware of the context in which the user is operating. For example you might send a control input when the user navigates to a link, or scrolls to a different point in the page.

> [navigate products]
> [scroll order]

Control inputs are always shown in square brackets. You can write rules that react specifically to control inputs. Note that responses that are meant to be delivered by a character can also have non-verbal elements, such as to glance in a certain direction, or to say something in a certain manner. These commands take the form of output tags, also in square brackets, that are meant for the animation and/or the speech system.



Auto and Idle

By convention, a client can provide the [auto] control input when a bot first appears to the user, to allow it to "autostart" with an opening statement.

> [auto]
Welcome to ACME corp! How can I help?
> looking for rocketsleds

The Reference Implementation lets a chatbot take the initiative after an "idle" period. A rulebase can contain one or more "Idle" rules for this purpose. The way idles normally work is by having the client poll the server periodically to see if an idle is available. Polling is done using the [idle n] control input, where n represents a count of seconds since the last input/output transaction. An empty response is returned if there is no idle. For example, a sequence might look like this:

> hello
Hi!
> [idle 1]
> [idle 2]
> [idle 3]
How can I help?

Polling is inefficient, and the Chatbot API allows us to do better. At the point where the "Hi!" Response is returned, the state object contains an 'idle' value that represents the n value for the next idle input that will result in an output. So instead of asking for [idle 1] and [idle 2], the client can wait for 3 seconds and then ask directly for [idle 3]. This mechanism is implemented in the Reference Implementation.



Detecting chat sessions

It is important for the bot to know if this is the first time the user has seen the bot. By storing the data in a permanent location, as we have, a conversation never really ends. The user can return with the same userid many days later and the bot would respond as if nothing had happened. While this is good, the very act of coming back to a conversation after a period of absence, say 5 minutes, is itself an important piece of context.

By convention, the control command [start] indicates a new user, and [restart n] indicates an absence, where n is the number of minutes since the last communication. The Reference Implementation automatically inserts a [start] or a [restart n] control command just prior to the next input from the client. To do this, it reads the presence and the last modified time of the data file and uses this information to decide whether to insert a control command.

...
> thank you.
You are welcome!
(2 days elapse)
> [restart 2880]
> [auto]
Welcome back! How can I help?


Calling external API

The Chatbot API's 'reply' endpoint cannot perform any external queries or initiate any API calls within your backend. However, you can use tags and control inputs to coordinate these activities from your own server.

Your rules can include tags that are intended for your own reply api, on the return path. The contents of these tags can be in any format you like, as long as they are enclosed in square brackets. For example you might respond to a rule with an output such as "The price is [select price where sku='12345']". To make this work, your own 'reply' endpoint scans the output for tags that it recognizes, before returning it to the client. If a tag is found, then you can delay returning the output until the tag is replaced with the appropriate results, after performing the required database call or web api, for example.

In some cases you will use a tag to initiate a transaction, for example "[transfer 100 from 12345 to 54321]". Rather than return the output directly to the user, you might turn around and send the result back to the chat engine as a control input, e.g. "[confirmation 123]". You can then use a rule to respond with an appropriate user message, while also altering the state of the conversation with important new context. A sequence diagram is helpful for visualizing this:

This type of request "ping-pong" between different services is common in modern web systems. The Chatbot API's stateless design allows your dialog subsystem to go "serverless", and evolve independently from the rest of your system, provided that any tags and control inputs have been well-defined.



Minds and Models

This tutorial focuses largely on the integration aspects of the Chatbot API, but it is helpful to understand a bit more about the capabilities of the underlying chat engine.

The Chatbot API is a modern Natural Language system that is based on a symbolic approach to Natural Language Processing (NLP). It combines many different capabilities, not all of which will be relevant to every project. Some key features include:

A Mind, or rulebase, is a collection of largely independent units called Models. You might have a Model for introductions, a Model for small talk about sports, a model for simplifying wordy sentences, etc. Minds can also employ Stock Models that provide a significnt jump-start to chatbot development.

A Model is a collection of rules that transform text that the User types (inputs) into text that the Bot emits (outputs). A single transformation from input to output might use multiple rules, and those rules might span multiple Models. It is helpful to view rules as very simple Agents, each with a specific purpose. Multiple rules can cooperate to form a "bridge" from the input to the output. However only one complete bridge must win out to form a response, and you can think of rules as competing for the chance to contribute to that output. Models are most powerful when they are self-contained and relatively independent. As with biological systems, the Mind is an emergent property of the cooperation and competition between rules.

There are several different types of Rules. Some rules serve to "restate" the input into a simpler, more canonical form. Other rules are used to "respond" to a specific input with a specific output, in context. Some rules serve only to "generate" the output in a manner that might vary, or take into account repetition-avoidance, or even mood. Rules can operate on Control and Text inputs (syntax), but also on intermediate results, often in the form of Predicates. Rules can be influenced by, and can contribute to, World Knowledge. World Knowledge is stored and accessed using Predicates and Predicate Unification, in a manner similar to Prolog.

The right output often depends on the mental "state" of the Bot. Models can use a State Machine formalism to model dialogs, and how they evolve. This allows a Bot to lead a dialog, and not just simply respond, tit-for-tat, to each input. Models contains States, and states can be grouped into Groups. One State is normally active at a time, and only the rules in the active State, or in the Groups that contain the active State, are enabled. At the outer-most level, each model can specify rules that are always active. A very general Model focused on sentence simplification might contain no States, but contribute general transformation rules that can apply anywhere and are always active. Other Models may have just a few "trigger" rules that remain active, lurking, ready to place the system into a particular State defined by that Model. For example a Sports "chit-chat" module may be triggered by any input that includes certain key words, such as a team name. Other applications might include a lot more scripted dialog, leading to elaborate state diagrams with sequences and branches that reflect the possible paths through the dialog.



Wrapping Up

The Chatbot API and Reference Implementation provide you with a modern framework for building chat-based applications. They can be used alone in purely text-based systems, or in conjunction with the Character API to create an embodied conversational agent.

We look forward to seeing what you build with the Chatbot API. As you embark on this journey, please keep our artistic and solution development services in mind. With over a decade of experience building solutions in this area, we look forward to being your infrastructure and solution partners.

Please send questions and comments to sales@mediasemantics.com.








Copyright © 2020 Media Semantics, Inc. All rights reserved.
Comments? Write webmaster@mediasemantics.com.
See our privacy policy.