Chatbot API Tutorial

The Chatbot API is a cloud-based chatbot engine. It can be used alone for text-based chat, or in conjunction with the Character API's 'animate' endpoint, to build embodied agents - animated characters that answer questions and lead conversations using voice, emotion, and gestures.

This tutorial accompanies the ChatBot Reference Implementation available at https://github.com/mediasemantics/chatapi. Available with a React-based chat window, the Reference Implementation provides you with a working chatbot that you can add to any website.

The Chatbot API is completely stateless. Each request contains the text input to respond to, data about the conversation state, and a 'mind id' that specifies the rule database to be used. The response consists of the text output and an updated version of the conversation state data.

In the ChatBot API, a Mind is a collection of Models, and Models are collections of Facts and Rules. Facts and Rules are authored in the Models tab within the Chatbot API's dashboard. You will typically have several clients chatting with the same Mind at once, however each conversation remains independent. This is because all of the conversation state is represented by a state object that arrives with each request, and returns, slightly modified, with the response.



Creating a Mind

To get started, you can use the Chatbot API's Dashboard to create a Mind. Go to the Minds tab and click Add, adjust the name, then Add again.

Next, we need to create a model. Go to the Models tab and click Add. Then select it and click Edit.

Create a first rule by clicking Add Rule. Fill in the Rule Details section with an Input Pattern and an Output:

You can test your rule immediately in the Model Tester section at the bottom:

Go back to the Models tab, and click Publish.

Now go back to the Minds tab, and Edit the Mind. Click Use. Select your new model from the list and click Select. Your mind is now using your new model.

Click Home, select your mind, and click Publish.

Note that the Model Tester will always take into account your latest Mind and Model changes, but changes will not be seen by callers of 'reply' until you Publish them. If you change the models that make up a Mind, you will also need to republish the Mind. There is generally not a problem with updating Minds or Models while users are chatting.

Click Info to see some important information about your mind, namely it's mind id. You will use this information in the next step.



Calling the API from curl

We can call 'reply' using a tool such as Postman, or curl.

$ curl -s -d key=xxxxxxxxxxxxxxxxxxxxxxxxx -d input="hello" -d mindid="xxxxxxx" http://api.mediasemantics.com/reply | json_pp

You will need to replace xxxxxxxxxxxxxxxxxxxxxxxxx with an API Key that you obtain from the API Dashboard for the ChatBot API. You will also need to replace the mind id xxxxxxx with the mind id for your Mind. Here is the output:

{
   "data" : {
      "history" : "> hello\nhi!\n"
   },
   "output" : "hi!"
}

The API returns a JSON object. We started the conversation without a data parameter, which is equivalent to using -d data="{}", i.e. an empty JSON object. The call returns with a state object containing the state of the newly started conversation. To continue this conversation you would simply pass the same object back on your next call to 'reply'.



A Client-Based Implementation

In order to maintain a conversation, you will need to store this state somewhere - one solution is at the client. Here is a simple single-page HTML app that lets you converse with the bot:

<!DOCTYPE html>
<html>
<head>
</head>
<body>

<div id="div1">
</div>
<br>
<input id="input1" type="text" autofocus onkeypress="keypress()"><button onclick="submit()">Submit</button>

<script>
var data = {};
function submit() {
    // Grab the user input and clear the input box
    var input = document.getElementById('input1').value;
    document.getElementById('input1').value = "";
    
    // Add the input to the transcript
    var div = document.getElementById('div1')
    div.innerHTML += "> " + input + "<br>";
    
    // Call the reply API
    var xhr = new XMLHttpRequest();
    xhr.open('POST', "http://api.mediasemantics.com/reply", true);
    xhr.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
    xhr.onload = function () {
        var o = JSON.parse(xhr.response);
        // Add the output to the transcript
        if (o.output) div.innerHTML += o.output + "<br>";
        // Save the data for the next round
        data = o.data;
    };
    var key = "<your keycode here>";
    var mindid = "<your mind id here>";
    xhr.send("key=" + key + "&mindid=" + mindid + "&data=" + encodeURIComponent(JSON.stringify(data)) + "&input=" + encodeURIComponent(input));
}

function keypress(e) {
    if (event.keyCode == 13)
        submit();
}
</script>

</body>
</html>

Because it calls a web API, you should run this from a localhost web server.

The result should look something like this:



A Server-Based Implementation

While you can call the API directly from a client, we normally recommend that you call it from a server application that you manage. After all, your API key needs to remain a secret to prevent others from copying your page and gaining access to your bot. In addition, you will want to store the conversation state data object in a permanent fashion, to avoid losing the conversation when the page is closed. You will want your own server to provide authorization, session detection, and storage. In an embodied agent application, this is often the same server as the caching server.

The Reference Implementation includes both a "plain html" and a React-based client. Both are based on the above code, but modified to talk to a simple Node.js server. The Reference server.js file includes its own 'reply' endpoint, but it is now a GET method. It does not require an API key, and it replaces the data object with a userid. On each call, the userid is used to look up the previous data object. On each return, the data object is saved away again. We have essentially replaced a stateless POST API with a stateful GET API that lets you manage the conversation state on your own terms.

http://localhost:3000/reply?input=hello&userid=5132543

The return contains the same response, but without the conversation state:

{"output":"hi!"}

If your application includes a login, then you will already have a user id that you can use as a key to store the conversation state. If you do not require a login, then you can generate a random user id that can be used for the duration of the client session, to identify that user. You can even save this random userid in a session cookie, so that you can allow the user to carry on a continuous conversation even when they navigate to another page on your site. The Reference Implementation shows an example of how a userid can be generated randomly.

A very simple implementation can use the file system to store the data object returned for the first call, and then reload it again on subsequent calls.

var express = require('express');
var bodyParser = require('body-parser');
var fs = require('fs');
var request = require('request');

app.get('/reply', function(req, res, next) {
    if (!req.query.userid) throw new Error("missing userid");
    if (!req.query.input) throw new Error("missing input");
    
    // Load the data for this user from disk, if available
    let filename = "./conversations/" + req.query.userid + '.json';
    fs.readFile(filename, "utf8", function(err, raw) {
        let data;
        if (err) 
            data = {}; // Otherwise use empty data
        else 
            data = JSON.parse(raw);
        
        // Call Chatbot API
        let params = {
            key:"<your keycode here>",
            input:req.query.input,
            data:JSON.stringify(data),
            url:"<your rulebase url here>"
        }	
        request.post({url:"http://api.mediasemantics.com/reply", form:params}, function (err, httpResponse, body) {            
            if (err) return next(err);
            if (httpResponse.statusCode != 200) return next(new Error("chat error"))
            let ret = JSON.parse(body);
            
            // Write the data back again
            fs.writeFile(filename, JSON.stringify(ret.data), function(err) {
                if (err) return next(err);
                
                // Return the response
                res.statusCode = 200;
                if ((req.get("Origin")||"").indexOf("localhost") != -1) res.setHeader('Access-Control-Allow-Origin', req.get("Origin"));
                if ((req.get("Origin")||"").indexOf("youdomain.com") != -1) res.setHeader('Access-Control-Allow-Origin', req.get("Origin"));
                res.setHeader('content-type', 'application/json');
                res.write(JSON.stringify({output:ret.output}));
                res.end();
            });
        });
    });
});
    
app.listen(3000, function() {
  console.log('Listening on port 3000');
});

Assuming you place this in a file server.js, you can run it with "node server.js". Before you start the server, you should ensure that you have an empty subdirectory "conversations", with read/write permissions, in the server.js directory.

The POST in our client code becomes a GET:

// Call the reply API
var xhr = new XMLHttpRequest();
xhr.open('GET', "http://localhost:3000/reply?userid=" + userid + "&input=" + encodeURIComponent(input), true);
xhr.onload = function () {
    var ret = JSON.parse(xhr.response);
    div.innerHTML += ret.output + "
"; }; xhr.send();

Clearly, the simple file-based mechanism in the Reference Implementation does not scale to multiple instances, and a better solution would be to use a database such as AWS DynamoDB.

While you can treat the data object as an opaque bundle that simply needs to be stored and passed back, more advanced uses include mining the predicates in the 'world' field for valuable information, and storing it in a manner that is more digestable for your application. Note that the 'history' field contains only a partial history of the conversation for use by the engine, and you might consider maintaining a complete transcript on your server by logging the actual inputs and outputs. These logs can be analyzed to guide further rule development. There are significant privacy implications associated with both logs and conversation state, and a privacy and data-destruction policy will need to be factored into any complete solution.

This is the essence of the Reference Implementation, but we will build a few refinements in the following sections.



Control Inputs and Tags

It can be helpful to provide the chat system with all kinds of other "non-verbal" inputs that can make it aware of the context in which the user is operating. For example you might send a control input when the user navigates to a link, or even when they scroll to a different point in the page.

> [nav products]
> [scroll order]

Control inputs are always shown in square brackets. You can write rules that react specifically to control inputs. Not all inputs require an output - in many cases a bot will just respond to a control input by adjusting some state in the conversation data object. In this case the reply will be an empty string.

Note that outputs that are meant to be delivered by a character can also have non-verbal elements, such as to glance in a certain direction, or to say something in a certain manner. These commands take the form of output tags, also in square brackets, that are meant for the animation and/or the speech system. This is described in more detail in the High-level actions tutorial.



Auto-Start and Chat Sessions

By storing the data in a permanent location, as we have, a conversation never really ends. The user can return with the same userid many days later and the bot would respond as if nothing had happened. While this is good, the very act of coming back to a conversation after a period of absence, say 5 minutes, is itself an important piece of context, and deserves a control input.

The Reference Implementation client sends an [autostart] command when the chat UI first appears. The Reference Implementation server then translates this command to one of 3 different commands depending on whether the user is known, and how long of a gap there has been in the conversation.

In some systems the same chatbot UI is placed on multiple pages, and the UI is instantiated with a distinct page id for the page the UI is located on. This page id is typically a programmatic id that has meaning in the chat rules, e.g. "LandingPage". This id is often sent as an additional argument to these three commands, i.e. [new page], [return n page], [nav page].

The purpose of these three commands is to assist in creating an appropriate prompt, be it a greeting for a new user, a return greeting, or perhaps an acknowledgement that the user has navigated to a new page on the same site. Of course whether an actual prompt is sent to the user is up to the chat rules. In an embodied agent situation, remember that the output can be non-verbal - it can be as simple as a short nod, or a glance at the page.

...
> thank you.
You are welcome!
(2 days elapse)
> [return 2880]
Welcome back! How can I help?


Idle

The Reference Implementation also lets a chatbot take the initiative after an "idle" period. A rulebase can contain one or more "Idle" rules for this purpose. To understand how idles work, imagine having the client poll the server periodically to see if an idle is available. Polling is done using the [idle n] control input, where n represents a count of seconds since the last input/output transaction. An empty response is returned if there is no idle. For example, a sequence might look like this:

> hello
Hi!
> [idle 1]
> [idle 2]
> [idle 3]
How can I help?

Polling is inefficient, and the Chatbot API allows us to do better. At the point where the "Hi!" Response is returned, the state object contains an 'idle' value that represents the n value for the next idle input that will result in an actual output. So instead of asking for [idle 1] and [idle 2], the client can wait for 3 seconds and then ask directly for [idle 3]. The resulting output will contain a new idle value that then determines when the next checkin will occur.

> hello
Hi!
> [idle 3]
How can I help?


Calling External API

With its stateless design, the Chatbot API's 'reply' endpoint cannot itself perform any external queries or initiate any transactions - it is strictly a translator. However you can use tags and control inputs to coordinate these activities from your own server.

Your rules can include tags that are intended for your own server, on the return path. The contents of these tags can be in any format you like, as long as they are enclosed in square brackets. For example you might use a rule to respond to "what is the price" with an output such as "The price is [select price where sku='12345']". In this case your server needs to do more than just blindly pass everything through. When it receives the output from the Character API, it first scans it for tags that it recognizes. If a tag is found, then the response is delayed until the tag is replaced with the appropriate results, such as after performing the required database call or web API.

In some cases a single user input could result in multiple Chatbot API calls. For example "I'd like to transfer 100 dollars" might result in a structured output tag with all the context your server needs to dispatch the transaction, such as "[transfer,100,12345,54321]". The result of the transaction might be a confirmation number. Rather than return this confirmation directly to the user, you could send a control input back to the Character API, such as "[confirmation 123]", along with the latest state data object. There you can then use a rule to respond with an appropriate user message, while also altering the state data object with this important new context. A sequence diagram is helpful for visualizing this:

This type of request ping-pong between different services is common in modern web systems. The Chatbot API's stateless design allows your dialog subsystem to go "serverless", and evolve independently from the rest of your system.



Minds and Models

This tutorial focuses largely on the integration aspects of the Chatbot API, but it is helpful to understand a bit more about the capabilities of the underlying chat engine.

The Chatbot API is a modern Natural Language system that is based on a symbolic approach to Natural Language Processing (NLP). It combines many different capabilities, not all of which will be relevant to every project. Some key features include:

A Mind, or rulebase, is a collection of largely independent units called Models. You might have a Model for introductions, a Model for small talk about sports, a model for simplifying wordy sentences, etc. Minds can also employ Stock Models that provide a significant jump-start to chatbot development.

A Model is a collection of rules that transform text that the User types (inputs) into text that the Bot emits (outputs). A single transformation from input to output might use multiple rules, and those rules might span multiple Models. It is helpful to view rules as very simple Agents, each with a specific purpose. Multiple rules can cooperate to form a "bridge" from the input to the output. However only one complete bridge must win out to form a response, and you can think of rules as competing for the chance to contribute to that output. Models are most powerful when they are self-contained and relatively independent. The Mind is an emergent property of the cooperation and competition between rules.

There are several different types of Rules. Some rules serve to "restate" the input into a simpler, more canonical form. Other rules are used to "respond" to a specific input with a specific output, in context. Some rules serve only to "generate" the output in a manner that might vary, or take into account repetition-avoidance, or even mood. Rules can operate on Control and Text inputs (syntax), but also on intermediate results, often in the form of Predicates. Rules can be influenced by, and can contribute to, World Knowledge. World Knowledge is stored and accessed using Predicates and Predicate Unification, in a manner similar to Prolog.



Wrapping Up

The Chatbot API and Reference Implementation provide you with a modern framework for building chat-based applications. They can be used alone in purely text-based systems, or in conjunction with the Character API to create embodied conversational agents.

This tutorial has focused on how to connect a Chatbot API mind to your application. The next tutorial in this series will focus on writing rules for Chatbot Models.








Copyright © 2024 Media Semantics, Inc. All rights reserved.
Comments? Write webmaster@mediasemantics.com.
See our privacy policy.