Lack of labeled training data for chatbots – our solution for the problem

One of the toughest problems in building bots is the scarcity of available training data. As all machine learning systems, chatbots and virtual customer assistants function best when they have been trained on good quality training data. In the beginning all bots have to be taught what things mean. In the context of understanding what customers want when they contact customer service, this means teaching bots what topics (i.e. intents) do certain questions refer to.

As a simple example, people ask from a bank “I have not received my salary” or “Where is my transfer”. These are seemingly two different types of questions but they refer to the same intent i.e. how long does it take for the transfer to reach from bank A to bank B. To understand those sentences a chatbot needs to learn from data that has been labeled. Meaning a human has to label some questions with the correct intent (“How long does it take …”) and feed it to the bot. With that the bot can learn to generalize based on different questions and match the question with the correct intent.

The challenge is that there are literally a million different ways to ask for the same intent. Language is highly nuanced and personal. To build chatbots with high accuracy one would need tens of thousands of labeled questions to train the bot. This is time consuming and expensive.

To overcome the lack of labeled training data, we have created deep learning based networks that are capable of achieving satisfactory levels of question-intent-matching accuracy even with a few sentences of training data per each intent. It is a way forward for deploying accurate chatbots in a more efficient manner by reducing the need for labeled training data. The video above shows how it works. If you have more interest and questions about the solution, reach out to hello [ at ] alphablues [ dot ] com

Leave a Comment