Look Who’s Talking: An Interview with PolyAI

Alex McKay - Soldo •

This interview is part of prepaid cards for business provider Soldo’s Digital Disruptors series, which highlights the individuals changing the world with technology.

An interview with Stefan van der Fluit

Stefan is a three-times startup founder and Business Development Manager at PolyAI. PolyAI has raised $12M to develop a machine learning platform for conversational artificial intelligence. In the past two years, the team has designed a truly unique conversational model for customer service environments that combines the best of a company’s existing resources with AI-powered interactions.

What is AI to you?  Definitions seem to vary hugely – and most people think it’s either sheer magic or a bunch of very boring algorithms…

Absolutely – and the AI sector is also full of people who have set unrealistic sci-fi style expectations of what is, or will be, possible! So let’s be clear and realistic.

AI to us is a combination of natural language understanding and machine learning.  We’re a machine learning company first, so nothing that we do is pre-programmed or hacked together.  The logic is that anything that hasn’t had previous exposure to a situation won’t know how to handle it. So, just like a child, we first expose a machine to situations it might come across in the future, we train it on those situations, and it is then equipped to respond. The better that response in any situation, the better we can consider the AI to have functioned. 

But isn’t this just an unbelievably sophisticated decision tree?

No – it’s categorically something very different. Poly AI’s Encoder Model has been pre-trained on over a billion conversations, so out-of-the-box it’s pretty capable of understanding how people ask certain things in different ways and hosting a natural, human conversation. 

From its universe of data, the Encoder can identify that a customer is most likely talking about booking a reservation, for example. Once we understand the intent, we can look in the database for useful answers. But the database is not a decision tree. It’s more like a landscape filled with information which has been categorised by the Encoder Model based on relevance and similarity. Decision trees are procedural, and very black and white. The landscape model is both flexible and subject to change as relevance changes over time, or as we learn from each interaction.

So in the example of a table booking, we go into the corner of the landscape where table reservation intent lies. We re-rank possible answers to the query in that landscape in real-time. Essentially, we take candidate answers, rank them against each other against the perceived relevancy to the context of the customer’s question, and then send back the most relevant answer to the customer.  This is done at every single turn of a conversation. So there’s no “If this, then that”: we treat each point of a conversation as a discrete entity.

We take candidate answers, rank them against each other against the perceived relevancy to the context of the customer’s question, and then send back the most relevant answer to the customer.  This is done at every single turn of a conversation. So there’s no “If this, then that”: we treat each point of a conversation as a discrete entity.

Deciding what to say back to a customer is only half the story. How do you make that response human, too?

Actually, we don’t generate responses.  We can’t create text out of thin air, and even if we could, it wouldn’t be particularly brand-aligned. Instead, for every client, we look at their resources: call logs, email logs and especially the open forums where customers often help themselves.

We can take questions and answers from forums and call logs and make them available to the re-ranker. Not only does this give us responses to extremely nuanced questions, it also means we can speak the customer’s language – if that’s what a client wants.

If a business wants to present themselves very formally, we will probably limit the selection of conversations sent to the Encoder model. But if the business wants to appear more down-to-earth, we can broaden the conversational interpretation to responses from a customer service team or from embedded responses on a forum. It gives us an axis of formality/informality, and it also means we have a constant cultural reference to remain relevant as audiences and languages change.

By the way, being human doesn’t mean deceiving the customer. We don’t pretend that our AI is human. At the beginning of a conversation, our bot will say “Hi, I’m Travelodge’s Virtual Agent”, for example. Many of our clients give their customers the choice to speak to a Virtual Agent or a real one.

Interestingly, we also need to train humans to be humans, too. You can interrupt our AI; you can ask it tangential questions to the original purpose of the call; but we did some user testing last week and people don’t do that. Why? Because, behaviourally, they’ve been using Google Assistant, Siri, and the other current crop of really limited chatbots which can only do what you directly ask them to do. We are actually going to have re-educate  people to be more human themselves!

What’s the metric of success for Poly AI?

Right now, it’s accuracy: accuracy in understanding intent and finding the appropriate response, which naturally ties into the percentage success rate of end-to-end automated calls and ultimately satisfaction and NPS. That’s actually our business model: we don’t charge customers for access to our technology; we charge them per successful transaction, which aligns our interests with our clients’. 

Is Poly AI about cutting costs and staff, more effective triaging of service calls, dealing with more customers…?

It’s a multitude of things. We are trying to impact different parts of the value stream. 

Let’s start with the agent.  It’s common knowledge that Customer Service Agents aren’t the happiest employees.  Call centres have pretty low retention rates and pretty short tenures within a business. There’s often not much room for progression. If we’re able to remove the mundane, transactional tickets and have CSAs focus on the heartfelt, empathetic, challenging questions, then everyone has something to gain. The agent has higher job satisfaction, the company has lower turnover of staff and a lower staff base overall. But those staff are all highly skilled, higher paid individuals who have greater decision making power themselves. 

This matters because we now live in a world of what I would call an “Amazon level of customer expectation and customer service”; where the customer demands instant replies on whichever channel they want.

Today, every single company has to have a customer service function. But it’s hard for an everyday business to finance that level of responsiveness from a human cost perspective. With AI, however, we can enable that. So we’re democratising good customer service.

We now live in a world of what I would call an “Amazon level of customer expectation and customer service”; where the customer demands instant replies on whichever channel they want. Today, every single company has to have a customer service function.

That’s only going to get harder with ever more channels, though. Channels like voice control – whether that’s Amazon’s Alexa, Google Home etc., or in-car contexts. The channels just keep on coming.

Yes, but there’s a lot to pick out here. 

First, right now if you want to have a chat AI, a voice AI, or even the things that aren’t AI, they’re typically from different vendors.  Two product teams for two different streams, but theoretically they need to serve the same customers with the same data and the same answers. It’s expensive for the client, and really hard to integrate and resource. That’s not sustainable in the long term. So our dream outcome is central processing, feeding any answer back to any channel in any language.

Then there are the tactical mistakes that some Customer Service Directors have made when new channels emerge. For example, when a lot of companies rolled out chat, a new and therefore experimental channel, they thought it made sense to put their best people from the call centre onto chat. But those employees began to leave, because chat precisely neutralised the characteristics that made them so good on the phone: conversationalism, empathy relationship-building etc. Chat is transactional, so it demands a different set of people. New channels usually mean new operational mistakes.

And you mentioned voice specifically. We call ourselves “voice first” because it’s the hardest medium to crack. On voice, we go completely off-piste and off-script because decision-tree models just can’t cope. You can’t control the way a customer is going to speak with you. It’s much easier to limit customer interactions by text because you can ask one-liner questions, you can give minimum detail and even suggest clickable answers, steering the customer in the direction you want them to take. With voice, you can’t do that, so I think voice is by far the most fertile area to look for innovation in customer service.