You are here
Why Virtual Assistants Need a Body
The last few months have seen a blossoming of virtual assistants, including Alexa from Amazon, and Google Home Assistant, with its latest version known as “duplex” offering to make phone calls on our behalf for appointments with the hairdresser or doctor. As attractive as they can be for earthlings of the year 2019, disembodied voices are nonetheless a far cry from what these assistants could be. We envisage the virtual beings of the future with both a body and a face, both of which will help them better convey their message as well as respond and reply to our moods, in the service of building a relationship in order to better meet our needs.
Creating such beings calls for long-term research on their ability to understand language, so as to avoid the “I’m sorry I don’t know how to help you with that” of today’s virtual assistants, but also on their way of speaking, so that their users feel comfortable with the system and are open to asking for help from a machine. Communication, however, is much more than an exchange of information; it’s not just the simple series of questions and answers of the artificial intelligence system Watson launched by IBM in 2017 (the software offers answers in natural language to questions asked by users regarding banking or health, for example).
Laboratories across the globe are thus working to create communicative and interpersonally-aware virtual beings that can communicate through voice of course, but also with gestures, facial expressions, and head movements. To achieve this, some researchers have based their work on actors, others have called in professional animators, and yet others, like us, seek to understand human behaviour in communication, and to extract from this research the algorithms that will later be embedded in virtual beings. For we believe that virtual assistants created through the analysis of interaction between real people have every chance of being better liked and understood, as well as more effective.
We therefore engage in rigorous study of human behaviour in order to design our virtual beings. We use videos of people in conversation (on television, online, or in scientific archives), and we put people in the most natural situations possible. For example, we asked two high-school students to work together on algebra homework, and we asked a person to recommend a film to somebody else. These videos are viewed second by second, and sometimes up to one hundred times in a row, to analyse verbal and non-verbal behaviour, and how the two are connected.
The knowing smile
Smiling is an interesting example: where do smiles occur during these interactions, and do they always have the same shape and meaning? One of our studies has shown that when two high school students work on algebra together, if one of them teases the other but offers a smile with the remark, their interpersonal rapport tends to increase, and they tend to learn better. But when the teasing is not accompanied by a smile, their rapport tends to decrease, and learning is less effective. The act of teasing, which is a conversational strategy often seen as threatening, can therefore also serve to indicate that the relationship between the two interlocutors is a privileged one. The difference between these two functions of teasing is indicated here by a simple smile.
This initial finding prompted us to examine the effects of other conversational strategies, such as the strategy of imitation—nodding the head when the other nods theirs, or repeating encouraging “hmms” when the other does so—and to implement them in our laboratory’s virtual beings in order to test their effectiveness. By varying the different communicative actions of the virtual human, we can measure how human interlocutors see these beings.
A question of balance
Despite an ever-greater knowledge of the mechanisms underlying human communication, and ever more powerful computing machines, creating a virtual being that can respond in any and all situations is still unrealistic today. It is, however, becoming possible to develop virtual people that can interact in specific contexts, such as virtual tutors for learning (a language, algebra...), or virtual companions that can, for instance, help the elderly adopt the health practices prescribed by their doctors. In this latter case, it has turned out to be essential to strike a balance between the warmth and the expertise that these virtual companions demonstrate. A study has shown that a large number of smiles naturally increased the impression of warmth, but decreased the level of perceived expertise, while small rhythmic movements of the hand reinforced the impression of expertise.
Even for specific applications, designing a virtual assistant remains quite an art! But it is also more than that, for there are ethical questions that researchers, as well as society as a whole, must answer soon. To what extent is it in our interest to create a virtual being that can establish relations and communicate in a totally natural way with human beings? On the contrary, should they retain a certain amount of imperfection, in order for humans not to become too attached to them, or to not be manipulated by a virtual interlocutor that has become too skilled?
This also raises the sensitive question of data storage: in order to meet the expectations of its user, a virtual being must first closely analyse his or her emotions, moods, and needs, and to then store this information. What will the company selling the commercial assistant do with it? What will happen to this information if it is hacked, or falls into the wrong hands? The recent example of Alexa recordings that were used by Amazon, without the knowledge of users, to improve its service, provides food for thought.
The points of view, opinions, and analyses published in this column are the author’s. They in no way represent a position on the part of the CNRS.