Why Virtual Assistants Need a Body

11.12.2019

Alexa and Google Home Assistant are just a first step. Since voice is not enough on its own, scientists are working to create virtual beings that can communicate through gestures and facial expressions as well. This is far from an easy task, as Justine Cassell and Catherine Pelachaud explain.

The last few months have seen a blossoming of virtual assistants, including Alexa from Amazon, and Google Home Assistant, with its latest version known as “duplex” offering to make phone calls on our behalf for appointments with the hairdresser or doctor. As attractive as they can be for earthlings of the year 2019, disembodied voices are nonetheless a far cry from what these assistants could be. We envisage the virtual beings of the future with both a body and a face, both of which will help them better convey their message as well as respond and reply to our moods, in the service of building a relationship in order to better meet our needs.

© C. Pelachaud/ Greta team (CNRS – ISIR)

Meeting the virtual being Alice. Researchers analysed natural communication between humans as part of her creation.

Creating such beings calls for long-term research on their ability to understand language, so as to avoid the “I’m sorry I don’t know how to help you with that” of today’s virtual assistants, but also on their way of speaking, so that their users feel comfortable with the system and are open to asking for help from a machine. Communication, however, is much more than an exchange of information; it’s not just the simple series of questions and answers of the artificial intelligence system Watson launched by IBM in 2017 (the software offers answers in natural language to questions asked by users regarding banking or health, for example).

Laboratories across the globe are thus working to create communicative and interpersonally-aware virtual beings that can communicate through voice of course, but also with gestures, facial expressions, and head movements. To achieve this, some researchers have based their work on actors, others have called in professional animators, and yet others, like us, seek to understand human behaviour in communication, and to extract from this research the algorithms that will later be embedded in virtual beings. For we believe that virtual assistants created through the analysis of interaction between real people have every chance of being better liked and understood, as well as more effective.

We therefore engage in rigorous study of human behaviour in order to design our virtual beings. We use videos of people in conversation (on television, online, or in scientific archives), and we put people in the most natural situations possible. For example, we asked two high-school students to work together on algebra homework, and we asked a person to recommend a film to somebody else. These videos are viewed second by second, and sometimes up to one hundred times in a row, to analyse verbal and non-verbal behaviour, and how the two are connected.

The knowing smile

Smiling is an interesting example: where do smiles occur during these interactions, and do they always have the same shape and meaning? One of our studies has shown that when two high school students work on algebra together, if one of them teases the other but offers a smile with the remark, their interpersonal rapport tends to increase, and they tend to learn better. But when the teasing is not accompanied by a smile, their rapport tends to decrease, and learning is less effective. The act of teasing, which is a conversational strategy often seen as threatening, can therefore also serve to indicate that the relationship between the two interlocutors is a privileged one. The difference between these two functions of teasing is indicated here by a simple smile.

During assessments conducted at the Cité des Sciences de la Villette in 2019, Alice played the role of a guide for an exhibition on video games.

This initial finding prompted us to examine the effects of other conversational strategies, such as the strategy of imitation—nodding the head when the other nods theirs, or repeating encouraging “hmms” when the other does so—and to implement them in our laboratory’s virtual beings in order to test their effectiveness. By varying the different communicative actions of the virtual human, we can measure how human interlocutors see these beings.

A question of balance

Despite an ever-greater knowledge of the mechanisms underlying human communication, and ever more powerful computing machines, creating a virtual being that can respond in any and all situations is still unrealistic today. It is, however, becoming possible to develop virtual people that can interact in specific contexts, such as virtual tutors for learning (a language, algebra...), or virtual companions that can, for instance, help the elderly adopt the health practices prescribed by their doctors. In this latter case, it has turned out to be essential to strike a balance between the warmth and the expertise that these virtual companions demonstrate. A study has shown that a large number of smiles naturally increased the impression of warmth, but decreased the level of perceived expertise, while small rhythmic movements of the hand reinforced the impression of expertise.

Even for specific applications, designing a virtual assistant remains quite an art! But it is also more than that, for there are ethical questions that researchers, as well as society as a whole, must answer soon. To what extent is it in our interest to create a virtual being that can establish relations and communicate in a totally natural way with human beings? On the contrary, should they retain a certain amount of imperfection, in order for humans not to become too attached to them, or to not be manipulated by a virtual interlocutor that has become too skilled?

Two virtual children interacting with a child as part of a collaborative science project. Today embodied virtual assistants can play important roles in health and education; for example, helping individuals with autism, such as this young boy.

This also raises the sensitive question of data storage: in order to meet the expectations of its user, a virtual being must first closely analyse his or her emotions, moods, and needs, and to then store this information. What will the company selling the commercial assistant do with it? What will happen to this information if it is hacked, or falls into the wrong hands? The recent example of Alexa recordings that were used by Amazon, without the knowledge of users, to improve its service, provides food for thought.

The points of view, opinions, and analyses published in this column are the author’s. They in no way represent a position on the part of the CNRS.

Keywords

Virtual Assistants Alexa Google Home Conversational Agent Body Face Smile Communication Artificial Intelligence Watson Social Behaviour Human-Computer Interaction

Follow

Customize your navigation

You are here

You are here

Why Virtual Assistants Need a Body

The knowing smile

A question of balance

Keywords

Follow

Customize your navigation

You are here

Why Virtual Assistants Need a Body

You are here

Why Virtual Assistants Need a Body

The knowing smile

A question of balance

Keywords

Share this article

Other opinions