The junk science of emotion-recognition technology

Sanjana Varghese Oct—21—2019 10:00AM EST

What do Facebook, Disney, Amazon, and Kellogg’s Cornflakes have in common? They want to know how you feel. Whether they’re quietly filing patents or partnering with companies that sell this kind of software to tweak their ads for more of your attention, big corporations are investing heavily in learning how their customers feel.

The basic idea behind emotion-recognition tech is that a specific kind of software or wearable hardware can not only tell what you’re feeling from your face, your voice or the way you walk, but that it can convert that data into money and “insights” for someone else. This is despite the fact that the scientific evidence for emotion recognition isn’t really there. It’s big business, and an increasing number of companies want a piece of the action; the market for “emotion detection and recognition” was valued at $12 billion by the market research firm Mordor Intelligence last year. Mordor estimates that this figure will grow to $92 billion by 2024.

Emotion-recognition technology covers a range of different methods — gait analysis, voice analysis, and the most common iteration, face-tracking technology, where video footage of people’s faces is mined to train algorithms. In face-tracking emotion-recognition technology, your features are mapped, with certain points used as “landmarks” — such as the corners of your mouth, which are theoretically supposed to turn up when you’re happy and down when you’re angry. Those changes — scrunching up your nose, furrowing your brow, a quirk upwards of one corner of your mouth — are then assigned to emotions, sometimes on a scale of one to zero, or a percentage. While human coders used to be the primary way of labelling these differences, developments in technology have made it possible for artificial intelligence to carry out that work instead, with what’s known as computer vision algorithms.

The broad area that emotion-recognition technology falls under is called “affective computing,” which examines the relationship between technology and emotions. Affectiva, born out of the MIT Media Lab, is probably the biggest company working in this area — it’s been around since 2006, and arguably has led the rest of the field. They’ve developed a massive dataset, comprising of over 5.3 million faces, mostly collected from their partners, which are companies that have video footage of people watching advertisements, television shows and “viral content.” In the last year, they’ve expanded that dataset to include people from gifs and crucially, people driving “in the wild”.

Affective computing in cars, often called automotive AI, is a particular area of focus: using video footage of drivers to identify whether they’re drowsy or if they’re distracted, with a stated end goal of a more “intimate in-cab experience” and “safe driving.” One of their other products, called Affdex for Market Research, tracks users reactions as they watch advertisements and videos, which are then converted into points and & then collated into an easily usable dashboard for the person who has to then try and decide whether a background of millennial pink or salmon pink would keep people’s attention for an eighth of a second more.

If a program doesn’t have access to your face, it will simply use your voice instead. Another emotion-recognition startup that’s received media coverage is Empath, which relies on a proprietary software to track and analyze changes in your voice; tracking your pitch and the way it sounds when you waver, detecting for four emotions, which are joy, anger, calm and sorrow. A range of products fill out Empath’s suite, including the “My Mood” forecast, an app that employees can tap to track changes in their emotion (using their voice) and how those shifts correlate with weather patterns. (Team managers can see both individual employees’ moods and aggregated team moods).

Empath’s Web Empath API — which works across Windows, iOS, Android — adds “emotion detection capabilities” to existing apps and services, such as its “smart call centre” suite of products, which are supposed to provide a better experience for customers while reducing employee turnover. But it’s fairly uncertain how tracking your employee’s emotions could produce these outcomes — or potentially, whether employees would even have an incentive to be honest about how they were feeling (such as in the My Mood app) if they knew their bosses would see it. (Affectiva declined a request for a comment, and Empath has not yet responded to my inquiry.)

It’s hard to tell whether these technologies actually stand up to severe scrutiny, given that the industry is still emerging and there’s not really a lot of research that verifies the claims these companies make. In fact, the opposite tends to be true. A study published in July which reviewed more than 1,000 papers in the field and was co-authored by five professors from different academic backgrounds, found that the basis for emotion-recognition technology that relies on face movement tracking just doesn’t exist. “Companies are mistaken in what they claim,” says the lead author Lisa Feldman Barrett, a psychology professor at Northeastern University who has been working in the area for many years. “Emotions are not located in your face, or in your body, or in any one signal. When you reach an emotional state, there are a cascade of changes.... The meaning of any one signal, like a facial movement, is not stable. So measuring a single signal, or even two or three signals, will be insufficient to infer a person's emotional state."

“It’s not only that people are ignoring the evidence, but also that they are relying on their common sense beliefs and then attempting to monetize them"
— Lisa Feldman Barrett

These technologies seem to work on reverse inference — that if someone is smiling, they’re happy, or if someone is pouting, they’re sad. “It’s not only that people are ignoring the evidence, but also that they are relying on their common sense beliefs and then attempting to monetize them,” says Barrett. Research on this kind of technology has primarily focused on face tracking, as it seems to be the kind of method that is most widely used, but a similar logic could apply for vocal and gait analysis in this area specifically.

It would be easy to write off emotion-recognition technology as another fantasy of tech bros, a part of a culture that sees people as walking dollar signs that fit into narrow boxes. But emotion-recognition technology is just another part of a wider array of Silicon Valley products that have become increasingly pervasive, such as Ring, Amazon’s DIY surveillance camera network. Like these services, emotion recognition technology operates on the premise that the more data you have, the better. This data-hungry state of mind is prevalent even in criticisms of the tech industry — concerns that facial recognition isn’t as effective for darker skin simply leads companies to say that they’ll harvest information from a wider range of people (consensually or not). Affectiva, for example, underscores that its collected data is increasingly diverse, with faces from all over the world scanned and added to their dataset. But even if emotion recognition has the ability to recognize a smile in myself and in someone else from across the world, that doesn’t make me any less queasy about having it trained on me.

Companies already collect a vast trove of information on you as you navigate around the Internet everyday, such as your gender, age, sexuality, likely political leanings, tastes, dislikes. This has often happened in ways that we’re not aware of, extends far further than we may think, and as we’re all aware, can have significant consequences. Adding technology which claims to recognize your emotions, rolled out by some of the world’s most powerful companies, is a recipe for disaster. We’ve already seen who gets categorized as angry and violent — hint, not white people — by facial recognition, and it’s obvious who loses when justice is carried out by algorithms. The development and creation of a commercialized emotion-recognition technology industry, despite the fact that the evidence for it looks flimsy, demonstrates that there’s no area the tech industry won’t try to plunder for some kind of value while glossing over any negative implications.

A blog post on Affectiva’s website claims that “We knew that if AI had emotional intelligence, it could interact with humans in the same way that people engage with one another.” I work in the service industry, and when I smile at someone as I take their order, the odds are high that giving them what they asked me for does not fill me with genuine happiness. But if I’m smiling at a friend who has just walked into wherever I’m working, I am usually genuinely happy that they’re there. If you were to feed video footage of those interactions through various kinds of “emotion-recognition” software, that distinction will be unlikely to emerge. I hate to say it, but we live in a society. Emotion recognition, like a lot of other contemporary, “disruptive” technology, relies on flattening the way that we actually interact with each other. If X, then Y. But that’s just not how people actually work. It’s just how machines do.

Sanjana Varghese is a journalist and researcher based in London.

The junk science of emotion-recognition technology

New startups bank on being able to discern how you feel. Smile for the computer, sweetie.

The Outline