“Clow-dia,” I say once. Twice. A third time. Defeated, I say the Americanized version of my name: “Claw-dee-ah.” Finally, Siri recognizes it. Having to adapt our way of speaking to interact with speech-recognition technologies is a familiar experience for people whose first language is not English or who do not have conventionally American-sounding names. I have now stopped using Siri, Apple’s voice-based virtual assistant, because of it.
The growth of this tech in the past decade—not just Siri but Alexa and Cortana and others—has unveiled a problem in it: racial bias. One recent study, published in the Proceedings of the National Academy of Sciences USA, showed that speech-recognition programs are biased against Black speakers. On average, the authors found, all five programs from leading technology companies, including Apple and Microsoft, showed significant race disparities; they were roughly twice as likely to incorrectly transcribe audio from Black speakers compared with white speakers.
This effectively censors voices that are not part of the “standard” languages or accents used to create these technologies. “I don’t get to negotiate with these devices unless I adapt my language patterns,” says Halcyon Lawrence, an assistant professor of technical communication and information design at Towson University, who was not part of the study. “That is problematic.” For Lawrence, who has a Trinidad and Tobagonian accent, or for me as a Puerto Rican, part of our identity comes from speaking a particular language, having an accent or using a set of speech forms such as African American Vernacular English (AAVE). Having to change such an integral part of an identity to be able to be recognized is inherently cruel.
The inability to be understood impacts other marginalized communities, such as people with visual or movement disabilities who rely on voice recognition and speech-to-text tools, says Allison Koenecke, a computational graduate student and first author of the PNAS study. For someone with a disability who is dependent on these technologies, being misunderstood could have serious consequences. There are probably many culprits for these disparities, but Koenecke points to the most likely: the data used for training, which are predominantly from white, native speakers of American English. By using databases that are narrow both in the words that are used and how they are said, training systems exclude accents and other ways of speaking that have unique linguistic features. Humans, presumably including those who create these technologies, have accent and language biases. For example, research shows that the presence of an accent affects whether jurors find people guilty and whether patients find their doctors competent.
Recognizing these biases would be an important way to avoid implementing them in technologies. But developing more inclusive technology takes time, effort and money, and often the decision to invest these are market-driven. (In response to several queries, only a Google spokesperson responded in time for publication, saying, in part, “We’ve been working on the challenge of accurately recognizing variations of speech for several years and will continue to do so.”)
Safiya Noble, an associate professor of information studies at the University of California, Los Angeles, admits that it’s a tricky challenge. “Language is contextual,” says Noble, who was not involved in the study. “But that doesn’t mean that companies shouldn’t strive to decrease bias and disparities.” To do this, they need the input of humanists and social scientists who understand how language actually works.
From the tech side, feeding more diverse training data into the programs could close this gap, Koenecke says. Noble adds that tech companies should also test their products more widely and have more diverse workforces so people from different backgrounds and perspectives can directly influence the design of speech technologies. Koenecke suggests that automated speech-recognition companies use the PNAS study as a preliminary benchmark and keep using it to assess their systems over time.
In the meantime, many of us will continue to struggle between identity and being understood when interacting with Alexa, Cortana or Siri. But Lawrence chooses identity every time: “I’m not switching,” she says. “I’m not doing it.”