Three parts Bosnian text. Thirteen parts Kurdish. Fifty-five parts Swahili. Eleven thousand parts English.

This is part of the data recipe for Facebook’s new large language model, which the company claims is able to detect and rein in harmful content in over 100 languages. Bumble uses similar technology to detect rude and unwanted messages in at least 15 languages. Google uses it for everything from translation to filtering newspaper comment sections. All have comparable recipes and the same dominant ingredient: English-language data.

For years, social media companies have focused their automatic content detection and removal efforts more on content in English than the world’s 7,000 other languages. Facebook left almost 70 percent of Italian- and Spanish-language Covid misinformation unflagged, compared to only 29 percent of similar English-language misinformation. Leaked documents reveal that Arabic-language posts are regularly flagged erroneously as hate speech. Poor local language content moderation has contributed to human rights abuses, including genocide in Myanmar, ethnic violence in Ethiopia, and election disinformation in Brazil. At scale, decisions to host, demote, or take down content directly affect people’s fundamental rights, particularly those of marginalized people with few other avenues to organize or speak freely.

The problem is in part one of political will, but it is also a technical challenge. Building systems that can detect spam, hate speech, and other undesirable content in all of the world’s languages is already difficult. Making it harder is the fact that many languages are “low-resource,” meaning they have little digitized text data available to train automated systems. Some of these low-resource languages have limited speakers and internet users, but others, like Hindi and Indonesian, are spoken by hundreds of millions of people, multiplying the harms created by errant systems. Even if companies were willing to invest in building individual algorithms for every type of harmful content in every language, they may not have enough data to make those systems work effectively.

A new technology called “multilingual large language models” has fundamentally changed how social media companies approach content moderation. Multilingual language models—as we describe in a new paper—are similar to GPT-4 and other large language models (LLMs), except they learn more general rules of language by training on texts in dozens or hundreds of different languages. They are designed specifically to make connections between languages, allowing them to extrapolate from those languages for which they have a lot of training data, like English, to better handle those for which they have less training data, like Bosnian.

These models have proven capable of simple semantic and syntactic tasks in a wide range of languages, like parsing grammar and analyzing sentiment, but it’s not clear how capable they are at the far more language- and context-specific task of content moderation, particularly in languages they are barely trained on. And besides the occasional self-congratulatory blog post, social media companies have revealed little about how well their systems work in the real world.

Why might multilingual models be less able to identify harmful content than social media companies suggest?

One reason is the quality of data they train on, particularly in lower-resourced languages. In the large text data sets often used to train multilingual models, the least-represented languages are also the ones that most often contain text that is offensive, pornographic, poorly machine translated, or just gibberish. Developers sometimes try to make up for poor data by filling the gap with machine-translated text, but again, this means the model will still have difficulty understanding language the way people actually speak it. For example, if a language model has only been trained on text machine-translated from English into Cebuano, a language spoken by 20 million people in the Philippines, the model may not have seen the term “kuan,” slang used by native speakers but one that does not have any comparable term in other languages.