Join GamesBeat Summit 2021 this April 28-29. Register for a free or VIP pass today.
Facebook today open-sourced a dataset designed to surface age, gender, and skin tone biases in computer vision and audio machine learning models. The company claims that the corpus Casual Conversations is the first of its kind featuring paid people who explicitly provided their age and gender as opposed to labeling this information by third parties or estimating it using models.
Biases can make their way into the data used to train AI systems, amplifying stereotypes and leading to harmful consequences. Research has shownthat state-of-the-art image-classifying AI models trained onImageNet, a popular dataset containing photos scraped from the internet, automatically learn humanlike biases about race, gender, weight, and more. Countlessstudieshavedemonstrated that facial recognition is susceptible to bias. Its even been shown that prejudices can creep into the AI tools used to create art, potentially contributing to false perceptions about social, cultural, and political aspects of the past and hindering awareness about important historical events.
Casual Conversations, which contains over 4,100 videos of 3,000 participants, some from the Deepfake Detection Challenge, aims to combat this bias by including labels of apparent skin tone. Facebook says that the tones are estimated using the Fitzpatrick scale, a classification schema for skin color developed in 1975 by American dermatologist Thomas B. Fitzpatrick. The Fitzpatrick scale is a way to ballpark the response of types of skin to ultraviolet light, from Type I (pale skin that always burns and never tans) to Type VI (deeply pigmented skin that never burns).
Facebook says that it recruited trained annotators for Casual Conversations to determine which skin type each participant had. The annotators also labeled videos with ambient lighting conditions, which helped to measure how models treat people with different skin tones under low-light conditions.
A Facebook spokesperson told VentureBeat via email that a U.S. vendor was hired to select annotators for the project from a range of backgrounds, ethnicity, and genders. The participants who hailed from Atlanta, Houston, Miami, New Orleans, and Richmond were paid.
As a field, industry and academic experts alike are still in the early days of understanding fairness and bias when it comes to AI The AI research community can use Casual Conversations as one important stepping stone toward normalizing subgroup measurement and fairness research, Facebook wrote in a blog post. With Casual Conversations, we hope to spur further research in this important, emerging field.
In support of Facebooks point, theres a body of evidence that computer vision models in particular are susceptible to harmful, pervasive prejudice. A paper last fall by University of Colorado, Boulder researchers demonstrated that AI from Amazon, Clarifai, Microsoft, and others maintained accuracy rates above 95% for cisgender men and women but misidentified trans men as women 38% of the time. Independent benchmarks of major vendors systems by the Gender Shadesproject andthe National Institute of Standards and Technology (NIST)have demonstrated that facial recognition technology exhibits racial and gender bias and have suggested that current facial recognition programs can be wildly inaccurate, misclassifying people upwards of96% of the time.
Beyond facial recognition, features like Zooms virtual backgrounds and Twitters automatic photo-cropping tool have historically disfavored people with darker skin. Back in 2015, a software engineer pointed out that the image recognition algorithms in Google Photos were labeling his Black friends as gorillas. And nonprofit AlgorithmWatch showed that Googles Cloud Vision API at once time automatically labeled a thermometer held by a dark-skinned person as a gun while labeling a thermometer held by a light-skinned person as an electronic device.
Experts attribute many of these errors to flaws in the datasets used to train the models. One recent MIT-led audit of popular machine learning datasets found an average of 3.4% annotation errors, including one where a picture of a Chihuahua was labeled feather boa. An earlier version of ImageNet, a dataset used to train AI systems around the world, was found to contain photos of naked children, porn actresses, college parties, and more all scraped from the web without those individuals consent. Another computer vision corpus, 80 Million Tiny Images, was found to have a range of racist, sexist, and otherwise offensive annotations, such as nearly 2,000 images labeled with the N-word, and labels like rape suspect and child molester.
But Casual Conversations is far from a perfect benchmark. Facebook says it didnt collect information about where the participants are originally from. And in asking their gender, the company only provided the choices male, female, and other leaving out genders like those who identify as nonbinary.
The spokesperson also clarified that Casual Conversations is available to Facebook teams only as of today and that employees wont be required but will be encouraged to use it for evaluation purposes.
Exposs about Facebooks approaches to fairness havent done much to engender trust within the AI community. A New York University study published in July 2020 estimated that Facebooks machine learning systems make about 300,000 content moderation mistakes per day, and problematic posts continue to slip through Facebooks filters. In one Facebook group that was created last November and rapidly grew to nearly 400,000 people, members calling for a nationwide recount of the 2020 U.S. presidential election swapped unfounded accusations about alleged election fraud and state vote counts every few seconds.
For Facebooks part, the company says that while it considers Casual Conversations a good, bold first step, itll continue pushing toward developing techniques that capture greater diversity over the next year or so. In the next year or so, we hope to explore pathways to expand this data set to be even more inclusive with representations that include more geographical locations, activities, and a wider range of gender identities and ages, the spokesperson said. Its too soon to comment on future stakeholder participation, but were certainly open to speaking with stakeholders in the tech industry, academia, researchers, and others.
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.
Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more