Casual Conversations v2 is a consent-driven, publicly available dataset that allows researchers to better evaluate the fairness and robustness of specific types of AI models to make them more inclusive.
Casual Conversations v2 expands on the first version by including data from countries other than the United States, such as Brazil, India, Indonesia, Mexico, Vietnam, and the Philippines.
Researchers need diverse and inclusive datasets to rigorously evaluate fairness in their models for AI to serve communities fairly. AI researchers need data to assess how well a model works for different demographic groups in computer vision and speech recognition applications. And gathering this data can be difficult due to complex geographic and cultural contexts, inconsistency between different sources, and labeling challenges.
This comprehensive dataset includes a granular list of 11 self-provided and annotated categories that can be used to further assess the algorithmic fairness and robustness of these AI systems. The release of this dataset, created in consultation with internal civil rights experts, is one of the key highlights of Meta’s civil rights progress.
The dataset, which includes 26,467 video monologues recorded in seven countries and 5,567 paid participants who provided self-identified attributes such as age and gender, is the next generation of the Casual Conversations consent-driven dataset, which we released in 2021.
According to Meta, it is the first open-source dataset with videos collected from multiple countries using highly accurate and detailed demographic information to help test AI models for fairness and robustness.