Google Partners African Universities To Launch WAXAL African Language Dataset

Google is partnering African universities and research institutions to launch WAXAL, a large-scale open speech dataset for African languages.
The dataset covers 21 Sub-Saharan African languages, including Hausa, Yoruba, Igbo, Luganda, Swahili, and Acholi. Google says WAXAL is designed to support more than 100 million speakers who have largely been excluded from voice-based technologies due to limited quality language data.
While voice assistants and speech-powered tools are widely used globally, Africa’s over 2,000 languages remain underrepresented in AI systems. This gap has limited access to voice-enabled services across sectors such as education, healthcare, and business.
Developed over three years with funding from Google, WAXAL contains 1,250 hours of transcribed natural speech and over 20 hours of high-quality studio recordings. These recordings can also be used to create realistic synthetic voices.
“The ultimate impact of WAXAL is the empowerment of people in Africa,” said Aisha Walcott-Bryantt, Head of Google Research Africa. She noted that the dataset provides a foundation for students, researchers, and entrepreneurs to build technology in their own languages and for their communities.
African institutions played a central role in the project. Universities and organisations including Makerere University in Uganda, the University of Ghana, and Digital Umuganda in Rwanda led data collection efforts, working closely with Google researchers.
Unlike many global datasets, ownership of the data remains with the partner institutions. This approach allows African researchers and students to develop local applications and tools without relying on external companies.
Joyce Nakatumba-Nabende, a Senior Lecturer at Makerere University, said the dataset would help researchers build speech technologies that reflect Africa’s diverse languages and cultural contexts.
At the University of Ghana, more than 7,000 volunteers contributed their voices to the project. According to Associate Professor Isaac Wiafe, the initiative is already supporting innovation in areas such as health, education, and agriculture.
The WAXAL dataset is now publicly available, giving developers, researchers, and startups access to foundational speech data to build more inclusive AI tools across Africa.
