Common Voice: Piloting Alternative Language Data Licenses workshop in Kenya, Maseno University.

Radically open source licenses like Creative Common CC0, in which no attribution is required, no terms of use can be prescribed, and no control afforded to communities, may not be appropriate for certain language communities. In communities that have been systematically exploited throughout history, the idea of giving up their data to foreign technology corporations for no economic returns may be uncompelling. Together with Mozilla Foundation, we are working on a pilot program with a vision of enabling language communities to leverage governance tools (such as licenses) that center language community needs and context, and have them be more accessible and widely used.
For this project, the Dholuo community has been chosen to co-create and pilot a novel, community-centered license on the Common Voice platform, as a proof of concept that can then be replicated and adapted.
[Read here: https://foundation.mozilla.org/en/common-voice/in-country-programmes/]
The "Common Voice: Piloting Alternative Language Data Licenses" workshop at Maseno University in Kenya, has brought together participants from the Dholuo language community, linguists, legal fraternity and AI researchers to discuss and co-create a community license.
Understanding AI and Natural Language Processing (NLP): A Pathway to Inclusive Technology
Maseno University, Kenya – As artificial intelligence (AI) continues to revolutionize the digital landscape, Natural Language Processing (NLP) has emerged as a critical component in enabling machines to understand and generate human language. the workshop at Maseno University, titled Common Voice: Piloting Alternative Language Data Licenses, highlighted the significance of NLP in promoting inclusivity for African languages.
The workshop emphasized the pressing need for AI-driven solutions that accommodate underrepresented African languages. Many languages, including Dholuo, lack sufficient digital resources to be effectively integrated into modern NLP systems. The project aims to bridge this gap by collecting and licensing linguistic data to build robust voice-enabled technologies.




