Project Spark
Igniting Community-Driven Generative AI for Community Owned Language Models
Background
There are many diverse use cases in the public and private sectors that require localised control, making it highly beneficial for communities, whether they are local organisations, regional departments, or entire nations, to independently build and maintain their own language models. These models do not necessarily need to be cutting-edge; rather, they must be sufficiently effective to meet practical community needs.
Research Problem
Recent advances in model training methods and transformer architectures have, in theory, significantly reduced the resources required to build effective generative AI language models. However, it remains unclear how practically accessible these advancements are to everyday communities with modest resources. The central uncertainty is whether these technical improvements can realistically enable communities to own their AI.
Research Objectives
Our objective is to conduct a Proof-of-Concept (PoC) to explore if recent training and architectural advancements realistically enable everyday communities to independently create, manage, and sustain effective language models tailored to their unique needs.
Research Questions
Is it possible to build an effective language model on a modest budget?
What are the key challenges and enabling factors for community-owned language model initiatives?
Methodology
Employ a collaborative, volunteer-driven project structure.
Utilize Python, open source data, and open source models to create and fine-tune generative AI models.
Study the architecture to establish this skills and knowledge required to bring "in house".
Engage communities through technical volunteers, hands-on learners, and community ambassadors.
Document all phases from model training to deployment, including qualitative and quantitative assessments.
Expected Outcomes
Successful development of a fully operational community-managed language model suitable for diverse public and private sector applications, demonstrating tangible community benefits.
Partial success resulting in actionable insights, detailed guidance, and clear recommendations for further advancements.
Comprehensive documentation and significant learning insights should the PoC not yield a viable model, clearly outlining encountered challenges and potential strategies for future success.
Timeline
The project is clearly defined and structured to be completed by August 2025
Join Our Efforts
We are inviting participation from:
Technical Volunteers: Individuals experienced in Python, data scraping, and transformer-based models.
Hands-on Learners: Those eager to engage directly in data preparation, model training, and fine-tuning.
Community Ambassadors: Enthusiastic individuals passionate about building inclusive AI communities and promoting collaborative solutions.
Sponsors and Supporters: Organizations or individuals interested in providing essential support to maintain resources and enable continuous research.