Project Spark

Igniting Community-Driven Generative AI for Community Owned Language Models

Background

There are many diverse use cases in the public and private sectors that require localised control, making it highly beneficial for communities, whether they are local organisations, regional departments, or entire nations, to independently build and maintain their own language models. These models do not necessarily need to be cutting-edge; rather, they must be sufficiently effective to meet practical community needs.

Research Problem

Recent advances in model training methods and transformer architectures have, in theory, significantly reduced the resources required to build effective generative AI language models. However, it remains unclear how practically accessible these advancements are to everyday communities with modest resources. The central uncertainty is whether these technical improvements can realistically enable communities to own their AI.

Research Objectives

Our objective is to conduct a Proof-of-Concept (PoC) to explore if recent training and architectural advancements realistically enable everyday communities to independently create, manage, and sustain effective language models tailored to their unique needs.

Research Questions

  • Is it possible to build an effective language model on a modest budget?

  • What are the key challenges and enabling factors for community-owned language model initiatives?

Methodology

  • Employ a collaborative, volunteer-driven project structure.

  • Utilize Python, open source data, and open source models to create and fine-tune generative AI models.

  • Study the architecture to establish this skills and knowledge required to bring "in house".

  • Engage communities through technical volunteers, hands-on learners, and community ambassadors.

  • Document all phases from model training to deployment, including qualitative and quantitative assessments.

Expected Outcomes

  1. Successful development of a fully operational community-managed language model suitable for diverse public and private sector applications, demonstrating tangible community benefits.

  2. Partial success resulting in actionable insights, detailed guidance, and clear recommendations for further advancements.

  3. Comprehensive documentation and significant learning insights should the PoC not yield a viable model, clearly outlining encountered challenges and potential strategies for future success.

Timeline

The project is clearly defined and structured to be completed by August 2025

Join Our Efforts

We are inviting participation from:

  • Technical Volunteers: Individuals experienced in Python, data scraping, and transformer-based models.

  • Hands-on Learners: Those eager to engage directly in data preparation, model training, and fine-tuning.

  • Community Ambassadors: Enthusiastic individuals passionate about building inclusive AI communities and promoting collaborative solutions.

  • Sponsors and Supporters: Organizations or individuals interested in providing essential support to maintain resources and enable continuous research.