
English | Gaelic

Challenge 11.2
How can technology unlock the cultural and economic potential of the Gaelic language by addressing the sparsity of useful, available data — a problem faced by many languages around the world?
Challenge Sponsor: Scottish Government, Directorate for Education Reform
CivTech is a Scottish Government programme that brings the public, private and third sectors together to build things that make people’s lives better. We take Challenges faced by government departments, public sector organisations and charities, and invite anyone with a brilliant idea to work hand-in-hand with us to create the solution.
Challenge summary
Scottish Gaelic, like many under-resourced languages, faces a significant barrier in developing advanced language technologies: data sparsity. This Challenge aims to overcome this barrier, ensuring Gaelic language and culture thrive in the 21st century and beyond. Successfully addressing data sparsity will enrich communities, safeguard cultural heritage and unlock economic opportunities, providing lasting benefits for speakers and learners globally.
Key information for applicants
Please note: you must apply for this Challenge via Public Contracts Scotland
Launch date
22 July 2025
Closing date
Midday, 2 September 2025
Exploration Stage interviews
6 October 2025
Exploration Stage
3 to 21 November 2025
Accelerator interviews
4 December 2025
Accelerator Stage
19 January to 1 May 2026
Maximum contract value
£650,000 + VAT
Q&A session
There will be an online Q&A session on Monday 18 August 2025 from 11:00 am–12:00 pm. It will hosted on Microsoft Teams and recorded to comply with procurement rules. Click here to register for the session.
This date may be subject to change.
What is the problem, and how does it affect the Challenge Sponsor organisation, service users and/or People of Scotland?
Gaelic is a living language in Scotland. It is the main language of many families and communities, especially in the Outer Hebrides, despite predictions of its ‘imminent demise’ going back generations.
In some respects, the Gaelic speech community has adapted well to the demands of contemporary life. Gaelic now enjoys a vibrant presence in education, the media and certain areas of commerce. Public awareness of its value in tourism and cultural heritage also continues to grow. Other hints of positive growth are the steady expansion of Gaelic-medium education, rising engagements with online learning platforms (e.g. SpeakGaelic or Duolingo) and the 2022 census, which recorded the first increase in speaker numbers in fifty years.
Yet as the language evolves, so too do the issues it faces. One of the most pressing is in the digital sphere: the rapid development of language learning technologies presents a significant challenge, but also a transformative opportunity. Success in this area could elevate Gaelic’s status and help secure its future in the digital age.
A stubborn barrier to developing Gaelic language technology, however, is data sparsity – a problem that’s common to most minority languages. The limited availability of Gaelic text and speech data constrains the accuracy of speech and language models, which results in unreliable machine translation and speech recognition, and flawed outputs from tools such as ChatGPT or other large language models (LLMs). Overcoming this limitation is essential if we are to build more robust and trustworthy Gaelic language technologies. In turn, this will help support the language’s growth and allow Scottish society to more fully benefit from a thriving Gaelic culture.
This data gap limits the development of tools that could support -
Learning support for Gaelic-medium education and lifelong learning.
Digital inclusion, ensuring Gaelic is not left behind in public service delivery.
Engagement with the large international learner community (e.g. in North America).
The creative industries and the wider cultural sector.
Policy innovation, such as drawing on traditional environmental knowledge for climate action.
Economic opportunity through branding, tourism and entrepreneurship.
It remains difficult to measure the exact number of Gaelic learners. For instance, Speak Gaelic had 10 million content views in a year, this includes followers on social media platforms including YouTube ranging 60 – 70% from outside the UK, that shows the capacity for Gaelic language content to engage people interested in the language. This and other information is accessible in the MG ALBA’s Annual Report for 2024/25, available on this website: https://mgalba.com/annual-report/2025/
In the 1980s, the great Gaelic poet Murdo MacFarlane predicted the language would die within 40 years. His fellow speakers took that as a clarion call, laying the groundwork for the innovations that have helped Gaelic to grow its speaker numbers, media profile and economic contributions. That momentum must now continue into the age of AI, to ensure one of Europe’s oldest literary languages thrives in the 21st century.
How will we know the Challenge has been solved?
This Challenge supports the commercial opportunity to develop a solution that directly benefits Gaelic language.
We will know it has been solved when developers and scientists have access to foundational speech and text data for developing advanced and varied language technologies, on par with well-supported minority languages (e.g. Basque, Irish and Welsh). The data will be diverse and inclusive enough to provide technologies that work for a wide range of dialects, purposes and linguistic domains. Examples of such technologies include:
Media subtitling (e.g. news broadcasts), supporting accessibility for learners and the hearing-impaired.
Personalised speech synthesis voices for use in education, broadcasting and assistive technologies.
Language-enabled search and discovery tools for Gaelic content across cultural archives and media libraries.
Interactive conversational agents (e.g. chatbots) for tourism and language learning.
It is important to emphasise that the problem of data sparsity is ubiquitous: it affects almost all of the world’s languages. So, while we are interested in a solution for Gaelic, we believe that it could powerfully affect minority language dynamics around the globe.
Who are the end users likely to be?
We expect the end users of a solution to the challenge itself will be language technologists, independent software developers and IT companies.. They require access to high-quality Gaelic speech and text data to build tools on par with those available for other well-supported languages.
The technologies that follow from this solved Challenge could benefit a large and diverse user set:
Speech tools could support those with dementia or aphasia to communicate in their mother tongue, enhancing person-centred care.
Gaelic-medium students and other language learners will benefit from automatic transcription and feedback on pronunciation – such tools would also remove barriers for students needing additional support.
Businesses and entrepreneurs will more easily integrate Gaelic language into branding, tourism and digital services, enhancing the value of their products and creating new economic opportunities.
Content creators – from BBC ALBA to community podcasters – will gain efficient transcription, translation, subtitling and voice-synthesis tools that broaden accessibility and audience engagement.
Artists and technologists will be able to explore new digital storytelling and heritage projects.
Has the Challenge Sponsor attempted to solve this problem before?
No current solution addresses the scale and specificity of the Challenge. While tools and initiatives exist to assist with certain aspects, they are limited in scope or not directly aligned with the development of Gaelic speech and language technologies at the required scale.
Language resources are available for Gaelic, such as a standard orthography and aids such as dictionaries and grammars, but these do not ameliorate the data sparsity challenge. It is useful to compare the situation with Basque, another minority language: 4 billion words of text can be accessed in it, but the extant corpus of Gaelic text is 150 million words. Furthermore, the quality of the Gaelic corpus is uneven; much of it is scraped automatically and of dubious provenance (e.g. from low-quality machine translation). So, both the quality and magnitude of the pre-existing Gaelic data present challenges.
One emerging pilot that is relevant to the challenge is ‘Opening the Well’, a crowdsourcing transcription platform pioneered by the University of Edinburgh, in collaboration with Tobar an Dualchais / Kist o Riches. This will help to increase the speech and language data for developing language technologies, but it is limited in scope and scale. For instance, the speech recordings involved are mostly from mid-20th century oral narrative and thus diverge somewhat from contemporary Gaelic linguistic patterns. More information can be found on this website:
https://blogs.ed.ac.uk/garg/2025/06/04/gaelic-in-the-digital-age-inside-the-eist-project/
The University of Edinburgh and the University of Glasgow are working towards transcribing some of the BBC Sound Archive data and disseminating it through online portals. But these initiatives, again, are relatively slow moving. Additionally, legal, ethical and copyright issues preclude making much of this data publicly available. For instance, if permission to use data or information was not agreed at the time of collection, or uses or applications of data and information has changed and is now far removed from what was imagined at the time. A component of the Challenge might be to gather as much of these data together in an open training corpus, but we believe that it is essential to gather a new dataset of contemporary Gaelic.
Are there any interdependencies or blockers?
We are not aware of any specific blockers or conflicts but it is important to consider that Gaelic literacy rates are low amongst heritage speakers. Additionally, we believe that any solution based upon crowdsourcing will need to budget in community coordinators to be effective. Getting buy-in from the Gaelic speaking population takes dedicated effort, but we believe that this approach will produce the most significant pay-off.
Will a solution need to integrate with any existing systems / equipment?
The University of Edinburgh is currently developing a speech-to-text API that will be available by the fourth quarter of 2025. This likely will need to be integrated. The costs are currently covered by a Scottish Government grant (project title: Ecosystem for Interactive Speech Technologies). This project is also piloting the developing of interviewer chatbots, and will be able to share results from that around the same time period.
If the solution uses text-to-speech, this will need to be developed. Alternatively, Cereproc provides a text to speech (TTS) model named ‘Ceitidh’, which is free to non-profits and for educational use.
Is this part of an existing service?
No.
Any technologies or features the Challenge Sponsor wishes to explore or avoid?
We believe that an imaginative and ambitious use of technology could significantly improve Gaelic data sparsity.
For example, much is currently being made of the potential of advanced artificial intelligence (AI), and in truth many of the products CivTech has developed over the past few years have AI as part of the tech stack, using componentry such as machine learning and pattern recognition, or indeed LLMs.
But there is no obligation on your part to go down a particular route: as long as the proposed solution offers the opportunity to solve the Challenge in question, we will consider it.
We do not wish to restrict ourselves to specific approaches and are open to ones that we may not have been considered before. Based on our experience, we would encourage proposals and solutions that explore or contain elements of the following –
Incorporating incentives to participation – such as score boards or prizes
Utilising current state-of-the-art Gaelic language technology creatively (e.g. speech recognition, large language models and speech synthesis)
Investigating the possibility of developing a smartphone app that provides automated ethnographic interviews on multiple topics (so that users can visit it repeatedly)
An alternative possibility is a phone-in service that conducts interviews (live or automatic) with Gaelic speakers
Handling all permissions (e.g. informed consent, GDPR considerations and usage permissions) at the user registration point
Establishing a portfolio of workflows, helping to address other minority languages at different stages of technological development – this could inform future commercialisation of the solution
Another pilot to mention is ongoing postgraduate research at the University of Edinburgh to develop an interviewer chatbot. We are exploring both text- and speech-based approaches, and – if successful – this work optionally could be leveraged for the present Challenge.
What is the commercial opportunity beyond a CivTech contract?
We are open to diverse commercial approaches. We offer the following suggestions as initial and flexible parameters to stimulate further discussion.
Gaelic has an unusually high learner-to-speaker ratio compared to most languages. Some estimate there are around 7 learners for every fluent speaker , whereas Spanish has just 0.15 learners per speaker. Crucially, however, Gaelic is just one of thousands of languages facing data sparsity challenges. A successful solution for Gaelic will deliver a model suitable for other low-resource languages too.
The tools built during this project – which we imagine will include processing pipelines, transcription interfaces, crowdsourcing apps and licensing templates – could be packaged and offered to other language communities. Commercial opportunities may also lie in training courses and technical support to provide additional revenue.
We are interested in ethical licensing and how it could ensure benefits for the communities who provided the data. For example, we are aware of ‘open-source but protected’ agreements, which ensure that the data produced from the Challenge can benefit minority language communities but permit developers to leverage it commercially.
We believe that Gaelic would benefit from further apps and digital products. For example a speech-enabled AI tutor, which gives real-time pronunciation feedback and lets learners practice speaking Gaelic in a natural way, could be offered through a subscription model. Done effectively, this could complement or compete with existing online language learning services. Regional Gaelic voice packs could be sold as add-ons, helping users hear text read in authentic accents. Even something as simple as a spelling and grammar checking app could generate income through one-off purchases or licensing to smartphone manufacturers.
Finally, some of the most valuable outcomes of the CivTech Challenge will be relational rather than purely technical. Demonstrating trust, transparency and effective collaboration with minority-language communities will significantly enhance the team’s public reputation. Internationally, many similar opportunities exist for partnerships between universities and linguistic communities, often funded by governments and research councils. Success in this Challenge will give the winning team a proven track record, opening doors to future collaborations.
We hope this demonstrates that we remain open to the commercial plans and aspirations of those that seek to solve the Challenge.
Who are the stakeholders?
Scottish Government Gaelic and Scots Division Language Policy team
Professor Will Lamb / ÈIST project (University of Edinburgh)
Further and Higher education institutions (including Sabhal Mòr Ostaig and Universities of Glasgow, Aberdeen and the Highlands & Islands)
Those involved in Gaelic Broadcasting (including for example MG ALBA; BBC; independent producers across the industry, content creators)
Those involved in Gaelic publishing (including for example Acair, Stòrlann, Gaelic Books Council)
Cultural and Historical organisations (including for example Tobar an Dualchais / Kist o Riches; historical societies etc. )
Who’s in the Challenge Sponsor team?
The core Challenge Team will comprise the Scottish Government’s Gaelic and Scots Division (Directorate for Education Reform), acting as project sponsor, and Professor Will Lamb of the University of Edinburgh, who will act as Advisor and the primary domain expert. Additional researchers, as well as public-sector and academic partners (e.g. BBC ALBA, MG ALBA and the University of Glasgow), may be invited to contribute as project needs evolve, by mutual agreement.
Professor Lamb will supply linguistic and technical guidance and the University of Edinburgh may offer non-funding support on request. For example, the University may grant access to relevant Gaelic language corpora and associated background IP under a separate written licence that defines permitted uses, duration and attribution.
What is the policy background to the Challenge?
The Scottish Government is committed to the safeguarding and revival of Gaelic within Scotland. Within education this is regulated by the legal framework created by the 1980 and 2016 Education (Scotland) Acts. For broadcasting it stems primarily from the Communications Act 2003. In the realm of language planning the key legislation is the Gaelic Language (Scotland) Act 2005 and its creation of Bòrd na Gàidhlig. The Act’s primary policy initiative, Bòrd na Gàidhlig’s National Gaelic Language Plan, is the major policy document informing Gaelic development. The individual corporate Gaelic Language Plans produced under the National Plan’s strategic objectives are the instruments through which individual initiatives on behalf of Gaelic are pursued in the public sector.
The Scottish Government will shortly be enacting the Scottish Languages Bill whose provisions elevate the work being done on behalf of Gaelic in each of these spheres bar broadcasting. We are also implementing the recommendations of the Short Life Working Group on Economic and Social Opportunities for Gaelic. The latter explicitly seeks to tackle the economic obstacles facing the language, and more fully realise the economic benefits arising from it, and the Challenge outlined here would work towards those goals.