A team of developers have created software to be used by Indigenous communities across Canada to assist in language learning — but they say that the project may soon end due to lack of funding.
The National Research Council’s Indigenous Languages Technology (ILT) project, first conceptualized in 2015 by Dr. Anna Kazantseva, a researcher with a PhD in computational linguistics, has worked on learning and teaching technologies for over 25 Indigenous languages alongside a number of collaborators.
“I think all of [the team], me included, were hoping to use what we know, use our skills, to help revitalize these languages,” says Kazantseva.
Federal Budget 2017 specified “$6 million for the National Research Council of Canada to develop, in collaboration with Indigenous stakeholders, information technology to preserve oral histories by converting speech to text and creating other interactive educational materials.” The funding went first towards the ILT project and later to other Indigenous language technology projects as well.
The funding from the budget supported the project for three years before running out in 2020, at which point NRC began to fund the project with its own internal funding. The project is now running up against a deadline to secure new funding by March 2022, or the researchers may end the project and release the software as open-source to the public.
Indigenous languages are structurally very different from languages such as English or French, and the tools, teaching strategies and resources must be tailored to the language. Funding for language work, however, is scarce in many Indigenous communities.
With language teachers often receiving little compensation for teaching, developing curricula and working in communities, funding is a significant issue, says Gerry Lawson, a member of the project’s Indigenous advisory committee and a language worker who is a lead on Indigitization, a digitization project which has also received funding through the NRC.
“I know that every bit of funding that goes to a federal department is funding that doesn’t go to a community, but I also feel that we’re all too busy in communities. We don’t have the luxury of focusing on something that will come to fruition in five or ten or fifteen years, and the NRC has exactly that sort of time and money and inclination,” Lawson said.
The project is run through the NRC’s Digital Technologies Research Centre. Collaborators include the Yukon Native Language Centre, the Syilx Language House, Indigitization, the Pirurvik Centre and Carleton University.
Unique languages create a challenge in teaching
The project’s goal, first and foremost, is to create technologies that will assist in Indigenous language revitalization as dictated by the communities themselves.
The ILT project first began with the development of a verb conjugator, in collaboration with the Mohawk immersion school Onkwawenna Kentyohkwa, which allows users to create verbs by choosing a root action, a tense and the person/people performing the action.
Many Indigenous languages are polysynthetic and verb-based — meaning that a single word may be able to express what might take a full sentence in English. A root word can take on other morphemes (units of meaning) to create long “sentence-words.”
Dictionaries for polysynthetic languages sometimes categorize words by their root word, but the sheer volume of potential combinations makes cataloguing difficult.
A comprehensive dictionary for a polysynthetic language such as Mohawk would need to contain tens of thousands of entries, so a verb conjugator simplifies the verb-building process and helps learners understand the structure of the language.
The software was created manually with the help of fluent speakers, but the project team hopes use AI to speed up the process in the future. “Now that we have all this knowledge encoded, I’d like to see if we can use machine learning to add some robustness,” says Kazantseva.
The software is still in beta, and there are currently other verb conjugators in development, including one for the Michif language.
Data sovereignty and open-source technology will be priorities
Data protection and data sovereignty are important principles for the project. Once a software is completed, all data is returned to the community and is not archived with the project. Some communities choose to keep their software open to the community only, while others decide to make it open to the public.
Although the data is kept private, the project team plans to ensure that software remains open source. For example, the verb conjugator created for the Mohawk language is being used as a model to develop verb conjugators for other languages.
“We’re trying to make it more accessible so that a language activist or a linguist would be able to use the programs, and they wouldn’t have to know how to code,” says Fineen Davis, a developer working on the ILT project. “But we’re not quite there yet.”
In the event of the project ending, this would allow anyone to be able to use the software for their own language, whether or not they are a developer.
The researchers working on the project also note that technology isn’t central to language revitalization — it’s a tool that, when tailored to the needs of a language community, can assist in language learning.
“Technology is only useful if it serves the needs of teachers and their students — its role should be secondary,” Roland Kuhn, ILT project lead, said in an email to Research Money. “That is why from the beginning of the ILT project, we have asked teachers what technologies they think will help them.”
A lack of funding leading to an uncertain future
For now, the future of the project remains uncertain. It is receiving internal NRC funding for this fiscal year, but has no funding secured beyond that. “NRC cannot continue to support the project on its own indefinitely,” said Kuhn.
Should the project continue, the team will keep refining the software they’ve created and making new software as needed. If the project ends next year, the team is doing what they can to ensure that the work they are doing can be used by Indigenous communities in the future.
“We are preparing for the worst-case scenario in which the project is unfunded after March 31, 2022, by releasing all of the software we produce as open-source, and making it as user-friendly as possible,” said Kuhn.
“In this way, the accomplishments of the project will survive, through uptake by users all over Canada and the rest of the world.”
R$