{"id":2068,"date":"2025-12-11T10:10:38","date_gmt":"2025-12-11T15:10:38","guid":{"rendered":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/?p=2068"},"modified":"2025-12-11T10:10:38","modified_gmt":"2025-12-11T15:10:38","slug":"ai-for-language-and-cultural-preservation","status":"publish","type":"post","link":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/csci-tech\/ai-for-language-and-cultural-preservation\/","title":{"rendered":"AI for Language and Cultural Preservation"},"content":{"rendered":"<h1 style=\"text-align: center\"><span style=\"font-weight: 400\">AI for Language and Cultural Preservation<\/span><\/h1>\n<h1><span style=\"font-weight: 400\">Abstract<\/span><\/h1>\n<p><span style=\"font-weight: 400\">Nearly half of the world&#8217;s languages face extinction, threatening irreplaceable knowledge and cultural connections. This paper examines how artificial intelligence can support endangered language documentation and revitalization when guided by community priorities. Through case studies, from Hawaiian speech recognition to Cherokee learning platforms, the paper identifies both opportunities (improved access, engaging tools, cross-distance connection) and challenges (privacy risks, cultural appropriation, misinformation, sustainability). The central argument: effective preservation requires community leadership, robust consent frameworks, and sustained support rather than commodified technological quick fixes. The paper concludes with principles for responsible AI use that strengthens living languages and their cultural contexts.<\/span><\/p>\n<h1><span style=\"font-weight: 400\">Introduction<\/span><\/h1>\n<p><span style=\"font-weight: 400\">Approximately 40% of the world&#8217;s 6,700 languages risk extinction as speaker populations decline (Jampel, 2025). This crisis extends beyond communication loss. Languages embody cultural identity, historical memory, and community bonds. When languages disappear, speakers lose direct access to ancestral knowledge, particularly where oral histories predominate. Research suggests linguistic heritage connection correlates with improved adolescent mental health outcomes and reduced rates of certain chronic conditions.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Artificial intelligence has emerged as one approach to preservation, offering unprecedented documentation scale and interactive learning platforms. However, concerns persist beyond environmental costs, to whether technology can authentically serve community needs.<\/span><\/p>\n<p><span style=\"font-weight: 400\">This paper argues that while AI provides powerful tools for endangered language work through natural language processing and speech recognition, success depends on careful integration with Indigenous communities&#8217; priorities, values, and active participation.<\/span><\/p>\n<p><span style=\"font-weight: 400\">It first examines AI&#8217;s technical foundations, real-world applications and case studies, diverse stakeholder perspectives, and future promises and challenges facing the field.<\/span><\/p>\n<h1><span style=\"font-weight: 400\">Technical Foundations: How AI Works in Language Preservation\u00a0<\/span><\/h1>\n<p><span style=\"font-weight: 400\">To understand how AI contributes to language preservation, it is helpful to see how Natural Language Processing provides the foundational methods for analyzing language, while modern language models apply these methods to learn from data and produce meaningful representations and outputs that support language preservation-focused tasks.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Natural Language Processing<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Natural Language Processing (NLP) sits at the core of how AI is used to process and manipulate text. Two key areas inform language documentation: Computational Linguistics (developing tools and methods for analyzing language data) and Semantics (studying how meaning operates in language).<\/span><\/p>\n<p><span style=\"font-weight: 400\">Semantics broadly concerns deriving meaning from language. It spans the linguistic side, where it handles lexical and grammatical meaning tied to computational linguistics, and the philosophical side, which examines distinctions between fact and fiction, emotional tone (e.g., positive, neutral, negative), and relationships between different corpora (Ali et al., 2025, p. 133).<\/span><\/p>\n<p><span style=\"font-weight: 400\">Semantics can address challenges like word ambiguity: where <\/span><i><span style=\"font-weight: 400\">lie<\/span><\/i><span style=\"font-weight: 400\"> could mean falsehood or resting horizontally. Computational Linguistics tools like Bidirectional Encoder Representations from Transformers (BERT) use contextual analysis to disambiguate such terms. Other common challenges include capturing idiomatic meanings in translation (Ali et al., 2025, p. 134).<\/span><\/p>\n<p><span style=\"font-weight: 400\">Computational Linguistics began in the 1940s-50s, but recent advancements have driven major developments in NLP through Machine Learning and Deep Learning. Neural networks, which are data-processing structures inspired by the human brain, allow machines to learn from sample data to perform complex tasks by recognizing, classifying, and correlating patterns.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In more recent years, Generative AI has gained prominence. It relies on transformer architectures, a type of neural network that analyzes entire sequences simultaneously to determine which parts are most important, enabling effective learning from large datasets.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In short, NLP implementation involves preprocessing textual data through steps such as tokenization (breaking text into smaller units), stemming or lemmatization (reducing words to their root forms, e.g., <\/span><i><span style=\"font-weight: 400\">talking<\/span><\/i><span style=\"font-weight: 400\"> \u2192 <\/span><i><span style=\"font-weight: 400\">talk<\/span><\/i><span style=\"font-weight: 400\">), and stop-word removal (eliminating common or low-value words like <\/span><i><span style=\"font-weight: 400\">and<\/span><\/i><span style=\"font-weight: 400\">, <\/span><i><span style=\"font-weight: 400\">for<\/span><\/i><span style=\"font-weight: 400\">, <\/span><i><span style=\"font-weight: 400\">with<\/span><\/i><span style=\"font-weight: 400\">). The processed data is then used to train models for specific tasks.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Common NLP applications relevant to language preservation include part-of-speech tagging, which labels words in a sentence based on their grammatical roles (e.g., nouns, verbs, adjectives, adverbs); word-sense disambiguation, which resolves multiple possible meanings of a word; speech recognition, which converts spoken language into text; machine translation, which enables translation between languages; sentiment analysis, which identifies emotional tone in text; and automatic resource mining, which involves the automated collection of linguistic resources (Amazon Web Services, n.d.).<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Language Models\u00a0<\/span><\/h2>\n<p><span style=\"font-weight: 400\">BERT, developed by Google, is trained mainly with masked language modeling, where it predicts missing words from surrounding context. The original BERT also included a next sentence prediction task to judge whether one sentence follows another, although many modern variants modify or omit this objective (BERT, n.d.). Multilingual BERT (MBERT) extends this ability to multiple languages (Ali et al., 2025, p. 136).<\/span><\/p>\n<p><span style=\"font-weight: 400\">Building on these advances, Cherokee researchers are applying and extending NLP techniques to advance language preservation and revitalization. According to Dr. David Montgomery, a citizen of the Cherokee Nation, \u201cIt would be a great service to Cherokee language learners to have a translation tool as well as an ability to draft a translation of documents for first-language Cherokee speakers to edit as part of their translation tasks\u201d (Zhang et al., 2022, p. 1535).<\/span><\/p>\n<p><span style=\"font-weight: 400\">To realize this potential, the research effort focuses on adapting existing NLP frameworks and creating tools specifically suited to Cherokee. Effective data collection and processing depend on capabilities such as automatic language identification and multilingual embedding models. For example, aligning Cherokee and English texts requires projecting sentences from both languages into a common semantic space to evaluate their similarity. These are capabilities that most standard NLP tools don\u2019t provide and must be custom-built for this context (Zhang et al., 2022, p. 1535).<\/span><\/p>\n<h1><span style=\"font-weight: 400\">Real-World Applications and Case Studies\u00a0<\/span><\/h1>\n<p><span style=\"font-weight: 400\">Broadly speaking, researchers and developers are creating innovative AI solutions to support language preservation across communities.<\/span><\/p>\n<p><span style=\"font-weight: 400\">For example, the First Languages A.I. Reality (FLAIR) Initiative develops adaptable AI tools for Indigenous language revitalization worldwide. Co-founder Michael Running Wolf (Northern Cheyenne Tribe) describes the project\u2019s goal as increasing the number of active speakers through accessible technologies. One notable product, \u201cLanguage in a Box,\u201d is a portable, voice-based learning system that delivers customizable guided lessons for different languages (Jampel, 2025).<\/span><\/p>\n<p><span style=\"font-weight: 400\">Indigenous scientists are also creating culturally grounded AI tools for youth engagement. Danielle Boyer developed Skobot, a talking robot designed to speak Indigenous languages (Smithsonian Magazine), while Jacqueline Brixey created Masheli, a chatbot that communicates in both English and Choctaw. Brixey notes that despite more than 220,000 enrolled Choctaw Nation members, fewer than 7,000 are fluent speakers today (Brixey, 2025)<\/span><\/p>\n<figure id=\"attachment_2070\" aria-describedby=\"caption-attachment-2070\" style=\"width: 441px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-2070\" src=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/danielle-boyer-and-student-wearing-skobots_web-300x200.jpeg\" alt=\"Students with Skobots on their shoulders stand next to Danielle Boyer\" width=\"441\" height=\"294\" srcset=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/danielle-boyer-and-student-wearing-skobots_web-300x200.jpeg 300w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/danielle-boyer-and-student-wearing-skobots_web-1024x683.jpeg 1024w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/danielle-boyer-and-student-wearing-skobots_web-768x512.jpeg 768w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/danielle-boyer-and-student-wearing-skobots_web-600x400.jpeg 600w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/danielle-boyer-and-student-wearing-skobots_web.jpeg 1071w\" sizes=\"auto, (max-width: 441px) 100vw, 441px\" \/><figcaption id=\"caption-attachment-2070\" class=\"wp-caption-text\">Students with Skobots on their shoulders stand next to Danielle Boyer (The STEAM Connection. n.d.)<\/figcaption><\/figure>\n<h2><span style=\"font-weight: 400\">Hawaiian Language Revitalization &#8211; ASR\u00a0<\/span><\/h2>\n<p><span style=\"font-weight: 400\">A collaboration between The MITRE Corporation, University of Hawai\u2018i at Hilo, and University of Oxford explored Automatic Speech Recognition (ASR) for Hawaiian, a low-resource language. Using dozens of hours of labeled audio and millions of pages of digitized Hawaiian newspaper text, researchers fine-tuned models such as Whisper (large and large-v2), achieving a Word Error Rate (WER) of about 22% (Chaparala et al., 2024, p. 4). This is promising for research and assisted workflows, but it remains challenging for beginner and intermediate learners without human review.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The models struggled with key phonetic features, particularly the glottal stop (\u02bbokina \u27e8\u02bb\u27e9) and vowel length distinctions, due to their subtle acoustic properties. Occasionally, the model substituted spaces for glottal stops, potentially due to English linguistic patterns where glottal stops naturally occur before vowels that begin words. Hawaiian\u2019s success with Whisper benefited from available training data, including 338 hours of Hawaiian and 1,381 hours of M\u0101ori, and its Latin-based alphabet. Other under-resourced languages lacking such advantages may face greater transcription challenges (Chaparala et al., 2024, p. 4).<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Missing Scripts Initiative &#8211; Input Methods\u00a0<\/span><\/h2>\n<p><span style=\"font-weight: 400\">The Missing Scripts Initiative, led by ANRT (National School of Art and Design, France) in collaboration with UC Berkeley\u2019s Script Encoding Initiative and the University of Applied Sciences, Mainz, addresses a major gap: nearly half of the world\u2019s writing systems lack digital representation.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Launched in 2024 as part of the International Decade of Indigenous Languages, the initiative recognizes that beyond simply encoding these scripts into standard formats, there is the need to create functional input methods that allow users to type and interact with these writing systems. Developing these digital typefaces requires collaboration among linguists, developers, and native speakers. The initiative&#8217;s primary objectives involve encoding these scripts, a standardization process that assigns unique numerical identifiers to each character, and producing digital fonts. This work supports UNESCO\u2019s global efforts to preserve and revitalize Indigenous linguistic heritage (UNESCO, n.d.).<\/span><\/p>\n<figure id=\"attachment_2071\" aria-describedby=\"caption-attachment-2071\" style=\"width: 405px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" class=\" wp-image-2071\" src=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/Screenshot-2025-12-11-at-10.04.12-AM-300x208.png\" alt=\"Diagram showing the computational process of translating Af\u00e1ka, the script of the Ndyuka language (an English-based creole of Suriname), into English at the Missing Scripts Program\" width=\"405\" height=\"281\" srcset=\"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/Screenshot-2025-12-11-at-10.04.12-AM-300x208.png 300w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/Screenshot-2025-12-11-at-10.04.12-AM-1024x711.png 1024w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/Screenshot-2025-12-11-at-10.04.12-AM-768x534.png 768w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/Screenshot-2025-12-11-at-10.04.12-AM-1536x1067.png 1536w, https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/Screenshot-2025-12-11-at-10.04.12-AM.png 1546w\" sizes=\"auto, (max-width: 405px) 100vw, 405px\" \/><figcaption id=\"caption-attachment-2071\" class=\"wp-caption-text\">Full process of translating Af\u00e1ka to English computationally at the Missing Scripts Program, a script for the Ndyuka language, an English-based creole of Suriname (The Missing Scripts, n.d.)<\/figcaption><\/figure>\n<h2><span style=\"font-weight: 400\">Cherokee Case Study \u2013 Tokenization &amp; Community-based Language Learning<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Researchers at UNC Chapel Hill found that Cherokee\u2019s strong morphological structure, where a single word can express an entire English sentence, poses unique NLP challenges. Character-level modeling using Latin script proved more effective than traditional word-level tokenization. Moreover, because Cherokee\u2019s word order varies depending on discourse context, translating entire documents at once may be more effective than translating one sentence at a time (Zhang et al., 2022, p. 1535-1536).<\/span><\/p>\n<p><span style=\"font-weight: 400\">Beyond technical modeling, researchers emphasized community-driven learning platforms that combine human input with AI. Inspired by systems like Wikipedia and Duolingo, these collaborative tools crowdsource content from speakers and learners. These platforms address two critical challenges simultaneously: the scarcity of training data for endangered languages and the resulting limitations in model performance. This approach transforms language learning from an individual task into a collective effort aimed at cultural preservation (Zhang et al., 2022, p. 1532).<\/span><\/p>\n<h1><span style=\"font-weight: 400\">Community Perspectives: Strengths and Concerns\u00a0<\/span><\/h1>\n<p><span style=\"font-weight: 400\">A study by Akdeniz University researchers examined community perspectives on AI for language preservation, highlighting both benefits and challenges.\u00a0<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Strengths<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Community members emphasized the transformative role of mobile apps in democratizing access: \u201cMobile apps have democratized access to our language, allowing learners from geographically dispersed areas to engage with it daily.\u201d Interactive games and voice recognition tools make learning more engaging and accessible, while digital platforms foster connection and belonging among geographically dispersed speakers.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Translation tools and automated content generation have also proven valuable, with one linguist commenting that these technologies have been \u201cgame-changers in making our stories universally accessible.\u201d Participants also underscored the value of cross-disciplinary collaboration, with one project manager noting that partnerships between tech developers and Indigenous communities have \u201copened new pathways for innovation.\u201d AI\u2019s adaptability was seen as another strength, allowing solutions to be customized for each language community (Soylu &amp; \u015eahin, 2024. p. 15). For example, prioritizing translation tools over transcription systems depending on local needs.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Concerns<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Participants also voiced serious concerns about ethics, privacy, and cultural sensitivity. One community leader stressed the importance of ensuring that \u201cthese technologies respect our cultural values and the integrity of our languages.\u201d Limited internet infrastructure, funding instability, and intergenerational gaps remain ongoing barriers. As another participant observed, \u201cBridging the gap between our elders and technology is ongoing work.\u201d Long-term sustainability depends on reliable funding and culturally informed consent practices (Soylu &amp; \u015eahin, 2024. p. 15).\u00a0<\/span><\/p>\n<h2><span style=\"font-weight: 400\">The Human and Cultural Dimensions\u00a0<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Focusing in on specific themes and perspectives, Indigenous innovators emphasize AI cannot replace human elders and tradition keepers. Technology should complement traditional practices like classes and intergenerational transmission. \u201cLanguage is a living thing,\u201d requiring living speakers, cultural context, and human relationships (Jampel, 2025).\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Language preservation carries profound emotional and cultural significance. It is not merely the deployment of \u2018fancy technology\u2019 but usually a response to the deep wounds caused by historical oppression, including forced assimilation, the systematic suppression of Indigenous languages, and the displacement of communities from their ancestral lands (Brixey, 2025). For many, language revitalization is not just an educational effort but an act of cultural healing and the restoration of what was forcibly taken.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Critical Concerns and Emerging Risks<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Beyond community-identified challenges, broader concerns about AI\u2019s role in language preservation have emerged, particularly regarding quality control and misinformation. In December 2024, the Montreal Gazette reported the sale of AI-generated \u201chow-to\u201d books for endangered languages, including Abenaki, Mi\u2019kmaq, Mohawk, and Omok (a Siberian language extinct since the 18th century). These books contained inaccurate translations and fabricated content, which Abenaki community members described as demeaning and harmful, undermining both learners\u2019 efforts and trust in legitimate revitalization work (Jiang, 2025).\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Many Indigenous communities also remain cautious about adopting AI. Jon Corbett, a Nehiyaw-M\u00e9tis computational media artist and professor at Simon Fraser University, noted that some communities \u201cdon\u2019t see the relevance to our culture, and they\u2019re skeptical and wary of their contribution. Part of that is that for Indigenous people in North America, their language has been suppressed and their culture oppressed, so they\u2019re weary of technology and what it can do\u201d (Jiang, 2025). This caution reflects historical trauma and highlights critical questions about control, ownership, and ethical deployment of AI in cultural contexts.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Toward Ethical and Decolonized Approaches<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Scholars emphasize decolonizing speech technology\u2014respecting Indigenous knowledge systems rather than imposing Western frameworks. In 2019, Onowa McIvor and Jessica Ball, affiliated with the University of Victoria in Canada, underscored community-level initiatives supported by coherent policy and government backing (Soylu &amp; \u015eahin, 2024. p. 13).<\/span><\/p>\n<p><span style=\"font-weight: 400\">Before developing computational tools, speaker communities&#8217; basic needs must be met: \u201crespect, reciprocity, and understanding.\u201d Researchers must avoid treating languages as commodities or prioritizing dataset size over community wellbeing. Common goals must be established before research begins. Only through such groundwork can AI technologies truly serve language revitalization rather than becoming another tool of extraction and exploitation (Zhang et al., 2022, p. 1531) .<\/span><\/p>\n<p><span style=\"font-weight: 400\">These perspectives reveal that while technology offers promising pathways for language revitalization, success depends fundamentally on addressing both technical and sociocultural barriers through genuinely community-centered approaches that honor the living, relational nature of language itself.\u00a0<\/span><\/p>\n<h1><span style=\"font-weight: 400\">Future Challenges and Considerations<\/span><\/h1>\n<h2><span style=\"font-weight: 400\">The Low-Resource Language Challenge<\/span><\/h2>\n<p><span style=\"font-weight: 400\">A key obstacle in applying AI to endangered languages is the lack of large training datasets. High-resource languages like English and Spanish rely on millions of parallel sentence pairs for accurate translation (Jampel, 2025), but many endangered languages have limited or no written resources. Some lack a script entirely, requiring more intensive dataset curation and multimodal approaches.<\/span><\/p>\n<p><span style=\"font-weight: 400\">To address this, Professor Jacqueline Brixey and Dr. Ron Artstein compiled a dataset combining audio, video, and text, with many texts translated into English, allowing models to leverage multiple modalities (Brixey, 2025). Similarly, Jared Coleman at Loyola Marymount University is developing translation tools for Owens Valley Paiute, a \u201cno-resource\u201d language with no public datasets. His system first teaches grammar and vocabulary to the model, then has it translate using this foundation, mimicking human strategies when working with limited data. Coleman emphasizes: \u201cOur goal isn\u2019t perfect translation but producing outputs that accurately convey the user\u2019s intended meaning\u201d (Jiang, 2025).\u00a0\u00a0<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Capturing Linguistic and Cultural Features<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Major models like ChatGPT perform poorly with Indigenous languages. Brixey notes: \u201cChatGPT could be good in Choctaw, but it&#8217;s currently ungrammatical; it shares misinformation about the tribe\u201d (Jampel, 2025). Models fail to understand cultural nuance or privilege dominant culture perspectives, potentially mishandling sensitive information. These failures underscore the need for better security controls and validation mechanisms to mitigate the potential harm of linguistic misinformation.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">Technical challenges extend to basic digitization processes as well. For example, most Cherokee textual materials exist as physical manuscripts or printed books, which are readable by humans but not machine-processable. This limits applications such as automated language-learning tools. Optical Character Recognition (OCR), using systems like Tesseract-OCR and Google Vision OCR, can convert these materials into machine-readable text with reasonable accuracy. However, OCR performance is highly sensitive to image quality. Texts with cluttered layouts or illustrations, common in children\u2019s books, often yield lower recognition rates, posing ongoing challenges for digitization and digital preservation efforts (Zhang et al., 2022, p. 1536).<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Ethical and Governance Issues<\/span><\/h2>\n<p><span style=\"font-weight: 400\">The exploitation of Indigenous languages has deep historical roots that continue to shape debates on AI development. In 1890, anthropologist Jesse Walter Fewkes recorded Passamaquoddy stories and songs, some sacred and meant to remain private, but the community was denied access for nearly a century, highlighting longstanding issues of linguistic sovereignty (Jampel, 2025).<\/span><\/p>\n<p><span style=\"font-weight: 400\">More recently, in late 2024, the Standing Rock Sioux Tribe sued an educational company for exploiting Lakota recordings without consent, profiting from tribal knowledge, and demanding extra fees to restore access (Jampel, 2025).\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">In response, researchers like Brixey and Boyer implement protective measures, allowing participants to withdraw recordings and exclude their knowledge from AI development. These practices uphold data sovereignty, ensuring Indigenous communities retain control over their cultural knowledge and limiting commercialization. There is also a strong emphasis on keeping these technologies within Indigenous communities, preventing them from being commercialized or sold externally (Jampel, 2025).<\/span><\/p>\n<p><span style=\"font-weight: 400\">As such, AI for language preservation requires clear policies for data governance and ethics. Some projects illustrate how AI can be ethically applied. New Zealand\u2019s Te Hiku Media \u201cK\u014drero M\u0101ori\u201d project uses AI for M\u0101ori language preservation under the Kaitiakitanga license, which forbids misuse of local data. CTO Keoni Mahelona emphasizes working with elders to record voices for transcription, demonstrating that AI tools can support Indigenous languages while respecting cultural values and community control (Jiang, 2025). Balancing technological openness with cultural sensitivity remains essential.<\/span><\/p>\n<h2><span style=\"font-weight: 400\">Resource and Infrastructure Needs<\/span><\/h2>\n<p><span style=\"font-weight: 400\">Beyond technical and ethical challenges, practical resource constraints significantly limit the scope and sustainability of language preservation initiatives. Securing funding for long-term projects remains one of the most persistent obstacles, as language revitalization requires sustained commitment over decades rather than short-term grant cycles. Training represents another critical need: communities require skilled teachers, technology experts, and materials developers who understand both the technical systems and the cultural context.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Infrastructure gaps pose fundamental barriers to participation. Many Indigenous communities lack reliable internet access and technology availability, limiting who can engage with digital language tools. Even when technologies are developed, communities need training to use and maintain AI tools independently, ensuring that these systems serve rather than create dependencies. Addressing these resource and infrastructure needs is essential for moving from pilot projects to sustainable, community-controlled language preservation ecosystems.<\/span><\/p>\n<h1><span style=\"font-weight: 400\">Conclusion<\/span><\/h1>\n<p><span style=\"font-weight: 400\">AI and NLP technologies hold significant promise for language preservation, addressing a critical need as many languages approach extinction due to declining numbers of speakers.<\/span><\/p>\n<p><span style=\"font-weight: 400\">However, these technologies face inherent technical limitations. Low-resource languages often lack sufficient written materials or even a formal script, making model training difficult. LLMs trained primarily on English and other major languages struggle to capture the lexical, grammatical, and semantic nuances of endangered languages.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Equally important is the role of communities. Successful preservation depends on Indigenous leadership, ethical oversight, sustained collaboration, and adequate funding. AI should not be seen as a replacement for human knowledge but as one tool among many in a broader preservation toolkit.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Ultimately, digital preservation empowers communities to maintain and revitalize their linguistic heritage. Languages are living systems that thrive through active human relationships, and technology\u2019s role is to support, not replace, these connections between people, language, and culture.<\/span><\/p>\n<h1><span style=\"font-weight: 400\">Bibliography\u00a0<\/span><\/h1>\n<p><span style=\"font-weight: 400\">Ali, M., Bhatti, Z. I., &amp; Abbas, T. (2025). Exploring the Linguistic Capabilities and Limitations of AI for Endangered Language preservation. <\/span><i><span style=\"font-weight: 400\">Journal of Development and Social Sciences<\/span><\/i><span style=\"font-weight: 400\">, <\/span><i><span style=\"font-weight: 400\">6<\/span><\/i><span style=\"font-weight: 400\">(2), 132\u2013140.<\/span><a href=\"https:\/\/doi.org\/10.47205\/jdss.2025(6-II)12\"> <span style=\"font-weight: 400\">https:\/\/doi.org\/10.47205\/jdss.2025(6-II)12<\/span><\/a><\/p>\n<p><i><span style=\"font-weight: 400\">BERT<\/span><\/i><span style=\"font-weight: 400\">. (n.d.). Retrieved November 6, 2025, from<\/span><a href=\"https:\/\/huggingface.co\/docs\/transformers\/en\/model_doc\/bert\"> <span style=\"font-weight: 400\">https:\/\/huggingface.co\/docs\/transformers\/en\/model_doc\/bert<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400\">Brixey, J. (Lina). (2025, January 22). <\/span><i><span style=\"font-weight: 400\">Using Artificial Intelligence to Preserve Indigenous Languages\u2014Institute for Creative Technologies<\/span><\/i><span style=\"font-weight: 400\">.<\/span><a href=\"https:\/\/ict.usc.edu\/news\/essays\/using-artificial-intelligence-to-preserve-indigenous-languages\/\"> <span style=\"font-weight: 400\">https:\/\/ict.usc.edu\/news\/essays\/using-artificial-intelligence-to-preserve-indigenous-languages\/<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400\">Chaparala, K., Zarrella, G., Fischer, B. T., Kimura, L., &amp; Jones, O. P. (2024). <\/span><i><span style=\"font-weight: 400\">Mai Ho\u2019om\u0101una i ka \u2019Ai: Language Models Improve Automatic Speech Recognition in Hawaiian<\/span><\/i><span style=\"font-weight: 400\"> (arXiv:2404.03073). arXiv.<\/span><a href=\"https:\/\/doi.org\/10.48550\/arXiv.2404.03073\"> <span style=\"font-weight: 400\">https:\/\/doi.org\/10.48550\/arXiv.2404.03073<\/span><\/a><\/p>\n<p><i><span style=\"font-weight: 400\">Digital preservation of Indigenous languages: At the intersection of<\/span><\/i><span style=\"font-weight: 400\">. (n.d.). Retrieved November 6, 2025, from<\/span><a href=\"https:\/\/www.unesco.org\/en\/articles\/digital-preservation-indigenous-languages-intersection-technology-and-culture\"> <span style=\"font-weight: 400\">https:\/\/www.unesco.org\/en\/articles\/digital-preservation-indigenous-languages-intersection-technology-and-culture<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400\">Jampel, S. (2025, July 31). <\/span><i><span style=\"font-weight: 400\">Can A.I. Help Revitalize Indigenous Languages?<\/span><\/i><span style=\"font-weight: 400\"> Smithsonian Magazine.<\/span><a href=\"https:\/\/www.smithsonianmag.com\/science-nature\/can-ai-help-revitalize-indigenous-languages-180987060\/\"> <span style=\"font-weight: 400\">https:\/\/www.smithsonianmag.com\/science-nature\/can-ai-help-revitalize-indigenous-languages-180987060\/<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400\">Jiang, M. (2025, February 22). Preserving the Past: AI in Indigenous Language Preservation. <\/span><i><span style=\"font-weight: 400\">Viterbi Conversations in Ethics<\/span><\/i><span style=\"font-weight: 400\">.<\/span><a href=\"https:\/\/vce.usc.edu\/weekly-news-profile\/preserving-the-past-ai-in-indigenous-language-preservation\/\"> <span style=\"font-weight: 400\">https:\/\/vce.usc.edu\/weekly-news-profile\/preserving-the-past-ai-in-indigenous-language-preservation\/<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400\">Soylu, D., &amp; \u015eahin, A. (2024). The Role of AI in Supporting Indigenous Languages. <\/span><i><span style=\"font-weight: 400\">AI and Tech in Behavioral and Social Sciences<\/span><\/i><span style=\"font-weight: 400\">, <\/span><i><span style=\"font-weight: 400\">2<\/span><\/i><span style=\"font-weight: 400\">(4), 11\u201318.<\/span><a href=\"https:\/\/doi.org\/10.61838\/kman.aitech.2.4.2\"> <span style=\"font-weight: 400\">https:\/\/doi.org\/10.61838\/kman.aitech.2.4.2<\/span><\/a><\/p>\n<p><i><span style=\"font-weight: 400\">Students with Skobots on their shoulders stand next to Danielle Boyer. The STEAM Connection<\/span><\/i><span style=\"font-weight: 400\">. (n.d.). [Graphic]. Retrieved November 6, 2025, from<\/span><a href=\"https:\/\/th-thumbnailer.cdn-si-edu.com\/8iThtG8bZkWMxUq0goedXRXlzio=\/fit-in\/1072x0\/filters:focal(616x411:617x412)\/https:\/\/tf-cmsv2-smithsonianmag-media.s3.amazonaws.com\/filer_public\/a5\/f3\/a5f3877c-f738-423d-bcff-8a44efcbe48f\/danielle-boyer-and-student-wearing-skobots_web.jpg\"> <span style=\"font-weight: 400\">https:\/\/th-thumbnailer.cdn-si-edu.com\/8iThtG8bZkWMxUq0goedXRXlzio=\/fit-in\/1072&#215;0\/filters:focal(616&#215;411:617&#215;412)\/https:\/\/tf-cmsv2-smithsonianmag-media.s3.amazonaws.com\/filer_public\/a5\/f3\/a5f3877c-f738-423d-bcff-8a44efcbe48f\/danielle-boyer-and-student-wearing-skobots_web.jpg<\/span><\/a><\/p>\n<p><i><span style=\"font-weight: 400\">The Missing Scripts<\/span><\/i><span style=\"font-weight: 400\">. (n.d.). [Graphic]. Retrieved November 6, 2025, from<\/span><a href=\"https:\/\/sei.berkeley.edu\/the-missing-scripts\/\"> <span style=\"font-weight: 400\">https:\/\/sei.berkeley.edu\/the-missing-scripts\/<\/span><\/a><\/p>\n<p><i><span style=\"font-weight: 400\">What is NLP? &#8211; Natural Language Processing Explained &#8211; AWS<\/span><\/i><span style=\"font-weight: 400\">. (n.d.). Amazon Web Services, Inc. Retrieved November 6, 2025, from<\/span><a href=\"https:\/\/aws.amazon.com\/what-is\/nlp\/\"> <span style=\"font-weight: 400\">https:\/\/aws.amazon.com\/what-is\/nlp\/<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400\">Zhang, S., Frey, B., &amp; Bansal, M. (2022). How can NLP Help Revitalize Endangered Languages? A Case Study and Roadmap for the Cherokee Language. In S. Muresan, P. Nakov, &amp; A. Villavicencio (Eds.), <\/span><i><span style=\"font-weight: 400\">Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)<\/span><\/i><span style=\"font-weight: 400\"> (pp. 1529\u20131541). Association for Computational Linguistics.<\/span><a href=\"https:\/\/doi.org\/10.18653\/v1\/2022.acl-long.108\"> <span style=\"font-weight: 400\">https:\/\/doi.org\/10.18653\/v1\/2022.acl-long.108<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>AI for Language and Cultural Preservation Abstract Nearly half of the world&#8217;s languages face extinction, threatening irreplaceable knowledge and cultural connections. This paper examines how artificial intelligence can support endangered language documentation and revitalization when guided by community priorities. Through case studies, from Hawaiian speech recognition to Cherokee learning platforms, the paper identifies both opportunities [&hellip;]<\/p>\n","protected":false},"author":744,"featured_media":2071,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[65],"tags":[85,87,96,60,254,227],"class_list":{"0":"post-2068","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-csci-tech","8":"tag-ai","9":"tag-ai-ethics","10":"tag-artificial-intelligence","11":"tag-csci-tech","12":"tag-nlp","13":"tag-technology","14":"entry"},"featured_image_src":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/Screenshot-2025-12-11-at-10.04.12-AM-600x400.png","featured_image_src_square":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-content\/uploads\/sites\/35\/2025\/12\/Screenshot-2025-12-11-at-10.04.12-AM-600x600.png","author_info":{"display_name":"Wing Kiu Lau '26","author_link":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/author\/wlau\/"},"_links":{"self":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/posts\/2068","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/users\/744"}],"replies":[{"embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/comments?post=2068"}],"version-history":[{"count":0,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/posts\/2068\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/media\/2071"}],"wp:attachment":[{"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/media?parent=2068"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/categories?post=2068"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/students.bowdoin.edu\/bowdoin-science-journal\/wp-json\/wp\/v2\/tags?post=2068"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}