Azerbaijani LLM on SageMaker: How Azercell Solved the Rare Language Problem

Q: Источник материала?

Оригинальная публикация на AWS Machine Learning Blog. Hamidun News обрабатывает и адаптирует материалы с помощью AI.

Q: Когда опубликовано?

2026-05-29. Время чтения: 3 мин.

Azerbaijani telecom operator Azercell developed its own large language model using Amazon SageMaker AI. In six weeks of intensive collaboration with the AWS Gen

Hamidun News Editorial

AI monitoring · AWS Machine Learning Blog

2026-05-29· 2 min

Azerbaijani LLM on SageMaker: How Azercell Solved the Rare Language Problem — Source: AWS Machine Learning Blog. Collage: Hamidun News.

◐ Listen to article

Azercell, an Azerbaijani telecom operator, developed its own language model on Amazon SageMaker AI. The company set itself an ambitious goal: to create a production-grade LLM for a morphologically complex language with a shortage of ready-made data and without existing market solutions.

Why Azerbaijani is a complex case

Azerbaijani is a typical representative of agglutinative languages with rich morphology. A single word can carry multiple suffixes that drastically change its meaning and grammatical function. This requires entirely different approaches to tokenization and model training compared to Indo-European languages. Added to morphological complexity is a critical factor: the volume of open training data in Azerbaijani is significantly smaller than for English, Russian, or Spanish. Standard LLM training methods, tested on large text corpora, do not work directly here.

Morphological complexity requires specialized tokenization
Data deficit: 100+ times fewer texts than for major languages
Lack of existing examples and best practices for LLMs in Azerbaijani
Need to adapt foundation models trained on English-language data
Requirement to integrate the model into production telecom systems

How Azercell solved the task

The company partnered with AWS Generative AI Innovation Center. Over six intensive weeks of joint work, specialists from both sides built a production-ready framework on Amazon SageMaker. The solution included several key components: proper preparation and cleaning of existing data, specialized tokenization accounting for Azerbaijani morphology, and optimization of the training process for working with smaller data volumes. Engineers used transfer learning — adapting already trained models instead of training from scratch on an Azerbaijani corpus.

What resulted: two roles for the model

Azercell's model operates in two directions. First, it serves as a customer-facing chatbot that helps subscribers with questions about services and tariffs in Azerbaijani. Second, the model is used in internal business processes: processing incoming requests, analyzing speech in call centers, classifying issues, and personalizing service recommendations. The focus on Azerbaijani makes it possible to avoid loss of meaning in translation and ensures that the model understands local contexts and speech nuances.

What this means

This is the first public example of a fully functional LLM for Azerbaijani developed in cloud infrastructure. The case shows that cloud platforms can adapt LLMs not only for rare languages but also for specific industrial tasks. For other companies in the region, this is a signal: investment in your own language model is real and achievable within several weeks.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com