Azerbaijani LLM on SageMaker: How Azercell Solved the Rare Language Problem
Azerbaijani telecom operator Azercell developed its own large language model using Amazon SageMaker AI. In six weeks of intensive collaboration with the AWS Gen

Azercell, an Azerbaijani telecom operator, developed its own language model on Amazon SageMaker AI. The company set itself an ambitious goal: to create a production-grade LLM for a morphologically complex language with a shortage of ready-made data and without existing market solutions.
Why Azerbaijani is a complex case
Azerbaijani is a typical representative of agglutinative languages with rich morphology. A single word can carry multiple suffixes that drastically change its meaning and grammatical function. This requires entirely different approaches to tokenization and model training compared to Indo-European languages. Added to morphological complexity is a critical factor: the volume of open training data in Azerbaijani is significantly smaller than for English, Russian, or Spanish. Standard LLM training methods, tested on large text corpora, do not work directly here.
- Morphological complexity requires specialized tokenization
- Data deficit: 100+ times fewer texts than for major languages
- Lack of existing examples and best practices for LLMs in Azerbaijani
- Need to adapt foundation models trained on English-language data
- Requirement to integrate the model into production telecom systems
How Azercell solved the task
The company partnered with AWS Generative AI Innovation Center. Over six intensive weeks of joint work, specialists from both sides built a production-ready framework on Amazon SageMaker. The solution included several key components: proper preparation and cleaning of existing data, specialized tokenization accounting for Azerbaijani morphology, and optimization of the training process for working with smaller data volumes. Engineers used transfer learning — adapting already trained models instead of training from scratch on an Azerbaijani corpus.
What resulted: two roles for the model
Azercell's model operates in two directions. First, it serves as a customer-facing chatbot that helps subscribers with questions about services and tariffs in Azerbaijani. Second, the model is used in internal business processes: processing incoming requests, analyzing speech in call centers, classifying issues, and personalizing service recommendations. The focus on Azerbaijani makes it possible to avoid loss of meaning in translation and ensures that the model understands local contexts and speech nuances.
What this means
This is the first public example of a fully functional LLM for Azerbaijani developed in cloud infrastructure. The case shows that cloud platforms can adapt LLMs not only for rare languages but also for specific industrial tasks. For other companies in the region, this is a signal: investment in your own language model is real and achievable within several weeks.