Habr AI→ original

Habr showed how to train a mini-LLM in C# using ILGPU and integrated AMD graphics

Habr showcased a complete cycle of building a micro-LLM in C# without Python and without requiring an NVIDIA card. The author used ILGPU and OpenCL, trained…

AI-processed from Habr AI; edited by Hamidun News
Habr showed how to train a mini-LLM in C# using ILGPU and integrated AMD graphics
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

A detailed practical breakdown was published on Habr, which shows a rare scenario for the local AI community: a small language model can be assembled and trained from scratch in C#, without Python and without mandatory NVIDIA graphics, and then saved to GGUF and run in LM Studio. The author used the ILGPU library and OpenCL, relying on built-in AMD graphics, and got a working, albeit very limited, LLM prototype of about 422 KB in size. The project was conceived as an educational demonstration of the full cycle, not as an attempt to compete with large open-source models.

The article discusses the basic architecture of a transformer: tokenization, embedding layer, self-attention, feed-forward, normalization, and output projection. For training, the author prepared two small datasets: a short pretrain corpus with general facts about the world and an instruction-tuning sample with questions and answers. The goal was not to get a strong model, but to manually go through all stages — from vocabulary to inference.

The final configuration turned out to be extremely compact: vocabulary of 512 tokens, context length of 64, embedding of 64, hidden layer of 128, one transformer layer and two attention heads. In total, this is 103,744 parameters. Before export, the model can be saved to a binary file of about 6.

87 MB, but the final GGUF file is about 422 KB and is recognized as GGUF V3. After training, the pretrain loss dropped to approximately 0.212, and after fine-tuning for dialog format, the finetune loss dropped to 0.

3926. For such a scale, this is more of an indication that the scheme works technically rather than proof of answer quality. The training logic itself is also important.

First, the model goes through a pretraining phase on a short corpus of dozens of sentences, then a separate instruction tuning stage on dialog examples. The program interface has a menu for retraining, tokenizer testing, weight saving, and launching an interactive chat. For the response, the author added a primitive confidence filter: if the minimum confidence across tokens falls below the threshold, the system outputs not an answer, but the phrase "I don't know".

This is a crude but understandable way to hide obvious garbage, which is almost inevitable for a model of this size. This is what makes the demo suitable for at least basic pipeline validation. The most difficult part turned out to be not the forward pass and not the training itself, but compatibility with the llama.

cpp ecosystem. The author first tried to build the model in a configuration more similar to LLaMA, but ran into errors when loading and tensor count mismatches. In the end, he had to simplify the architecture, rebuild the tokenizer based on Microsoft.

ML.Tokenizers, and carefully describe GGUF metadata: architecture, context length, attention parameters, special tokens, and chat template. A separate technical nuance — the model actually doesn't hold a long dialog history, so for chat a template with only the last user message is used, otherwise the tiny context quickly breaks inference.

The practical result is modest but indicative. The model answers simple queries, opens in LM Studio and can work through a compatible stack, however llama.cpp warns about degradation of generation quality, and the author directly states that training is still not good enough.

In addition, inference breaks on some queries with question marks and exclamation marks, so the project remains an experiment and training ground. But the work itself is important for another reason: it shows that the entry threshold for low-level LLM experiments can be significantly lowered. For developers from the .

NET world, this is a clear example of how to assemble a minimal model, verify an OpenCL scenario outside of CUDA monopoly, and understand what details actually make up the local running of a language model.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…