Andrej Karpathy fit GPT into 243 lines of pure Python

Q: What is the source?

Originally published on Habr AI. Hamidun News processes and adapts the material with AI.

Q: When was it published?

2026-02-25. Reading time: 3 min.

Andrej Karpathy released microGPT — a full implementation of the transformer architecture in 243 lines of pure Python with no external dependencies. The project

Hamidun News Editorial

AI monitoring · Habr AI

2026-02-25· 3 min

AI-processed from Habr AI; edited by Hamidun News

Andrej Karpathy fit GPT into 243 lines of pure Python — Source: Habr AI. Collage: Hamidun News.

◐ Listen to article

There exists a particular genre of programming that is closer to poetry than engineering. When the most complex system is compressed to its absolute essence, removing everything unnecessary, until only pure mathematics and logic remain. This is exactly what Andrey Karpathy, one of the founders of OpenAI and former head of AI at Tesla, did by publishing in February 2026 the project microGPT — a complete implementation of transformer architecture that learns and generates text in 243 lines of pure Python.

To appreciate the scale of this achievement, you need to understand the context. Modern language models like GPT-4 or Claude are tens of billions of parameters, thousands of GPUs, months of training, and codebases where hundreds of engineers work. Behind all this industrial power, it's easy to forget that at its core lies a relatively elegant mathematical construction described in the famous 2017 paper "Attention Is All You Need." Karpathy took this construction and showed that it fits entirely on a few screens of code — without PyTorch, without NumPy, without a single external library.

MicroGPT implements all key components of the transformer architecture: tokenization, positional encoding, the attention mechanism (self-attention), forward propagation through fully connected layers, normalization, and backpropagation for training. Each mathematical operation is performed manually — matrix multiplication, softmax, activation functions. This means that anyone with a basic understanding of linear algebra and Python can open this file and trace the entire path from input text to generated token without stumbling upon framework abstractions.

Karpathy humbly calls microGPT an "art project," and there's more accuracy in this definition than might appear. It's not a tool for practical use — a model trained this way won't conduct meaningful dialogues and won't replace ChatGPT. The performance of pure Python without optimized libraries is several orders of magnitude lower than specialized frameworks. But the value of the project lies in an entirely different plane. It is a demystification of the technology that determines the appearance of the modern world.

For Karpathy, such an approach is nothing new. He has long established himself as one of the best popularizers of deep learning. His neural networks course at Stanford became a classic, and the series "Neural Networks: Zero to Hero" on YouTube helped tens of thousands of people understand the fundamentals. The microGPT project continues this line but raises the bar: if previously Karpathy explained architectures with the help of PyTorch, now he has removed the last layer of abstraction. Between the reader and the mathematics of the transformer, nothing remains.

The consequences of this step go beyond education. The artificial intelligence industry is experiencing a paradoxical moment: the technology is becoming increasingly influential, but at the same time increasingly opaque. Companies are closing their models, publishing fewer technical details, and the gap between those who create AI and those who use it is growing. In this context, projects like microGPT fulfill a crucial function — they return fundamental understanding of the technology to the public sphere. When a politician, journalist, or simply a curious engineer from an adjacent field wants to understand what GPT really is, 243 lines of code provide a more honest answer than any marketing document.

There is also a practical aspect. For beginning researchers and students, microGPT is an ideal sandbox. You can modify the attention mechanism and see what happens. You can change the activation function, experiment with the context window size, add your own variant of positional encoding. When the entire code is in front of your eyes and every line is understandable, experimentation transforms from black magic into scientific method.

In the end, microGPT is a reminder that behind the trillion-dollar valuations of AI companies and talk of artificial superintelligence stands mathematics that can fit on a few pages. Scale and computational power turn this mathematics into something remarkable, but the essence itself remains accessible to understanding. And as long as there are people like Karpathy willing to spend time making the complex simple, the industry has a chance to remain not only powerful but transparent.

Hamidun News

AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Telegram channel RSS hamidun.com

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

🎓 Academy — 7 days free Free consultation