Pelicans on Bicycles: Simon Willison's Amusing Test for LLMs
Simon Willison tests LLMs with a prompt to 'generate an SVG pelican on a bicycle.' While it seems like a joke, the results reveal genuine insights about…
AI-processed from Habr AI; edited by Hamidun News
Simon Willison, creator of the Django framework, invented an unusual way to test LLMs: asking a neural network to draw an SVG pelican on a bicycle. At first glance, this looks like a joke, but the results proved more informative than many serious benchmarks.
Where the Pelican Test Came From
Willison's idea is straightforward: the ability to draw in SVG and understand complex composite images (pelican + bicycle + motion) reveals the real limits of AI capabilities. SVG requires structured code, not just token prediction. It's like asking AI not only to think but to build—to translate ideas into a specific format. It turns out that almost every new version of an LLM interprets the task in its own way: some generate syntactically correct SVG with an anatomically recognizable pelican, others create whimsical birds with approximate geometry, and still others confuse a bicycle with a cyclist or draw something completely unexpected.
What the Experiment Reveals
The test reveals several model parameters without using classical metrics:
- Understanding of geometry, proportions, and space
- Ability to generate structured, working code
- Interpretation of composite images (animal + object + action in one)
- Creativity and ability to find non-trivial solutions
- Control over details and ability to maintain context
While the SVG pelican doesn't directly help assess performance on production tasks, the results often correlate with the overall power and comprehensibility of the model.
In Russian: Coding Cats
The article's authors on Habr re-examined the experiment in Russian with the prompt 'create an SVG cat that codes.' The results differed from the English version: Russian-language models interpret the task in a new way. Some add a laptop in the cat's paws, others draw a screen with code in its paws, and still others create a cat sitting at a desk in front of a monitor. This shows that cultural context and language features influence how the task is perceived, even at the level of basic geometric objects and scenarios.
What This Means
Willison's SVG test is a reminder that evaluating LLM capabilities cannot be reduced to standard benchmarks and trained datasets. Sometimes the simplest and most amusing questions reveal the limits of neural networks more honestly than complex professional tests. And each new model passes this test in its own way, leaving traces of its 'thinking.'
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.