Habr AI→ original

Pelicans on Bicycles: Simon Willison's Amusing Test for LLMs

Simon Willison tests LLMs with a prompt to 'generate an SVG pelican on a bicycle.' While it seems like a joke, the results reveal genuine insights about…

AI-processed from Habr AI; edited by Hamidun News
Pelicans on Bicycles: Simon Willison's Amusing Test for LLMs
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

Simon Willison, creator of the Django framework, invented an unusual way to test LLMs: asking a neural network to draw an SVG pelican on a bicycle. At first glance, this looks like a joke, but the results proved more informative than many serious benchmarks.

Where the Pelican Test Came From

Willison's idea is straightforward: the ability to draw in SVG and understand complex composite images (pelican + bicycle + motion) reveals the real limits of AI capabilities. SVG requires structured code, not just token prediction. It's like asking AI not only to think but to build—to translate ideas into a specific format. It turns out that almost every new version of an LLM interprets the task in its own way: some generate syntactically correct SVG with an anatomically recognizable pelican, others create whimsical birds with approximate geometry, and still others confuse a bicycle with a cyclist or draw something completely unexpected.

What the Experiment Reveals

The test reveals several model parameters without using classical metrics:

  • Understanding of geometry, proportions, and space
  • Ability to generate structured, working code
  • Interpretation of composite images (animal + object + action in one)
  • Creativity and ability to find non-trivial solutions
  • Control over details and ability to maintain context

While the SVG pelican doesn't directly help assess performance on production tasks, the results often correlate with the overall power and comprehensibility of the model.

In Russian: Coding Cats

The article's authors on Habr re-examined the experiment in Russian with the prompt 'create an SVG cat that codes.' The results differed from the English version: Russian-language models interpret the task in a new way. Some add a laptop in the cat's paws, others draw a screen with code in its paws, and still others create a cat sitting at a desk in front of a monitor. This shows that cultural context and language features influence how the task is perceived, even at the level of basic geometric objects and scenarios.

What This Means

Willison's SVG test is a reminder that evaluating LLM capabilities cannot be reduced to standard benchmarks and trained datasets. Sometimes the simplest and most amusing questions reveal the limits of neural networks more honestly than complex professional tests. And each new model passes this test in its own way, leaving traces of its 'thinking.'

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…