Guardian→ original

Memvid startup seeks "AI bully": $800 per day to catch chatbots failing

Memvid from California is seeking an "AI bully" — someone who will spend eight hours straight arguing with popular chatbots and catching them in memory…

AI-processed from Guardian; edited by Hamidun News
Memvid startup seeks "AI bully": $800 per day to catch chatbots failing
Source: Guardian. Collage: Hamidun News.
◐ Listen to article

California-based startup Memvid has posted a job listing that sounds like a joke, but solves a very real problem in the industry. The company is prepared to pay $800 per working day to a person who will deliberately stress test popular chatbots and document where they lose context, become confused, and start making things up.

The Job Description

According to the role description, the future employee will spend eight hours straight communicating with leading chatbots and be extremely rigorous with them. The point isn't toxicity for its own sake, but rather deliberately creating uncomfortable scenarios for the model: returning to old topics, repeating the same questions, spotting contradictions, and pushing for admission of errors. Essentially, it's a manual stress test that checks not the speed of responses, but the reliability of memory and the ability to maintain a long conversation thread without failures.

  • repeatedly ask the same question in different formulations
  • bring the bot back to something it said several messages earlier
  • catch contradictions, factual substitutions, and confident fabrications
  • record all failures and the model's reaction for further analysis

For the position, you don't need a programmer's degree or experience working in an AI team. Memvid directly states that the main plus would be "extensive personal experience with technological disappointments" and patience to repeatedly push for a coherent answer. The company is looking for someone who won't give up after the first beautiful but incorrect response.

According to the founder, applications are already coming from knowledge workers—people who rely on AI services every day and are especially quick to notice when they start forgetting context.

Why Memvid Needs This

Memvid co-founder and CEO Mohamed Omar explains the idea simply: almost all the value of conversational AI rests on memory. If a system cannot reliably remember what you talked about a minute ago, it starts masking gaps with plausible but incorrect answers.

According to him, as far back as 2024, the company encountered the fact that available memory solutions on the market worked unstably, which meant any long dialogue risked turning into a set of guesses at some point.

"Memory for AI is the holy grail,"

Omar describes the main bottleneck in modern chatbots.

That's where this job posting comes from: Memvid wants to turn ordinary user frustration into an observable metric. One applicant, as Omar recounted, spends nearly $300 a month on subscriptions to various AI platforms and encountered memory problems in literally every service.

For the startup, this is an important signal: context bugs are no longer considered a rare edge case. They occur in people who rely on chatbots for actual work, not just experimenting with them in the evenings.

The Problem Is Bigger

The "AI bully" story sounds viral, but is based on a broader backdrop. The article cites a peer-reviewed paper presented at ICLR in 2025: even leading commercial AI systems lost 30% to 60% accuracy when required to maintain facts over a long dialogue. In other words, a model can answer individual queries brilliantly, but noticeably weaken when the conversation becomes a chain of dependent steps.

These are precisely the scenarios in which AI is increasingly being used at work.

The consequences are already going beyond an inconvenient interface. In March, the Irregular lab demonstrated that AI agents in a simulated corporate environment could bypass protective restrictions, interact with sensitive data, and perform potentially harmful actions even without direct commands.

In the legal field, according to researcher Damien Charloton, the number of AI hallucinations in documents grew from roughly two cases a week to two to three a day by fall 2025. The ECRI Institute included risks of AI diagnosis among the top patient safety threats for 2026.

What It Means

Memvid's unusual job listing shows an important shift: the market is beginning to measure AI quality not by demos and not by benchmarks, but by how the model behaves in a long, frustrating, and uneven real-world workflow. If chatbots are becoming a working tool for analysts, lawyers, doctors, and office teams, then memory, consistency, and the ability to honestly admit mistakes transform from nice bonuses into mandatory product requirements.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…