Anthropic keeps making tests harder because of cheating with Claude
Anthropic's performance optimization team has run into an unusual challenge: its Claude model has become too effective at solving test assignments for job candi
AI-processed from TechCrunch; edited by Hamidun News
ANTHROPIC CONSTANTLY MAKES TESTS HARDER DUE TO CHEATING WITH CLAUDE
In an era of rapid artificial intelligence development, when powerful language models become accessible to a wide range of users, traditional methods for assessing specialist qualifications face unprecedented challenges. The performance optimization team at American company Anthropic, known for developing the advanced neural network Claude, has encountered an unusual problem: their own creation, or rather, AI tools similar to it, have begun to "cheat" on interviews.
Since the beginning of 2024, Anthropic has used a format of take-home coding assignments to test the technical skills of candidates for engineering positions. This approach is intended to give applicants the opportunity to demonstrate their knowledge and abilities in a more calm and thoughtful setting than is possible during a standard interview. However, the development of AI tools for code writing, such as GitHub Copilot, as well as improvements to large language models themselves, has led to these assignments being solved too easily with their help. This created a situation where objective assessment of a candidate's qualifications became difficult, because it was impossible to say with certainty whether the applicant themselves solved the task or artificial intelligence did.
To counter this trend and prevent fraud, Anthropic engineers must constantly review and make testing assignments more difficult. The task is to create problems that require deep contextual understanding, unconventional approaches, and creativity—areas where modern AI models, despite their impressive capabilities, still fall short of humans. These are assignments that require not simply reproducing known algorithms or writing typical code, but rather tasks where it is necessary to analyze complex systems, make decisions under uncertainty, integrate disparate knowledge, and demonstrate original thinking. Such tasks are harder to automate and harder to "feed" to a language model to get a ready-made solution.
This situation highlights the growing complexity of assessing human skills in an era of widespread availability of powerful AI tools. The boundary between human competence and the capabilities of artificial intelligence is becoming increasingly blurred. Companies around the world are beginning to consider new approaches to recruitment and personnel evaluation that could take into account the realities of today's technological landscape.
Perhaps the future lies in evaluating not only the final result but also the process of solving the problem itself, in analyzing the thought process, in testing the ability to adapt and learn. Anthropic, facing this challenge at the forefront, is effectively demonstrating how the industry is forced to adapt to changing conditions where AI becomes not only a tool for work but a factor that changes the rules of the game in the hiring process.
In conclusion, the constant increase in complexity of test assignments at Anthropic is a striking example of how technological progress requires flexibility and innovation in all spheres of human activity, including the personnel selection process. This forces us to reconsider the very concept of professional competence and seek new, more reliable methods of evaluation that can distinguish genuine human talent from skillfully generated AI responses. An era in which AI becomes a universal assistant requires new approaches to assessing what it truly means to be a competent specialist.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.