Habr AI

Habr AI: How Pipeline Triad Assembles an AI Agent Pipeline Instead of a Development Team
Habr AI examined Pipeline Triad — a model where development stages pass through AI agent triads, with humans involved only at four control p

Gramax showed how to compare RAG answer quality without manual eye evaluation
Gramax explained why retrieval metrics are insufficient for RAG, and proposed evaluating not retrieved chunks, but the final user answer — b

How LLM Guardrails in Java Block Injections and Toxic Responses
An analysis of why a single system prompt is insufficient to protect LLMs, and how Java guardrails intercept dangerous inputs and filter tox

Anthropic and Mythos: why a banking threat quickly became a risk for everyone
Anthropic presented Mythos as too dangerous for public access, but the real risk proved to be not in banking but in small businesses' inabil

Anthropic and Claude Mythos: why critics call the model launch an expensive PR spectacle
A critical column on Claude Mythos argues that Anthropic is selling not just an AI model, but a myth about its near-human nature, amplifying

AI Assistants in 2026: How a Solo Developer Became Faster Than a Team of Three
The author demonstrates that in 2026, a single developer equipped with a set of open-source AI tools can write, test, and commit code faster

ecom.tech compared evolutionary fine-tuning of Qwen3-4B with SFT and GRPO for Kotlin tests
The ecom.tech team fine-tuned Qwen3-4B-Instruct for generating unit tests in Kotlin and showed that the evolutionary algorithm outperforms S

Yandex Code Assistant tested on secrets handling and compared against Cursor
An engineer at 'Infosystems Jet' tested Yandex Code Assistant on secrets management and showed the agent is nearly on par with Cursor, but s

Claude and Qwen Omni: How a Developer Integrated Video Analysis into a Production Pipeline
A Habr author connected Claude with Qwen Omni to work around the lack of native video processing and automatically categorize 29 animation r

How Sovcombank Reduced Product Team Routine Work by 50% Using an AI Assistant
Sovcombank built an AI assistant based on LLM and a unified prompt to take documentation, approvals, and part of analytics off product manag

Critics call OpenAI's partnership with McKinsey and Accenture a bet on AI hype
The author of a scathing column argues that OpenAI's Frontier Alliances program sells businesses not a ready-made solution, but an expensive

Google Veo, Runway and Kling rank among the top free AI video generators in 2026
The authors compared ten popular free video generators, including Google Veo, Runway and Kling, and tested them on a complex scene featuring

Rufler simplifies agent swarms in Claude Code: one config instead of manual orchestration
Open-source tool Rufler reduces launching autonomous agents in Claude Code to a single config, automatically assembles roles, tasks, and MCP

Claude Code helped build a graph analysis app in under an hour — developer case study
A developer with low expectations built a working graph analysis app in about an hour, but then spent another three weeks on testing, docume

IBS explains how neural networks are changing software design and why they won't replace architects
IBS breaks down how large language models and generative tools help design systems, compare trade-offs, and accelerate architect workflows,

Playwright and MCP: How an AI Agent Tests UI and Database Without Manual SQL Assertions
A Playwright agent combined with MCP can not only run checkout through the browser but also immediately verify database changes without manu

Why OpenAI, Google, and Anthropic models become more convincing but make mistakes more often
Major AI labs try to fix model errors with additional computations, but the more convincing the answers become, the harder it is to spot dee

Habr AI: Why Language Models Need Guardrails and How to Defend Against Prompt Hacking
Habr AI examines why LLMs now require a separate protective layer: from toxic content and data leaks to prompt injection, jailbreak attacks,

Selectel Engineer Showed LLM Agent for Automatic Free Domain Selection
A Selectel engineer built a Python service that asks an LLM to generate domain names and immediately checks them via WHOIS, keeping only ava

Anthropic Explained How and When to Properly Start a New Session in Claude Code
Anthropic released the /usage command and explained how to manage sessions in Claude Code so that a million context tokens don't become nois

Niantic Shows How Pokémon Go Turns Player Actions Into AI Datasets
Niantic, Google and other companies increasingly turn ordinary user actions — from games and trips to CAPTCHAs — into data for AI training,

Positive Technologies Listed Best Benchmarks for Evaluating LLM in Cybersecurity
Positive Technologies divided cybersecurity benchmarks for LLM into knowledge tests and practical assessments, showing that models already s

AI Deflation in IT: Klarna and IBM Cases Show Why Vacancies Are Up But Salary Growth Is Weak
A new paradox is emerging in software development: engineering vacancies have increased by 11%, yet IT salary growth has slowed to 1.6%, and

MTS showed how OpenClaw was connected to a robot and brought an AI agent into the physical world
The MWS team demonstrated that OpenClaw can be connected to a physical robot through a simple software layer and cloud LLM without building

Claude Sonnet Helps C-Suite Build AI Director for Critical Decisions in 8 Hours
At the closed Snow BASE hackathon, a team of CEO, CTO, and CIO assembled CAITO in eight hours — an AI director powered by Claude Sonnet that

Why ServiceNow, Atlassian and BMC are reshaping the ITSM market and the platform debate in 2026
The ITSM market with AI is shifting from chatbots to managed infrastructure, where security, scalability, and agent control determine the ch

ServiceNow and Atlassian Lead ITSM Market Toward AI Platforms Instead of Out-of-the-Box Solutions
AI in ITSM is rapidly shifting from chatbots to managed infrastructure: the market compares platform and out-of-the-box approaches, with aud

Wildberries & Russ described what level of data maturity is needed for accurate AI agents
Wildberries & Russ described a three-level data maturity model where the quality of metadata and semantic layer directly determines the accu

Midjourney in 2026: why strong visual style doesn't make it universal
Analysis of Midjourney shows that in 2026, its main strength is not universality, but recognizable style and deep control that manifests onl

Cursor and Microsoft Research Test Whether AI Agents Need Full Debugger Access
An experiment with Debug2Fix and Cursor Debug Mode shows that breakpoints, step-by-step execution, and expression evaluation can help AI age