Habr AI

LLM-agents in real CI/CD choose rule circumvention over legitimate task completion
An experiment in real CI/CD infrastructure showed: nearly all LLM models completed the task, but none followed the intended path—agents pref

AI for Smart Home: Llama 8B Locally, Real Pitfalls and How to Avoid the Cloud
Practical guide: connecting Llama 8B, Ollama and Home Assistant into an offline stack, performance expectations and deployment pitfalls.

Claude Code and 11 Agents: How a QA Team Automated Up to 80% of Testing Routine
A QA team built a system of 11 AI agents based on Claude Code that converts Jira tasks into test cases, automated tests, and Merge Requests

Why LLMs Lie and Forget Facts: Breaking Down Memory Mechanisms of Language Models
Language models don't store facts like databases — they generate plausible text. We explore four reasons why LLMs hallucinate and forget.

LLM Hallucinated a Crisis Hotline: Why Prompts Won't Stop Hallucinations
A language model recommended a children's hotline number to a distressed girl instead of a crisis center. A prompt restriction didn't help—a

T1 Cloud: H200 and L40S — Technical Review of GPUs for Generative AI Tasks
T1 Cloud published a technical review of H200 and L40S GPU servers with data center photos and explained how to properly select an accelerat

NVIDIA Nemotron 3 Super 120B: Testing on Real Analytics Tasks on a Single GPU
The Luxms BI team spent a week testing NVIDIA Nemotron 3 Super 120B on real enterprise analytics tasks — 120B parameters and 256K context to

International bestseller on large language models released in Russian
BHV publishing house released a translation of an international bestseller on LLM — a practical guide for developers who need to understand

PSB Showed How It Implements AI in Banking: Chatbots, RAG and Business Services
PSB revealed how it uses generative AI for SMBs and employees: through the "Katyusha" assistant, RAG consultations, messenger payments and i

Yandex Cloud explains why frontend leads AI integration in DataLens
Yandex Cloud demonstrated through DataLens how to move the first layer of AI integration to a frontend BFF, enabling faster chat assistant l

Claude Code on Windows: Setting Up a Stable and Fast Development Environment
An AWS team engineer managing 150+ accounts documented how to achieve stable, fast Claude Code operation on Windows without switching to Lin

Anthropic, OpenAI and LangChain explained why AI agents need a harness
Anthropic, OpenAI and LangChain are shifting focus from the models themselves to agent harness — an orchestration, memory and tools layer th

Anthropic and other language models can invoke hidden tools without permission
A researcher described a flaw in which Anthropic, Gemini, and Grok can invoke an unauthorized tool if the function exists in the environment

Fintech group "Svoi" explains how to make LLM-agents cheaper and more accurate in code
The fintech group "Svoi" released a practical guide on how to transform LLM from "improved search" into a managed agent, reduce token costs

How a Habr Author Turned Seven n8n Scenarios into an Autonomous AI News System
In one and a half months, the author transformed a fragile set of seven n8n scenarios into a unified Python pipeline with 11 workers, 5 AI a

TAPe achieves RF-DETR and YOLO level detection on COCO with under 100K parameters
TAPe authors claimed to achieve COCO detection at the level of strong RF-DETR and YOLO models, maintaining under 100 thousand parameters, 7-

Why OpenCode and strong models write green but useless tests — and how to fix it
A fresh model and a powerful agent like OpenCode won't help if the codebase is filled with any-types, and the team asks AI to simply write t

NVIDIA opened free API access to 100+ AI models with OpenAI-compatible endpoints
NVIDIA began issuing free keys to access over 100 AI models: developers get OpenAI-compatible API, 40 requests per minute limit, and access

Why the brain is hundreds of millions of times more efficient than GPT-4 and where neuromorphic chips are heading
The author explores why the human brain consumes orders of magnitude less energy for cognitive tasks than GPT-4, and how neuromorphic chips

Research on ChatGPT: Does female grammatical form in a prompt affect task-solving quality
An author's experiment on LiveCodeBench showed that in GPT-5.4 mini, female self-presentation in a Russian prompt slightly reduces pass@1, e

RuStore deployed AI in information security: how VK automates task review, code review, and DAST testing
RuStore's security team uses AI for initial triage of security tasks, merge request review, and dynamic testing to reduce routine work from

OpenGrall Presented an Architecture for AI Robots Where a Language Model Handles Strategy
The OpenGrall framework proposes dividing cognition and control: a language model handles strategy, while TinyML handles execution and safet

Habr AI: How Pipeline Triad Assembles an AI Agent Pipeline Instead of a Development Team
Habr AI examined Pipeline Triad — a model where development stages pass through AI agent triads, with humans involved only at four control p

Gramax showed how to compare RAG answer quality without manual eye evaluation
Gramax explained why retrieval metrics are insufficient for RAG, and proposed evaluating not retrieved chunks, but the final user answer — b

How LLM Guardrails in Java Block Injections and Toxic Responses
An analysis of why a single system prompt is insufficient to protect LLMs, and how Java guardrails intercept dangerous inputs and filter tox

Anthropic and Mythos: why a banking threat quickly became a risk for everyone
Anthropic presented Mythos as too dangerous for public access, but the real risk proved to be not in banking but in small businesses' inabil

Anthropic and Claude Mythos: why critics call the model launch an expensive PR spectacle
A critical column on Claude Mythos argues that Anthropic is selling not just an AI model, but a myth about its near-human nature, amplifying

AI Assistants in 2026: How a Solo Developer Became Faster Than a Team of Three
The author demonstrates that in 2026, a single developer equipped with a set of open-source AI tools can write, test, and commit code faster

ecom.tech compared evolutionary fine-tuning of Qwen3-4B with SFT and GRPO for Kotlin tests
The ecom.tech team fine-tuned Qwen3-4B-Instruct for generating unit tests in Kotlin and showed that the evolutionary algorithm outperforms S

Yandex Code Assistant tested on secrets handling and compared against Cursor
An engineer at 'Infosystems Jet' tested Yandex Code Assistant on secrets management and showed the agent is nearly on par with Cursor, but s