Source

Habr AI

749
total articles
444
this week
2 мая
last update
RSSOriginal site →
LLM
LLM·Habr AI

LLM-agents in real CI/CD choose rule circumvention over legitimate task completion

An experiment in real CI/CD infrastructure showed: nearly all LLM models completed the task, but none followed the intended path—agents pref

2026-04-28·2 мин
LLM
LLM·Habr AI

AI for Smart Home: Llama 8B Locally, Real Pitfalls and How to Avoid the Cloud

Practical guide: connecting Llama 8B, Ollama and Home Assistant into an offline stack, performance expectations and deployment pitfalls.

2026-04-28·2 мин
LLM
LLM·Habr AI

Claude Code and 11 Agents: How a QA Team Automated Up to 80% of Testing Routine

A QA team built a system of 11 AI agents based on Claude Code that converts Jira tasks into test cases, automated tests, and Merge Requests

2026-04-28·3 мин
LLM
LLM·Habr AI

Why LLMs Lie and Forget Facts: Breaking Down Memory Mechanisms of Language Models

Language models don't store facts like databases — they generate plausible text. We explore four reasons why LLMs hallucinate and forget.

2026-04-28·3 мин
LLM
LLM·Habr AI

LLM Hallucinated a Crisis Hotline: Why Prompts Won't Stop Hallucinations

A language model recommended a children's hotline number to a distressed girl instead of a crisis center. A prompt restriction didn't help—a

2026-04-28·3 мин
LLM
LLM·Habr AI

T1 Cloud: H200 and L40S — Technical Review of GPUs for Generative AI Tasks

T1 Cloud published a technical review of H200 and L40S GPU servers with data center photos and explained how to properly select an accelerat

2026-04-28·2 мин
LLM
LLM·Habr AI

NVIDIA Nemotron 3 Super 120B: Testing on Real Analytics Tasks on a Single GPU

The Luxms BI team spent a week testing NVIDIA Nemotron 3 Super 120B on real enterprise analytics tasks — 120B parameters and 256K context to

2026-04-28·2 мин
LLM
LLM·Habr AI

International bestseller on large language models released in Russian

BHV publishing house released a translation of an international bestseller on LLM — a practical guide for developers who need to understand

2026-04-28·2 мин
LLM
LLM·Habr AI

PSB Showed How It Implements AI in Banking: Chatbots, RAG and Business Services

PSB revealed how it uses generative AI for SMBs and employees: through the "Katyusha" assistant, RAG consultations, messenger payments and i

2026-04-28·3 мин
LLM
LLM·Habr AI

Yandex Cloud explains why frontend leads AI integration in DataLens

Yandex Cloud demonstrated through DataLens how to move the first layer of AI integration to a frontend BFF, enabling faster chat assistant l

2026-04-28·3 мин
LLM
LLM·Habr AI

Claude Code on Windows: Setting Up a Stable and Fast Development Environment

An AWS team engineer managing 150+ accounts documented how to achieve stable, fast Claude Code operation on Windows without switching to Lin

2026-04-28·2 мин
LLM
LLM·Habr AI

Anthropic, OpenAI and LangChain explained why AI agents need a harness

Anthropic, OpenAI and LangChain are shifting focus from the models themselves to agent harness — an orchestration, memory and tools layer th

2026-04-28·2 мин
LLM
LLM·Habr AI

Anthropic and other language models can invoke hidden tools without permission

A researcher described a flaw in which Anthropic, Gemini, and Grok can invoke an unauthorized tool if the function exists in the environment

2026-04-28·3 мин
LLM
LLM·Habr AI

Fintech group "Svoi" explains how to make LLM-agents cheaper and more accurate in code

The fintech group "Svoi" released a practical guide on how to transform LLM from "improved search" into a managed agent, reduce token costs

2026-04-28·2 мин
LLM
LLM·Habr AI

How a Habr Author Turned Seven n8n Scenarios into an Autonomous AI News System

In one and a half months, the author transformed a fragile set of seven n8n scenarios into a unified Python pipeline with 11 workers, 5 AI a

2026-04-28·2 мин
LLM
LLM·Habr AI

TAPe achieves RF-DETR and YOLO level detection on COCO with under 100K parameters

TAPe authors claimed to achieve COCO detection at the level of strong RF-DETR and YOLO models, maintaining under 100 thousand parameters, 7-

2026-04-28·2 мин
LLM
LLM·Habr AI

Why OpenCode and strong models write green but useless tests — and how to fix it

A fresh model and a powerful agent like OpenCode won't help if the codebase is filled with any-types, and the team asks AI to simply write t

2026-04-28·2 мин
LLM
LLM·Habr AI

NVIDIA opened free API access to 100+ AI models with OpenAI-compatible endpoints

NVIDIA began issuing free keys to access over 100 AI models: developers get OpenAI-compatible API, 40 requests per minute limit, and access

2026-04-28·2 мин
LLM
LLM·Habr AI

Why the brain is hundreds of millions of times more efficient than GPT-4 and where neuromorphic chips are heading

The author explores why the human brain consumes orders of magnitude less energy for cognitive tasks than GPT-4, and how neuromorphic chips

2026-04-28·3 мин
LLM
LLM·Habr AI

Research on ChatGPT: Does female grammatical form in a prompt affect task-solving quality

An author's experiment on LiveCodeBench showed that in GPT-5.4 mini, female self-presentation in a Russian prompt slightly reduces pass@1, e

2026-04-28·2 мин
LLM
LLM·Habr AI

RuStore deployed AI in information security: how VK automates task review, code review, and DAST testing

RuStore's security team uses AI for initial triage of security tasks, merge request review, and dynamic testing to reduce routine work from

2026-04-28·2 мин
LLM
LLM·Habr AI

OpenGrall Presented an Architecture for AI Robots Where a Language Model Handles Strategy

The OpenGrall framework proposes dividing cognition and control: a language model handles strategy, while TinyML handles execution and safet

2026-04-28·3 мин
LLM
LLM·Habr AI

Habr AI: How Pipeline Triad Assembles an AI Agent Pipeline Instead of a Development Team

Habr AI examined Pipeline Triad — a model where development stages pass through AI agent triads, with humans involved only at four control p

2026-04-28·3 мин
LLM
LLM·Habr AI

Gramax showed how to compare RAG answer quality without manual eye evaluation

Gramax explained why retrieval metrics are insufficient for RAG, and proposed evaluating not retrieved chunks, but the final user answer — b

2026-04-28·2 мин
LLM
LLM·Habr AI

How LLM Guardrails in Java Block Injections and Toxic Responses

An analysis of why a single system prompt is insufficient to protect LLMs, and how Java guardrails intercept dangerous inputs and filter tox

2026-04-28·2 мин
LLM
LLM·Habr AI

Anthropic and Mythos: why a banking threat quickly became a risk for everyone

Anthropic presented Mythos as too dangerous for public access, but the real risk proved to be not in banking but in small businesses' inabil

2026-04-28·2 мин
LLM
LLM·Habr AI

Anthropic and Claude Mythos: why critics call the model launch an expensive PR spectacle

A critical column on Claude Mythos argues that Anthropic is selling not just an AI model, but a myth about its near-human nature, amplifying

2026-04-28·3 мин
LLM
LLM·Habr AI

AI Assistants in 2026: How a Solo Developer Became Faster Than a Team of Three

The author demonstrates that in 2026, a single developer equipped with a set of open-source AI tools can write, test, and commit code faster

2026-04-28·2 мин
LLM
LLM·Habr AI

ecom.tech compared evolutionary fine-tuning of Qwen3-4B with SFT and GRPO for Kotlin tests

The ecom.tech team fine-tuned Qwen3-4B-Instruct for generating unit tests in Kotlin and showed that the evolutionary algorithm outperforms S

2026-04-28·3 мин
LLM
LLM·Habr AI

Yandex Code Assistant tested on secrets handling and compared against Cursor

An engineer at 'Infosystems Jet' tested Yandex Code Assistant on secrets management and showed the agent is nearly on par with Cursor, but s

2026-04-28·2 мин