MarkTechPost

The Qwen team released FlashQLA: accelerating linear attention up to 3× on NVIDIA Hopper
QwenLM released FlashQLA — a CUDA kernel library for Gated Delta Network that delivers up to 3× performance gain on NVIDIA Hopper GPU for pr

OpenAI Privacy Filter: How to Build a Production Pipeline for PII Detection and Masking
The OpenAI Privacy Filter guide breaks down a complete pipeline for detecting and masking personal data — from model loading to automatic te

DeepSeek, Google, and Meta: 10 Techniques for LLM KV-Cache Compression to Reduce Inference Memory
KV-cache has become a memory hog for large LLMs, and a new survey reveals 10 approaches — from H2O and SnapKV to TurboQuant and DeepSeek's M

Poolside released Laguna XS.2 and M.1 — open models for agentic programming
Poolside unveiled two Laguna models for agentic coding: the open XS.2 runs locally, while the more powerful M.1 is designed for long tasks w

LlamaIndex ParseBench: How to Test Document Parsing via Python and Hugging Face
A practical walkthrough shows how to build a document parser evaluation pipeline using the LlamaIndex ParseBench dataset: load PDFs from Hug

smol-audio from Deep-unlearning: A collection of Colab notebooks for audio model fine-tuning
Deep-unlearning released smol-audio — a collection of Colab-compatible notebooks for fine-tuning Whisper, Parakeet, Voxtral, Granite Speech

Top 10 Physical AI Models Controlling Real Robots in 2026
Over 18 months, the gap between LLMs and real robotics has narrowed dramatically: physical AI models are already operating in factories, war

Hugging Face and Gemma 3 1B: Building a Production-Ready Generation Pipeline in Colab
A breakdown of how to run Gemma 3 1B Instruct in Colab via Hugging Face Transformers: with secure authorization, chat templates, and a repro

Z.ai releases GLM-5V-Turbo — native multimodal model for visual programming
Chinese lab Z.ai has released GLM-5V-Turbo — a model that recognizes architectural diagrams and screenshots, then immediately generates work

Google Gemma 4, NVIDIA, and OpenClaw: Local AI Agents Without Per-Token Billing
Google and NVIDIA are promoting local deployment of Gemma 4 on RTX, Jetson, and DGX Spark so that always-on AI agents like OpenClaw run fast

Talkie-1930: Researchers released a 13B model with no knowledge of the internet and World War II
Talkie-1930 is an open 13B model trained only on English texts up to 1931 to study historical thinking, data leaks, and AI's ability to gene

MarkTechPost Demonstrates How to Build a Lightweight VLA Agent with Latent World Model and MPC
In a new tutorial, MarkTechPost breaks down how to build a simplified embodied agent: it operates on RGB frames, learns a latent world model

Arcee AI Released Trinity Large Thinking — Open Reasoning Model for AI Agents
Arcee AI open-sourced Trinity Large Thinking weights under Apache 2.0 license and focused on long agent scenarios, multi-step reasoning, and

NVIDIA Showcased Complete Model Optimization Pipeline with FastNAS Pruning and Fine-tuning
NVIDIA released a practical guide to Model Optimizer: a single Colab notebook demonstrates ResNet20 training, FastNAS pruning under FLOPs li

TII Releases Falcon Perception — 0.6B Model for Object Segmentation and Text-Based Search
TII unveiled Falcon Perception — a compact 0.6-billion-parameter vision-language model that searches and segments objects from plain text qu

Google DeepMind Enables LLM to Rewrite Game Theory Algorithms and Surpass Experts
Google DeepMind demonstrated that AlphaEvolve can rewrite code for game algorithms with incomplete information and find solutions that outpe

Z.AI showed how to build production-ready agentic systems on GLM-5 with tool calling
Z.AI released a detailed GLM-5 tutorial: from SDK setup and OpenAI-compatible API to streaming, tool calling, JSON output, and a multi-turn

Netflix Opens Void — Model for Removing Objects from Video with Scene Physics Consideration
The Netflix and INSAIT team released Void as open source — a system that removes objects from video while simultaneously recalculating falls

How Artificial Intelligence Helps Clothing Brands Design Fashion's Future
Algorithms already help fashion brands create collections faster, forecast trends, reduce overproduction, and personalize shopping, but also

How to Build a Netflix Void Pipeline for Object Removal from Video Using CogVideoX
A detailed walkthrough shows how to deploy the Netflix Void model, download required checkpoints, prepare input data, and run object removal

Gladstone Institutes Present MaxToki — an AI Model That Predicts Cell Aging
Gladstone Institutes' MaxToki model learns to see not a 'snapshot' of a cell, but its trajectory over time, assesses aging acceleration, and

TinyFish Launched Unified Web Platform for AI-Agents with Search, Fetch, Browser and Agent
TinyFish combined search, page rendering, browser sessions, and autonomous web workflows in a single platform for AI-agents with a single AP

Google added Skills to Chrome and turned AI prompts into one-click scenarios
Google launched the Skills feature in Chrome: Gemini users will be able to save frequently used prompts as reusable scenarios and run them i

Google DeepMind Presents Gemini Robotics-ER 1.6 for Robot Autonomy and Instrument Reading
Google DeepMind updated Gemini Robotics-ER to version 1.6: a model for robots that better understands space, handles multiple video streams,

MarkTechPost broke down the complete training cycle of large language models: from data to deployment
MarkTechPost released a detailed breakdown of how LLMs are built today: from pretraining on massive corpora to SFT, RLHF, reasoning optimiza

Google Introduced Gemini 3.1 Flash TTS — Speech Model with Control, Dialogues and 70+ Languages
Google launched Gemini 3.1 Flash TTS in preview: the model synthesizes text in 70+ languages, supports dialogues with two voices, and allows

Mem0 and OpenAI: How to Build a Universal Long-Term Memory Layer for AI Agents
A new tutorial breaks down the combination of Mem0, OpenAI models, and ChromaDB: it extracts facts from ordinary conversations, stores them

SmolAgents: How to Build a Multi-Agent AI System with Code and Dynamic Orchestration
A breakdown of SmolAgents implementation shows how lightweight AI agents execute code, invoke tools, work with memory, and coordinate tasks

NetKet and JAX: How to Build a Transformer Model for Frustrated Spin Systems
The guide shows how to build a research VMC pipeline using NetKet, JAX, and Transformer architecture for modeling a frustrated spin chain J1

OpenAI Presented GPT-Rosalind — an AI Model for Biology, Genomics, and Drug Development
OpenAI launched GPT-Rosalind — a specialized model for biology and pharmaceuticals that helps accelerate drug development, genomic data anal