النماذج

Small Language Model (SLM)

A small language model (SLM) is a language model with a compact parameter count—typically 1 to 13 billion parameters—optimized for efficient inference on consumer devices, edge hardware, or single-GPU servers rather than large data center deployments.

Small language models (SLMs) are language models designed to deliver useful text understanding and generation within tight compute and memory budgets. While no strict parameter boundary defines the category, models in the roughly 1- to 13-billion-parameter range are commonly described as small relative to frontier systems that exceed 100 billion parameters. Prominent examples include Microsoft's Phi-3 and Phi-4 families (3.8B to 14B parameters), Google's Gemma 2 (2B and 9B variants), Apple's on-device models powering Apple Intelligence (estimated 3B parameters), and the 1B and 3B tiers of Meta's LLaMA 3.2.

SLMs achieve competitive performance relative to their size through several techniques. High-quality curated training data—synthetic textbook-style text, code, and structured reasoning examples—can outperform noisily scraped web data for same-size models, a finding publicized by Microsoft's Phi research beginning in 2023. Knowledge distillation transfers representations from a larger teacher model to a smaller student, compressing capability. Post-training quantization reduces numerical precision from 16-bit floats to 4-bit or 8-bit integers, cutting memory footprint by 2–4x with minimal accuracy loss. Efficient architectures like grouped-query attention and sliding-window attention further reduce inference cost.

The practical significance of SLMs is deployment flexibility and data privacy. Running inference locally eliminates API round-trip latency, avoids transmitting sensitive user data to external servers, removes dependency on internet connectivity, and eliminates per-token API costs at scale. These properties matter for enterprise applications with data-governance constraints, consumer applications on mobile and personal computers, latency-sensitive scenarios like real-time typing assistance, and use cases in regions with limited connectivity infrastructure.

By 2026, every major AI developer ships SLMs as first-class products. Apple Intelligence runs language tasks on iPhones and Macs using on-device models without sending prompts to the cloud. Microsoft's Phi-4 models are integrated into developer tooling and Windows Copilot. The quality gap between SLMs and large models has closed substantially for structured tasks—summarization, information extraction, code completion, classification—while complex multi-step reasoning and broad world knowledge still favor larger systems. On-device SLMs also enable personalization through local fine-tuning on user data without any data leaving the device.

مثال

A healthcare software company deploys a 7-billion-parameter SLM on hospital servers to extract structured data from clinical notes, keeping all patient records on-premises and meeting HIPAA requirements without cloud API calls.

مصطلحات مرتبطة

← المسرد