Federated Learning
Federated learning is a machine learning technique that trains a shared model across many decentralized devices or servers without centralizing raw data, transmitting only model parameter updates to a coordinating server to preserve privacy.
Federated learning is a distributed machine learning paradigm conceptualized by Google in 2016 and formalized in a 2017 paper. Instead of aggregating raw data on a central server, it keeps data on the devices or institutions that own it — smartphones, hospitals, financial institutions — and trains the model locally on each participating node.
In a standard federated training round, each client downloads the current global model, trains it on its local dataset for a fixed number of steps, and uploads only the resulting weight updates (gradients or model deltas) to the server. The server aggregates these updates — typically via Federated Averaging (FedAvg) — to produce an improved global model, which is then redistributed. This cycle repeats until convergence.
Federated learning addresses two central concerns: data privacy and regulatory compliance. Organizations subject to GDPR, HIPAA, or financial secrecy laws often cannot share raw data across borders or institutions. By keeping data local, federated learning enables collaborative model training without legal or ethical exposure. It also reduces bandwidth requirements compared to centralizing large datasets.
As of 2026, federated learning is in production at scale. Google uses it for on-device next-word prediction in Gboard and for voice models; Apple applies it to features like QuickType and Siri without uploading user content. Active research challenges include communication efficiency, handling non-IID (non-identically distributed) data across clients, and defending against model poisoning attacks. Frameworks such as TensorFlow Federated, PySyft, and NVIDIA FLARE have made the technique accessible beyond academic settings.