Vanguard built a Virtual Analyst on AWS following eight AI-ready data principles
Vanguard, which manages $9+ trillion in assets, launched an internal Virtual Analyst powered by AWS. The solution was built on data, not neural networks: the…
AI-processed from AWS Machine Learning Blog; edited by Hamidun News
First Data, Then Models
Key takeaway from Vanguard's experience: AI transformation begins not with choosing a neural network or purchasing computing power, but with getting your data house in order. The company's engineers formulated eight principles of AI-ready data that formed the foundation of the entire project before a single line of model code was written. The principles span the full lifecycle — from semantics and structure to security and monitoring:
- Unified taxonomy — a common vocabulary for all metrics, KPIs, and business entities, so that "yield" in one division means the same as in another
- Data lineage — traceability of each metric from primary source to analytical warehouse
- Timeliness — guarantee of data freshness at the moment of every query
- Machine-readable metadata — schemas and descriptions understood not only by humans but by automation
- Access control — granular security policies at row and column level
- Quality monitoring — automatic validation of data correctness in real time
- Standardized formats — uniform schemas and conventions adopted by all teams
- Documentation — reproducibility of each dataset and explainability of each calculation
Without these principles, AI models hallucinate on incorrect or ambiguous data. Vanguard addressed the problem systematically rather than piecemeal — and that became the foundation of their success.
AWS Under the Hood
For technical implementation, Vanguard deployed an integrated suite of AWS services. Amazon S3 serves as a single data lake consolidating sources from different divisions. AWS Glue Data Catalog handles ETL pipelines and centralized metadata storage — this is where descriptions, schemas, and business definitions of all datasets live.
Model training and deployment are implemented on Amazon SageMaker. Orchestration of complex multi-step processes is handled through AWS Step Functions, and data quality and pipeline performance monitoring is done through Amazon CloudWatch with configured alerts and dashboards for the team. On top of this infrastructure sits the Virtual Analyst: it takes questions in natural language, translates them into data queries, and returns structured analytics with charts and textual interpretations.
Business teams get insights without SQL, Python, or queuing requests to data specialists.
Business Results
"The path to AI-ready data is not a one-time project, but an operating culture," emphasize the case authors in the AWS
Machine Learning blog.
Vanguard documents concrete, measurable impact. Time to prepare typical analytical reports has been cut dramatically. Portfolio management and risk analysis teams ask questions directly to the system — instead of assigning tasks to data analysts and waiting for answers for hours or days. Importantly, the Virtual Analyst does not replace human expertise: it takes on the routine part of the work — aggregation, filtering, and initial data interpretation — and frees analysts for higher-level tasks: hypothesis development and strategic decision-making.
What This Means
The Vanguard case is one of the most detailed public descriptions of how a large financial company builds AI analytics in production. Eight principles of AI-ready data — a practical checklist for any organization wanting to get real business value from AI, not just a beautiful pilot on perfectly prepared test data.
Want to stop reading about AI and start using it?
AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.