Save 80% of Your Machine Learning Training Bill on Kubernetes
Mon Jun 6, 2022, by Phil Winder, in Cloud Native, MLOps, Case Study
Winder.AI worked with Grid.AI to stress test managed Kubernetes services with the aim of reducing training time and cost. A summary of this work includes: Stress testing the scaling performance of the big three managed Kubernetes services Reducing the cost of training a 1000-node model by 80% The finding that some cloud vendors are better (cheaper) than others The Problem: How to Minimize the Time and Cost of Training Machine Learning Models Artificial intelligence (AI) workloads are resource hogs.