LLMs, AI agents, machine-learning models are all the rage now - everybody uses them.
Running them as Kubernetes apps in the cloud, however, is a bit tricky - at the very least, you need expensive specialized hardware (GPUs) and a mindset tweak to work around the special nature of these workloads.
Let’s delve into what you need to do, what tools are available, and what fun and frustrating challenges await you when you decide to run containerized GPU workloads in Kubernetes in the cloud today - might actually change tomorrow, who knows?