Habr AI→ original

dBrain.cloud integrated LocalAI and Kubeflow into a container platform for enterprise AI

dBrain.cloud built a two-tier AI infrastructure: LocalAI for quickly launching ready-made models and Kubeflow for the full MLOps cycle. The hardest part…

AI-processed from Habr AI; edited by Hamidun News
dBrain.cloud integrated LocalAI and Kubeflow into a container platform for enterprise AI
Source: Habr AI. Collage: Hamidun News.
◐ Listen to article

The dBrain team shared how they built two different layers for AI workload on their container platform: LocalAI for rapid deployment of pre-built models and Kubeflow for the complete development cycle. Practice showed that the bulk of work was not in the models themselves, but in integrating them into existing infrastructure and networks.

Two Platform Layers

For pre-built models, dBrain chose LocalAI. This open source tool enables chat models, image and video generation, speech recognition and synthesis, as well as multimodal scenarios. A significant advantage for a resource-constrained platform is the ability to flexibly load and unload models, use GPU where needed, while providing developers with a local endpoint compatible with the OpenAI API. This is especially important when compute resources are limited and workload distribution between services must be rebalanced quickly.

Within the platform, LocalAI turned out to be the simplest stage. Integration mainly came down to adapting manifests to internal deployment templates. This layer is needed for clients who don't require their own ML pipeline, but need rapid deployment of ready-made models in containers.

For their own models, the team took a heavier approach and integrated key parts of Kubeflow, turning the platform into a full-fledged MLOps circuit. This stack included:

  • KServe for model hosting and management
  • Trainer for training and optimization
  • Notebooks for rapid experimentation and fine-tuning
  • Katib for hyperparameter tuning
  • Model Registry and Pipelines for storage and process automation

Why Without Knative

The most contentious part of the integration was KServe, which has two inference modes: Knative mode and Standard mode. The documentation primarily guides teams toward Knative. For many, this appears to be the default option. This approach has strong advantages: services can scale to zero without traffic, then quickly wake up on request, and the platform gains convenient mechanisms for revisions, traffic splitting, and canary releases.

But dBrain deliberately chose a different path. The reason was not performance, but operational cost. Knative brings along a separate networking layer, additional queue proxy containers inside pods, and dependency on gateway implementations like Istio or Kourier. For an enterprise platform, this means more components to maintain, more places where things can break, and more complex diagnostics. Ultimately, the team chose Standard mode, which relies on regular Kubernetes Deployments and Services and better fits the already existing operational model.

Migration to Gateway API

Choosing Standard mode didn't solve all problems—it simply shifted them to the networking layer. For this mode to work, Gateway API is required. dBrain already had basic support for this standard, but most of the platform's services had historically been published via Ingress. Keeping the old scheme alongside the new one didn't work: in their architecture, KServe Standard mode couldn't properly coexist with the ingress model. The team considered targeted adaptation or switching the ingress controller, but considered it a temporary measure.

Instead, they decided to complete the migration and move the platform's entire networking model to Gateway API. This was a more expensive step upfront, but it eliminated intermediate compromises and prepared the infrastructure for Kubernetes' new standard. In effect, integrating AI services became an infrastructure reform that affected not only the ML stack, but the publication of all services.

What It Means

The dBrain case illustrates the current reality of enterprise AI well: choosing a model or framework is no longer enough. The real work begins where you need to combine rapid deployment of ready-made models, full-fledged MLOps for in-house development, and predictable operation in Kubernetes. Success goes not to the trendiest stack, but to the one that can be stably maintained in production. This is why infrastructure decisions here influence business as much as the choice of models themselves.

ZK
Hamidun News
AI news without noise. Daily editorial selection from 400+ sources. A product by Zhemal Khamidun, Head of AI at Alpina Digital.

Want to stop reading about AI and start using it?

AI News is a curated feed of AI/tech news. Hamidun Academy teaches you to use AI systematically in your work.

What do you think?
Loading comments…