MlOps: Machine learning Pipelines using kubeflow

Pavan Kumar
Nerd For Tech
Published in
5 min readAug 31, 2022

--

Effective MLOps on Kubernetes using Kubeflow

Machine Learning !! Machine Learning !! Machine Learning !!

Yes, this is the current scenario. Everywhere I go I see Machine learning. Well, AI and ML have now become the most important components of any application. An e-com website needs an ML model, websites like Netflix use an ML model to suggest content to their customers. There are many such examples where ML plays a very important role in accelerating the business. But how easy is the lifecycle of building an ML Model? That too in a containerized ecosystem? Phew, it's a nightmare. For a typical ML Model, these would be the basic steps

a) Training

b) Testing

c) Serving

Assume that all of the aforementioned steps are isolated workflows. Gathering them into one pipeline is highly impossible. Don't you worry, Kubeflow to our rescue !! The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Kubeflow is the ML toolkit for Kubernetes.

Kubeflow Components

One such component that we’d be highlighting in this article is Kubeflow Pipelines.

Kubeflow components

What is the entire story all about? (TLDR)

  1. Understanding Kubeflow Pipelines.
  2. Developing a Sample Model.

Prerequisites

  1. A Kubernetes Cluster ( EKS, AKS, Kind, etc ).
  2. Kubeflow Pipelines Installed.

Story Resources

  1. GitHub Link: https://github.com/pavan-kumar-99/medium-manifests
  2. GitHub Branch: kubeflow

Installing Kubeflow

You can select the most relevant way of installing kubeflow based on your Kubernetes distribution here. I have a local on-prem cluster and I have installed only Kubeflow pipelines using the documentation here.

Once Kubeflow pipelines are installed, the UI would look something like this

Kubeflow UI

Before you run your first kubeflow pipeline there are some important terms that you should be having an Idea about. Let us now understand them.

Pipeline:

A kubeflow pipeline is a description of an ML workflow, including all of the components in the workflow and how they combine in the form of a graph. The pipeline includes the definition of the inputs (parameters) required to run the pipeline and the inputs and outputs of each component.

A sample pipeline that is already created in the Kubeflow samples directory

Component:

A pipeline component is a self-contained set of code that performs one step in the ML workflow (pipeline), such as data preprocessing, data transformation, model training, and so on. A component is analogous to a function, in that it has a name, parameters, return values, and a body.

Sample Component

Graph:

A graph is a pictorial representation in the Kubeflow Pipelines UI of the runtime execution of a pipeline. The graph shows the steps that a pipeline run has executed or is executing, with arrows indicating the parent/child relationships between the pipeline components represented by each step.

Example of a Kubeflow Graph

Run:

A run is a single execution of a pipeline. Runs comprise an immutable log of all experiments that you attempt and are designed to be self-contained to allow for reproducibility. You can track the progress of a run by looking at its details page on the Kubeflow Pipelines UI, where you can see the runtime graph, output artifacts, and logs for each step in the run.

Kubeflow Run

Alright, we now have a basic understanding of the terms in Kubeflow. Let us now run our first kubeflow pipeline.

Running the Kubeflow Pipeline

All the ML Kubeflow Pipelines are built using Kubeflow Pipelines SDK. The Kubeflow Pipelines SDK provides a set of Python packages that you can use to specify and run your machine learning (ML) workflows. A pipeline is a description of an ML workflow, including all of the components that make up the steps in the workflow and how the components interact with each other.

Alright, Let us now understand the code. We are creating 2 components as a part of this code namely process_data_op, train_op ( Which are only printing some statements as of now ). As mentioned above A pipeline component is a self-contained set of code that performs one step in the ML workflow (pipeline), such as data preprocessing, data transformation, model training, and so on.

And after that, we are creating a Kubeflow pipeline using the sequential_pipeline function. And in that function, we are defining the steps needed for our pipeline by calling the components created earlier. And we are also defining the dependency by using the after function.

$ git clone https://github.com/pavan-kumar-99/medium-manifests.git \
-b kubeflow
$ cd medium-manifests$ python3 medium-pipeline.py

That’s it. We are now compiling the pipeline and generating the pipeline yaml.

The Pipeline YAML generated from the Kubeflow pipeline

Let us now upload this Pipeline YAML.

Upload the Pipeline Yaml
Upload the pipeline to Kubeflow

Once the pipeline is uploaded, You can run the pipeline from the Kubeflow Pipeline UI.

Kubeflow Pipeline Output

So this is how our pipeline looks like in a Graphical representations. All the events. logs, visualisations ( if created ) can now be seen in the console as well.

Well, this was a very simple kubeflow pipeline. This article will ensure that one has sufficient knowledge before getting started with Kubeflow pipelines. In my next article on the Kubeflow series, I will take a real time usecase and explain how we can use Kubeflow to design an E2E ML Solution ( Including the Model Serving using TFServing as well ).

Until next time…..

Recommended

--

--

Pavan Kumar
Nerd For Tech

Senior Cloud DevOps Engineer || CKA | CKS | CSA | CRO | AWS | ISTIO | AZURE | GCP | DEVOPS Linkedin:https://www.linkedin.com/in/pavankumar1999/