Helion


Szczegóły ebooka

Machine Learning on Kubernetes

Machine Learning on Kubernetes


MLOps is an emerging field that aims to bring repeatability, automation, and standardization of the software engineering domain to data science and machine learning engineering. By implementing MLOps with Kubernetes, data scientists, IT professionals, and data engineers can collaborate and build machine learning solutions that deliver business value for their organization.

You'll begin by understanding the different components of a machine learning project. Then, you'll design and build a practical end-to-end machine learning project using open source software. As you progress, you'll understand the basics of MLOps and the value it can bring to machine learning projects. You will also gain experience in building, configuring, and using an open source, containerized machine learning platform. In later chapters, you will prepare data, build and deploy machine learning models, and automate workflow tasks using the same platform. Finally, the exercises in this book will help you get hands-on experience in Kubernetes and open source tools, such as JupyterHub, MLflow, and Airflow.

By the end of this book, you'll have learned how to effectively build, train, and deploy a machine learning model using the machine learning platform you built.

  • Machine Learning on Kubernetes
  • Contributors
  • About the authors
  • About the reviewers
  • Preface
    • Who this book is for
    • What this book covers
    • To get the most out of this book
    • Download the example code files
    • Download the color images
    • Conventions used
    • Get in touch
    • Reviews
    • Share Your Thoughts
  • Part 1: The Challenges of Adopting ML and Understanding MLOps (What and Why)
  • Chapter 1: Challenges in Machine Learning
    • Understanding ML
    • Delivering ML value
    • Choosing the right approach
      • The importance of data
    • Facing the challenges of adopting ML
      • Focusing on the big picture
      • Breaking down silos
      • Fail-fast culture
    • An overview of the ML platform
    • Summary
    • Further reading
  • Chapter 2: Understanding MLOps
    • Comparing ML to traditional programming
    • Exploring the benefits of DevOps
    • Understanding MLOps
      • ML
      • DevOps
      • ML project life cycle
      • Fast feedback loop
      • Collaborating over the project life cycle
    • The role of OSS in ML projects
    • Running ML projects on Kubernetes
    • Summary
    • Further reading
  • Chapter 3: Exploring Kubernetes
    • Technical requirements
    • Exploring Kubernetes major components
      • Control plane
      • Worker nodes
      • Kubernetes objects required to run an application
    • Becoming cloud-agnostic through Kubernetes
    • Understanding Operators
    • Setting up your local Kubernetes environment
      • Installing kubectl
      • Installing minikube
      • Installing OLM
    • Provisioning a VM on GCP
    • Summary
  • Part 2: The Building Blocks of an MLOps Platform and How to Build One on Kubernetes
  • Chapter 4: The Anatomy of a Machine Learning Platform
    • Technical requirements
    • Defining a self-service platform
    • Exploring the data engineering components
      • Data engineer workflow
    • Exploring the model development components
      • Understanding the data scientist workflow
    • Security, monitoring, and automation
    • Introducing ODH
      • Installing the ODH operator on Kubernetes
      • Enabling the ingress controller on the Kubernetes cluster
      • Installing Keycloak on Kubernetes
    • Summary
    • Further reading
  • Chapter 5: Data Engineering
    • Technical requirements
    • Configuring Keycloak for authentication
      • Importing the Keycloak configuration for the ODH components
      • Creating a Keycloak user
    • Configuring ODH components
      • Installing ODH
    • Understanding and using JupyterHub
      • Validating the JupyterHub installation
      • Running your first Jupyter notebook
    • Understanding the basics of Apache Spark
      • Understanding Apache Spark job execution
    • Understanding how ODH provisions Apache Spark cluster on-demand
      • Creating a Spark cluster
      • Understanding how JupyterHub creates a Spark cluster
    • Writing and running a Spark application from Jupyter Notebook
    • Summary
  • Chapter 6: Machine Learning Engineering
    • Technical requirements
    • Understanding ML engineering
    • Using a custom notebook image
      • Building a custom notebook container image
    • Introducing MLflow
      • Understanding MLflow components
      • Validating the MLflow installation
    • Using MLFlow as an experiment tracking system
      • Adding custom data to the experiment run
    • Using MLFlow as a model registry system
    • Summary
  • Chapter 7: Model Deployment and Automation
    • Technical requirements
    • Understanding model inferencing with Seldon Core
      • Wrapping the model using Python
      • Containerizing the model
      • Deploying the model using the Seldon controller
    • Packaging, running, and monitoring a model using Seldon Core
    • Introducing Apache Airflow
      • Understanding DAG
      • Exploring Airflow features
      • Understanding Airflow components
      • Validating the Airflow installation
      • Configuring the Airflow DAG repository
      • Configuring Airflow runtime images
    • Automating ML model deployments in Airflow
      • Creating the pipeline by using the pipeline editor
    • Summary
  • Part 3: How to Use the MLOps Platform and Build a Full End-to-End Project Using the New Platform
  • Chapter 8: Building a Complete ML Project Using the Platform
    • Reviewing the complete picture of the ML platform
    • Understanding the business problem
    • Data collection, processing, and cleaning
      • Understanding data sources, location, and the format
      • Understanding data processing and cleaning
    • Performing exploratory data analysis
      • Understanding sample data
    • Understanding feature engineering
      • Data augmentation
    • Building and evaluating the ML model
      • Selecting evaluation criteria
      • Building the model
      • Deploying the model
    • Reproducibility
    • Summary
  • Chapter 9: Building Your Data Pipeline
    • Technical requirements
    • Automated provisioning of a Spark cluster for development
    • Writing a Spark data pipeline
      • Preparing the environment
      • Understanding data
      • Designing and building the pipeline
      • Using the Spark UI to monitor your data pipeline
    • Building and executing a data pipeline using Airflow
      • Understanding the data pipeline DAG
      • Building and running the DAG
    • Summary
  • Chapter 10: Building, Deploying, and Monitoring Your Model
    • Technical requirements
    • Visualizing and exploring data using JupyterHub
    • Building and tuning your model using JupyterHub
    • Tracking model experiments and versioning using MLflow
      • Tracking model experiments
      • Versioning models
    • Deploying the model as a service
      • Calling your model
    • Monitoring your model
      • Understanding monitoring components
      • Configuring Grafana and a dashboard
    • Summary
  • Chapter 11: Machine Learning on Kubernetes
    • Identifying ML platform use cases
      • Considering AutoML
      • Commercial platforms
      • ODH
    • Operationalizing ML
      • Setting the business expectations
      • Dealing with dirty real-world data
      • Dealing with incorrect results
      • Maintaining continuous delivery
      • Managing security
      • Adhering to compliance policies
      • Applying governance
    • Running on Kubernetes
      • Avoiding vendor lock-ins
      • Considering other Kubernetes platforms
    • Roadmap
    • Summary
    • Further reading
    • Why subscribe?
  • Other Books You May Enjoy
    • Packt is searching for authors like you
    • Share Your Thoughts