Horizontal Pod Autoscaling
Introduction
One exciting feature of Kubernetes is the ability to horizontally scale a workload. For example, if you have one pod serving traffic and it’s CPU usage begins to blow up, Kubernetes can automatically create more pods to handle work for you! In this section we’re going to look at a horizontal pod autoscaler and watch as it scales our application.
Prerequisites
- You should have the
noobernetes
deployment and service applied to your Kubernetes cluster - (optional) You should have the watch command installed
Installing the metrics-server
The metrics-server provides information about the CPU and Memory Usage of containers running in your cluster. It is generally installed by default in most Kubernetes clusters, however with Docker for Mac we’re going to have to install it ourselves.
The official repository provides a quick way to install the metrics server, but we’ll need to tweak it a little bit to get it to work with Docker for Mac
If you’d like to just use a pre-tweaked file, you can find it at applications/metrics-server/metrics.yaml
> kubectl apply -f applications/metrics-server/metrics.yaml
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
serviceaccount/metrics-server created
deployment.apps/metrics-server created
service/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
The metrics-server
will begin scraping our running containers for metrics.
Writing our Horizontal Pod Autoscaler (HPA)
We will add a file called horizontal-pod-autoscaler.yaml
for our HPA manifest.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: noobernetes-hpa
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: noobernetes
targetCPUUtilizationPercentage: 50
We can now apply this to our cluster just like any other Kubernetes resource.
> kubectl apply -f manifests/horizontal-pod-autoscaler.yaml
horizontalpodautoscaler "noobernetes-hpa" created
Before we begin trying to scale our pods, we first must update our Deployment to specify its resource requests, which in our case will be based on CPU utilization. The CPU utilization for a resource request is added under spec
for a deployment. Your deployment should now look as follows. Note that after changing the Deployment, you will need to apply it again via kubectl apply -f deployment.yaml
from within the manifests
folder in your application in order for the changes to be applied.
apiVersion: apps/v1
kind: Deployment
metadata:
name: noobernetes
spec:
selector:
matchLabels:
app: noobernetes
template:
metadata:
name: noobernetes
labels:
app: noobernetes
spec:
containers:
- name: noobernetes-container
image: noobernetes:hello-world
resources:
requests:
cpu: 200m
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: SUPER_SECRET
value: "This is my secret string"
restartPolicy: Always
Now lets check out our HPA.
> kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
noobernetes-hpa Deployment/noobernetes <unknown>/50% 1 10 1 2m23s
We can see that its targeting our deployment, and that CPU usage is currently at 0% of our 50% target that we specified earlier. It’ll also let us know its current replica count and its minimum and maximum boundaries.
Watch it scale!
We’re now ready to see our HPA in action. We’re going to use the watch
command so that we can see it live!
watch kubectl get pods
watch kubectl get hpa
This will refresh our view every 2 seconds so that we can follow along as Kubernetes scales our application. If you don’t have watch installed, you can just re-run the command when you’d like to see updates.
Generating load
Now that we’re monitoring pods and hpa we’re ready to generate some load on our server.
Let’s use the kubectl run
command to spin up a pod that we can use to send traffic to our service in a loop. With the run
command you can run any contiainer images as a kubernetes pod. We’re going to use busybox, a lightweight utility image. Run the following command in another pane or tab of your terminal:
> kubectl run -i --tty load-generator --image=busybox /bin/sh
If you don't see a command prompt, try pressing enter.
/ # while true; do wget -q -O- http://noobernetes:4000; done
It may take some time, but eventually you should see CPU usage climb and more pods get automatically spun up by the HPA.
Conclusion
HPAs are powerful tools that let us automate scaling in our cluster. While scaling on CPU is helpful Kubernetes 1.10 supports the ability to scale on custom and external metrics. Check out some of the resources for more information on how to implement different scaling metrics.