k8s_autoscale

Posted on 2021-06-08 Edited on 2023-08-16 In k8s , autoscale

Introduction

Autoscaling allows you to dynamically adjust to demand without intervention from the individuals in charge of operating the cluster.

Kubernetes autoscaling helps optimize resource usage and costs by automatically scaling a cluster up and down in line with demand.

Kubernetes enables autoscaling at the cluster/node level as well as at the pod level.

Autoscaler

Autoscaling eliminates the need for constant manual reconfiguration to match changing application workload levels. Kubernetes can autoscale by adjusting the capacity (vertical autoscaling) and number (horizontal autoscaling) of pods, and/or by adding or removing nodes in a cluster (cluster autoscaling).

There are actually three autoscaling features for Kubernetes: Horizontal Pod Autoscaler, Vertical Pod Autoscaler, and Cluster Autoscaler. Let’s take a closer look at each and what they do.

Horizontal Pod Autoscaler(HPA)

Horizontal scaling, which is sometimes referred to as “scaling in/out,” allows Kubernetes administrators to dynamically (i.e., automatically) increase or decrease the number of running pods as your application’s usage changes.

A cluster operator declares their target usage for metrics, such as CPU or memory utilization, as well their desired maximum and minimum desired number of replicas,. The cluster will then reconcile the number of replicas accordingly, and scale up or down the number of running pods based on their current usage and the desired target.

HPA Design

Vertical Pod Autoscaler(VPA)

VPA refers to adding more resources (such as CPU or memory) to an existing machine. it’s sometimes referred to as “scaling up/down”

The cluster operator declares their target usage for metrics, such as CPU or memory utilization. The cluster will then reconcile the size(CPU, Memory) of the service’s pod or pods based on their current usage and the desired target.

Vertical Pod Autoscaler (VPA) frees the users from necessity of setting up-to-date resource limits and requests for the containers in their pods. When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in initial containers configuration.

It can both down-scale pods that are over-requesting resources, and also up-scale pods that are under-requesting resources based on their usage over time.

Autoscaling is configured with a Custom Resource Definition object called VerticalPodAutoscaler. It allows to specify which pods should be vertically autoscaled as well as if/how the resource recommendations are applied.

VPA Design

For each VPA resource, there are three modes in which VPAs operate:

“Auto”: VPA assigns resource requests on pod creation as well as updates them on existing pods using the preferred update mechanism. Currently this is equivalent to “Recreate” (see below). Once restart free (“in-place”) update of pod requests is available, it may be used as the preferred update mechanism by the “Auto” mode. NOTE: This feature of VPA is experimental and may cause downtime for your applications.
“Recreate”: VPA assigns resource requests on pod creation as well as updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation (respecting the Pod Disruption Budget, if defined). This mode should be used rarely, only if you need to ensure that the pods are restarted whenever the resource request changes. Otherwise prefer the “Auto” mode which may take advantage of restart free updates once they are available. NOTE: This feature of VPA is experimental and may cause downtime for your applications.
“Initial”: VPA only assigns resource requests on pod creation and never changes them later.
“Off”: VPA does not automatically change resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object.

VPA does not modify the template in the deployment, but the actual requests of the pods are updated

containers:
- name: nginx
  image: nginx:1.13.12
  ports:
  - containerPort: 80
  resources:
    requests:
      memory: "64Mi"
      cpu: "250m"
    limits:
      memory: "128Mi"
      cpu: "500m"

enable VPA

# step1: enable metric server https://github.com/kubernetes-incubator/metrics-server

# step2: install VerticalPodAutoscaler resource and it's controller
$ git clone https://github.com/kubernetes/autoscaler.git
$ cd autoscaler/vertical-pod-autoscaler

$ unset $REGISTRY
$ unset $TAG

# start VPA controller etc
$ ./hack/vpa-up.sh  # openssl must be version 1.1.1 or higher

$ kubectl create -f examples/hamster.yaml
# The above command creates a deployment with 2 pods, each running a single container that requests 100 millicores and tries to utilize slightly above 500 millicores. The command also creates a VPA config pointing at the deployment. VPA will observe the behavior of the pods and after about 5 minutes they should get updated with a higher CPU request
$ kubectl get vpa
$ kubectl describe vpa
...
apiVersion: "autoscaling.k8s.io/v1"                                             
kind: VerticalPodAutoscaler                                                     
metadata:                                                                       
  name: hamster-vpa                                                             
spec:                                                                           
  targetRef:                                                                    
    apiVersion: "apps/v1"                                                       
    kind: Deployment                                                            
    name: hamster                                                               
  resourcePolicy:                                                               
    containerPolicies:                                                          
      - containerName: '*'                                                      
        minAllowed:                                                             
          cpu: 100m                                                             
          memory: 50Mi                                                          
        maxAllowed:                                                             
          cpu: 1                                                                
          memory: 500Mi                                                         
        controlledResources: ["cpu", "memory"]      
  updatePolicy:
    updateMode: "Auto" 

# debug
$ kubectl -n kube-system get pods|grep vpa
$ kubectl -n kube-system get deploy|grep vpa
$ kubectl get customresourcedefinition| grep verticalpodautoscalers

# stop using VPA
# Note that if you stop running VPA in your cluster, the resource requests for the pods already modified by VPA will not change, but any new pods will get resources as defined in your controllers (i.e. deployment or replicaset) and not according to previous recommendations made by VPA.
$ ./hack/vpa-down.sh

Note

Updating running pods is an experimental feature of VPA. Whenever VPA updates the pod resources the pod is recreated, which causes all running containers to be restarted. The pod may be recreated on a different node
Vertical Pod Autoscaler should not be used with the Horizontal Pod Autoscaler (HPA) on CPU or memory at this moment. However, you can use VPA with HPA on custom and external metrics.
VPA recommendation might exceed available resources (e.g. Node size, available size, available quota) and cause pods to go pending. This can be partly addressed by using VPA together with Cluster Autoscaler.
Multiple VPA resources matching the same pod have undefined behavior.

Cluster Autoscaler(CA)

HPA and VPA essentially make sure that all of the services running in your cluster can dynamically handle demand while not over-provisioning during slower usage periods. That’s a good thing.

It’s what allows for the autoscaling of the cluster itself, increasing and decreasing the number of nodes available for your pods to run on.

Cluster Autoscaler will reach out to a cloud provider’s API and scale up or down the number of nodes attached to the cluster accordingly, so it’s different config for different cloud providers, only works in cloud env.

Cluster Autoscaler is a tool that automatically adjusts the size of the Kubernetes cluster when one of the following conditions is true:

there are pods that failed to run in the cluster due to insufficient resources(to increase node)
there are nodes in the cluster that have been underutilized for an extended period of time and their pods can be placed on other existing nodes.(decrease node, migrate pod to other nodes).

Here is an example for Alibaba Cloud.

Cloud provider that has cluster autoscaler provided

AWS
GKE
Azure
AliCloud
…