2024.07.25

ClusterAutoscalerでノードのスケールインのCPUリソースの閾値を調整する

saratogax

記事内に商品プロモーションを含む場合があります

Kubernetes でノードのスケールを管理するのに Cluster Autoscaler を利用しています。

Cluster Autoscaler

最近は、EKS で利用するなら Karpenter の方が推奨されているようですが、運用期間が長くてノウハウが増えつつあるサードパーティー製品を置き換えるのは大変そうです。

基本的には、Kuberentes クラスタがリソース不足で pod をスケジュールできないときにノードを増やし、ノードのリソースが長時間利用されていないときにノードを減らすといった動きをしてくるのは同じ。

ただ、このあたりのスケーリングのルールがブラックボックス化している印象だったので触れてこなかったのですが、今回は暇そうにしているノードがなかなかスケールインしなかったので調査してみました。

Contents

ノードのスケールアウト(スケールアップ)
ノードのスケールイン(スケールダウン)
–scale-down-utilization-thresholdの設定調整
まとめ

ノードのスケールアウト(スケールアップ)

Cluster Autoscaler の説明で書かれているのは、冒頭で書いた「Kuberentes クラスタがリソース不足で pod をスケジュールできないとき」とほぼ同じ。

Scale-up creates a watch on the API server looking for all pods. It checks for any unschedulable pods every 10 seconds (configurable by –scan-interval flag). A pod is unschedulable when the Kubernetes scheduler is unable to find a node that can accommodate the pod. For example, a pod can request more CPU that is available on any of the cluster nodes. Unschedulable pods are recognized by their PodCondition. Whenever a Kubernetes scheduler fails to find a place to run a pod, it sets “schedulable” PodCondition to false and reason to “unschedulable”. If there are any items in the unschedulable pods list, Cluster Autoscaler tries to find a new place to run them.
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-scale-up-work

ノードのスケールイン(スケールダウン)

スケールインの条件については、以下に説明がありました。

このノードで実行されているすべてのポッド (DaemonSet ポッドと Mirror ポッドはデフォルトで含まれますが、–ignore-daemonsets-utilization フラグと –ignore-mirror-pods-utilization フラグで設定可能) の CPU 要求の合計とメモリ要求の合計が、ノードの割り当て可能容量の 50% 未満です。

Every 10 seconds (configurable by –scan-interval flag), if no scale-up is needed, Cluster Autoscaler checks which nodes are unneeded. A node is considered for removal when all below conditions hold:

The sum of cpu requests and sum of memory requests of all pods running on this node (DaemonSet pods and Mirror pods are included by default but this is configurable with –ignore-daemonsets-utilization and –ignore-mirror-pods-utilization flags) are smaller than 50% of the node’s allocatable. (Before 1.1.0, node capacity was used instead of allocatable.) Utilization threshold can be configured using –scale-down-utilization-threshold flag.
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-scale-down-work

以下のオプションで、この閾値が変更できるようなので、ここを増やしてあげるともう少しノードが働いてくれそうです。

–scale-down-utilization-threshold

もちろん、ノードの CPU やメモリなどリソースの状況を監視しながら調整していく必要はありますが、50% はかなり余裕のある設定に見受けられます。

–scale-down-utilization-thresholdの設定調整

設定ファイルを調整して、起動コマンドにオプションを追加してみましょう。

閾値は 60% にしてみます。

# 一部抜粋
command:
  - ./cluster-autoscaler
  - --cloud-provider=aws
  - --expander=least-waste
  - --scale-down-utilization-threshold=0.6

これまで Cluster Autoscaler のログで、以下のように「スケールインしたいけどスケールイン使用率の閾値を超えています」というものが出力されていましたが、

Node ip-xxx-xxx-xxx-xxx.ap-northeast-1.compute.internal unremovable: cpu requested (55.12% of allocatable) is above the scale-down utilization threshold

該当ノードもスケールインされて、コスト的な節約にもなりました。